NYU Center for Data Science Incredible Alumni: Alexandre Sablayrolles
Data science is a rapidly growing area of study. Yet many people are still left asking, “What actually is a data scientist?” If there were a simple answer, data science wouldn’t be all the rage it is now. What makes data science so unique, among other things, is the application of the knowledge and skills it takes to be a data scientist to a vast, interdisciplinary array of studies and professions.
Data is everywhere, and data scientists are following. CDS seeks to give students the tools and networks necessary to set them up for success in whatever industry they enter. But don’t take our word for it… Allow us to introduce the first of NYU Center for Data Science’s Incredible Alumni: Alexandre Sablayrolles!
Alexandre attended the Masters in Data Science program at CDS from September 2015 to December 2016, and is now a Ph.D. Resident in AI Research at Facebook. He spoke with us about his experience as a member of one of the earlier cohorts at CDS, “radioactive data”, and real-world impact.
This interview has been lightly edited for clarity.
How did you first become interested in the field of data science?
My very first encounter with data science was during a computer vision class project at Ecole Polytechnique. Back then I was working on super-resolution: given a low-resolution image, we wanted to create a higher-resolution version of the same image, with 4x as many pixels. Our method was a mix of rule-based methods combining pixel values, with a few additional improvements we came up with. As I was reading literature on the subject, I came across a paper that was using convolutional neural networks for this task. Their approach sounded very elegant: just “train” a model on pairs of low- and high- resolution images, and given a new low-resolution image, the model will output its corresponding high-resolution version. This got me very excited about neural networks and data science.
What brought you to CDS?
I applied to CDS in 2014, which was one year after the first cohort entered the program. Even though the program was very young, the curriculum was appealing and the fact that Yann Lecun was the founder of CDS was a big factor for me. Also in retrospect, living in New York is a fantastic experience.
What was your experience with CDS like?
I had an amazing experience at CDS. It’s a great environment: we have a strong faculty and the students are nice and smart. What I particularly liked about CDS is having a dedicated space where students go between classes to work. There are regular talks and research seminars in this space where renowned researchers present their latest results. It also encourages collaboration for understanding coursework, working on projects, and in general getting to know other students. I think this is one of the essential ingredients that creates the spirit of the cohort.
What has been the most challenging point in your career as a data scientist?
I think one of the most challenging parts of research is resilience: you have to keep working on a topic even though your technique is not working yet. For my first paper, I spent quite a lot of time testing various models for a privacy task called “membership inference”, but none of these models were able to get better performance than a very simple model. A couple of weeks later, we realized that there was a theoretical reason for it, which was one of the main contributions of this project.
What has been the most rewarding point in your career as a data scientist?
I think one of the most rewarding moments is when you get people excited about what you are doing. When I presented my first paper at a conference, I met other researchers who had encountered the same data science problem as me and were interested in my solution. Knowing that your research can help others is very satisfying.
What upcoming projects are you most excited for?
I am currently doing research in privacy for machine learning, and we have some exciting projects in the pipes. Our current research direction is called “radioactive data”: it is a technique that allows users to add a marker to their data, so that they can identify any machine learning model trained on their data. This technique is very similar to the way radioactive tracers work in the medical domain, hence its name.
What role do you see your work playing in the future?
I think that privacy is a field of machine learning of growing importance, and it has seen huge progress in the past years: On the regulation side, new rules such as the European Union’s GDPR and the right to be forgotten set out the terms for the use of data. On the technology side, the industry has agreed on a standard, differential privacy, that matches people’s expectations of private usage of their data. I think that in the future, privacy technology will be integrated even more in most data science frameworks, and it will increase people’s confidence in machine learning systems.
Which classes that you took at CDS proved most helpful or valuable in your studies and/or career?
I think in terms of material, the deep learning class was the most useful class for my career, as it gave us the tools to understand and implement state-of-the-art algorithms in computer vision and natural language processing. During my curriculum, there are two other classes I particularly enjoyed. The first was “Statistical Natural Language Processing” by Prof. Slav Petrov: The class covered a lot of the traditional statistical approaches to NLP, and there was a bi-weekly competition to get the best performing implementation that got us very excited. The other one was “Inference and Representation” by Prof. David Sontag. This class taught the theory of graphical models and their applications, which I found intellectually satisfying and beautiful: It goes beyond the correlations used in machine learning and provides a simple way to express complex causal phenomenon.
Do you have any advice for someone beginning a career in data science?
I would say to keep an open mind. Data science is a relatively new field, but it lies at the intersection of many domains which have existed for decades: Databases, statistics, optimization, information theory… There are often interesting applications that emerge as we apply ideas from a different domain to our data science problems.
By Mary Oliver