CDS Professor Develops New Course Accessible Online to All: Mathematical Tools for Data Science
CDS Assistant Professor of Data Science and Mathematics, Carlos Fernandez-Granda, has developed the course Mathematical Tools for Data Science. Contributors who participated in the development of the course include CDS visiting assistant professor Brett Bernstein and CDS PhD students Aakash Kaku, Sheng Liu, and Sreyas Mohan. The course provides an introduction to tools from several areas of mathematics such as linear algebra, Fourier analysis, probability theory, and convex optimization, which are useful in data science. Topics include covariance matrices, principal component analysis, linear regression, regularization, sparse regression, frequency representations, the short-time Fourier transform, wavelets, Wiener filtering, and convolutional neural networks. Course materials such as notes, slides, videos, and code are fully available on the CDS website.
We caught up with Carlos to discuss a bit about the origins of the course and how he sees it growing in the future.
What was the initial motivation/inspiration for developing Mathematical Tools for Data Science?
I wanted to teach fundamental topics in signal and image processing, such as frequency transformations, the sampling theorem, the spectrogram, or Wiener filtering, which are not covered in existing data-science classes. I also wanted to provide a slightly more mathematical perspective on other topics, such as linear regression, regularization, and sparse regression.
How do you hope the course will benefit those who take it?
On the one hand, it may expose them to some new topics (many students did not know about wavelets or stationarity). On the other hand, it will show them how certain mathematical tools (the singular value decomposition, subgradients, the Fourier series) can help us to analyze and understand data-science methods.
What would you say is unique about the course?
The course provides a self-contained and accessible analysis of phenomena that have important implications for data-science applications and are not often analyzed mathematically. For example, we study early stopping (a technique widely used in deep learning) on a linear model, showing that it performs implicit regularization in a very similar way to ridge regression.
How do you see the course evolving over time?
Nowadays my research group is mostly focused on designing and analyzing deep learning techniques for signal and image processing. This year I included a brief description of convolutional neural networks at the end of the course, using our work on deep denoising of electron-microscope images to illustrate the potential of these techniques. I will probably incorporate more material on deep learning in the future.
Carlos’ research efforts focus on the design and analysis of data-science methodology, and its application to medicine, climate science, and scientific imaging. He is a member of the MaD group (a joint initiative of CDS & NYU’s Courant Institute of Mathematical Sciences).
To learn more about Mathematical Tools for Data Science, please visit the course’s webpage.
By Ashley C. McDonald