CDS Incredible Alumni: Recent Publications

NYU Center for Data Science
4 min readNov 10, 2020

Preserving a strong sense of community is of utmost importance to CDS. We continue to cultivate this not only among our current students but our alumni as well. In recognition of their recent successes, we feature some of our alumni who have recently published papers below:

Shuaiji Li, CDS MS Graduate

Shuaiji Li holds a MS degree in data science from CDS. He is currently a research engineer at DiDi, a leading mobile transportation and convenience platform. Prior to that, Shuaiji was a data science intern at NBCUniversal Media.

This year Shuaiji published a paper “System and Method for Detecting Generated Domain”, which proposes a computer-implemented method for domain analysis. In 2019, alongside peers, he published a paper called “Domain Generation Algorithms Detection Through Deep Neural Network and Ensemble”, which proposes “several new real-time detection models and frameworks which utilize meta-data generated from domains”(1) and combines “the advantages of a deep neural network model and a lexical features based model using the ensemble technique.”(1)

Manoj Kumar, CDS MS Graduate

Manoj Kumar (who was previously featured in our Incredible Alumni Series) is a graduate of the CDS MS program. During his time at CDS, he attended the Google Brain Residency program for machine learning research. In 2019, Manoj co-authored “VideoFlow: A Conditional Flow-Based Model for Stochastic Video Generation”, which describes “an approach for modeling the latent space dynamics”(2) and “demonstrates that flow-based generative models offer a viable and competitive approach to generative modeling of video.”(2) The paper was accepted into the ICLR 2020 (the International Conference on Learning Representations) which is a premier gathering of professionals who are dedicated to the advancement of AI representation learning/deep learning.

Felipe Ducau, CDS MS Graduate

Felipe Ducau holds a MS in data science from CDS. He is currently co-founder and CTO of an ag-tech company, Stealth Startup, operating in stealth mode at the intersection of deep learning and microbiology, working on the understanding of soil microbiomes.

In 2019, he co-authored “ALOHA: Auxiliary Loss Optimization for Hypothesis Augmentation”,which proposes a multi-objective loss function to improve the performance of a Machine Learning classifier “trained to predict whether a given file is malware or benignware.”(3) In this work, the team fits “deep neural networks to multiple additional targets derived from metadata in a threat intelligence feed for Portable Executable (PE) malware and benignware, including a multi-source malicious/benign loss, a count loss on multi-source detections, and a semantic malware attribute tag loss. We find that incorporating multiple auxiliary loss terms yields a marked improvement in performance on the main detection task.”(3) A video of the paper’s presentation can be viewed on YouTube.

Thibault Févry, CDS MS Graduate

Thibault Févry is a MS graduate of CDS. He is currently a researcher at Point72, a global firm led by Steven Cohen that invests in multiple asset classes and strategies worldwide. Prior to that, he was an AI Resident at Google.

Most recently, Thibault co-authored “Entities as Experts: Sparse Memory Access with Entity Supervision”, which focuses on the problem “of capturing declarative knowledge about entities in the learned parameters of a language model.”(4) The team introduces a new model — Entities as Experts (EAE) — “that can access distinct memories of the entities mentioned in a piece of text.”(4) The paper was accepted into EMNLP 2020, a conference that focuses on the study of empirical methods in natural language processing.

Thibault also co-authored “Empirical Evaluation of Pretraining Strategies for Supervised Entity Linking” earlier this year. The paper presents “an entity linking model which combines a Transformer architecture with large scale pretraining from Wikipedia links.”(5) Thibault and his peers present “detailed analyses to understand what design choices are important for entity linking, including choices of negative entity candidates, Transformer architecture, and input perturbations”.(5)

References:

  1. Domain Generation Algorithms detection through deep neural network and ensemble
  2. VideoFlow: A Conditional Flow-Based Model for Stochastic Video Generation
  3. ALOHA: Auxiliary Loss Optimization for Hypothesis Augmentation
  4. Entities as Experts: Sparse Memory Access with Entity Supervision
  5. Empirical Evaluation of Pretraining Strategies for Supervised Entity Linking

--

--

NYU Center for Data Science

Official account of the Center for Data Science at NYU, home of the Undergraduate, Master’s, and Ph.D. programs in Data Science.