CDS PhD Student Co-Authors Paper on Deep Multispecies Network-Based Protein Function Prediction

NYU Center for Data Science
2 min readMar 16, 2021

--

CDS PhD Student Meet Barot

CDS PhD student Meet Barot recently co-authored “NetQuilt: deep multispecies network-based protein function prediction using homology-informed network similarity” along with Simons Foundation research scientist Vladimir Gligorijević, CDS professor Kyunghyun Cho, and CDS associated professor Richard Bonneau.

In its introduction the paper details how sequences have been the central source of information protein function prediction — which is primarily due to their multitude and ease with which many models can incorporate large amounts of sequence data. They go on to explain, however, that in function prediction, sequence information does not provide the context of a protein in an organism — and this context can be immensely relevant in determining the protein’s function. Conversely, protein interaction networks provide a way to understand how proteins function in cellular pathways and “have been a powerful source of information for inferring the functions of unannotated proteins.”(1) Sequence and structure-based function prediction methods are intrinsically able to predict functions for proteins of multiple organisms. However, the problem of how to use the extensive amounts of network information from multiple species in a single model has yet to be solved.

As a solution, the team introduces their method NetQuilt, which achieves numerous goals in function prediction. First, it allows for the integration of sequences and networks, which in turn allows the limited knowledge of the homology between proteins to be augmented by knowledge of the network topology, and contrariwise. NetQuilt also creates “protein features that are not tied to a single species and that include evolutionary and functional information.”(1) Most importantly, the method “enables network-based function prediction even for species for which knowledge of their protein interaction networks is limited.”(1)

NetQuilt is the first of its kind. It’s a “multispecies network-based deep learning method for protein function prediction that effectively integrates PPI network information and homology.”(1) Ultimately, the team was able to demonstrate that this approach performs well, even in instances where a species has no network information available.

To learn more about NetQuilt, please visit the NetQuilt Github page. To read the paper in its entirety, please visit the paper’s Oxford Academic Bioinformatics page.

References:

  1. “NetQuilt: deep multispecies network-based protein function prediction using homology-informed network similarity”

By Ashley C. McDonald

--

--

NYU Center for Data Science
NYU Center for Data Science

Written by NYU Center for Data Science

Official account of the Center for Data Science at NYU, home of the Undergraduate, Master’s, and Ph.D. programs in Data Science.