Neural Covidex: Searching For COVID-19

Image designed by Ashley C. McDonald (original stock image by 洋榤 郭)

Our Associate Professor Kyunghyun Cho, in collaboration with Jimmy Lin from the University of Waterloo and a small team of students, has developed an innovative search engine that provides current information to researchers, medical professionals, and others on the frontlines working to combat the COVID-19 pandemic.

Neural Covidex utilizes Allen Institute for AI’s COVID-19 Open Research Dataset (CORD-19), a freely accessible resource of over 57,000 scholarly articles available for the global research community. Covidex also supports search on over 100 randomized controlled trials via Trialstreamer, which is an evolving annotated database of currently 571,818 randomized controlled trials. Additional sources Covidex draws from includes preprints from bioRxiv and medRxiv as well as PubMed, which houses a list of relevant articles from the World Health Organization.

In addition to general search functionality, Covidex uses a baseline keyword search interface which is powered by open source systems such as Blacklight (a multi-institutional collaboration for building a better discovery platform framework), Solr (an Apache based enterprise search platform), and the Anserini IR toolkit, (a retrieval toolkit built on Lucene). With the software stack readily available in its entirety on Github, Covidex’s searcher, neural reranker, and passage highlighter aspects of the tool can be reused by others in the research community. The Covidex team is quite forthcoming in its encouragement of other researchers to build on these available components.

To learn more about how Neural Covidex functions, please refer to their system description and initial question answering experiments pages on the official website.

By Ashley C. McDonald

Official account of the Center for Data Science at NYU, home of the Undergraduate, Master’s, and Ph.D. programs in Data Science.