Coronavirus Drug Discovery Competition Results

The results for the world's first and only open-source Coronavirus Drug Discovery Competition are now in! There were so many awe-inspiring submissions but the top 3 not only identified potential Coronavirus treatments using machine learning, they went above and beyond by creating high-quality evidence-based reports of their results. Let's take a look at their submissions! 

1st Place: Matt O'Connor

  • Current Position: Founder of Reboot.AI, Data Science Coach   
  • Background: Former Lead Data Scientist for Bridgewater Associates (algorithmic trading)          
  • Location: Hong Kong & New York City
  • Results: Identified Remdesivir as a potential COVID-2019 treatment, as well as a few other novel molecules that still need to be tested for synthetic feasibility.
  • Award: $1000 cash + $500 cloud credits 
  • Code: https://github.com/mattroconnor/deep_learning_coronavirus_cure                                    

AI System Description:

Matt used a generative recurrent network trained on the Moses and ChemBL datasets which together represent about 4 million molecules to learn a representation, a compressed form, of all of them. It took 4 days to train them all on a single GTX 1060 GPU. Then, he used that representation to generate 10,000 molecules. Since it would take 100 hours of compute time to compute the docking scores for each of those 10,000 molecules, he used a genetic algorithm to find a subset of the most diverse molecules in the larger set. Through a series of steps randomly mutating molecules and assessing each of their fitness scores by measuring their molecular similarity, he ended up with 1500 of the most diverse molecules. Then, he ranked all of them by their docking scores and found that several of the highest scoring ones were existing HIV inhibitors, which have already been shown to be effective against the Coronavirus.  Remdesivir, however, gave an even higher docking score. And a few new molecules scored even higher than Remdesivir, but they still need to be assessed for their synthetic feasibility.

Matt's Submission Presentations (Part 1 & 2):

2nd Place: Thomas MacDougall 

  • Current Position: Graduate Student in Computer Science at the University of Montreal
  • Background: Received a Bachelors in Chemistry and a Bachelors in Computer Science at the University of New Brunswick Fredericton Campus 
  • Results: Identified a novel compound as a synthetically feasible potential treatment for COVID-2019 
  • Award: $1000 in cloud credits
  • Code: https://github.com/tmacdou4/2019-nCov                                                                                                          

AI System Description: Tom built an impressive neural architecture for this problem. He used a constrained graph variational autoencoder to generate molecules, and an edge memory neural network to classify them. He was then able to compute how well they dock or fit with the Coronavirus protease using a program called autodock vina. The idea there is that the higher the docking score, the more likely a drug is to be effective against a given virus. Ultimately, his algorithm found a novel drug that could serve as potential Coronavirus treatment. 

Tom's Submission Presentations (Part 1 & 2):

3rd Place: Tinka Vidovic

  • Current Position: PhD student at the Mediterranean Institute of Life Sciences
  • Background: Received her MD from the University of Zagreb School of Medicine
  • Location: Croatia
  • Results: Identified Valproic Acid as potential treatments for COVID-2019
  • Award: $1000 in cloud credits
  • Code:  https://github.com/tinkavidovic/competition                                                                           

AI System Description:

Tinka used a Connectivity map genome-scale library of cellular signatures that catalogs transcriptional responses to chemical, genetic, and disease perturbation her analysis. She also used a dataset called 'Harmonizome', which contains 72 million functional associations between genes and their attributes as a starting point. Specifically, she used a subset of it titled "Gene expression omnibus signatures of differentially expressed genes for viral infections'. It contains data from the lungs of mice and lung cell lines infected with 2 earlier versions of the Coronavirus similar to the novel coronavirus (SARS-1 and MERS). She used an R package called PharmacoGX for this which helps analyze large biological datasets. Using it she was able to collect the relevant data, then preprocess it so as to include connectivity scores and p value scores. Negative connectivity score means that drug could reverse disease transcriptional signature to normal level. After ranking them based on a connectivity score and p value, she identified Valproic acid, an FDA approved drug for migraine headache, epilepsy and bipolar disorder that is currently tested as anti-HIV therapy as a potential treatment for the coronavirus.


Next Steps

In the short-term, we'll coalesce our efforts around Matt's highest scoring, existing compound (Remdesivir) as a promising treatment for the Coronavirus. This is because it'll be faster to gain approval by various governments since it's already been studied for a few years. As such, we'll be donating samples of it for further analysis to the Wuhan institute of Virology since that region currently has the most deaths. We are also encouraging the Sage Health community to raise awareness around Remdesivir by sharing learning resources on social media with the hashtag #COVID2019_Potential_treatment. 


We'll also be sending a report of the novel molecules that our winners discovered to several other institutions for further analysis. All of the code, data, and AI techniques that have been used are open-source and freely available. A huge thank you to all the participants in this competition! We plan on hosting more so stay tuned for updates! Using the power of community-driven. open-source Artificial Intelligence, we can and will accomplish our goal of helping treat, cure, and prevent all human disease by the end of the 21st Century.