Researchers Develop New Classification Tool to Find COVID-19 Genetic Signature

Underlying genomic signature for 29 different COVID-19 DNA sequences is discovered.

University of Waterloo biology professor Kathleen Hill co-led a team of researchers who’ve identified an underlying genomic signature for 29 different COVID-19 sequences. They did this using a new graphics-based software classification tool powered by machine learning. When sequencing the genome of a virus like COVID-19, the classification software makes use of a decision tree method to run through the best possible choices and achieve a 100 percent accurate outcome within minutes.

This is a 3D map of researchers' machine learning classification tool. (a) 3,273 viral sequences from the researchers first test, representing 11 viral families and realm Riboviria; (b) 2,779 viral sequences from the second test, classifying 12 viral families of realm Riboviria; and (c) 208 Coronaviridae sequences classified into genera. (Image courtesy of PLOS.)

This is a 3D map of researchers’ machine learning classification tool. (a) 3,273 viral sequences from the researchers first test, representing 11 viral families and realm Riboviria; (b) 2,779 viral sequences from the second test, classifying 12 viral families of realm Riboviria; and (c) 208 Coronaviridae sequences classified into genera. (Image courtesy of PLOS.)

Researchers wrote a paper about their findings called “Machine learning using intrinsic genomic signatures for rapid classification of novel pathogens: COVID-19 case study.” According to the paper, “The proposed method combines supervised machine learning with digital signal processing (MLDSP) for genome analyses, augmented by a decision tree approach to the machine learning component, and a Spearman’s rank correlation coefficient analysis for result validation. These tools are used to analyze a large dataset of over 5000 unique viral genomic sequences, totaling 61.8 million bp, including the 29 COVID-19 virus sequences available on January 27, 2020.”

The team’s results confirm the currently accepted hypothesis that COVID-19 (SARS-CoV-2) originated from bats, and their results classified it as Sarbecovirus within Betacoronavirus. The findings suggest that their machine learning approach is a reliable, scalable and quick option for taxonomic classification of novel viruses. This means that the tool can be used to better serve real-time worldwide reaction and strategy to combat novel viruses in the future.

Mobilizing medical personnel to react as quickly as possible is invaluable in the fight against a global pandemic and will also be incredibly useful in helping develop correct treatments to outbreaks of novel viruses, including the rapid development of vaccines.

Bottom Line

The research is encouraging for future outbreaks, but what it means for the treatment and rapid development of vaccines for the COVID-19 outbreak remains unclear. New studies are showing the efficacy of Remdesivir in treating patients infected with COVID-19, and two vaccinations from Oxford and Germany that seem to be making rapid progress are set to begin human trials in the coming months. Let’s hope for the best.