BioNTech, the German biotech company that pioneered the messenger RNA technology behind the Pfizer COVID-19 vaccine, has teamed up with London-based A.I. company InstaDeep to create what the two firms say is an effective “early warning system” for spotting potentially dangerous new coronavirus variants.
In tests, the two companies said that their early warning system was able to pick up 12 of the 13 coronavirus variants that the World Health Organization has so far designated as potentially dangerous, doing so on average two months before the WHO reached that conclusion. For the Omicron variant, the system identified it as potentially dangerous on the same day its genetic sequence was first made available, according to a paper BioNTech and InstaDeep published on the non-peer-reviewed academic repository bioRxiv.org on Wednesday.
“Early flagging of potential high-risk variants could be an effective tool to alert researchers, vaccine developers, health authorities, and policymakers, thereby providing more time to respond to new variants of concern,” said Ugur Sahin, BioNTech’s cofounder and chief executive officer.
The global nature of the pandemic, the easy transmissibility of the virus behind COVID-19 (SARS-CoV-2), and the widespread use of genomic sequencing have deluged scientists with data. And since the virus constantly mutates, new variants are found continuously, even though the vast majority of these new variants do not pose an increased risk or pose a challenge to existing vaccines and treatments.
“More than 10,000 novel variant sequences are currently discovered every week, and human experts simply cannot cope with complex data at this scale,” Karim Beguir, the cofounder and chief executive of InstaDeep, said in a statement.
The system BioNTech and InstaDeep developed works in two ways, both based on the DNA sequence of a variant.
The first part of the system predicts the structure of a variant’s spike protein from the DNA sequence. The spike protein is the part of the virus used to infect cells, and it is also the part that antibodies latch on to in order to disable the virus.
In the past year, big leaps have been made in the ability to forecast protein structures just from DNA sequences. London-based A.I. company DeepMind, which is owned by Google-parent Alphabet, has made one such system, AlphaFold, freely available to researchers. Another system, called RoseTTAFold, created by researchers at the University of Washington, is also available. Colby Ford, a researcher at the University of North Carolina at Charlotte, used both of those systems to predict that the Omicron variant would be a significant variant of concern but not fully escape vaccine-induced antibodies, weeks before such results were confirmed by traditional lab experiments.
BioNTech and InstaDeep research did not use these new A.I.-based protein structure prediction tools, instead using older molecular simulation methods.
Based on this modeling, the system awards the variant two scores: one for how easily that spike protein latches on to a receptor, called ACE 2, that it uses to invade human cells. The other score ranks how easily antibodies, such as those generated in people inoculated with BioNTech’s vaccine, can bind with that spike protein, preventing the virus from infecting cells. Both scores were validated using lab experiments.
“With the advanced computational methods we have been developing over the past months, we can analyze sequence information of the spike protein and rank new variants according to their predicted immune escape and ACE2 binding score,” Sahin said.
The second part of the system takes the DNA sequence and treats it as if it were a kind of language. It then uses A.I. techniques that have been developed for natural-language processing to examine how similar the DNA sequence for a particular variant’s spike protein is to other known coronavirus spike proteins. From this, the researchers derive two additional scores.
Ranking the variants
These four metrics—two from the structural models and two from the language models—are then combined into two aggregate scores.
One, called “the immune escape score,” which takes in the antibody binding score from the structural analysis and the spike protein difference score from the language model, gauges how likely it is that the variant will evade a natural or vaccine-induced immune response.
The other, called a “fitness prior score,” takes the binding data from the structural analysis and the variant’s likelihood of existence metric from the language model and uses them to provide a sense of how likely that variant is to be able to outcompete other known variants. The fitness score also includes a metric on how quickly that variant seems to be spreading, based on how well represented it is among all variants sequenced over the past eight weeks.
Based on both the immune escape score and fitness score, a statistical method is then used to give the variant an overall rank. Those with higher ranks compared to most other known variants spreading at the time were considered variants of concern.
The researchers tested the system by looking at the database where scientists deposit new SARS-CoV-2 genetic sequences and analyzing them every week from Sept. 16, 2020, to Nov. 23, 2021. During that period it spotted every variant of concern, except for one, on average 58 days before the WHO flagged it.
For variants Alpha to Mu, the researchers said, their early warning system was able to tell the variant was concerning when there were on average just 25 cases recorded in the database, compared to more than 1,500 cases by the time of WHO designation. For Omicron, the researchers said, the system flagged the variant immediately, finding it had the highest “immune escape” score of any of the 70,000 variants that had been analyzed during the study period.
Notably, the Delta variant was the one WHO “variant of concern” that the early warning system failed to accurately predict. The BioNTech and InstaDeep researchers attribute this to two possible factors. One is that one-quarter of their method is focused on changes to the spike protein that might enable it to evade antibodies. But it is known that antibodies continue to bind well with the Delta variant’s spike protein. Delta’s high transmissibility and ability to cause more severe disease seem to stem from mutations in other parts of the virus, the researchers said.
Also, the Delta variant first emerged in India, which conducts relatively little genomic sequencing compared to the vast number of infections in the country. This might mean the limited genomic sequence data available was “insufficient to fully describe the epidemiological landscape in time,” the researchers said. They also said Indian government restrictions on the export of biological data might have restricted key sequence information from reaching the global databases that the researchers used to back-test their early warning system.