Researchers at Google DeepMind, the tech giant's artificial intelligence arm, on Tuesday introduced a tool that predicts whether genetic mutations are likely to cause harm, a breakthrough that could help research into rare diseases. The findings are "another step in recognising the impact that AI is having in the natural sciences," said Pushmeet Kohli, vice president for research at Google DeepMind.
The program makes predictions about so-called missense
mutations, where a single letter is misspelt in the DNA code. Such mutations
are often harmless but they can disrupt how proteins work and cause diseases
from cystic fibrosis and sickle-cell anaemia to cancer and problems with brain
development.
The researchers used AlphaMissense to assess all 71m
single-letter mutations that could affect human proteins. When they set the
program’s precision to 90%, it predicted that 57% of missense mutations were
probably harmless and 32% were probably harmful. It was uncertain about the
impact of the rest.
Based on the findings, the scientists have released a free
online catalogue of the predictions to help geneticists and clinicians who are
either studying how mutations drive diseases or diagnosing patients who have
rare disorders.
A typical person has about 9,000 missense mutations
throughout their genome. Of more than 4m seen in humans, only 2% have been
classified as either benign or pathogenic. Doctors already have computer
programs to predict which mutations may drive disease but because the
predictions are inaccurate, they can only provide supporting evidence for
making a diagnosis.
Writing in Science, Dr Jun Cheng and others describe how
AlphaMissense performs better than current “variant effect predictor” programs
and should help experts pinpoint more swiftly which mutations are driving
diseases. The program may also flag mutations that have not previously been
linked to specific disorders and guide doctors to better treatments.
The AI is an adaptation of DeepMind’s AlphaFold program,
which predicts the 3D structure of human proteins from their chemical makeup.
AlphaMissense was fed data on DNA from humans and closely
related primates to learn which missense mutations are common, and therefore
probably benign, and which are rare and potentially harmful. At the same time,
the program familiarised itself with the “language” of proteins by studying
millions of protein sequences and learning what a “healthy” protein looks like.
When the trained AI is fed a mutation, it generates a score
to reflect how risky the genetic change appears to be, though it cannot say how
the mutation causes any problems.
“This is very similar to human language,” Cheng said. “If we
substitute a word in an English sentence, a person familiar with English can
immediately see whether the word substitution will change the meaning of the
sentence or not.”
Prof Joe Marsh, a computational biologist at Edinburgh
University who was not involved in the work, said AlphaMissense had “great
potential”.
“We have this issue with computational predictors where
everybody says their new method is the best,” he said. “You can’t really trust
people, but [the DeepMind researchers] do seem to have done a pretty good job.”
If clinical experts decided that AlphaMissense was reliable,
its predictions may carry more weight in future disease diagnosis, he said.
Prof Ben Lehner, senior group leader in human genetics at
the Wellcome Sanger Institute, said the Al’s predictions need to be verified by
other scientists but it seemed good at identifying which DNA changes cause
disease and which do not.
“One concern about the DeepMind model is that it is
extremely complicated,” Lehrer said. “A model like this may turn out to be more
complicated than the biology it is trying to predict. It’s humbling to realise
that we may never be able to understand how these models actually work. Is this
a problem? It may not be for some applications, but will doctors be comfortable
making decisions about patients that they don’t understand and can’t explain?
“The DeepMind model does a good job of predicting what is
broken,” he added. “Knowing what is broken is a good first step. But you also
need to know how something is broken if you want to fix it. Many of us are very
busy generating the massive data needed to train the next generation of AI
models that will tell us not only which changes in DNA are bad but also exactly
what the problem is and how we might go about fixing things.”