Google DeepMind has released an AI model named AlphaGenome that reads DNA sequences to predict how genes work. The model looks at up to 1 million DNA letters at once and figures out thousands of details about gene control. Researchers published their findings in Nature in January 2026. This work aims to make sense of the parts of DNA that guide gene activity, which make up most of the human genome.
Background
DNA holds the instructions for building and running the body. Only a small part, about 2 percent, directly codes for proteins. The rest, often called non-coding DNA, plays a big role in controlling when and where genes turn on. Scientists have long wanted to understand these control regions, known as promoters and enhancers. They act like switches that decide gene activity in different cells, like nerve cells or muscle cells.
Past efforts used lab tests called MPRA, short for massively parallel reporter assays, to check DNA pieces. These tests show how changes in DNA affect gene output. But they take time and cover short DNA stretches. Computer models helped, but older ones could not handle long DNA or many tasks at once. AlphaGenome changes that by using deep learning, a type of AI trained on huge data sets from projects like ENCODE and GTEx. These projects measured gene activity across hundreds of human and mouse cell types.
The model learns patterns in DNA, much like grammar in a language. It spots where genes start and end, how RNA gets cut and joined, how much RNA a gene makes, and where proteins bind to DNA. Training took just four hours on less computing power than some earlier models.
Key Details
AlphaGenome takes a DNA sequence up to 1 million base pairs long. That is about 0.03 percent of the full human genome. It predicts effects at the level of single letters. For changes, like a single letter swap, it compares the normal sequence to the altered one and scores the impact in seconds.
How It Outperforms Others
Tests showed AlphaGenome beat top models on 22 of 24 tasks for single sequences. For variant effects, it matched or topped others on 24 of 26 tasks. It handles many jobs at once, like predicting splice sites, gene expression levels, and DNA openness. Splice sites are spots where RNA gets edited, following strict DNA rules. The model excels there and could replace older tools right away.
It uses data from 5,930 human points and 1,128 mouse points. This covers RNA production, protein binding, and chromosome folding. Unlike past models that traded length for detail, AlphaGenome does both without extra compute.
“By doing well on so many different genomic tasks simultaneously, we believe this demonstrates that the model has learned a powerful general representation of DNA sequences and the complex processes these sequences encode.” – Natasha Latysheva, Google DeepMind
Experts note its strength in rigid rule areas, like splice prediction. It also predicts gene expression from DNA alone, a hard task since environment matters too. Still, its accuracy from local DNA rules impresses researchers.
What This Means
AlphaGenome opens doors to better disease understanding. Many health issues come from changes in non-coding DNA, the so-called dark genome. These variants alter gene control far from the gene itself, up to hundreds of thousands of letters away. The model spots these effects, helping link mutations to conditions like cancer or rare diseases.
In research, it speeds up mapping genome functions. Scientists can test huge DNA regions fast, finding key control elements for cell types. For synthetic biology, predictions guide custom DNA design, like genes active only in certain cells.
Drug development could benefit. By pinpointing variant impacts, researchers target regulatory flaws in diseases. It improves quantitative trait loci predictions by 25.5 percent for expression and 8 percent for DNA access changes.
Limits exist. It struggles with controls over 100,000 bases away or tissue-specific effects. Training focused on protein-coding genes, missing some non-coding RNAs. Cell-specific regulation needs more work. Still, it sets a base for future models.
The model is public via API, letting any researcher use it. This could speed discoveries in genome function and biology. Teams expect it to help find new treatments by decoding DNA's control code.
