Google DeepMind has released a new AI model called AlphaGenome that reads up to one million letters of DNA at once. It predicts how these sequences control genes and how small changes in the DNA might affect health. The work appeared in the journal Nature in January 2026. Researchers built the model to make sense of the parts of DNA that do not directly make proteins but still guide gene activity. This could help explain why some genetic changes lead to diseases like cancer.

Background

DNA carries the instructions for building and running the human body. Most people know the 2% that codes for proteins, the building blocks of cells. But 98% of DNA does other jobs. It tells genes when to start, stop, or adjust their work in different cells, like brain cells versus skin cells. Scientists call this the regulatory genome or non-coding DNA.

For years, researchers studied these regulatory parts with lab tests. They used methods like MPRA, which stands for mass parallel reporter assays. These tests check how DNA snippets affect gene output. But the tests cover short DNA pieces and take a lot of time and money. They also miss how far-away DNA bits influence genes.

Advertisement

AI has changed that. Models like earlier ones from DeepMind looked at DNA but had limits. They handled short sequences or low detail. AlphaGenome fixes this. It takes long DNA strings—up to a million base pairs—and predicts many gene control steps at the level of single letters. The team trained it on huge public datasets from projects like ENCODE and GTEx. These cover hundreds of human and mouse cell types.

The model spots patterns with convolutional layers, links them across the sequence with transformers, and outputs predictions. Training one full model took four hours on special computers called TPUs. That is half the power needed for older models but with better results.

Key Details

AlphaGenome predicts a wide range of gene regulation steps. It forecasts where genes begin and end in various tissues. It tells how much RNA a gene makes, a sign of its activity level. The model also predicts splicing, where RNA gets cut and joined. It flags open DNA regions, nearby bases, and spots where proteins bind.

How It Handles Changes

The real power shows when testing DNA variants. These are single letter swaps, called mutations. AlphaGenome compares the original DNA to a changed one. It scores the impact on all those predictions in seconds. For example, it might show a mutation boosts RNA output or blocks protein binding.

Tests pitted it against top models for single tasks. AlphaGenome won or tied on 22 of 24 property predictions. For variant effects, it led on 24 of 26. No other model predicts everything at once. It even handles long-range control, where enhancers far from a gene still steer it.

One case looked at T-cell leukemia. Around the TAL1 gene, the model matched known protein binding sites. Predicted changes lined up with real data.

"By doing well on so many different genomic tasks simultaneously, we believe this demonstrates that the model has learned a powerful general representation of DNA sequences and the complex processes these sequences encode." – Natasha Latysheva, Google DeepMind

The team made AlphaGenome available through an API. Scientists can input DNA and get results fast. They plan a full release for others to tweak on their data.

What This Means

This model opens doors in medicine and research. Most disease-linked variants sit in non-coding DNA. AlphaGenome pinpoints how they disrupt gene control. That could explain cancers, rare diseases, or traits like height.

In drug design, it might spot targets in regulatory DNA. Researchers could test how changes fix faulty control. For synthetic biology, predictions guide custom DNA. Say, a sequence that turns a gene on only in nerve cells.

Basic science gains too. It maps genome functions better. It reveals 'grammar' rules in DNA—patterns that act like code for regulation. Splice sites follow strict rules, and the model nails those. Gene expression predictions from DNA alone amaze, even if environment plays a role.

Experts see it replacing old tools for some jobs right away. Over time, it speeds hypothesis testing. No need for many models; one call covers all. That saves time and lets teams chase new questions.

The non-coding genome was once called junk. Now, tools like this show its depth. Animals like mice, fish, and humans have similar gene counts. Differences come from regulatory tweaks—when and where genes fire. AlphaGenome deciphers those tweaks across huge DNA spans.

Researchers worldwide will use it to probe disease causes. It could lead to treatments that fix gene control, not just protein issues. The Nature paper sets a benchmark. Future models will build on this foundation.