Findings

Solving more of the genome puzzle

A new tool developed by a Yale researcher speeds up the process.

Alex Eben Meyer

Alex Eben Meyer

View full image

Modern genome sequencing technology has existed for a quarter century, yet certain parts of the human genome continue to evade scientists’ attempts to sequence them. These regions—some of which harbor genes involved in devastating genetic diseases—are like especially tricky sections of a puzzle, says Haoyu Cheng, assistant professor of biomedical informatics and data science at Yale School of Medicine. 

“The puzzle is messy,” he says. “We need computational tools to figure out the real picture.” 

Cheng’s research has focused on building such tools, and earlier this year, he released new software that can read these tricky genomic sequences more easily and affordably. With this software, he hopes to make genome sequencing a more powerful clinical asset. 

Typical sequencing tools read a genome in short spurts. Each segment is like a puzzle piece, and software helps scientists fit these pieces together to create the full genome sequence—a complex picture of human life, disease, and history.

But what if the puzzle has a large section of identical pieces—for example, dozens of blue pieces forming the sky? The human genome has a similar phenomenon called repetitive regions. These stretches of DNA have the same sequence repeated anywhere from dozens to hundreds of times. Figuring out how these pieces fit together has stumped traditional genome sequencing tools.

Newer sequencing tools solve this problem by reading longer stretches of the genome. “If you think about this like a puzzle, we need to make the puzzle pieces larger than each of the repetitive regions,” Cheng says—that is, instead of many small blue puzzle pieces, one large puzzle piece that captures the sky. In the context of DNA sequencing, this technology is called “long-read.” But it comes with its own challenges: assembling a genome from these long-reads requires data from ultra-long segments derived from large amounts of DNA, which is expensive and challenging to acquire. This makes it nearly impossible to sequence patients’ genomes in a hospital setting.

Cheng’s new software, called hifiasm (ONT), avoids this limitation. It works with a simpler version of long-read sequencing data that can be extracted from 40 times less DNA. The tool also speeds up the process tenfold, making it possible to generate a full human genome in just three days. Researchers around the world are already using hifiasm (ONT) for everything from sequencing genes involved in aging to building a catalog of non-human vertebrate genomes. 

With this tool, Cheng and collaborators were able to generate full sequences of genes that cause diseases. For example, spinal muscular atrophy is a muscular disease that is caused by mutations in a pair of nearly identical genes. Cheng’s tool was able to construct precise sequences to distinguish both genes using far less DNA than previously. He hopes that his software can make complete genomes more accessible in clinical settings, where they can hold valuable clues for diagnoses and treatments. 

“For undiagnosed genetic diseases, it can take two or three years to make a diagnosis for a single patient,” Cheng says. “But if we can [use this tool], it might be much faster.”  

Post a comment