The Human Genome Is Finally Fully Sequenced
TThe Human Genome Project mapped the first human genome in 2001. However, researchers realized that it wasn’t complete or accurate. Researchers have created the best-sequenced human genome yet, filling in gaps, and correcting any mistakes made in the earlier version.
It is the longest reference sequence for mammal genomes. These are the results of six papers that have been published to describe the genome. ScienceThe research could lead to deeper understandings of the human evolutionary process and new strategies for treating various diseases.
More precise DNA sequences for humans
“The Human Genome Project relied on DNA obtained through blood draws; that was the technology at the time,” says Adam Phillippy, head of genome informatics at the National Institutes of Health’s National Human Genome Research Institute (NHGRI) and senior author of one of the new papers. “The techniques at the time introduced errors and gaps that have persisted all of these years. It’s nice now to fill in those gaps and correct those mistakes.”
“We always knew there were parts missing, but I don’t think any of us appreciated how extensive they were, or how interesting,” says Michael Schatz, professor of computer science and biology at Johns Hopkins University and another senior author of the same paper.
This work was done by the Telomere to Telomere Consortium, supported by NHGRI. It involves computational and genetic biology specialists from many institutes around the globe. This group was focused on filling the remaining 8% of human genetic code that has remained unfilled since the initial draft sequence. Geneticists have tried to make up the missing parts of the genome ever since. The latest group of studies identifies about an entire chromosome’s worth of new sequences, representing 200 million more base pairs (the letters making up the genome) and 1,956 new genes.
“Since the Human Genome Project [in 2001], we have declared victory a few times over the last two decades,” says Evan Eichler, professor of genome sciences at the University of Washington and another senior author of one of the papers. Eichler was involved in mapping the original sequence and says that the focus of the sequences this time is very different. “While the original goal of the Human Genome Project was to order and orientate every base pair, that couldn’t be achieved because the technology wasn’t sufficiently advanced enough. So we finished the parts that we could finish.”
New findings promise great things
These previously unreachable sections include the centromeres. They are the centrally wound portions of chromosomes. The centromeres keep long double DNA strands organized. As the strands unravel, they copy each other and then separate into two cells when a cell divides. They are essential for human normal development. These areas also have a vital role in neurodegenerative and brain growth. “It’s been one of the great mysteries of biology that all eukaryotes—all plants, animals, people, trees, flowers and higher organisms—have centromeres. It’s a really fundamental part of how DNA replicates and how chromosomes organize and how cells divide. But it’s been a great paradox, because while its function has been around for billions of years, it was almost impossible to study because we didn’t have a centromere sequence to look at,” says Schatz. “Now we finally do.”
Scientists were also able to sequence the long stretches of DNA that contained repeated sequences, which genetic experts originally thought were similar to copying errors and dismissed as so-called “junk DNA”. However, these repeated sequences may be involved in some human diseases. “Just because a sequence is repetitive doesn’t mean it’s junk,” says Eichler. He points out that critical genes are embedded in these repeated regions—genes that contribute to machinery that creates proteins, genes that dictate how cells divide and split their DNA evenly into their two daughter cells, and human-specific genes that might distinguish the human species from our closest evolutionary relatives, the primates. One paper showed that the number of repeated areas in primates differs from humans’ and that the regions are located in various parts of the genome.
“These are some of the most important functions that are essential to live, and for making us human,” says Eichler. “Clearly, if you get rid of these genes, you don’t live. That’s not junk to me.”
Decoding the meanings of these repeat sections, as well as how sequences from previously unsequenced areas like the centromeres can be used to improve human health and treatment, are just the beginning, according to Deanna Church (Vice President at Inscripta), a gene engineering company that wrote the commentary. It is not the same thing as decoding the entire human genome. She estimates scientists only have half the information.
There’s still room for improvement. The new sequence comes from essentially half a human—that is, half of the genetic content normally found in a person’s DNA. Two sets of DNA are present in each person, one maternal and one paternal. The DNA strands have slightly different versions of the same genes. This gives us two genomes. The task of assembling these two genomes was not easy. It was difficult to separate paternal and maternal DNA using the current sequencing technology. Scientists might have difficulty matching sections if they mistakenly thought they were working on the maternal chromosome. “It’s similar to having two puzzles in the same box,” says Phillippy. “You have to sort out what the differences are and reconstruct both.”
The scientists used a fertilization error to create this new sequence. This meant that the embryo resulting from the fertilization process contained only paternal DNA. This resulted in the growth being removed. The lab kept the cell line in existence in the 2000s as it was viable, despite the abnormal chromosomal contents. This made it much easier for the team to create the genome, as they only had one puzzle to solve.
However, ultimately researchers will require a human genome that includes the entire sequences of maternal and paternal DNA. That’s coming soon. Phillippy is working alongside others to collect DNA samples from three individuals: volunteers, their parents and their fathers. This will allow scientists to separate paternal and maternal sequences. It will also enable them to assemble two distinct genomes. By the end of this year, the teams will have completed the diploid human genome sequence.
Already, says Winston Timp, associate professor of biomedical engineering at Johns Hopkins and a co-author on one of the papers, “the new genome assembly is paying dividends because it provides a more accurate map to understand what data we had from before meant.” That includes finding new variants that might distinguish healthy people from those affected by disease, for example, as well as variants that might put people at higher risk of developing certain diseases.
“We’ve discovered millions of genetic variants that were previously not known across samples of thousands of individuals whose genomes have already been sequenced,” says Rajiv McCoy, assistant professor of biology at Johns Hopkins and another co-author. “We will have to wait until future work to learn more about their associations with disease, but a big focus of work now will be on trying to discover new genetic variations that were previously uncharacterized.”
Even with the more complete version of the human genome, scientists likely won’t be clamoring to replace the old version, despite its gaps and errors. That’s because the decades of work on human genetics has made that older version far more annotated than the new one—similar to the difference between your favorite copy of book, with your handwritten notes and highlighting in the margins, and a fresh copy from the bookstore. “A genome is only as good as its annotation,” says Eichler. “All the clinical and research labs have built decades worth of data based on the old, gap-filled genome. To redo all of that work for any individual lab would be horrific.” He predicts that many labs will gradually switch over to working with the new genome by comparing smaller datasets first in a test run to see how much richer and more comprehensive the information they generate from the newer genome is. The new genome is available on the public database, just like the human original. “For now, both genomes will be kept up so there will be no replacement,” he says.
Researchers will be able to produce more of the entire genome in the coming years. They’ll use both paternal as well maternal DNA to aid scientists in identifying the most promising targets for potential new treatments and improving our understanding of human evolution and development. There will be more important patterns that can lead to better understanding and treatment of disease in humans if there are more genomes. The ultimate goal of the project is for every individual to be able sequence their entire genome as part their medical records. This would enable doctors to examine those sequences and identify which variants might contribute to particular diseases.
“This is presenting the world with a whole additional chromosome that we have never seen before,” says Karen Miga, assistant professor in biomolecular engineer at University of California, Santa Cruz and a senior author of one of the papers. “We have new landscapes, new sequences and the opportunity and promise of new discoveries.”
It is evident that the medical and genomic community are buzzing with excitement. “Hallelujah, we finally finished one human genome, but the best is yet to come,” Eichler said during a briefing. “No one should see this as the end, but the beginning of a transformation not only in genomic research but in clinical medicine as well.”
Here are more must-read stories from TIME