If DNA is the blueprint for what makes us human, scientists have just served up plans for the whole neighbourhood.
On Wednesday, an international team supported by the U.S. National Institutes of Health released 47 genomes that together provide the most comprehensive look yet at the molecular underpinnings of human diversity.
The collection, known as the “pangenome reference,” replaces a succession of single reference genomes that were constructed primarily from the DNA of one individual.
The update means that researchers studying everything from inherited disease to human evolution are no longer confined by one version of the linear sequence of genetic letters that represents the recipe for making a human being. Now they can draw on a rich matrix of possibilities that samples the manifold directions the genome has taken since humanity first arose.
“We are finding remarkable patterns of genetic variation,” Eric Green, director of the National Human Genome Research Institute in Bethesda, Md., said during an online news briefing. Results of the effort, conducted by the Human Pangenome Reference Consortium, were published in the journal Nature.
Dr. Green said those results go beyond highlighting alternative versions of the same genes and encompass entire sets of genes that are present only in some human subpopulations, or that have a substantially different structures from one individual to the next.
“There were hints of all this before, but we didn’t actually have the right microscope to see it,” Dr. Green added. “We can now see this variation for the first time and it is amazing.”
Consortium members said the pangenome reference provides scientists with a better sense of what the full range of human diversity looks like at the molecular level, with practical implications.
“The pangenome is going to allow us to distinguish between common structural variations that are probably pretty benign … from things that are rare and potentially have some sort of consequence in health and disease,” said Benedict Paten, a consortium member and researcher at the University of California Santa Cruz Genomics Institute.
The human genome consists of some 3.2 billion base pairs or genetic letters strung out along approximately two metres of DNA packed into nearly every human cell.
While all people share approximately 99.9 per cent of that DNA, it is the remaining 0.1 per cent – the average genetic difference between individuals – that has the potential to reveal why health risks from cancer to a range of mental disorders vary so widely across the human population.
To better scrutinize those differences, project members used methods that provided more complete reads of longer strands of DNA. The pangenome reference, together with accompanying data and analyses, are now available for researchers anywhere.
Guillaume Bourque, director of bioinformatics at the McGill Genome Centre in Montreal, was a participant in the project who helped to validate the way the pangenomic data will be used.
In a separate study published in the journal Cell Genomics last month, he and colleagues compared the genes of 35 individuals to tease out sequences that play a role in the human immune response to influenza.
Only a small fraction of our DNA consists of genes that code for the specific proteins needed to build and sustain a human. However, Dr. Bourque said that much of the diversity in the pangenome lies in other areas that regulate which genes are switched on and off at different stages of development and which genetic factors are amplified.
“While there is a core set of genes that tend to be quite stable,” he said, “this is telling us that it’s not just what’s in your kitchen, but how you use it.”
The pangenome reference builds on the earlier achievement of the US$3-billion Human Genome Project, which reached its culmination in 2003. It produced the first ever nearly complete sequence of an individual’s DNA.
Many improvements to the genome have been made since then to fill in gaps in that sequence.
But along the way, scientists have come to recognize the limitations and biases in treatment options that result when differences between people of varied genetic heritage are not reflected in the tools that physicians have to work with.
“We now understand that having one map of a single human genome cannot adequately represent all of humanity,” said geneticist Karen Miga, also with the UC Santa Cruz Genomics Institute.
Last year, a team led by Dr. Miga released the most complete version of an individual human genome ever assembled as part of the Telomere to Telomere project. That became the foundation for the new pangenome reference.
Consortium members said the next step is to sequence a much larger set of 350 genomes to capture most of the remaining variation in the global population.
“I would call this a mid-term report,” said Peter Lansdorp, a researcher with the BC Cancer Agency who has worked on the methods employed by the consortium but was not involved in Wednesday’s release.
With more genomes, he said, would come more potential for linking genetic differences with various responses to drugs or to disease susceptibility.
“That’s going to the next challenge, given that there’s so much variation,” he said. “But we’ve got to start somewhere and, like navigators, find our way around.”