A database detailing the genetic diversity of humankind makes finding genetic modifications causing severe diseases easier and may help scientists develop new medicines. Data from the Estonian Genome Centre played an important part in the creation of gnomAD and in introducing its possibilities.
Even the DNA of two people born in the same country and the same city may differ on average by four million larger or smaller modifications. “Similarly, I and others have on average 20–30 different genes, which technically do not work. By mapping this normal variability all over the world, we can say with great certainty whether, for example, a child’s disease is caused by a mutation which broke a protein,” explained Tõnu Esko, a leading research professor of human genomics at the University of Tartu.
Evaluating the impact of DNA modifications is easier with plants and animals. Even a banana tree can be irradiated, affected by chemicals or its genome can be modified with new generation precision tools and see what happens as a result. Doing the same with humans would be more time consuming and ethically questionable. We are left with using statistical methods. The more vital the gene, the rarer we should see variations in it. People with a mutation causing more severe disruptions in the functioning of the body often die younger or are unable to produce as many offspring.
For the correct categorisation of the impact of the modifications, as many people from the different regions of the world should be studied as possible. For example, a gene variant extremely rare in Estonia may be normal in Mexico and its carriers may lead a perfectly normal life. One of the largest respective databases thus far – exAC – included the gene sequences of 60,000 people. At the same time, it was heavily inclined towards people living in Europe and the USA.
The database that is now available to all scientists aggregates about 126,000 exome and 15,700 complete genome sequences. Approximately one fifth of the latter came from the Estonian Genome Centre. Estonian genomes were sequenced in a way that made them of especially high quality. If scientists see added or lost DNA segments or additional copies of some segments, it is highly probable that this is how they actually are.
“With complete genomes we are able to notice greater rearrangements in the genome. With children suffering from rare diseases, in about 10–15 per cent of cases we see that it is caused by a structural change,” said a professor of clinical genetics at the University of Tartu, Katrin Õunap, pointing out one of the main values of the new database.
As an example, Õunap and her colleagues were able to associate a specific mutation with a hereditary metabolic disease diagnosed in an Estonian child. Scientists had been debating over the causes of this disease for decades already. Õunap suspected that the dihydropteridine reductase deficiency was caused by a 180-degree rotation of a DNA segment containing nine million pairs of letters. This mutation would have significantly changed the function of the QDPR gene associated with it.
At the request of the professor, one of the founders of the gnomAD database, Ryan Collins, confirmed that such change was not present in any of the 15,000 people in the database. This makes the mutation a highly likely candidate for this disease. Even though a molecular diagnosis may not change the life of a specific child, parents carrying the mutation can take it into account when considering extending their family further.
From the gnomAD database, in addition to 300,000 structural changes, the scientists also found a total of over 17 million gene variations and 262 million modifications from all the genomes.
The other side of the coin
Studying the gene variations of healthy people has other benefits as well. Scientists can use the database to look for people with neither of the gene copies gotten from their parents working. For example, Tõnu Esko, Lili Milani, a research professor at the Genome Centre, and many other scientists involved in the gnomAD project focused on the gene LRRK2. Some of its variants have been associated with the Parkinson’s and Chron’s diseases, for example. Primarily people whose genes have more of the respective protein produced are at a greater risk of developing the disease.
Even though it is usually found in one of the most important organs in humans – the brain – according to the analysis, the lack of the protein did not have any visible effects on human health. “This suggests that a drug directed against a specific protein or a signal path would not have very drastic or noxious side effects,” said Esko. This could potentially decrease the risk of developing the Parkinson’s disease.
However, scientists associated with this project noted that with most other genes at least a thousand times more genomes should be sequenced to draw similarly reliable conclusions. One of the published analyses also indicated that it is probably too early for fully automating the search for mutations leading to the loss of a gene function.
To reduce errors, the catalogues need to be regularly updated and amended by real people. There is also a lack of genomes from people living in the Middle East, Africa, Oceania, and Central and Southeast Asia.
Tõnu Esko noted that despite these limitations, gnomAD is several times more reliable than other similar databases, including ClinVaril, where physicians and labs add hypothetical mutations related to a disease. The technologies and standards used are not the same everywhere.
For evaluating the impact of different mutations, in addition to sequencing additional genomes, studying the other molecular markers of people would also be of help to the scientists. If only one gene copy is working properly but by-products of protein work can still be found in the same amount, it is probably not a great loss. The relative content of proteins circulating in blood plasma should be monitored in the same way.
However, Tõnu Esko finds that when gathering genetic data, consideration should always be given to why it is being gathered. In his opinion, for using all the possibilities of precision medicine, 100,000–200,000 complete genomes should be sequenced in Estonia and these should represent all families. For the rest, the cheaper gene chips directed towards the protein-producing areas would be enough.
“This way we could already see at birth whether the child has any significant recessive deficiencies, give recommendations concerning the use of medicines later in life and predict the risk of diseases of older age such as Parkinson’s or Alzheimer’s disease,” explained the research professor.
The editorial published in the journal Nature noted that projects similar to the Estonian or British biobanks are paving the way with combining genetic and clinical data, but more diverse human populations need to be studied. “Certain type of research needs to be done with international cooperation and in large collectives,” said Esko.
Studies were published in the following journals: Nature, Nature Communications ja Nature Medicine.
The translation of this article from Estonian Public Broadcasting science news portal Novaator was funded by the European Regional Development Fund through Estonian Research Council.