Two recent papers in the premier scientific journal Nature Genetics have showcased Estonia’s wealth of talent and global profile in the field of genetics research.
In the first study, scientists from the University of Tartu together with partners from the European Bioinformatics Institute in the UK described the development of a compendium of uniformly processed human gene expression and splicing quantitative trait loci, called the eQTL Catalogue. The project was developed by Open Targets, a public-private partnership based in the UK.
The purpose of the catalogue was to gather and harmonize a collection of genetic datasets that could be made accessible via one resource to researchers undertaking genome-wide association studies, or GWAS, where researchers look across genomic data from a specific cohort for genetic markers that could underlie a certain phenotype or condition. QTLs are variants in the human genome that have been found to be associated with a measurable phenotype. Such variants are often just coordinates that are adjacent to genes or splicing events, a disruption in genomic code. While GWAS identify phenotype-associated variants, studying QTLs for gene expression and other omics layers helps to understand which genes and biological pathways might be underlying those higher-level phenotypes.
Scientists have been churning out genome-wide association studies data for nearly two decades using technologies such as genotyping arrays and next-generation sequencing, however much of that data remains difficult to access because of the need to apply to use it, which is often a complex legal process that can take years to complete.
The concept of the eQTL Catalogue therefore was to do this work for others. The authors have gained access to datasets from 29 studies and counting and compiled a collection of quality-controlled and harmonized gene expression and splicing QTLs. Summary statistics are freely available on the EBI’s website and should aid researchers with the systematic interpretation of human GWAS associations across diverse cell types and tissues.
According to Kaur Alasoo, a lecturer in bioinformatics at the University of Tartu’s Institute of Computer Science, and corresponding author on the study, he and fellow researchers had to “jump through many bureaucratic hoops to get all the datasets to a single location,” in this case the University of Tartu, and then to process them locally.
“This was annoying, but worked reasonably well for the relatively small individual datasets that we had,” said Alasoo. “The advantage to the data owners was that they did not have to do any work to get the datasets to the eQTL Catalogue, we did all the work for them,” he said. “The disadvantage of our approach was that we had to exclude some datasets, because the consent obtained from the original study participants did not allow sharing of raw individual-level data.”
Alasoo noted that the initial release of the eQTL Catalogue has already served as a catalyst for many new studies, which have not yet been published.
Scientists working in the FinnGen Project, an effort to obtain genetic data from half a million Finns by next year, have been using the catalogue to interpret their findings, he said. Alasoo added that he and colleagues are currently preparing the next release of the Catalogue, which will include new studies, updated genotypes, and X chromosome data, and expect it to go live in February.
The eQTLGen Consortium paper
The second new Nature Genetics paper is also related to eQTL analysis and in this case is the product of the eQTLGen Consortium, a collaborative research effort aimed at identifying the downstream consequences of trait-related genetic variants. Scientists from the University of Tartu and the University of Groningen in the Netherlands led the consortium.
As noted in the paper, they set out to investigate the genetics of gene expression, performing cis- and trans-eQTL analyses using blood gene expression from 31,684 individuals. A cis-eQTL is a variant that coincides with the location of the underlying gene, whereas a trans-eQTL affects distant genes or even genes residing on other chromosomes.
The researchers detected cis-eQTL for 88 percent of genes, which were replicable in numerous tissues, whereas trans-eQTL showed lower replication rates, due in part to low replication power and confounding by cell type composition. The researchers also reported that trans-eQTL exerted their effects via multiple mechanisms, mostly through regulation of transcription factors. Expression of 13 percent of the genes correlated with polygenic scores for 1,263 phenotypes, meaning that those genes might be associated with those traits. In total, the work represented useful resource that genetics researchers could use in their studies.
Urmo Võsa, a research fellow at the Estonian Genome Center at the University of Tartu, described the work of the eQTLGen Consortium as a large-scale meta-analysis, and said the work began nearly seven years ago.
“In order to identify relatively weak trans-eQTL effects, we aimed to do a highly powered meta-analysis in a large number of samples, finally reaching more than 30,000. This is a six-fold increase compared to previous similar efforts,” said Võsa. “The limitation was that we could only reach such numbers using the most accessible tissue, in this case blood.”
Võsa noted that in the eQTLGen Consortium project they were well-powered to look at the distal effects of eQTL SNPs that affect not only nearby genes, but genes more distant in the genome. “We focused on those variants previously known to be associated with some phenotypic traits, like risks for different diseases, height, body mass index, these kinds of higher-level traits,” he said. Võsa pointed out that in this first phase of the project, they only tested around 10,000 SNPs associated with traits.
For the next phase of the project, the researchers aim to run a GWAS on every single gene, testing all the common variants in the genome against the expression of every blood-expressed gene. “This is quite a challenging task, as we want to run 20,000 GWAS on the numerous cohorts involved,” Võsa said. “Such analyses are computationally intensive and yield in terabytes of data, making classical meta-analysis approach impractical.” Võsa said he is working with partners on an analysis setup which overcomes those challenges.
According to Võsa, the Nature Genetics study paper has been cited frequently, even since when it was a preprint. Other researchers have used it to interpret their GWAS findings, or to design new bioinformatics methods for GWAS interpretation, as it is the largest, most highly powered eQTL study published to date. He added that researchers also widely use the eQTL Catalogue which, while not so highly powered, has a “better overview of different tissues and cell types.”
Alasoo agreed. “The eQTL Catalogue has around 70 different cell types and tissues, but only a few hundred individuals or donors in each cell type or tissue,” he said. “eQTLGen has only one tissue, blood, but a massively larger sample size of more than 30,000,” he continued. “As a result, eQTL Catalogue is most useful for detecting genetic variants near each gene, while eQTLGen also enables the detection of associations much further away.”
Võsa said that working in a consortium had helped to raise the Estonian Genome Centre’s profile and competencies in carrying out such studies, though the center is also a go-to partner, partly because Estonia has a large biobank encompassing 20 percent of the Estonian adult population. “Most people working on complex trait genetics know the Estonian Genome Centre because Estonian Biobank is one of the larger ones in terms of population coverage” said Võsa. “Anyone who wants to do meta-analysis on some complex trait wants to involve as many biobanks as possible and the Estonian Biobank is one of those,” he said. He noted that the recent papers have also helped to raise the profile of Estonian scientists.
“Researchers from the Estonian Genome Centre are leading some of such international collaborations too,” remarked Võsa. “It gives our scientists the possibility to pursue their own research interests.”
Written by: Justin Petrone
This article was funded by the European Regional Development Fund through Estonian Research Council.