University of Tartu's Jaak Vilo on Meeting Estonia, and the World's, Computer Science Needs

Jaak Vilo heads the Institute of Computer Science at the University of Tartu, the university’s largest institute, and arguably its most intermeshed with the private sector. By Vilo’s account, it is also its most international institute as roughly half of the institute’s researchers are not Estonians. The institute collaborates with the local IT industry as well as international organizations such as ELIXIR, which provides a distributed infrastructure for bioinformatics data across Europe.

Estonia was not always a hub for computer science though. When Vilo pursued his higher education in the mid-1980s, he first studied applied mathematics at the University of Tartu before transferring to the University of Helsinki where he enrolled in a PhD program for computer science. Yet Estonia eventually lured him back, both with its ongoing efforts to build the Estonian Biobank as well as an active IT sector that includes the development offices for Skype and Playtech. Today the institute collaborates with multiple partners to serve their needs.

Research in Estonia recently spoke with Vilo about the evolution of the field of computer science in Estonia, and, especially, where the field is headed.

I was lucky to get into an exchange program in Finland in my fourth year of university, so 1989 to 1990, even before the Soviet collapse. I graduated in 1991, just before Estonian independence had been redeclared. Because I had been one year in Helsinki studying hard, as well as working as a teaching assistant to earn a living, I was offered a chance to join the PhD program in computer science. It was there at the University of Helsinki that I got into bioinformatics and the DNA and protein sequence world, in the mid-1990s. For bioinformatics, that was relatively early. The big explosion happened in the late-1990s and 2000s. So I have been involved in it for quite a while.

You were also involved with the Estonian Biobank from the beginning.

In 1999, I went to the European Bioinformatics Institute in the UK working there just at the time that genome sequences started to appear. Microarrays emerged at this time and I was working with the data at the EBI. When the biobank idea emerged in Estonia, it was clear that bioinformatics would be needed. Andres Metspalu invited me back to Estonia, so I came back for the biobank already in the summer of 2002.

In 2006, I became a full professor at the University of Tartu. That was a complicated time for computer science at Tartu because around 2005 there were three professors all over retirement age and still working. Then I got a private donation from Swedank to establish a software engineering professorship at Tartu. Thanks to that we were able to recruit Honduran-Australian computer scientist Marlon Dumas, who is the best researcher in Estonia now. I became more involved with the university from that time on and, in 2011, I became head of the institute.

What does the institute do?

Computer science is pretty much everything that is related to software development, programming, data analytics, and artificial intelligence nowadays. Computer science does not study computers as hardware or equipment but does all the algorithms, software development, software engineering, anything related to programming, really. Since 2011, the institute has more than quadrupled in size, and has twice as many students and, in master’s studies, even four times as many students. The PhD program has over doubled and still needs to grow to match the need for training academic staff as well as satisfying the needs of the industry. As of this year we are the biggest unit at Tartu and perhaps in the entire Estonian higher education system. The reason for the growth is that there is such a huge demand for software developers at the companies and students are also interested in the discipline. Skype happened, Playtech happened, and these are only the big ones who are inseparable from software developers. All IT development was starting to experience rapid growth in the early 2000s, but you can’t grow exponentially because you need skills. I think since those days, the IT industry related to software development has been growing at a very sustainable rate of 1,200 people per year, especially in the IT industry and on the software development side.

How do you interact with industry?

It’s a mix of different things. There is of course some kind of constant ongoing discussion because of the need for software developers. A lot of the people teaching at the university, lecturers, are involved in industry. There is a massive interaction in that sense.

I think the main issue is that if you think back 15 years ago, companies would say they just needed programmers. Even today, many say they just need someone who knows a particular programming language or coding skills with a narrow focus, but this is not what a university education is about. University education is much broader, and much deeper than just becoming a programmer. It’s not that we don’t want to train a mass of programmers, but rather we want to make sure that everybody who goes through the curriculum at the university becomes skilled professionals, deep specialists with a broad theoretical and practical understanding of the domains.

The problem we had when you go back to 2003, 2004, 2005, is that there was almost nobody to hire to work at the university. That is why my role at the institute has also been to develop the PhD program, so that eventually we will get people with PhD degrees who can take up these academic positions in the long run. The Estonian education system in the 1980s had not many computers, in the 1990s, there was a very poor head start. Computer science was not on the agenda. Only in the 2000s did the field develop as we suddenly needed many software developers and had constantly more jobs, but no people. The same was with the university, as we didn’t have the sufficient teaching staff nor research strengths. That is why our institute now is not just the biggest but one of the most international as this was simply the only way to respond to the needs of the companies and the university. Over half of the staff is now international hires, this has tremendously increased the capacity and quality of teaching of local students. International students have in addition created the international atmosphere and helped improve the quality of overall education, as they are often more focused on their studies in nominal time. Many work now in key positions in local companies and have even established their own successful startups in Estonia.

How have you been able to keep up with the infrastructure needs?

Somehow we have managed, because Estonia was running through years of financial support from EU structural funds. These usually have come with some investment goals, like infrastructure development. Through these funds we have managed to purchase hardware in Estonia. We managed to buy computer clusters, storage, discs, mostly through these types of projects. Estonia has also joined the Euro High Performance Computing [HPC] initiative and collaborates a lot with Nordic countries. In 2021 we will gain access to LUMI, a new supercomputer based in Finland when it gets switched on later this year. That will especially increase our AI capabilities as there will be a lot of GPU nodes needed for neural network training.

Where is the majority demand coming from, in terms of the investigators you work with?

In Estonia, researchers historically needed to compute for computational chemistry and physics simulations, but biology has emerged now as one of the biggest data producers, because of genetics and genomics approaches, all of these different measurements in almost every lab. Of course you can add astronomy and satellite images. You can stream infinite amounts of data into the disc from various sensors, and then it depends really on what you want to compute. All these fields — physics, chemistry, genetics, astronomy, all of these are data rich disciplines now. Our computer scientists have also started to develop autonomous driving needing also massive amounts of data from the roads and AI technologies to process those data streams.

What are you working on at the moment, what are your development plans?

We are developing the infrastructure for the Estonian medical system, how to provide access to personal genomic information in the form of valuable interpretations made available to medical doctors. The biobank has come out with various genetic risk scores and we are working on how these would get into everyones health records and to the medical doctors. We are developing this IT infrastructure at the moment. On the research side we continue with analysis of biological data, but increasingly so also with electronic health records and disease progression analysis. As we need to understand better every disease, so to speak, we also need to bring additional genetic or other big data in. The focus has shifted more toward analyzing health data as it comes from medical doctors.

We are also interfacing to international infrastructures. We host the international ELIXIR infrastructure for biological data, software tools, and services. You can imagine that biology has become one of the largest data producers in science. The main difference from physics is that they often have huge measuring devices in one location, such as the CERN, or various telescopes, like the Hubble. Biology generates sufficient large data in many tens of thousands of laboratories around the world. Data production is widespread and in order to not repeat the same experiments everywhere, there has to be proper data sharing in place, so that research data would be openly accessible in these common databases. ELIXIR provides this bonding of biological data, supporting standardization efforts, the use of standard tools, and providing a resource for training researchers. Millions of researchers who do biological data analyses around the world get access to the data. This is something that is needed. Some of our tools are being accessed with a million queries per month.

Written by: Justin Petrone

This article was funded by the European Regional Development Fund through Estonian Research Council.

University of Tartu’s Jaak Vilo on Meeting Estonia, and the World’s, Computer Science Needs