Slovak National Corpus

Visit Slovak National Corpus' website
For more information, please visit the Slovak National Corpus' website

About us and our research work

The main objective of our department is to build corpora. These are large electronic databases containing a plethora of electronic texts in an unified format. The texts are enriched with specific (not only) linguistic tags. Our corpora (which can be searched for free) offer linguists and public at large the possibility to study Slovak grammar, lexicon or stylistics in authentic texts according to their criteria and query settings.

Our corpus resources contain mostly Slovak texts written after 1955, but also records of older historical or dialectal forms of Slovak. The Slovak language can be also studied in comparison with other languages by using parallel corpora. These consist of sentence-aligned pairs of original and translated texts, rarely translations from third languages. In order to create such corpora and other specialized databases, it is required to combine linguistic approaches with modern information technology. That is why, continuous improvement of our software tools and solutions for individual tasks within the domain of Natural Language Processing has been an essential part of our work.

The research conducted at our department employs methods of corpus linguistics inextricably based on corpus resources. We have contributed to the lexicological and lexicographical tradition of our institute by publishing special dictionary works, such as Frekvenčný slovník slovenčiny na báze Slovenského národného korpusu [Frequency Dictionary of Slovak Based on the Slovak National Corpus] (2017), Retrográdny slovník súčasnej slovenčiny – slovné tvary na báze Slovenského národného korpusu [Reverse Dictionary of Contemporary Slovak. Word Forms Based on the Slovak National Corpus] (2018), Frekvenčný slovník hovorenej slovenčiny na báze Slovenského národného korpusu [Frequency Dictionary of Spoken Slovak Based on the Slovak National Corpus] (2018). In addition to tasks related to Natural Language Processing and corpus linguistics, we are also partly involved in terminological research and in a relatively new domain of Digital Humanities.