curlicat

2020-06-01 – 2022-11-30

The overall objective of the Curated Multilingual Language Resources for CEF AT Action is to compile curated datasets in seven languages targeted by the consortium (Bulgarian, Croatian, Hungarian, Polish, Romanian, Slovak and Slovenian) in domains of relevance to the European Digital Service Infrastructures (DSIs) with a view to enhance the Automated Translation.

Project coordinator: Nyelvtudományi Kutatóközpont

Partners:

Project webpage: https://curlicat-project.eu/

News

2022-11-29 Available 3rd version of the corpus, 67 milion tokens (51 million words). Download: vertical format curlicat-sk-20221025-v1.0.ver.xz, format CoNLL-U Plus curlicat-sk-20221025-v1.0.conllup.xz (~ 700 MB)
» NoSketch Engine interface.
2022-06-24 Available 2nd version of the corpus, 67 million tokens (51 million words). Download: vertical format curlicat-sk-20220621-v0.7.ver.xz, format CoNLL-U Plus curlicat-sk-20220621-v0.7.conllup.xz (~ 400 MB)
NoSketch Engine interface.
2021-12-01 Available 1st version of the corpus (“proof of concept”): curlicat-sk-v0.1.tar

🇪🇺 Co-financed by the European Union – Connecting Europe Facility.

Ľ. Štúr Institute of Linguistics

Slovak Academy of Sciences

News