2020-06-01 – 2022-11-30
The overall objective of the Curated Multilingual Language Resources for CEF AT Action is to compile curated datasets in seven languages targeted by the consortium (Bulgarian, Croatian, Hungarian, Polish, Romanian, Slovak and Slovenian) in domains of relevance to the European Digital Service Infrastructures (DSIs) with a view to enhance the Automated Translation.
Project coordinator: Nyelvtudományi Kutatóközpont
Partners:
- Институт за български език „Професор Любомир Андрейчин“
- Filozofski fakultet Sveučilišta u Zagrebu
- Instytut Podstaw Informatyki Polskiej Akademii Nauk
- Institutul de Cercetări pentru Inteligență Artificială “Mihai Drăgănescu”
- Jazykovedný ústav Ľ. Štúra Slovenskej akadémie vied, v. v. i.
- Institut “Jožef Stefan”
Project webpage: https://curlicat-project.eu/
News
- 2022-11-29 Available 3rd version of the corpus, 67 milion tokens (51 million words). Download: vertical format curlicat-sk-20221025-v1.0.ver.xz, format CoNLL-U Plus curlicat-sk-20221025-v1.0.conllup.xz (~ 700 MB)
2022-06-24 Available 2nd version of the corpus, 67 million tokens (51 million words). Download: vertical format curlicat-sk-20220621-v0.7.ver.xz, format CoNLL-U Plus curlicat-sk-20220621-v0.7.conllup.xz (~ 400 MB)
- 2021-12-01 Available 1st version of the corpus (“proof of concept”): curlicat-sk-v0.1.tar
🇪🇺 Co-financed by the European Union – Connecting Europe Facility.