Multilingual Resources for CEF.AT in the legal domain (MARCELL) (Curlicat)
2018 – 2021
Principal Investigator: Radovan Garabík
Coordinator: Hungarian Research Centre for Linguistics, Hungary
Number of participating institutions: 8
Outputs of the Institute in the framework of the project:
- compilation of a legal corpus (containing 43 millions tokens) processed with linguistic processing chains (LPCs) including tokenization, PoS/MSD-tagging, NERC, dependency parsing; classified with top-level EUROVOC descriptors and annotated with EUROVOC and IATE terms in texts. The corpus was updated three times and can be downloaded under the CCA-SA 4.0 International license.
- creation of the portable Docker image for complex computer processing of Slovak with the focus on legislative texts
Publications: 1 scientific publication in English