Multilingual Resources for CEF.AT in the legal domain (MARCELL) (Curlicat)

2018 – 2021

Principal Investigator: Radovan Garabík

Coordinator: Hungarian Research Centre for Linguistics, Hungary

Number of participating institutions: 8

Outputs of the Institute in the framework of the project:

- compilation of a legal corpus (containing 43 millions tokens) processed with linguistic processing chains (LPCs) including tokenization, PoS/MSD-tagging, NERC, dependency parsing; classified with top-level EUROVOC descriptors and annotated with EUROVOC and IATE terms in texts. The corpus was updated three times and can be downloaded under the CCA-SA 4.0 International license.

- creation of the portable Docker image for complex computer processing of Slovak with the focus on legislative texts

Publications: 1 scientific publication in English