Lemmatization & morphosyntactic analysis of Slovak

Online demo: https://mrd.juls.savba.sk/

The process uses updated morphology database from the Slovak National Corpus, MorphoDita for the lemmatization and MSD tagging.

If you want to use the API, please obtain a key here first. The API access is limited to about 10 requests per 5 minutes, the text size is limited to about 16KB.

API examples

The output will be application/text in either case. Do not forget to quote the input text properly, if it contains unusual characters (such as apostrophes or backslashes).

x-www-form-urlencoded

This is the default for www forms, good for mostly ASCII text (which Slovak mostly is). Do not forget to urlencode-quote the text properly (in the example below, curl does this for you).

curl -k --compressed -v -X 'POST' \
'https://mrd.juls.savba.sk/api/' \
-H 'Content-Type: application/x-www-form-urlencoded' \
-H Expect: \
--data-raw 'key=abcdef&text=Toto je test API pre morfologickú analýzu. Analýzu slovenčiny.'

JSON

This is an alternate way, better suited for data encapsulation in higher programming languages. Also better if your text has a lot of non ASCII Unicode characters (not the case of Slovak).

curl -k --compressed -v -X 'POST' \
'https://mrd.juls.savba.sk/api/' \
-H 'Content-Type: application/json' \
-H Expect: \
-d '{"text": "Toto je test API pre morfologickú analýzu. Analýzu slovenčiny.",
"key": "abcdef"
}'

Example of using JSON in python:

import json, urllib.request, urllib.error

url = 'https://mrd.juls.savba.sk/api/'

data = json.dumps(
        { 'text' : 'Toto je test API pre morfologickú analýzu. Analýzu slovenčiny.',
          'key': 'abcdef',
        }
        ).encode()

response = None

try:
    req = urllib.request.Request(url, method="POST")
    req.add_header('Content-Type', 'application/json')
    response = urllib.request.urlopen(req, data)
except urllib.error.HTTPError as e:
    print('ERROR happened:', e.code)
    response = e

if response:
   print(response.read().decode())

Citation

For the online demo:

Radovan Garabík, Kristína Bobeková: Lematizácia, morfologická anotácia a dezambiguácia slovenského textu – webové rozhranie. In: Slovenská reč, vol. 86, 2021, no. 1, pp. 104–109.

For the lemmatization & MSD:

Radovan Garabík, Denis Mitana. Analysing Accuracy of Slovak Language Lemmatization and MSD Tagging. In Slovenská reč, 2023, vol. 88, no. 2, pp. 129-140.