Lemmatization & morphosyntactic analysis of Slovak
Online demo: https://mrd.juls.savba.sk/
The process uses updated morphology database from the Slovak National Corpus, MorphoDita for the lemmatization and MSD tagging.
If you want to use the API, please obtain a key here first. The API access is limited to about 10 requests per 5 minutes, the text size is limited to about 16KB.
API examples
The output will be application/text in either case. Do not forget to quote the input text properly, if it contains unusual characters (such as apostrophes or backslashes).
x-www-form-urlencoded
This is the default for www forms, good for mostly ASCII text (which Slovak mostly is). Do not forget to urlencode-quote the text properly (in the example below, curl does this for you).
curl -k --compressed -v -X 'POST' \ 'https://mrd.juls.savba.sk/api/' \ -H 'Content-Type: application/x-www-form-urlencoded' \ -H Expect: \ --data-raw 'key=abcdef&text=Toto je test API pre morfologickú analýzu. Analýzu slovenčiny.'
JSON
This is an alternate way, better suited for data encapsulation in higher programming languages. Also better if your text has a lot of non ASCII Unicode characters (not the case of Slovak).
curl -k --compressed -v -X 'POST' \
'https://mrd.juls.savba.sk/api/' \
-H 'Content-Type: application/json' \
-H Expect: \
-d '{"text": "Toto je test API pre morfologickú analýzu. Analýzu slovenčiny.",
"key": "abcdef"
}'Example of using JSON in python:
import json, urllib.request, urllib.error
url = 'https://mrd.juls.savba.sk/api/'
data = json.dumps(
{ 'text' : 'Toto je test API pre morfologickú analýzu. Analýzu slovenčiny.',
'key': 'abcdef',
}
).encode()
response = None
try:
req = urllib.request.Request(url, method="POST")
req.add_header('Content-Type', 'application/json')
response = urllib.request.urlopen(req, data)
except urllib.error.HTTPError as e:
print('ERROR happened:', e.code)
response = e
if response:
print(response.read().decode())
Links
- original morphological database: https://korpus.sk/en/corpora-and-databases/databases/morphology-database/
Citation
For the online demo:
Radovan Garabík, Kristína Bobeková: Lematizácia, morfologická anotácia a dezambiguácia slovenského textu – webové rozhranie. In: Slovenská reč, vol. 86, 2021, no. 1, pp. 104–109.
For the lemmatization & MSD:
Radovan Garabík, Denis Mitana. Analysing Accuracy of Slovak Language Lemmatization and MSD Tagging. In Slovenská reč, 2023, vol. 88, no. 2, pp. 129-140.