MATAS v0.2 - Morphologically Annotated Lithuanian Corpus (manually checked)
Contains 4 parts: Documents (21%), Fiction (19%), Periodicals (36%), Scientific texts (24%)
Wordform count: 1,641,263
Files: 92
Encoding: UTF-8
Tagset:
Human-readable (Lithuanian tags)
e.g. <word="liepos" lemma="liepa" type="dktv mot.gim vnsk K">
Date:
2014.08.06
Please use the following text to cite this item:
Rimkutė E., Daudaravičius V., Utka A. 2007: Morphological Annotation of the Lithuanian Corpus. Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics; Workshop Balto-Slavonic Natural Language Processing 2007, Prague, 94–99.
Licence:
CLARIN-LT ACA