dc.contributor.author |
Rimkutė, Erika |
dc.contributor.author |
Bielinskienė, Agnė |
dc.contributor.author |
Dadurkevičius, Virginijus |
dc.contributor.author |
Kovalevskaitė, Jolanta |
dc.contributor.author |
Utka, Andrius |
dc.contributor.author |
Boizou, Loïc |
dc.date.accessioned |
2019-12-22T23:16:25Z |
dc.date.available |
2019-12-22T23:16:25Z |
dc.date.issued |
2019-12-23 |
dc.identifier.uri |
http://hdl.handle.net/20.500.11821/33 |
dc.description |
MATAS corpus (version 1.0)
DESCRIPTION
Manually checked, morphologically annotated corpus MATAS
FORMATS
1. CoNLL-U (CONLLU, conllu)
2. SketchEngine - tab delimited word per line (TAB-WPL, txt)
SIZE
Wordform count: 1,693,410
Sentence count: 144,047
GENRES
Contains 5 genres: Documents (14%), Fiction (19%), Periodicals (36%), Scientific texts (24%), Transcripts(7%)
TAGSETS
morphological annotation presented with 3 different tagsets:
- Universal Dependencies (POS 4 column, morphological categories 6 column), see universaldependencies.org;
- Jablonskis (5 column) see Documentation folder;
- Multext-EAST (10 column), see Documentation folder.
JABLONSKIS AND MULTEXT-EAST TAGSETS
Jablonskis -> Lithuanian tagset -> human-readable
Multext-East -> English tagset -> machine-readable
Please use the following text to cite this item:
Rimkutė E., Daudaravičius V., Utka A. 2007: Morphological Annotation of the Lithuanian Corpus. Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics; Workshop Balto-Slavonic Natural Language Processing 2007, Prague, 94–99. |
dc.language.iso |
lit |
dc.publisher |
Vytautas Magnus University |
dc.rights |
PUB_CLARIN-LT_End-User-Licence-Agreement_EN-LT |
dc.rights.uri |
https://clarin.vdu.lt/licenses/eula/PUB_CLARIN-LT_End-User-Licence-Agreement_EN-LT.htm |
dc.rights.label |
PUB |
dc.subject |
morphologically annotated |
dc.subject |
POS tagged |
dc.subject |
Lithuanian |
dc.title |
Lithuanian morphologically annotated corpus - MATAS v1.0 |
dc.type |
corpus |
metashare.ResourceInfo#ContentInfo.mediaType |
text |
hidden |
false |
hasMetadata |
false |
has.files |
yes |
branding |
CLARIN-LT |
contact.person |
Andrius Utka andrius.utka@vdu.lt Vytautas Magnus University |
sponsor |
Vytautas Magnus University MTI-02/2015 Information System of Syntactic-Semantic Analysis of Lithuanian Language: Development of Public Services (SEMANTIKA-2) euFunds |
size.info |
1693410 words |
size.info |
276 files |
files.size |
34549005 |
files.count |
3 |