dc.contributor.author |
Rimkutė, Erika |
dc.contributor.author |
Bielinskienė, Agnė |
dc.contributor.author |
Dadurkevičius, Virginijus |
dc.contributor.author |
Kovalevskaitė, Jolanta |
dc.contributor.author |
Utka, Andrius |
dc.contributor.author |
Boizou, Loïc |
dc.date.accessioned |
2024-12-20T07:08:43Z |
dc.date.available |
2024-12-20T07:08:43Z |
dc.date.issued |
2024-12-19 |
dc.identifier.uri |
http://hdl.handle.net/20.500.11821/61 |
dc.description |
MATAS corpus (version 3.0)
DESCRIPTION
Updated, manually checked, morphologically annotated corpus MATAS
LANGUAGE
Lithuanian
PREVIOUS VERSIONS
1. MATAS v0.2 (http://hdl.handle.net/20.500.11821/9)
2. MATAS v1.0 (http://hdl.handle.net/20.500.11821/33)
FORMATS, STANDARTS
1. CoNLL-U (https://universaldependencies.org/format.html);
2. JABLONSKIS tagset v2 (https://sitti.vdu.lt/jablonskis-en/);
3. MULTEXT-East tagset (http://nl.ijs.si/ME/V4/msd/html/index.html)
4. UTF-8
SIZE
Tokens (incl. punctuation): 2,137,287
Words: 1,694,819
Sentences: 144,047
Documents: 1,234
GENRES
Contains 5 genres: Documents (14%), Fiction (19%), Periodicals (36%), Scientific texts (24%), Transcripts(7%)
PUBLISHER
Institute of Digital Resources and Interdisciplinary Research (SITTI), Vytautas Magnus University |
dc.language.iso |
lit |
dc.publisher |
Institute of Digital Resources and Interdisciplinary Research (SITTI), Vytautas Magnus University |
dc.rights |
PUB_CLARIN-LT_End-User-Licence-Agreement_EN-LT |
dc.rights.uri |
https://clarin.vdu.lt/licenses/eula/PUB_CLARIN-LT_End-User-Licence-Agreement_EN-LT.htm |
dc.rights.label |
PUB |
dc.subject |
corpus |
dc.subject |
Lithuanian |
dc.subject |
annotated |
dc.subject |
morphology |
dc.subject |
CoNLL-U |
dc.title |
Lithuanian morphologically annotated corpus - MATAS v3.0 |
dc.type |
corpus |
metashare.ResourceInfo#ContentInfo.mediaType |
text |
hidden |
false |
has.files |
yes |
branding |
CLARIN-LT |
contact.person |
Erika Rimkutė erika.rimkute@vdu.lt Vytautas Magnus University |
size.info |
2137287 tokens |
files.size |
24224266 |
files.count |
3 |