| dc.contributor.author | Rimkutė, Erika |
| dc.contributor.author | Bielinskienė, Agnė |
| dc.contributor.author | Dadurkevičius, Virginijus |
| dc.contributor.author | Kovalevskaitė, Jolanta |
| dc.contributor.author | Utka, Andrius |
| dc.contributor.author | Boizou, Loïc |
| dc.date.accessioned | 2019-12-22T23:16:25Z |
| dc.date.available | 2019-12-22T23:16:25Z |
| dc.date.issued | 2019-12-23 |
| dc.identifier.uri | http://hdl.handle.net/20.500.11821/33 |
| dc.description | MATAS corpus (version 1.0) DESCRIPTION Manually checked, morphologically annotated corpus MATAS FORMATS 1. CoNLL-U (CONLLU, conllu) 2. SketchEngine - tab delimited word per line (TAB-WPL, txt) SIZE Wordform count: 1,693,410 Sentence count: 144,047 GENRES Contains 5 genres: Documents (14%), Fiction (19%), Periodicals (36%), Scientific texts (24%), Transcripts(7%) TAGSETS morphological annotation presented with 3 different tagsets: - Universal Dependencies (POS 4 column, morphological categories 6 column), see universaldependencies.org; - Jablonskis (5 column) see Documentation folder; - Multext-EAST (10 column), see Documentation folder. JABLONSKIS AND MULTEXT-EAST TAGSETS Jablonskis -> Lithuanian tagset -> human-readable Multext-East -> English tagset -> machine-readable Please use the following text to cite this item: Rimkutė E., Daudaravičius V., Utka A. 2007: Morphological Annotation of the Lithuanian Corpus. Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics; Workshop Balto-Slavonic Natural Language Processing 2007, Prague, 94–99. |
| dc.language.iso | lit |
| dc.publisher | Vytautas Magnus University |
| dc.rights | PUB_CLARIN-LT_End-User-Licence-Agreement_EN-LT |
| dc.rights.uri | https://clarin.vdu.lt/licenses/eula/PUB_CLARIN-LT_End-User-Licence-Agreement_EN-LT.htm |
| dc.rights.label | PUB |
| dc.subject | morphologically annotated corpus |
| dc.subject | POS tagged |
| dc.subject | Lithuanian |
| dc.title | Lithuanian morphologically annotated corpus - MATAS v1.0 |
| dc.type | corpus |
| metashare.ResourceInfo#ContentInfo.mediaType | text |
| hidden | false |
| hasMetadata | false |
| has.files | yes |
| branding | CLARIN-LT |
| contact.person | Andrius Utka andrius.utka@vdu.lt Vytautas Magnus University |
| sponsor | Vytautas Magnus University MTI-02/2015 Information System of Syntactic-Semantic Analysis of Lithuanian Language: Development of Public Services (SEMANTIKA-2) euFunds |
| size.info | 1693410 words |
| size.info | 276 files |
| files.size | 34549005 |
| files.count | 3 |