Show simple item record

 
dc.contributor.author Rimkutė, Erika
dc.contributor.author Bielinskienė, Agnė
dc.contributor.author Dadurkevičius, Virginijus
dc.contributor.author Kovalevskaitė, Jolanta
dc.contributor.author Utka, Andrius
dc.contributor.author Boizou, Loïc
dc.date.accessioned 2019-12-22T23:16:25Z
dc.date.available 2019-12-22T23:16:25Z
dc.date.issued 2019-12-23
dc.identifier.uri http://hdl.handle.net/20.500.11821/33
dc.description MATAS corpus (version 1.0) DESCRIPTION Manually checked, morphologically annotated corpus MATAS FORMATS 1. CoNLL-U (CONLLU, conllu) 2. SketchEngine - tab delimited word per line (TAB-WPL, txt) SIZE Wordform count: 1,693,410 Sentence count: 144,047 GENRES Contains 5 genres: Documents (14%), Fiction (19%), Periodicals (36%), Scientific texts (24%), Transcripts(7%) TAGSETS morphological annotation presented with 3 different tagsets: - Universal Dependencies (POS 4 column, morphological categories 6 column), see universaldependencies.org; - Jablonskis (5 column) see Documentation folder; - Multext-EAST (10 column), see Documentation folder. JABLONSKIS AND MULTEXT-EAST TAGSETS Jablonskis -> Lithuanian tagset -> human-readable Multext-East -> English tagset -> machine-readable Please use the following text to cite this item: Rimkutė E., Daudaravičius V., Utka A. 2007: Morphological Annotation of the Lithuanian Corpus. Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics; Workshop Balto-Slavonic Natural Language Processing 2007, Prague, 94–99.
dc.language.iso lit
dc.publisher Vytautas Magnus University
dc.rights PUB_CLARIN-LT_End-User-Licence-Agreement_EN-LT
dc.rights.uri https://clarin.vdu.lt/licenses/eula/PUB_CLARIN-LT_End-User-Licence-Agreement_EN-LT.htm
dc.rights.label PUB
dc.subject morphologically annotated
dc.subject POS tagged
dc.subject Lithuanian
dc.title Lithuanian morphologically annotated corpus - MATAS v1.0
dc.type corpus
metashare.ResourceInfo#ContentInfo.mediaType text
hidden false
hasMetadata false
has.files yes
branding CLARIN-LT
contact.person Andrius Utka andrius.utka@vdu.lt Vytautas Magnus University
sponsor Vytautas Magnus University MTI-02/2015 Information System of Syntactic-Semantic Analysis of Lithuanian Language: Development of Public Services (SEMANTIKA-2) euFunds
size.info 1693410 words
size.info 276 files
files.size 34549005
files.count 3


 Files in this item  Download all files in item (32.95 MB)

This item is
Publicly Available
and licensed under:
PUB_CLARIN-LT_End-User-Licence-Agreement_EN-LT
Icon
Name
readme-EN.txt
Size
3.26 KB
Format
Text file
Description
readme-EN
 Download file
Icon
Name
readme-LT.txt
Size
3.99 KB
Format
Text file
Description
readme-LT
 Download file
Icon
Name
MATAS-v1.0.zip
Size
32.94 MB
Format
application/zip
 Download file

Show simple item record