Show simple item record

 
dc.contributor.author Dadurkevičius, Virginijus
dc.date.accessioned 2020-06-30T13:32:25Z
dc.date.available 2020-06-30T13:32:25Z
dc.date.issued 2020-06-30
dc.identifier.uri http://hdl.handle.net/20.500.11821/36
dc.description The resource is the assessment data of The Dictionary of Modern Lithuanian, 6th edition (DML6) [1], from the point of view of its coverage in the Joint Corpus of Lithuanian (JCL) [2].The JCL is a merge of three corpora: 1) Vilnius university corpus compiled out of the Lithuanian internet content from 2014 and primarily used for machine translation, 2) legal document corpus in a form of wordlist (courtesy of the Office of the Seimas of the Republic of Lithuania, 2011) and 3) balanced corpus of present day Lithuanian of Vytautas Magnus University (VMU). Total size of the JCL is more than 1,3 billion tokens. The resource consists of 5 files. 1. Frequency list of types (different tokens) in JCL versus DML6. type<TAB>count<TAB>occurrence_in_dml6 (0 – no, 1 – main entries, 2 – geographic names, 3 – abbreviations). 2. List of explicit lemmas in DML6 versus JCL. lemma<TAB>part_of_speech<TAB>occurrence_in_JCL (count of all tokens in JCL which can be interpreted as a wordform of the particular lemma). Possible part_of_speech values: N – noun, V – verb, A – adjective, P – pronoun, R – adverb, S – preposition, C – conjunction, M – numeral, Q – particle, I – interjection, O – onomatopoeia, Y – abbreviation. occurrence_in_JCL means count of all tokens in JCL which can be interpreted as a wordform of the particular lemma. 3. Hunspell affixes (flexion rules) for Lithuanian language. 4. Hunspell dictionary, constructed from both explicit and implicit DML6 lemmas. 5. List of filtered out (excluding misspellings, foreign words, proper names, etc.) 254726 word-forms of JCL that are missing in the DML6 type<TAB>count Literature [1] Dadurkevičius, V., Petrauskaitė, R. 2020: Corpus based methods for assessment of the traditional dictionaries. Human language technologies - the Baltic perspective: the 9th international conference Baltic HLT, Kaunas, Lithuania, September 22–23, 2020. [2] The Dictionary of Modern Lithuanian. Edited by Keinys S. 6th (3 electronic) edition of the Dabartinės lietuvių kalbos žodynas. 2006, ISBN 978-9955-704-37-9
dc.language.iso lit
dc.publisher Vilnius university
dc.publisher Vytautas Magnus University
dc.rights PUB_CLARIN-LT_End-User-Licence-Agreement_EN-LT
dc.rights.uri https://clarin.vdu.lt/licenses/eula/PUB_CLARIN-LT_End-User-Licence-Agreement_EN-LT.htm
dc.rights.label PUB
dc.subject Lithuanian
dc.subject wordlist
dc.title Assessment Data of the Dictionary of Modern Lithuanian versus Joint Corpora
dc.type lexicalConceptualResource
metashare.ResourceInfo#ContentInfo.detailedType wordList
metashare.ResourceInfo#ContentInfo.mediaType text
hidden false
hasMetadata false
has.files yes
branding CLARIN-LT
contact.person Virginijus Dadurkevičius virginijus.dadurkevicius@vdu.lt Vytautas Magnus University
size.info 85893 entries
size.info 4968125 entries
size.info 202275 entries
size.info 4932 rules
files.size 23396854
files.count 1


 Files in this item

This item is
Publicly Available
and licensed under:
PUB_CLARIN-LT_End-User-Licence-Agreement_EN-LT
Icon
Name
DML6_vs_JCL.zip
Size
22.31 MB
Format
application/zip
Description
The assessment data of The Dictionary of Modern Lithuanian, 6th edition, from the point of view of its coverage in the Joint Corpus of Lithuanian
 Download file

Show simple item record