Show simple item record

 
dc.contributor.author Bielinskienė, Agnė
dc.contributor.author Boizou, Loïc
dc.contributor.author Bumbulienė, Ieva
dc.contributor.author Kovalevskaitė, Jolanta
dc.contributor.author Krilavičius, Tomas
dc.contributor.author Mandravickaitė, Justina
dc.contributor.author Rimkutė, Erika
dc.contributor.author Vilkaitė-Lozdienė, Laura
dc.date.accessioned 2019-11-27T13:56:38Z
dc.date.available 2019-11-27T13:56:38Z
dc.date.issued 2019
dc.identifier.uri http://hdl.handle.net/20.500.11821/25
dc.description Dataset of 2-grams with frequencies extracted from Delfi.lt corpus (~ 70 million words, period: March 2014 - November 2016). Firstly corpus was split into sentences, then symbol analysis as well as analysis of intended structures made of symbols were performed. Also, dictionary of abbreviations was used in order to preserve various abbreviations. Finally, 2-grams generated, making all in all 67 million entries. Frequencies of all entries were added to the dataset as well.
dc.language.iso lit
dc.publisher Baltic Institute of Advanced Technology
dc.publisher Vytautas Magnus University
dc.rights PUB_CLARIN-LT_End-User-Licence-Agreement_EN-LT
dc.rights.uri https://clarin.vdu.lt/licenses/eula/PUB_CLARIN-LT_End-User-Licence-Agreement_EN-LT.htm
dc.rights.label PUB
dc.source.uri http://mwe.lt/
dc.subject n-grams
dc.subject Lithuanian
dc.title Lithuanian 2-gram dataset
dc.type lexicalConceptualResource
metashare.ResourceInfo#ContentInfo.detailedType other
metashare.ResourceInfo#ContentInfo.mediaType text
has.files yes
branding CLARIN-LT
contact.person Tomas Krilavičius tomas.krilavicius@vdu.lt Baltic Institute of Advanced Technology; Vytautas Magnus University
sponsor Research Council of Lithuania LIP-027/2016 Automatic Identification of Lithuanian Multi-word Expressions (PASTOVU) nationalFunds
size.info 67000000 entries
files.size 94296211
files.count 1


 Files in this item

This item is
Publicly Available
and licensed under:
PUB_CLARIN-LT_End-User-Licence-Agreement_EN-LT
Icon
Name
2gram.zip
Size
89.93 MB
Format
application/zip
Description
Lithuanian 2gram dataset
 Download file

Show simple item record