dc.contributor.author |
Bielinskienė, Agnė |
dc.contributor.author |
Boizou, Loïc |
dc.contributor.author |
Bumbulienė, Ieva |
dc.contributor.author |
Kovalevskaitė, Jolanta |
dc.contributor.author |
Krilavičius, Tomas |
dc.contributor.author |
Mandravickaitė, Justina |
dc.contributor.author |
Rimkutė, Erika |
dc.contributor.author |
Vilkaitė-Lozdienė, Laura |
dc.date.accessioned |
2019-11-27T13:56:38Z |
dc.date.available |
2019-11-27T13:56:38Z |
dc.date.issued |
2019 |
dc.identifier.uri |
http://hdl.handle.net/20.500.11821/25 |
dc.description |
Dataset of 2-grams with frequencies extracted from Delfi.lt corpus (~ 70 million words, period: March 2014 - November 2016). Firstly corpus was split into sentences, then symbol analysis as well as analysis of intended structures made of symbols were performed. Also, dictionary of abbreviations was used in order to preserve various abbreviations. Finally, 2-grams generated, making all in all 67 million entries. Frequencies of all entries were added to the dataset as well. |
dc.language.iso |
lit |
dc.publisher |
Baltic Institute of Advanced Technology |
dc.publisher |
Vytautas Magnus University |
dc.rights |
PUB_CLARIN-LT_End-User-Licence-Agreement_EN-LT |
dc.rights.uri |
https://clarin.vdu.lt/licenses/eula/PUB_CLARIN-LT_End-User-Licence-Agreement_EN-LT.htm |
dc.rights.label |
PUB |
dc.source.uri |
http://mwe.lt/ |
dc.subject |
n-grams |
dc.subject |
Lithuanian |
dc.title |
Lithuanian 2-gram dataset |
dc.type |
lexicalConceptualResource |
metashare.ResourceInfo#ContentInfo.detailedType |
other |
metashare.ResourceInfo#ContentInfo.mediaType |
text |
has.files |
yes |
branding |
CLARIN-LT |
contact.person |
Tomas Krilavičius tomas.krilavicius@vdu.lt Baltic Institute of Advanced Technology; Vytautas Magnus University |
sponsor |
Research Council of Lithuania LIP-027/2016 Automatic Identification of Lithuanian Multi-word Expressions (PASTOVU) nationalFunds |
size.info |
67000000 entries |
files.size |
94296211 |
files.count |
1 |