dc.contributor.author |
Bielinskienė, Agnė |
dc.contributor.author |
Boizou, Loïc |
dc.contributor.author |
Bumbulienė, Ieva |
dc.contributor.author |
Kovalevskaitė, Jolanta |
dc.contributor.author |
Krilavičius, Tomas |
dc.contributor.author |
Mandravickaitė, Justina |
dc.contributor.author |
Rimkutė, Erika |
dc.contributor.author |
Vilkaitė-Lozdienė, Laura |
dc.date.accessioned |
2019-11-27T13:57:19Z |
dc.date.available |
2019-11-27T13:57:19Z |
dc.date.issued |
2019 |
dc.identifier.uri |
http://hdl.handle.net/20.500.11821/26 |
dc.description |
GloVe type word vectors (embeddings) for Lithuanian. Delfi.lt corpus (~70 million words) and StanfordNLP were used for training. The training consisted of several stages: 1) the vocabulary was compiled, eliminating words the the frequency less than 5; 2) word co-occurrence matrix was generated with window size of 5; 3) this matrix was randomly shuffled; 4) word vectors were generated (100 iterations, 200 dimensions). The final result consists of 331 203 unique word vectors. |
dc.language.iso |
lit |
dc.publisher |
Baltic Institute of Advanced Technology |
dc.publisher |
Vytautas Magnus University |
dc.rights |
PUB_CLARIN-LT_End-User-Licence-Agreement_EN-LT |
dc.rights.uri |
https://clarin.vdu.lt/licenses/eula/PUB_CLARIN-LT_End-User-Licence-Agreement_EN-LT.htm |
dc.rights.label |
PUB |
dc.source.uri |
http://mwe.lt/ |
dc.subject |
word embeddings |
dc.subject |
Lithuanian |
dc.subject |
embeddings |
dc.title |
Lithuanian Word embeddings |
dc.type |
lexicalConceptualResource |
metashare.ResourceInfo#ContentInfo.detailedType |
other |
metashare.ResourceInfo#ContentInfo.mediaType |
text |
has.files |
yes |
branding |
CLARIN-LT |
contact.person |
Tomas Krilavičius tomas.krilavicius@vdu.lt Baltic Institute of Advanced Technology; Vytautas Magnus University |
sponsor |
The Research Council of Lithuania LIP-027/2016 Automatic Identification of Lithuanian Multi-word Expressions (PASTOVU) nationalFunds |
size.info |
331 203 entries |
files.size |
239127890 |
files.count |
1 |