What's New
corpus

Description:
English-Lithuanian parallel corpus DVITAS v2 includes original English texts on cybersecurity and their Lithuanian translations aligned on the sentence level. Version 1 of the corpus was compiled for the bilingual terminology ...
This item contains 3 files (9.1
MB).
Publicly Available
corpus

Description:
Two Lithuanian language children’s corpora, collected during the EMVAKA project, consist of the Lithuanian language production by children aged 7–13:
(1) spoken (73 files, c. 31,000 tokens) and written (77 files, c. 7,600 ...
This item contains 4 files (245.91
KB).
Academic Use


corpus

Description:
MATAS corpus (version 3.0)
DESCRIPTION
Updated, manually checked, morphologically annotated corpus MATAS
LANGUAGE
Lithuanian
PREVIOUS VERSIONS
1. MATAS v0.2 (http://hdl.handle.net/20.500.11821/9)
2. MATAS v1.0 ...
This item contains 3 files (23.1
MB).
Publicly Available
Most Viewed Items
Top Last Week
toolService

Description:
Trilingual BERT-like (Bidirectional Encoder Representations from Transformers) model, trained on Lithuanian, Latvian, and English data. State of the art tool representing words/tokens as contextually dependent word embeddings, ...
This item contains 3 files (1.83
GB).
Publicly Available
corpus

Description:
Specialised "Corpus of Discourse on Crime" is synchronic, monolingual, unannotated, consists of two subcorpora.
Subcorpus 1: all texts on crime, published in criminal columns on the most popular Lithuanian web portals ...
This item contains 1 file (1.11
MB).
Publicly Available
corpus

Description:
Corpus of the Contemporary Lithuanian Language, which comprises 208 million words, is a collection of texts designed to represent the current Lithuanian. The corpus has been compiled since 1990. The corpus is designed to ...
This item contains no files.