dc.contributor.author |
Utka, Andrius |
dc.contributor.author |
Rackevičienė, Sigita |
dc.contributor.author |
Rokas, Aivaras |
dc.contributor.author |
Bielinskienė, Agnė |
dc.contributor.author |
Mockienė, Liudmila |
dc.contributor.author |
Laurinaitis, Marius |
dc.date.accessioned |
2022-02-05T22:23:36Z |
dc.date.available |
2022-02-05T22:23:36Z |
dc.date.issued |
2022-02-05 |
dc.identifier.uri |
http://hdl.handle.net/20.500.11821/46 |
dc.description |
English-Lithuanian parallel corpus DVITAS includes original English texts on cybersecurity and their Lithuanian translations aligned on the sentence level. The corpus was compiled for the bilingual terminology extraction project together with English-Lithuanian comparable corpus. The parallel corpus includes the EU legal acts and other documents from the time period of 2006-2021. The documents have been extracted from the EUR-Lex database and other EU institutional repositories.
There are 80 aligned files in TMX format in English and Lithuanian, as well as 160 raw files (80 in English, and 80 in Lithunian) in the dataset. The total size of the corpus is 1.4m words (EN-0.77m; LT-0.63m). The corpus contains 35,415 aligned segments. |
dc.language.iso |
lit |
dc.language.iso |
eng |
dc.publisher |
Vytautas Magnus University |
dc.publisher |
Mykolas Romeris university |
dc.rights |
PUB_CLARIN-LT_End-User-Licence-Agreement_EN-LT |
dc.rights.uri |
https://clarin.vdu.lt/licenses/eula/PUB_CLARIN-LT_End-User-Licence-Agreement_EN-LT.htm |
dc.rights.label |
PUB |
dc.source.uri |
https://sitti.vdu.lt/dvitas/en |
dc.subject |
parallel corpus |
dc.subject |
specialized corpus |
dc.subject |
cybersecurity corpus |
dc.title |
English-Lithuanian Parallel Cybersecurity Corpus - DVITAS |
dc.type |
corpus |
metashare.ResourceInfo#ContentInfo.mediaType |
text |
hasMetadata |
false |
has.files |
yes |
branding |
CLARIN-LT |
contact.person |
Andrius Utka andrius.utka@vdu.lt Vytautas Magnus university |
sponsor |
Research Council of Lithuania P-MIP-20-282 Bilingual Automatic Terminology Extraction nationalFunds |
sponsor |
The project was included as a use case in COST Action CA18209 "European network for Web-centered linguistic data science (NexusLinguarum)" |
size.info |
35415 sentences |
size.info |
1407315 words |
files.size |
6873671 |
files.count |
3 |