Show simple item record

 
dc.contributor.author Utka, Andrius
dc.contributor.author Rackevičienė, Sigita
dc.contributor.author Rokas, Aivaras
dc.contributor.author Bielinskienė, Agnė
dc.contributor.author Mockienė, Liudmila
dc.contributor.author Laurinaitis, Marius
dc.date.accessioned 2022-02-05T22:23:36Z
dc.date.available 2022-02-05T22:23:36Z
dc.date.issued 2022-02-05
dc.identifier.uri http://hdl.handle.net/20.500.11821/46
dc.description English-Lithuanian parallel corpus DVITAS includes original English texts on cybersecurity and their Lithuanian translations aligned on the sentence level. The corpus was compiled for the bilingual terminology extraction project together with English-Lithuanian comparable corpus. The parallel corpus includes the EU legal acts and other documents from the time period of 2006-2021. The documents have been extracted from the EUR-Lex database and other EU institutional repositories. There are 80 aligned files in TMX format in English and Lithuanian, as well as 160 raw files (80 in English, and 80 in Lithunian) in the dataset. The total size of the corpus is 1.4m words (EN-0.77m; LT-0.63m). The corpus contains 35,415 aligned segments.
dc.language.iso lit
dc.language.iso eng
dc.publisher Vytautas Magnus University
dc.publisher Mykolas Romeris university
dc.rights PUB_CLARIN-LT_End-User-Licence-Agreement_EN-LT
dc.rights.uri https://clarin.vdu.lt/licenses/eula/PUB_CLARIN-LT_End-User-Licence-Agreement_EN-LT.htm
dc.rights.label PUB
dc.source.uri https://sitti.vdu.lt/dvitas/en
dc.subject parallel corpus
dc.subject specialized corpus
dc.subject cybersecurity corpus
dc.title English-Lithuanian Parallel Cybersecurity Corpus - DVITAS
dc.type corpus
metashare.ResourceInfo#ContentInfo.mediaType text
hasMetadata false
has.files yes
branding CLARIN-LT
contact.person Andrius Utka andrius.utka@vdu.lt Vytautas Magnus university
sponsor Research Council of Lithuania P-MIP-20-282 Bilingual Automatic Terminology Extraction nationalFunds
sponsor The project was included as a use case in COST Action CA18209 "European network for Web-centered linguistic data science (NexusLinguarum)"
size.info 35415 sentences
size.info 1407315 words
files.size 6873671
files.count 3


 Files in this item  Download all files in item (6.56 MB)

This item is
Publicly Available
and licensed under:
PUB_CLARIN-LT_End-User-Licence-Agreement_EN-LT
Icon
Name
readme-EN.txt
Size
1.44 KB
Format
Text file
 Download file
Icon
Name
skaityk-LT.txt
Size
1.37 KB
Format
Text file
 Download file
Icon
Name
EN-LT_Parallel_CS_Corpus.zip
Size
6.55 MB
Format
application/zip
 Download file

Show simple item record