English-Lithuanian Parallel Cybersecurity Corpus - DVITAS

Utka, Andrius; Rackevičienė, Sigita; Rokas, Aivaras; Bielinskienė, Agnė; Mockienė, Liudmila; Laurinaitis, Marius

dc.contributor.author	Utka, Andrius
dc.contributor.author	Rackevičienė, Sigita
dc.contributor.author	Rokas, Aivaras
dc.contributor.author	Bielinskienė, Agnė
dc.contributor.author	Mockienė, Liudmila
dc.contributor.author	Laurinaitis, Marius
dc.date.accessioned	2022-02-05T22:23:36Z
dc.date.available	2022-02-05T22:23:36Z
dc.date.issued	2022-02-05
dc.identifier.uri	http://hdl.handle.net/20.500.11821/46
dc.description	English-Lithuanian parallel corpus DVITAS includes original English texts on cybersecurity and their Lithuanian translations aligned on the sentence level. The corpus was compiled for the bilingual terminology extraction project together with English-Lithuanian comparable corpus. The parallel corpus includes the EU legal acts and other documents from the time period of 2006-2021. The documents have been extracted from the EUR-Lex database and other EU institutional repositories. There are 80 aligned files in TMX format in English and Lithuanian, as well as 160 raw files (80 in English, and 80 in Lithunian) in the dataset. The total size of the corpus is 1.4m words (EN-0.77m; LT-0.63m). The corpus contains 35,415 aligned segments.
dc.language.iso	lit
dc.language.iso	eng
dc.publisher	Vytautas Magnus University
dc.publisher	Mykolas Romeris university
dc.rights	PUB_CLARIN-LT_End-User-Licence-Agreement_EN-LT
dc.rights.uri	https://clarin.vdu.lt/licenses/eula/PUB_CLARIN-LT_End-User-Licence-Agreement_EN-LT.htm
dc.rights.label	PUB
dc.source.uri	https://sitti.vdu.lt/dvitas/en
dc.subject	parallel corpus
dc.subject	specialized corpus
dc.subject	cybersecurity corpus
dc.subject	cybersecurity
dc.title	English-Lithuanian Parallel Cybersecurity Corpus - DVITAS
dc.type	corpus
metashare.ResourceInfo#ContentInfo.mediaType	text
hasMetadata	false
has.files	yes
branding	CLARIN-LT
contact.person	Andrius Utka andrius.utka@vdu.lt Vytautas Magnus university
sponsor	Research Council of Lithuania P-MIP-20-282 Bilingual Automatic Terminology Extraction nationalFunds
sponsor	The project was included as a use case in COST Action CA18209 "European network for Web-centered linguistic data science (NexusLinguarum)"
size.info	35415 sentences
size.info	1407315 words
files.size	6873671
files.count	3

Files in this item Download all files in item (6.56 MB)

Large Size

The requested files are being packed into one large file. This process can take some time, please be patient.

This item is

Publicly Available

and licensed under:
PUB_CLARIN-LT_End-User-Licence-Agreement_EN-LT

Name: readme-EN.txt
Size: 1.44 KB
Format: Text file

Download file

Name: skaityk-LT.txt
Size: 1.37 KB
Format: Text file

Download file

Name: EN-LT_Parallel_CS_Corpus.zip
Size: 6.55 MB
Format: application/zip

Download file

Show simple item record

Files in this item Download all files in item (6.56 MB)

Large Size

The requested files are being packed into one large file. This process can take some time, please be patient.

Partners

Sponsors

Repository

Files in this item Download all files in item (6.56 MB) × Large Size The requested files are being packed into one large file. This process can take some time, please be patient. Continue Cancel

Partners

Sponsors

Repository

Files in this item Download all files in item (6.56 MB)

Large Size

The requested files are being packed into one large file. This process can take some time, please be patient.