dc.contributor.advisor |
|
dc.contributor.author |
Utka, Andrius |
dc.contributor.author |
Rackevičienė, Sigita |
dc.contributor.author |
Rokas, Aivaras |
dc.contributor.author |
Bielinskienė, Agnė |
dc.contributor.author |
Mockienė, Liudmila |
dc.contributor.author |
Laurinaitis, Marius |
dc.date.accessioned |
2022-02-05T22:24:06Z |
dc.date.available |
2022-02-05T22:24:06Z |
dc.date.issued |
2022-02-05 |
dc.identifier.uri |
http://hdl.handle.net/20.500.11821/47 |
dc.description |
The English-Lithuanian comparable corpus (DVITAS COMPARABLE) is morphologically annotated. It includes English and Lithuanian original texts on cybersecurity from the time period of 2010-2021. The corpus was compiled for the bilingual terminology extraction project together with English-Lithuanian parallel corpus. There are 1,708 files in English and 2,567 for Lithuanian. The total size of the corpus is 4m words (EN-2m; LT-2m) The corpus is composed of texts representing 4 text types: academic (EN-19%; LT-30%), administrative-informative (EN-8%; LT-11%), legal (EN-18%; LT-4%), media (EN-55%; LT-55%). |
dc.language.iso |
eng |
dc.language.iso |
lit |
dc.publisher |
Vytautas Magnus university |
dc.publisher |
Mykolas Romeris university |
dc.rights |
ACA_CLARIN-LT_End-User-Licence-Agreement_EN-LT |
dc.rights.uri |
https://clarin.vdu.lt/licenses/eula/ACA_CLARIN-LT_End-User-Licence-Agreement_EN-LT.htm |
dc.rights.label |
ACA |
dc.source.uri |
https://sitti.vdu.lt/dvitas/en |
dc.subject |
comparable corpus |
dc.subject |
morphologically annotated corpus |
dc.subject |
cybersecurity corpus |
dc.title |
English-Lithuanian Comparable Cybersecurity Corpus - DVITAS |
dc.type |
corpus |
metashare.ResourceInfo#ContentInfo.mediaType |
text |
hidden |
false |
hasMetadata |
false |
has.files |
yes |
branding |
CLARIN-LT |
contact.person |
Andrius Utka andrius.utka@vdu.lt Vytautas Magnus university |
sponsor |
Research Council of Lithuania P-MIP-20-282 Bilingual Automatic Terminology Extraction nationalFunds |
sponsor |
The project was included as a use case in COST Action CA18209 "European network for Web-centered linguistic data science (NexusLinguarum)" |
size.info |
4000932 words |
files.size |
69567534 |
files.count |
12 |