| dc.contributor.advisor |
|
| dc.contributor.author |
Utka, Andrius |
| dc.contributor.author |
Rackevičienė, Sigita |
| dc.contributor.author |
Rokas, Aivaras |
| dc.contributor.author |
Bielinskienė, Agnė |
| dc.contributor.author |
Mockienė, Liudmila |
| dc.contributor.author |
Laurinaitis, Marius |
| dc.date.accessioned |
2022-02-05T22:24:06Z |
| dc.date.available |
2022-02-05T22:24:06Z |
| dc.date.issued |
2022-02-05 |
| dc.identifier.uri |
http://hdl.handle.net/20.500.11821/47 |
| dc.description |
The English-Lithuanian comparable corpus (DVITAS COMPARABLE) is morphologically annotated. It includes English and Lithuanian original texts on cybersecurity from the time period of 2010-2021. The corpus was compiled for the bilingual terminology extraction project together with English-Lithuanian parallel corpus. There are 1,708 files in English and 2,567 for Lithuanian. The total size of the corpus is 4m words (EN-2m; LT-2m) The corpus is composed of texts representing 4 text types: academic (EN-19%; LT-30%), administrative-informative (EN-8%; LT-11%), legal (EN-18%; LT-4%), media (EN-55%; LT-55%). |
| dc.language.iso |
eng |
| dc.language.iso |
lit |
| dc.publisher |
Vytautas Magnus university |
| dc.publisher |
Mykolas Romeris university |
| dc.rights |
ACA_CLARIN-LT_End-User-Licence-Agreement_EN-LT |
| dc.rights.uri |
https://clarin.vdu.lt/licenses/eula/ACA_CLARIN-LT_End-User-Licence-Agreement_EN-LT.htm |
| dc.rights.label |
ACA |
| dc.source.uri |
https://sitti.vdu.lt/dvitas/en |
| dc.subject |
comparable corpus |
| dc.subject |
morphologically annotated corpus |
| dc.subject |
cybersecurity corpus |
| dc.subject |
cybersecurity |
| dc.title |
English-Lithuanian Comparable Cybersecurity Corpus - DVITAS |
| dc.type |
corpus |
| metashare.ResourceInfo#ContentInfo.mediaType |
text |
| hidden |
false |
| hasMetadata |
false |
| has.files |
yes |
| branding |
CLARIN-LT |
| contact.person |
Andrius Utka andrius.utka@vdu.lt Vytautas Magnus university |
| sponsor |
Research Council of Lithuania P-MIP-20-282 Bilingual Automatic Terminology Extraction nationalFunds |
| sponsor |
The project was included as a use case in COST Action CA18209 "European network for Web-centered linguistic data science (NexusLinguarum)" |
| size.info |
4000932 words |
| files.size |
69567534 |
| files.count |
12 |