Show simple item record

 
dc.contributor.advisor
dc.contributor.author Utka, Andrius
dc.contributor.author Rackevičienė, Sigita
dc.contributor.author Rokas, Aivaras
dc.contributor.author Bielinskienė, Agnė
dc.contributor.author Mockienė, Liudmila
dc.contributor.author Laurinaitis, Marius
dc.date.accessioned 2022-02-05T22:24:06Z
dc.date.available 2022-02-05T22:24:06Z
dc.date.issued 2022-02-05
dc.identifier.uri http://hdl.handle.net/20.500.11821/47
dc.description The English-Lithuanian comparable corpus (DVITAS COMPARABLE) is morphologically annotated. It includes English and Lithuanian original texts on cybersecurity from the time period of 2010-2021. The corpus was compiled for the bilingual terminology extraction project together with English-Lithuanian parallel corpus. There are 1,708 files in English and 2,567 for Lithuanian. The total size of the corpus is 4m words (EN-2m; LT-2m) The corpus is composed of texts representing 4 text types: academic (EN-19%; LT-30%), administrative-informative (EN-8%; LT-11%), legal (EN-18%; LT-4%), media (EN-55%; LT-55%).
dc.language.iso eng
dc.language.iso lit
dc.publisher Vytautas Magnus university
dc.publisher Mykolas Romeris university
dc.rights ACA_CLARIN-LT_End-User-Licence-Agreement_EN-LT
dc.rights.uri https://clarin.vdu.lt/licenses/eula/ACA_CLARIN-LT_End-User-Licence-Agreement_EN-LT.htm
dc.rights.label ACA
dc.source.uri https://sitti.vdu.lt/dvitas/en
dc.subject comparable corpus
dc.subject morphologically annotated corpus
dc.subject cybersecurity corpus
dc.title English-Lithuanian Comparable Cybersecurity Corpus - DVITAS
dc.type corpus
metashare.ResourceInfo#ContentInfo.mediaType text
hidden false
hasMetadata false
has.files yes
branding CLARIN-LT
contact.person Andrius Utka andrius.utka@vdu.lt Vytautas Magnus university
sponsor Research Council of Lithuania P-MIP-20-282 Bilingual Automatic Terminology Extraction nationalFunds
sponsor The project was included as a use case in COST Action CA18209 "European network for Web-centered linguistic data science (NexusLinguarum)"
size.info 4000932 words
files.size 69567534
files.count 12


 Files in this item  Download all files in item (66.34 MB)

This item is
Academic Use
and licensed under:
ACA_CLARIN-LT_End-User-Licence-Agreement_EN-LT
Attribution Required Noncommercial
Icon
Name
readme-EN.txt
Size
1.81 KB
Format
Text file
 Download file
Icon
Name
skaityk-LT.txt
Size
1.89 KB
Format
Text file
 Download file
Icon
Name
EN-LT_Comparable_CS_Corpus-Text.zip
Size
12.57 MB
Format
application/zip
 Download file
Icon
Name
EN-LT_Comparable_CS_Corpus-Morph.zip
Size
24.09 MB
Format
application/zip
 Download file
Icon
Name
CS_EN_COMPARABLE_ACADEMIC.txt
Size
2.45 MB
Format
Text file
Description
txt
 Download file
Icon
Name
CS_EN_COMPARABLE_ADMINISTRATIVE-INFORMATIVE.txt
Size
1.02 MB
Format
Text file
Description
txt
 Download file
Icon
Name
CS_EN_COMPARABLE_LEGAL.txt
Size
2.44 MB
Format
Text file
Description
txt
 Download file
Icon
Name
CS_EN_COMPARABLE_MEDIA.txt
Size
6.83 MB
Format
Text file
Description
txt
 Download file
Icon
Name
KS_LT_PALYGINAMASIS_ZINIASKLAIDA.txt
Size
9.13 MB
Format
Text file
Description
txt
 Download file
Icon
Name
KS_LT_PALYGINAMASIS_TEISINIAI.txt
Size
798.17 KB
Format
Text file
Description
txt
 Download file
Icon
Name
KS_LT_PALYGINAMASIS_ADMINISTRACINIAI-INFORMACINIAI.txt
Size
1.92 MB
Format
Text file
Description
txt
 Download file
Icon
Name
KS_LT_PALYGINAMASIS_AKADEMINIAI.txt
Size
5.12 MB
Format
Text file
Description
txt
 Download file

Show simple item record