Show simple item record

 
dc.contributor.author Raškinis, Gailius
dc.date.accessioned 2025-04-24T05:22:40Z
dc.date.available 2025-04-24T05:22:40Z
dc.date.issued 2025-04-23
dc.identifier.uri http://hdl.handle.net/20.500.11821/68
dc.description The dataset was extracted from publicly available online sources, primarily Lithuanian news portal publications from the period 2014–2020 (~500M words). It includes patterns using the following Perl-style regular expression: [[:upper:]][-\x27[:lower:]]+\s+[[:upper:]][-\x27[:lower:]]+\s+\([[:upper:]][-\x27[:lower:]]+\s+[[:upper:]][-\x27[:lower:]]+\s*\) This pattern captures two adjacent uppercase-initial strings followed by another pair of uppercase-initial strings enclosed in parentheses: "Name Surname (Name Surname)". The resulting list was sorted, duplicates removed, and frequency counts added to each pattern. The dataset was intended for training a foreign-to-Lithuanian transliteration model for text-to-speech (TTS) applications. Note: the dataset may contain noise, including non-valid transliterations and inconsistent ordering between Lithuanian and foreign names. Further cleaning or filtering may be required depending on the intended use.
dc.language.iso lit
dc.publisher Institute of Digital Resources and Interdisciplinary Research (SITTI) at Vytautas Magnus University
dc.rights PUB_CLARIN-LT_End-User-Licence-Agreement_EN-LT
dc.rights.uri https://clarin.vdu.lt/licenses/eula/PUB_CLARIN-LT_End-User-Licence-Agreement_EN-LT.htm
dc.rights.label PUB
dc.subject transliteration
dc.subject Lithuanian language
dc.title Transliteration list of foreign person names into Lithuanian v.1
dc.type lexicalConceptualResource
metashare.ResourceInfo#ContentInfo.detailedType wordList
metashare.ResourceInfo#ContentInfo.mediaType text
hidden false
hasMetadata false
has.files yes
branding CLARIN-LT
contact.person Gailius Raškinis gailius.raskinis@vdu.lt Institute of Digital Resources and Interdisciplinary Research (SITTI) at Vytautas Magnus University
size.info 68167 entries
size.info 133254 tokens
files.size 767551
files.count 1


 Files in this item

This item is
Publicly Available
and licensed under:
PUB_CLARIN-LT_End-User-Licence-Agreement_EN-LT
Icon
Name
data_0.zip
Size
749.56 KB
Format
application/zip
Description
Unknown
 Download file

Show simple item record