dc.contributor.author |
Raškinis, Gailius |
dc.date.accessioned |
2025-04-24T05:22:40Z |
dc.date.available |
2025-04-24T05:22:40Z |
dc.date.issued |
2025-04-23 |
dc.identifier.uri |
http://hdl.handle.net/20.500.11821/68 |
dc.description |
The dataset was extracted from publicly available online sources, primarily Lithuanian news portal publications from the period 2014–2020 (~500M words). It includes patterns using the following Perl-style regular expression:
[[:upper:]][-\x27[:lower:]]+\s+[[:upper:]][-\x27[:lower:]]+\s+\([[:upper:]][-\x27[:lower:]]+\s+[[:upper:]][-\x27[:lower:]]+\s*\)
This pattern captures two adjacent uppercase-initial strings followed by another pair of uppercase-initial strings enclosed in parentheses: "Name Surname (Name Surname)".
The resulting list was sorted, duplicates removed, and frequency counts added to each pattern.
The dataset was intended for training a foreign-to-Lithuanian transliteration model for text-to-speech (TTS) applications. Note: the dataset may contain noise, including non-valid transliterations and inconsistent ordering between Lithuanian and foreign names. Further cleaning or filtering may be required depending on the intended use. |
dc.language.iso |
lit |
dc.publisher |
Institute of Digital Resources and Interdisciplinary Research (SITTI) at Vytautas Magnus University |
dc.rights |
PUB_CLARIN-LT_End-User-Licence-Agreement_EN-LT |
dc.rights.uri |
https://clarin.vdu.lt/licenses/eula/PUB_CLARIN-LT_End-User-Licence-Agreement_EN-LT.htm |
dc.rights.label |
PUB |
dc.subject |
transliteration |
dc.subject |
Lithuanian language |
dc.title |
Transliteration list of foreign person names into Lithuanian v.1 |
dc.type |
lexicalConceptualResource |
metashare.ResourceInfo#ContentInfo.detailedType |
wordList |
metashare.ResourceInfo#ContentInfo.mediaType |
text |
hidden |
false |
hasMetadata |
false |
has.files |
yes |
branding |
CLARIN-LT |
contact.person |
Gailius Raškinis gailius.raskinis@vdu.lt Institute of Digital Resources and Interdisciplinary Research (SITTI) at Vytautas Magnus University |
size.info |
68167 entries |
size.info |
133254 tokens |
files.size |
767551 |
files.count |
1 |