| dc.contributor.author | Raškinis, Gailius |
| dc.date.accessioned | 2025-04-24T05:22:40Z |
| dc.date.available | 2025-04-24T05:22:40Z |
| dc.date.issued | 2025-04-23 |
| dc.identifier.uri | http://hdl.handle.net/20.500.11821/68 |
| dc.description | The dataset was extracted from publicly available online sources, primarily Lithuanian news portal publications from the period 2014–2020 (~500M words). It includes patterns using the following Perl-style regular expression: [[:upper:]][-\x27[:lower:]]+\s+[[:upper:]][-\x27[:lower:]]+\s+\([[:upper:]][-\x27[:lower:]]+\s+[[:upper:]][-\x27[:lower:]]+\s*\) This pattern captures two adjacent uppercase-initial strings followed by another pair of uppercase-initial strings enclosed in parentheses: "Name Surname (Name Surname)". The resulting list was sorted, duplicates removed, and frequency counts added to each pattern. The dataset was intended for training a foreign-to-Lithuanian transliteration model for text-to-speech (TTS) applications. Note: the dataset may contain noise, including non-valid transliterations and inconsistent ordering between Lithuanian and foreign names. Further cleaning or filtering may be required depending on the intended use. |
| dc.language.iso | lit |
| dc.publisher | Institute of Digital Resources and Interdisciplinary Research (SITTI) at Vytautas Magnus University |
| dc.rights | PUB_CLARIN-LT_End-User-Licence-Agreement_EN-LT |
| dc.rights.uri | https://clarin.vdu.lt/licenses/eula/PUB_CLARIN-LT_End-User-Licence-Agreement_EN-LT.htm |
| dc.rights.label | PUB |
| dc.subject | transliteration |
| dc.subject | Lithuanian language |
| dc.title | Transliteration list of foreign person names into Lithuanian v.1 |
| dc.type | lexicalConceptualResource |
| metashare.ResourceInfo#ContentInfo.detailedType | wordList |
| metashare.ResourceInfo#ContentInfo.mediaType | text |
| hidden | false |
| hasMetadata | false |
| has.files | yes |
| branding | CLARIN-LT |
| contact.person | Gailius Raškinis gailius.raskinis@vdu.lt Institute of Digital Resources and Interdisciplinary Research (SITTI) at Vytautas Magnus University |
| size.info | 68167 entries |
| size.info | 133254 tokens |
| files.size | 767551 |
| files.count | 1 |