Show simple item record Dadurkevičius, Virginijus 2024-03-21T17:53:16Z 2024-03-21T17:53:16Z 2024-03-13
dc.description We present the comparative wordlist based on the Corpus of the Contemporary Lithuanian Language (CCLL2 version 2, pre-2020), supplemented by the media (courtesy of the news media company 15min – and social networks lexicons of the war in Ukraine period (Feb 2022 to Feb 2024). For a fair comparison, all word counts have been normalized as if they were 100m words in each source. CCLL2 has 162m words, wartime media – 36m words and wartime social networks – 2m words. The term "word" does not apply here to punctuation, numbers, dates, URL's, hashtags, popular English words, etc. The data itself is in the form of a tab-separated-values (TSV) text file consisting of the following columns: word(token), CCLL2 count, CCLL2 docs, media count, media docs, social networks count, social networks docs. Where "docs" mean number (normalized) of documents with a particular word. All words are written as case-insensitive using capital letters.
dc.language.iso lit
dc.publisher SITTI
dc.rights PUB_CLARIN-LT_End-User-Licence-Agreement_EN-LT
dc.rights.label PUB
dc.subject wordlist
dc.subject Lithuanian
dc.subject Ukraine
dc.subject war
dc.subject wartime
dc.subject frequency counts
dc.title Wordlist of the Contemporary Corpus of Lithuanian Language in the Face of War in Ukraine
dc.type lexicalConceptualResource
metashare.ResourceInfo#ContentInfo.detailedType wordList
metashare.ResourceInfo#ContentInfo.mediaType text
hidden false
hasMetadata false
has.files yes
branding CLARIN-LT
contact.person Virginijus Dadurkevičius SITTI 2264779 entries 2264780 entries
files.size 11111146
files.count 1

 Files in this item

This item is
Publicly Available
and licensed under:
10.6 MB
zip file
 Download file

Show simple item record