dc.contributor.author |
Dadurkevičius, Virginijus |
dc.date.accessioned |
2024-03-21T17:53:16Z |
dc.date.available |
2024-03-21T17:53:16Z |
dc.date.issued |
2024-03-13 |
dc.identifier.uri |
http://hdl.handle.net/20.500.11821/57 |
dc.description |
We present the comparative wordlist based on the Corpus of the Contemporary Lithuanian Language (CCLL2 version 2, pre-2020), supplemented by the media (courtesy of the news media company 15min – www.15min.lt) and social networks lexicons of the war in Ukraine period (Feb 2022 to Feb 2024).
For a fair comparison, all word counts have been normalized as if they were 100m words in each source. CCLL2 has 162m words, wartime media – 36m words and wartime social networks – 2m words. The term "word" does not apply here to punctuation, numbers, dates, URL's, hashtags, popular English words, etc.
The data itself is in the form of a tab-separated-values (TSV) text file consisting of the following columns: word(token), CCLL2 count, CCLL2 docs, media count, media docs, social networks count, social networks docs. Where "docs" mean number (normalized) of documents with a particular word. All words are written as case-insensitive using capital letters. |
dc.language.iso |
lit |
dc.publisher |
SITTI |
dc.rights |
PUB_CLARIN-LT_End-User-Licence-Agreement_EN-LT |
dc.rights.uri |
https://clarin.vdu.lt/licenses/eula/PUB_CLARIN-LT_End-User-Licence-Agreement_EN-LT.htm |
dc.rights.label |
PUB |
dc.subject |
wordlist |
dc.subject |
Lithuanian |
dc.subject |
Ukraine |
dc.subject |
war |
dc.subject |
wartime |
dc.subject |
frequency counts |
dc.title |
Wordlist of the Contemporary Corpus of Lithuanian Language in the Face of War in Ukraine |
dc.type |
lexicalConceptualResource |
metashare.ResourceInfo#ContentInfo.detailedType |
wordList |
metashare.ResourceInfo#ContentInfo.mediaType |
text |
hidden |
false |
hasMetadata |
false |
has.files |
yes |
branding |
CLARIN-LT |
contact.person |
Virginijus Dadurkevičius virginijus.dadurkevicius@vdu.lt SITTI |
size.info |
2264779 entries |
size.info |
2264780 entries |
files.size |
11111146 |
files.count |
1 |