DIGIRES COVID-19 Corpus v.1 consists of 351 Lithuanian media articles about COVID-19 pandemics. The corpus was compiled from various internet public Lithuanian media sources. Corpus contains 351 files in plain text format (TXT) with UTF-8 encoding. Each article consists of a title (in the 1st line) and an article body. Files are classified into two subcorpora: 1) "unrealiable" that contains articles, which were identified by professional fact checkers as fake news; 2) "reliable" that contains trustworthy articles.
Subcorpus Files Word tokens
Reliable: 175 67902
Unreliable: 176 118747
Total 351 186649