LITUND contains two comparable corpora:
1. Unreliable news texts. 147 full-text articles (100,678 words) identified as misleading by professional fact-checkers. The corpus includes metadata file with the following information: file name, text topic category, title, the specific false claim addressed, publication date, url to the text, word count, debunking reference, and url to the debunking reference.
2. LRT corpus. 147 full-text articles (131,640 words), published by Lithuania’s national broadcaster (LRT) on topics similar to those in the Unreliable News Corpus. The corpus includes metadata file with the following information: file name, text topic category, publication date, url to the text, and word count.
The corpora are in two formats: 1) plain text (UTF-8 encoding) and 2) morphologically tagged in CoNLL-U format. The morphological annotation was done by morphological anlyser MORFUOKKLIS (https://sitti.vdu.lt/morfuoklis/en/about).
Corpus covers 6 topic categories: Environment, COVID-19, Health, Politics, War in Ukraine, Others.