dc.contributor.author |
Kapočiūtė-Dzikienė, Jurgita |
dc.contributor.author |
Šarkutė, Ligita |
dc.contributor.author |
Utka, Andrius |
dc.date.accessioned |
2017-10-11T06:10:25Z |
dc.date.available |
2017-10-11T06:10:25Z |
dc.date.issued |
2017-10-05 |
dc.identifier.uri |
http://hdl.handle.net/20.500.11821/17 |
dc.description |
23.9 m word Lithuanian Parliament corpus is specially designed for authorship attribution task.
The corpus consists of 111 thousand samples of speech transcripts by 147 parliamentarians in Lithuanian Seimas. It covers the period of March, 1990 – December, 2013. Each line in a corpus file contains a different text feature that can be used in the authorship attribution task (Kapočiūtė Dzikienė et al. 2014).
References:
Kapočiūtė-Dzikienė, Jurgita, Utka, Andrius, Šarkutė, Ligita. 2014. Feature exploration for authorship attribution of Lithuanian parliamentary speeches. Text, speech and dialogue: 17th international conference, TSD 2014, Brno, Czech Republic, September 8-12, 2014: proceedings, 93-100.
Kapočiūtė-Dzikienė, Jurgita; Nivre, Joakim; Krupavičius, Algis. 2013. Lithuanian Dependency Parsing with Rich Morphological Features. Empirical Methods in Natural Language Processing - 4th Workshop on Statistical Parsing of Morphologically Rich Languages (SPMRL'2013), psl. 12-21.
Zinkevičius, Vytautas. 2000. Lemuoklis - morfologinei analizei. Gudaitis, L. (ed.) Darbai ir Dienos, 24: 246-273. |
dc.language.iso |
lit |
dc.publisher |
Vytautas Magnus University |
dc.rights |
PUB_CLARIN-LT_End-User-Licence-Agreement_EN-LT |
dc.rights.uri |
https://clarin.vdu.lt/licenses/eula/PUB_CLARIN-LT_End-User-Licence-Agreement_EN-LT.htm |
dc.rights.label |
PUB |
dc.source.uri |
http://dangus.vdu.lt/~jkd/eng/ |
dc.subject |
corpus |
dc.subject |
authorship attribution |
dc.subject |
Lithuanian |
dc.subject |
supervised machine learning |
dc.title |
Lithuanian Parliament Corpus for Authorship Attribution |
dc.type |
corpus |
metashare.ResourceInfo#ContentInfo.mediaType |
text |
hidden |
false |
hasMetadata |
false |
has.files |
yes |
branding |
CLARIN-LT |
contact.person |
Andrius Utka andrius.utka@vdu.lt Vytautas Magnus University |
contact.person |
Jurgita Kapočiūtė-Dzikienė jurgita.kapociute-dzikiene@vdu.lt Vytautas Magnus University |
sponsor |
Research Council of Lithuania LIT-8-69 Automatic Authorship Attribution and Author Profiling for the Lithuanian Language (ASTRA) nationalFunds |
size.info |
23908302 words |
size.info |
147 classes |
size.info |
110908 texts |
files.size |
1844091961 |
files.count |
4 |