dc.contributor.author | Bielinskienė, Agnė |
dc.contributor.author | Boizou, Loïc |
dc.contributor.author | Bumbulienė, Ieva |
dc.contributor.author | Kovalevskaitė, Jolanta |
dc.contributor.author | Krilavičius, Tomas |
dc.contributor.author | Mandravickaitė, Justina |
dc.contributor.author | Rimkutė, Erika |
dc.contributor.author | Vilkaitė-Lozdienė, Laura |
dc.date.accessioned | 2019-12-09T10:05:54Z |
dc.date.available | 2019-12-09T10:05:54Z |
dc.date.issued | 2019 |
dc.identifier.uri | http://hdl.handle.net/20.500.11821/30 |
dc.description | DELFI.lt is corpus made of articles published by news portal DELFI.lt since March 2014 till November 2016. Metadata was collected with articles as well: author, title, date, source, link, category, number of words. This corpus is made of 190 000 news articles from 12 thematic categories: DELFI Faces (DELFI Veidai), Projects (Projektai), DELFI Science (DELFI Mokslas), DELFI Auto, Unidentified category, Sport, DELFI Life (DELFI Gyvenimas), DELFI People (DELFI Žmonės), DELFI CItizen (DELFI Pilietis), Business (Verslas), DELFI FIT, DELFI News (DELFI Žinios). All in all DELFI.lt corpus consists of 70 million words. The corpus is morphologically annotated with Universal Dependencies tags and is freely accessible for online search at http://tekstynas.mwe.lt/. |
dc.language.iso | lit |
dc.publisher | Baltic Institute of Advanced Technology |
dc.publisher | Vytautas Magnus University |
dc.source.uri | http://mwe.lt/ |
dc.subject | Lithuanian |
dc.subject | news articles |
dc.subject | media corpus |
dc.subject | POS tagged |
dc.subject | DELFI corpus |
dc.subject | corpus |
dc.title | DELFI.lt corpus |
dc.type | corpus |
metashare.ResourceInfo#ContentInfo.mediaType | text |
has.files | no |
branding | CLARIN-LT |
contact.person | Tomas Krilavičius tomas.krilavicius@vdu.lt Baltic Institute of Advanced Technology; Vytautas Magnus University |
sponsor | Research Council of Lithuania LIP-027/2016 Automatic Identification of Lithuanian Multi-word Expressions (PASTOVU) nationalFunds |
size.info | 70000000 tokens |
files.size | 0 |
files.count | 0 |