Show simple item record

 
dc.contributor.author Bielinskienė, Agnė
dc.contributor.author Boizou, Loïc
dc.contributor.author Bumbulienė, Ieva
dc.contributor.author Kovalevskaitė, Jolanta
dc.contributor.author Krilavičius, Tomas
dc.contributor.author Mandravickaitė, Justina
dc.contributor.author Rimkutė, Erika
dc.contributor.author Vilkaitė-Lozdienė, Laura
dc.date.accessioned 2019-12-09T10:05:54Z
dc.date.available 2019-12-09T10:05:54Z
dc.date.issued 2019
dc.identifier.uri http://hdl.handle.net/20.500.11821/30
dc.description DELFI.lt is corpus made of articles published by news portal DELFI.lt since March 2014 till November 2016. Metadata was collected with articles as well: author, title, date, source, link, category, number of words. This corpus is made of 190 000 news articles from 12 thematic categories: DELFI Faces (DELFI Veidai), Projects (Projektai), DELFI Science (DELFI Mokslas), DELFI Auto, Unidentified category, Sport, DELFI Life (DELFI Gyvenimas), DELFI People (DELFI Žmonės), DELFI CItizen (DELFI Pilietis), Business (Verslas), DELFI FIT, DELFI News (DELFI Žinios). All in all DELFI.lt corpus consists of 70 million words. The corpus is morphologically annotated with Universal Dependencies tags and is freely accessible for online search at http://tekstynas.mwe.lt/.
dc.language.iso lit
dc.publisher Baltic Institute of Advanced Technology
dc.publisher Vytautas Magnus University
dc.source.uri http://mwe.lt/
dc.subject Lithuanian
dc.subject news articles
dc.subject media corpus
dc.subject POS tagged
dc.subject DELFI corpus
dc.subject corpus
dc.title DELFI.lt corpus
dc.type corpus
metashare.ResourceInfo#ContentInfo.mediaType text
has.files no
branding CLARIN-LT
contact.person Tomas Krilavičius tomas.krilavicius@vdu.lt Baltic Institute of Advanced Technology; Vytautas Magnus University
sponsor Research Council of Lithuania LIP-027/2016 Automatic Identification of Lithuanian Multi-word Expressions (PASTOVU) nationalFunds
size.info 70000000 tokens
files.size 0
files.count 0


Show simple item record