dc.contributor.author |
Utka, Andrius |
dc.date.accessioned |
2016-11-17T12:12:13Z |
dc.date.available |
2016-11-17T12:12:13Z |
dc.date.issued |
2016-11-17 |
dc.identifier.uri |
http://hdl.handle.net/20.500.11821/8 |
dc.description |
Dabartinės lietuvių kalbos tekstyno žodžių formų dažniniai sąrašai
Worlists of Wordforms of the Contemporary Corpus of Lithuanian language
Tekstyno struktūra/Corpus Structure
Patekstynis/Subcorpus Words,m Proportion
Grožinė lit./Fiction 15.54 12.6%
Negrožinė lit./Non-fiction 19.99 16.2%
Administracinė lit./ Documents 11.19 9.1%
Periodika/Periodicals 76.24 61.8%
Sakytinė kalba/Speech Corpus 0.49 0.4%
---
Visas/Total 123.45 100%
Tinklalapiai/Website:
tekstynas.vdu.lt
corpus.vdu.lt
Data/Date:
2016.10.17
2022.11.15*
* upgraded method of handling punctuation and format
Metodas/Method:
sed -e 's/<[^>]*>//g' *.txt | tr q'[:punct:]' ' ' | tr -s ' ' | tr ' ' '\n' | tr '[:upper:]' '[:lower:]' | grep -v '[^a-z]' | grep -v "^\s*$" | sort | uniq -c | sort -rn > freq-visas.txt
Kaip cituoti/Reference
Rimkutė E., Kovalevskaitė J., Melninkaitė V., Utka A., Vitkutė-Adžgauskienė D. 2010: Corpus of Contemporary Lithuanian Language – the Standardised Way. Proceedings of the Fourth International Conference Human Language Technologies – The Baltic Perspective, 154–160.
Licencija/Licence:
CLARIN-LT PUB |
dc.language.iso |
lit |
dc.publisher |
Vytautas Magnus University |
dc.rights |
PUB_CLARIN-LT_End-User-Licence-Agreement_EN-LT |
dc.rights.uri |
https://clarin.vdu.lt/licenses/eula/PUB_CLARIN-LT_End-User-Licence-Agreement_EN-LT.htm |
dc.rights.label |
PUB |
dc.subject |
wordlist |
dc.subject |
Lithuanian |
dc.title |
Wordlist of the Contemporary Corpus of Lithuanian language |
dc.type |
lexicalConceptualResource |
metashare.ResourceInfo#ContentInfo.detailedType |
wordList |
metashare.ResourceInfo#ContentInfo.mediaType |
text |
hidden |
false |
hasMetadata |
false |
has.files |
yes |
branding |
CLARIN-LT |
contact.person |
Andrius Utka andrius.utka@vdu.lt Vytautas Magnus University |
size.info |
1850477 entries |
files.size |
34766572 |
files.count |
2 |