The corpus of read Lithuanian speech „7G“ was compiled in 2015-2016. The corpus consists of 352 audio recordings with a total duration of over 7 hours. Seven different speakers are reading excerpts of books and a list of isolated words (the list reflects the diversity of triphones in the Lithuanian). The audio recordings are stored as WAV PCM 44.1 kHz 16-bit mono format files. Annotations are stored in MLF format (the format used by the HTK Toolkit).
Most of the speakers are young women aged between 20 and 25. The aim was to obtain recordings in as natural a recording environment as possible, so no requirements were placed on the speakers in terms of recording equipment, microphone settings or recording environment. Most of the speakers used personal laptops with a built-in microphone.