About metadata
This page provides information about what metadata we require and how we disseminate it. Metadata are freely accessible and are distributed in the public domain (under CC0). However, we reserve the right to be informed about commercial usage of metadata from LINDAT/CLARIN repository including a description of your use case at Help Desk.
Metadata formats
We are able to disseminate the submission metadata in various formats including but not limited to CMDI and oai_dc. See the full list of supported formats but note that some of the formats might not be applicable to all items. The various formats help us promote the submitted content in number of aggregators (and/or search engines).
CMDI
See the CLARIN introduction to component metadata in order to get more information about this topic. Despite we can support submissions with attached cmdi file to a degree, we prefer guiding the user through our submission process. Thus ensuring metadata in reasonable quality.
Our current submissions are adhering to the clarin.eu:cr1:p_1403526079380 profile/schema. Portion of older submissions is using different profile clarin.eu:cr1:p_1349361150622
Both profiles are fairly covered with links to a concept registry.
oai_dc
oai_dc is the format required by OAI-PMH. See the mapping section in order to understand how we map our submission to this format.
Submitted metadata
Following list enumerates the fields we ask in the submission. It's very likely to change. The metadata are submitted in English. Different formats contain different generated metadata (eg. human readable language names acompanying the iso codes, identifiers, other dates).
Field name | Description | Status |
---|---|---|
Type | Type of the resource: "Corpus" refers to text, speech and multimodal corpora. "Lexical Conceptual Resource" includes lexica, ontologies, dictionaries, word lists etc. "language Description" covers language models and grammars. "Technology / Tool / Service" is used for tools, systems, system components etc. | required |
Title | The main title of the item. | required |
Project URL | URL of resource/project related to the submitted item (eg. project webpage). Regexp controlled (starts with http/https) | regexp controlled |
Demo URL | Demonstration, samples or in case of tools sample output URL. Regexp controlled (starts with http/https) | regexp controlled |
Date issued | The date when the submission data were issued if any e.g., 2014-01-21 or at least the year. | required |
Author | Names of authors of the item. In case of collections (eg. corpora or other large database of text) you usually want to provide the name of people involved in compiling the collection, not the authors of individual pieces. A person name is stored as surname comma any other name (eg. "Smith, John Jr."). | required repeatable |
Publisher | Name of the organization/entity which published any previous instance of the item, or your home institution. | required repeatable |
Contact person | Person to contact in case of issues with the submission. Someone able to provide information about the resource, eg. one of the authors, or the submitter. Stored as structured string containing given name, surname, email and home organization. | required repeatable |
Funding | Sponsors and funding that supported work described by the submission. Stored as structured string containing project name, project code, the funding organization and the type of funds (own/national/eu). | repeatable |
Description | Textual description of the submission. | required |
Language | The language(s) of the main contenten of the item. Stored as ISO 639-3 code. Required for corpora, lexical conceptual resources and language descriptions. | repeatable type-bind required |
Subject Keywords | Keywords or phrases related to the subject of the item. | repeatable required |
Size | Extent of the submitted data, eg. the number of token, or number of files. | repeatable |
Media type | Media type of the main content of the item, eg. text or audio. Dropdown selection, required for corpora, language descriptions and lexical conceptual resources. | dropdown selection type-bind required |
Detailed type | Further classification of the resource type. Dropdown selection, required for tools, language descriptions and lexical conceptual resources. | dropdown selection type-bind required |
Language Dependent | Boolean value indicating whether the described tool/service is language dependent or not. Required for tools | type-bind required |
Metadata mapping
The following tables contains the submission - oai_dc mapping, it also lists some of the important automatically generated fields.
Submission field | Mapped field |
---|---|
Type | dc.type |
Title | dc.title |
Project URL | dc.source |
Demo URL | not mapped |
Date Issued | dc.date |
Author | dc.creator |
Publisher | dc.publisher |
Contact person | not mapped |
Funding | not mapped |
Description | dc.description |
Language | dc.language |
Subject Keywords | dc.subject |
Size | not mapped |
Media Type | not mapped |
Detailed Type | not mapped |
Generated field | Description |
---|---|
dc.identifier | PID (currently handle) of the resource. |
dc.rights | Repeatable field can contain the name of the license under which the resource is distributed, the URL to the full text of the license and so called label (PUB, ACA, RES) |