This page defines guidelines for users who want to archive their resources in the CEDIFOR CLARIN-D repository. Resource owners should read this page carefully and check whether the resource owners can meet all the criteria listed below. CEDIFOR supports you with questions about the process. Please contact firstname.lastname@example.org for further information.
The repository focuses on multimodal corpora, but also includes pure textual, multimedia or pictorial corpora.
We only take data into the repository which,
- are the result of research projects,
- are equipped with detailed metadata,
- whose data structure is described by detailed documentation (PDF/A),
- contains information about how the data was originally generated,
- were checked out by a third party.
We also encourage resource owners:
- A list of usage scenarios for which the resource is to or can be used is made available.
- To include related publications about the resource in the metadata
Metadata must be provided in CMDI format. To generate CMDI-compliant metadata, there is detailed documentation and ready-made programs for generating it.
We check metadata for conformity to CMDI standards as follows:
- Are the XML metadata well formatted and valid?
- Are the CMDI components and profiles used stored in the component registry and publicly accessible?
- Are the data categories used in the components/profiles present in the ISOcat data category registry?
- Do the provided CMDIs contain sufficient and consistent information (e.g. consistent indication of the “name” of the data producer and the license) according to the needs of the VLO?
- The metadata must include information about the data provider and/or producer (name and URL of the person/institution, contact information) and a statement of the legal status of the resource.
- The Data Controller undertakes to make this metadata publicly and 100% freely accessible via technical interfaces to the repository such as OAI-PMH and to enable the calculation, reuse and redistribution of this metadata by third parties.
Data and Formats
It is recommended to follow the listed formats of the CLARIN standard recommendations.
If no recommended/known and documented format is used, detailed documentation on the syntax and semantics of the data (e.g. database dumps: names of tables and columns; specifications and examples of the contents of each column; examples of retrieval of different data types) shall be provided.
Only data that is publicly accessible or has a limited license to work within resource institutions is added to the repository. Access to metadata must not be restricted in any way or at any time.
Technical access to the data is not restricted to any user. The CEDIFOR CLARIN-D repository contains only resources or services that are publicly usable or contain a corresponding license.
If the privacy of a data subject is a problem here, this must be regulated by contracts signed by these persons (e.g. the persons interviewed expressly declare that the data can be freely passed on to researchers/teaching purposes).
Resource creators or resource owners can request to archive their data in the CEDIFOR CLARIN-D repository by signing the depositor agreement available on the Centre’s website. The following steps are necessary:
- Fill in the application form for resource storage and send it to email@example.com
- If the request is accepted, the resource and its metadata must be provided in the manner described above.
Further details will be part of a positive response to the request.