This page describes the archiving guidelines in the CEDIFOR CLARIN-D repository.
If you have any questions, please contact: firstname.lastname@example.org
Data Storage and Backup
The data is stored in a GraphDatabase on a RAID system and all content is routinely copied to separate hardware (e.g. tape backup). The deterioration of the storage media is monitored via Nagios probes (e.g. S.M.A.R.T. – Self-Monitoring, Analysis and Reporting Technology – data).
Access to the archive system is restricted to a small group of people. While read access from external systems is possible, write access to data storage systems running on the archive server is limited to the system itself (e.g. no public write access to databases from external sources).
The depositors are recommended to use standardised formats (UTF-8, documented XML formats,…) for the transmission of their data for archiving. If user-defined / proprietary formats are used, detailed documentation must be enclosed.
The CEDIFOR CLARIN-D repository is checked once a year,
- whether the archived data is unchanged (e.g. checksums).
- whether an update of the stored data is necessary due to the obsolescence of the format used.
- if the available metadata needs to be updated.
If an update is required, the depositor will be contacted and asked to provide an updated version.
In some cases, the CEDIFOR CLARIN-D Centre may also decide to carry out the update itself. In this case, the original depositor will be informed (in which case the person/institution is still available).
Backups are stored on hardware located in a separate location from the live system. For this purpose the CEDIFOR CLARIN-D Center uses the cooperation with the University Computer Center of the University of Frankfurt to store (encrypted) backups distributed.
Software Stack Preservation
The CEDIFOR CLARIN-D Center uses widespread open source software stacks (Neo4J, Tomcat, Apache) that are installed on virtual machines ( Docker) to enable all repository services (data storage, OAI-PMH, …). This maximizes the long-term likelihood of support (updates, security fixes) for the tools used and increases the ability to run installations of these software stacks regardless of the underlying hardware and/or operating system.
The CEDIFOR CLARIN-D Center will inspect at least once a year:
- whether significant updates of software components are available and necessary.
- whether security patches are available
- whether software components are still actively updated / developed or
and the change to alternatives should be considered.
- whether access to the data/metadata stored in the repository should be enabled via additional interfaces (or updated versions of existing ones).