Glossary

In the course of ANC development, a number of terms have arisen that require clarification. These terms will be incorporated into the ANC policy, with their implications effectively integrated into the repository. While the policy is under development, they are currently listed for reference.

Anonymization | Authentication | BIDS | Copyright | Cognitive Ontology | Certification | Data lifecycle | Data management infrastructure | Data management plan | Data curation | Data protection | Digital objects | FAIR data | FAIR principles | HED | HPC | Licence | Metadata | Metadata schema | Persistent identifier | Personal data | Primary data | Pseudonymization | Repository | Reproducibility and replicability | Research data | Research data management | TRUST principles | Trusted digital repositories

Anonymization

The anonymization of personal data in science is part of good scientific practice. According to BDSG (Federal Data Protection Act) § 3, para. 6, anonymization means any measures that change personal data in such a way that "the individual details about personal or factual circumstances can no longer be assigned to a specific or identifiable natural person, or can only be assigned to a specific or identifiable natural person with a disproportionate amount of time, cost and manpower". forschungsdaten.info

Authentication

Access to certain data, systems or services must be restricted. Access control is regulated via authentication. The accessing person can be uniquely identified using various features: IP address, login and password, security feature (key file, biometric feature, hardware token) or a combination (two-factor authentication). This requires a functioning user administration/identity management (IDM), where password data etc. can be stored and managed. An alternative is the so-called single sign-on procedure, such as Shibboleth, where one person can use several services with one login. Authentication must be distinguished from authorization, in which the authenticated person is assigned certain rights to the system.forschungsdaten.info

BIDS

The Brain Imaging Data Structure (BIDS) is an important tool for organizing and sharing neurocognitive and behavioral data. BIDS describes a way of organizing such data in a way widely accepted within the neurocognitive research community.

Certification

In the RDM sector, certification generally refers to repositories. By complying with certain standards, repositories can receive a so-called certificate. This certifies both the quality and the trustworthiness of the repository. forschungsdaten.info

Copyright

Whether or not research data is subject to protection under copyright law depends on whether the requirements for intellectual creativity or the requirements of database protection law are met. Since the existence of the requirements must be examined on a case-by-case basis, it is advisable to consult a specialist lawyer in case of doubt. In order to ensure maximum reusability of scientific research data, which may in principle be subject to copyright law, the granting of additional rights of use, e.g. by licensing the data accordingly, should be considered. The granting of such licenses usually leads to greater use of the data in scientific research and can thus contribute to a gain in reputation for the scientist, even beyond the boundaries of the respective specialist community. forschungsdaten.info

Cognitive Ontology

Scientific progress and innovation depend significantly on the systematic contextualization and integration of scientific knowledge. Crucial for this is above all the semantic networking of subject-specific concepts. This can be realized with the help of ontologies, which can be defined as "formal, explicit specifications of a common conceptualization of a domain of interest", and play an essential role in the implementation of the FAIR Principles (Exelra). To enable this within the ANC environment, the ANC has developed a Cognitive Ontology, i.e., a formal description of knowledge from the field of Cognitive Neuroscience in which a set of domain-specific concepts are annotated and related to each other by means of ontological links. The focus is on latent, non-observable cognitive constructs and defining theories.

Data Lifecycle

The data lifecycle model illustrates all the stages that research data can pass through, from collection to subsequent use. The stages of the data lifecycle can vary, but in general the data lifecycle comprises the following phases: Planning the research project (including handling the data in the research project, see data management plan), creation/collection, preparation and analysis, sharing and publishing, archiving, subsequent use. forschungsdaten.info

Data management infrastructure

An infrastructure used to provide data management and enforce data management policies. A data management infrastructure should include resources such as a data repository and an information catalog. EOSC Preservation: Overview Discussion Paper (2022)

Data management plan

A data management plan (DMP) describes the handling of research data produced or used in a project during the project duration and beyond. The DMP contains rules that are agreed and applied within the project team. It helps to systematically plan and transparently implement data management. The data management plan documents the (planned) collection, storage, documentation, maintenance, processing, forwarding, publication and storage of data, as well as the required resources, legal framework conditions and responsible persons. A DMP thus contributes to the quality, long-term usability and security of the data and supports, for example, the implementation of the FAIR principles. A DMP is a living document, which means that it is regularly updated during the project. Based on their guidelines, some funding bodies require a DMP to be submitted with the project application, but an initial version is usually only required at or shortly after the start of the project. The following questions, for example, need to be clarified in the DMP: What data will be used in the project and where will it come from? What infrastructure, software and licenses are required? What data will be generated in the project (type, scope, etc.)? Which data should be published and/or archived after the end of the project? Where should the data be published and/or archived? Who is responsible for the description with meta? Who is permitted to use the data after the project ends? And under what licence terms?

Data curation

A managed process, throughout the data lifecycle, by which data/data collections are cleansed, documented, standardized, formatted and inter-related. This includes versioning data, or forming a new collection from several data sources, annotating with metadata, adding codes to raw data (e.g., classifying a galaxy image with a galaxy type such as “spiral”). Higher levels of curation involve maintaining links with annotation and with other published materials. Thus a dataset may include a citation link to publication whose analysis was based on the data. The goal of curation is to manage and promote the use of data from its point of creation to ensure it is fit for contemporary purposes and available for discovery and re-use. For dynamic datasets this may mean continuous enrichment or updating to keep it fit for purpose. Special forms of curation may be available in data repositories. The data curation process itself must be documented as part of curation. Thus curation and provenance are highly related. EOSC Preservation: Overview Discussion Paper (2022)

Data protection

Data protection refers to technical and organizational measures to prevent the misuse of personal data. Misuse occurs when such data is collected, processed or used without authorization. Data protection is regulated in the EU General Data Protection Regulation (GDPR), in the German Federal Data Protection Act and in the corresponding laws at federal state level, e.g. in the State Data Protection Act of Baden-Württemberg. In research, personal data is generated particularly in medical and social science studies. Encryption and storage in specially secured locations is mandatory here. However, subsequent pseudonymization or anonymization can remove the personal reference to such an extent that it is even legally possible to publish this data.

Digital objects

A collation of data and metadata. Different actors will place different ‘boundaries’ around data and metadata to define different ‘objects’. An object might be easily defined as a collection of data and metadata files in a single directory, but might equally be geographically distributed. With linked data (RDF), the boundaries become even less defined. Objects may be linked to other related objects, may have different versions, or may depend on other digital objects (e.g. semantic artifacts such as ontologies). Different researchers and different archives will have different perspectives on what constitutes ‘a digital object’. But when a group of objects is cared for together by an organization it may be described as a ‘collection’ and that organization may be described as a ‘repository’ (see below). A repository collection may contain objects under different levels of curation and preservation. Object metadata does not come with a ‘level of curation/preservation’ attached that would allow us to identify its current level of care, or to highlight when that level of care changes. EOSC Preservation: Overview Discussion Paper (2022)

FAIR data

The term FAIR (Findable, Accessible, Interoperable and Reusable) Data was first coined in 2016 by the FORCE 11 community for sustainable research data management. The main aim of the FAIR Data Principles is to optimize the preparation of research data, which should be findable, accessible, interoperable and reusable. The FAIR principles were also included by the European Commission in the EU Horizon 2020 funding guidelines and are also part of the application for the successor funding project Horizon Europe.

FAIR principles

The (re-)usability of digital objects depends largely on the broad application of the FAIR Guiding Principles for the Management and Stewardship of Scientific Data to improve the findability, accessibility, interoperability, and reusability of digital resources (Wilkinson et al., 2016; see also GOFAIR). The FAIR Principles are an integral part of contemporary research data management and are ultimately intended to ensure that data are "born FAIR", i.e. that the research process along the entire data lifecycle is FAIR ("FAIR-by-design"; Dillo et al., 2021).

FAIR Initiatives and Projects

FAIR Tools

FAIR data and services can only exist in a FAIR ecosystem, which will be made available in Europe primarily through the European Open Science Cloud (EOSC; Europäische Kommission, 2018. The focus here is on FAIR-enabling, Trusted Digital Repositories (TDRs), which are intended to offer long-term perspectives for making FAIR data available and (re-)usable (L'Hours et al., 2020, L'Hours et al., 2022, von Stein, 2021; Conzett et al., 2022 and Lin et al., 2020). In this context, the EOSC can be seen as the European contribution to a worldwide Internet of FAIR Data and Services.

HED

The Hierarchical Event Descriptors (HED) system provides a controlled, hierarchical vocabulary for describing experimental events in a machine-readable way, which is approved for use with BIDS-formatted data and makes an important contribution to promoting the reliability and reproducibility of scientific findings.

HPC

An important aspect regarding the digitization of science and research - especially in data-intensive disciplines like Cognitive Neuroscience - is the processing and analysis of very large amounts of data ("Big Data"). The ANC supports its users in this respect by making available High Performance Computing (HPC) system.

Licence

A license is a contractually agreed right of use. It allows the rights holder to allow their contractual partner to use a work in various ways (e.g. to copy, store or make it digitally accessible). In many cases, rights holders charge a license fee for this. In addition to such commercial licenses, free licenses such as Creative Commons licenses are also available. These allow the work to be used free of charge.

Metadata

Metadata is independent data that contains structured information about other data or resources and their characteristics. It is stored independently of or together with the data that describes it in more detail. A precise definition of metadata is difficult because the term is used in different contexts and the distinction between data and metadata varies depending on the perspective. A distinction is usually made between functional and technical or administrative metadata. While the latter have a clear metadata status, technical metadata can sometimes also be understood as research data. In order to increase the effectiveness of metadata, it is essential to standardize the description. A metadata standard allows metadata from different sources to be linked and processed together. forschungsdaten.info

Metadata schema

A metadata schema organizes the structure of metadata. It specifies which elements are mandatory for the description of analog and digital objects such as research data and which information should be specified in which format. A standardized data schema simplifies data entry and increases the quality of the metadata. Above all, however, structured metadata enables machine readability and the exchange of information between different applications and ensures long-term reusability. forschungsdaten.info

Persistent identifier

In research data management, a persistent identifier is a permanent (persistent) digital identifier consisting of numbers and/or alphanumeric characters that is assigned to a dataset (or another digital object) and refers directly to it. Frequently used identifier systems are DOI (Digital Object Identifiers) and URN (Uniform Resource Names). In contrast to other serial identifiers (e.g. URL addresses), a persistent identifier refers to the object itself and not to its location on the Internet. If the location of a digital object associated with a persistent identifier changes, the identifier remains the same. Only the URL location must be changed or added to the identifier database. This ensures that a dataset remains permanently findable, retrievable and citable. forschungsdaten.info

Personal data

The Federal Data Protection Act (BDSG) defines personal data as "individual details about personal or factual circumstances of an identified or identifiable natural person (data subject)". Data can be considered personal if it can be clearly assigned to a specific natural person. Typical examples are a person's name, profession, height or nationality. The BDSG also stipulates that information on ethnic origin, political opinion, religious or philosophical beliefs, trade union membership, health and sex life constitutes a particularly sensitive type of personal data and is therefore subject to stricter protection requirements. forschungsdaten.info

Primary data

Data obtained directly from data collection or from an investigation or observation of a phenomenon is referred to as primary data (or raw data). Primary data can be, for example, unprocessed, unchecked and uncommented measurement data or audio and video recordings. Data derived from primary data is referred to as secondary data. forschungsdaten.info

Pseudonymization

In contrast to anonymization, pseudonymization merely replaces certain identification features, such as the name, with a pseudonym (a letter and/or number code) in order to make it more difficult or impossible to identify the persons concerned (BDSG § 3, para. 6a). For the duration of a scientific study, it is often unavoidable to keep personal data and codes in a reference list and the study data in a separate database, i.e. to pseudonymize data. Anonymization of the data is achieved by deleting the reference list, for example after completion of the study, so that no reference can be made between individual persons and the study results. forschungsdaten.info

Repository

A repository can be regarded as a special form of archive. In the digital age, the term repository refers to a managed storage location for digital objects. As repositories are usually accessible to the public or a restricted group of users, this term is closely linked to open access. forschungsdaten.info

Reproducibility and replicability

Reproducibility is often defined as the ability of a researcher to duplicate the results of a prior study using the same materials as were used by the original investigator. That is, a second researcher might use the same raw data to build the same analysis files and implement the same statistical analysis in an attempt to yield the same results. This is distinct from replicability which refers to the ability of a researcher to duplicate the results of a prior study if the same procedures are followed but new data are collected. A simpler way of thinking about this might be that reproducibility is methods-oriented, whereas replicability is results-oriented. FOSTER Open Science Training Handbook

Research data

Research data is (digital) data that is generated during scientific activities (e.g. through measurements, surveys, source work). It forms the basis of scientific work and documents its results, resulting in a discipline- and project-specific understanding of research data with different requirements for the preparation, processing and management of the data: so-called research data management. Sometimes a distinction is also made between primary data and metadata, whereby the latter is often not considered research data in the narrower sense, depending on the discipline. forschungsdaten.info

Research data management

Research data management is the process of transforming, selecting and storing research data with the aim of keeping it accessible, reusable and verifiable in the long term, independently of the data producer. Structured measures can be taken at all points of the data life cycle in order to preserve the scientific validity of research data, maintain its accessibility by third parties for evaluation and analysis and secure the chain of custody. forschungsdaten.info

TRUST principles

FAIR-enabling, Trusted Digital Repositories (TDRs) are considered a core component in the lifecycle of research data and take a central role in federated data infrastructures such as the European Open Science Cloud (EOSC; L'Hours et al., 2019). To be considered here are the TRUST Principles for Digital Repositories (Transparency, Responsibility, User Focus, Sustainability, Technology), which form the core of TDRs and are intended to ensure the long-term preservation and curation of FAIR Data and Services (Lin et al., 2020; Conzett et al., 2022). Directly related to this is the certification of repositories. The best-known certification body at present is the CoreTrustSeal (CTS). It only certifies repositories that successfully implement the CTS-Requirements and therefore have all the characteristics of a TDR. It can be assumed that the deposit of research data in certified data repositories will be mandatory in the future, as is already the case for some Horizon Europe work programs (Burgelman et al., 2019).

Trusted digital repositories

Has been defined as having “a mission to provide reliable, long term access to managed digital resources to its designated community, now and into the future”. The TDR must include the following seven attributes: compliance with the reference model for an Open Archival Information System (OAIS), administrative responsibility, organizational viability, financial sustainability, technological and procedural suitability, system security, and procedural accountability. The concept has been an important one particularly in relation to certification of digital repositories. EOSC Preservation: Overview Discussion Paper (2022)