The indicators of the FAIRplus-DSM Model are designed to enable researchers in the life sciences to measure dataset maturity using a finite set of dimensions in conjunction with maturity levels. The levels themselves are a collection of well defined indicators across each maturity dimension, thus enabling user to better understand in what aspects the dataset can be improved in terms of FAIR. This assessment will help the users to benchmark and visibly improve re-usability of their data assets and to increase discoverability, interoperability and overall machine actionability incrementally.
The FAIRplus dataset maturity indicators were created based on previous work around the FAIR indicators, done by the Research Data Alliance (RDA) and the FAIRsFAIR projects:
FAIR Data Maturity Model Working Group. (2020). FAIR Data Maturity Model. Specification and Guidelines (1.0). https://doi.org/10.15497/rda00050
Dataset (s) are NOT Identifiable via a Unique Identifiers
Maturity Level
0
Category
Content and Context
Granularity Level
Dataset
Description
This is a ZERO-LEVEL indicator, which indicates the absence of one of the essential requirements for a Dataset to reach minimum level of FAIRness (Level 1) according to the FAIR-DSM model.
See DSM-1-C0 to satisfy this prerequisite requirement.
This is a ZERO-LEVEL indicator, which indicates the absence of one of the essential requirements for a Dataset to reach minimum level of FAIRness (Level 1) according to the FAIR-DSM model.
See DSM-1-C1 to satisfy the minimal level for this requirement.
Dataset Descriptor does NOT include a reference to the Dataset it describes.
Maturity Level
0
Category
Content and Context
Granularity Level
Dataset
Description
This is a ZERO-LEVEL indicator, which indicates the absence of one of the essential requirements for a Dataset to reach minimum level of FAIRness (Level 1) according to the FAIR-DSM model.
This indicator refers to the absence of one of the prerequisites of FAIR priniciples, which is to include in the metadata object (the Dataset Descriptor) a reference to the Identifier assigned to the FAIR Data Object being described (the Dataset) See DSM-1-C2 to satisfy this prerequisite requirement.
Data or metadata is hosted in non-accessible storage (e.g., personal desktop, local file system or archive)
Maturity Level
0
Category
Hosting Environment
Granularity Level
Storage Capability
Description
This is a ZERO-LEVEL indicator, which indicates the absence of one of the essential requirements for a Dataset to reach minimum level of FAIRness (Level 1) according to the FAIR-DSM model.
See DSM-1-H1 to satisfy a basic accessibility related FAIR requirement.
Data or metadata hosted in an accessible resource but with no retrieval capability
Maturity Level
0
Category
Hosting Environment
Granularity Level
Retrieval Capability
Description
This is a ZERO-LEVEL indicator, which indicates the absence of one of the essential requirements for a Dataset to reach minimum level of FAIRness (Level 1) according to the FAIR-DSM model. This indicator refers to a state of storage, but with no formal retrieval mechanism that is defined for data access. A simple URI for download over http is usually the basic requirement to satisfy.
See DSM-1-H2 to satisfy this prerequisite requirement.
Dataset’s Metadata is NOT searchable via keywords or elements within the Descriptor
Maturity Level
0
Category
Hosting Environment
Granularity Level
Search Capability
Description
This is a ZERO-LEVEL indicator, which indicates the absence of one of the essential requirements for a Dataset to reach minimum level of FAIRness (Level 1) according to the FAIR-DSM model. Although not deemed as an essential requirement for FAIR Data accessibility, but Findability of data is dependent on some form of searching capabilities that allow users or machines to make use of the associated metadata to promote the Dataset’s discoverability.
See DSM-1-H4 to satisfy this prerequisite requirement.
This is a ZERO-LEVEL indicator, which indicates the absence of one of the essential requirements for a Dataset to reach minimum level of FAIRness (Level 1) according to the FAIR-DSM model.
This indicator refers to the absence of any context-related metadata reported about the FAIR dataset such as, description of leading study or project, the assessment, or assay that generated the data …etc. See indicator [DSM-1-R1]https://fairplus.github.io/Data-Maturity/docs/Indicators/#DSM-1-R1) for the minimum level of Contextual Metadata representation.
Related DSM Indicator
N/A
Related FAIR Principle
F2. Data are described with rich metadata, R1. Meta(data) are richly described with a plurality of accurate and relevant attributes
No representation of Data purposed for FAIR sharing is available.
Maturity Level
0
Category
Data Representation
Granularity Level
Dataset
Description
This is a ZERO-LEVEL indicator, which indicates the absence of one of the essential requirements for a Dataset to reach minimum level of FAIRness (Level 1) according to the FAIR-DSM model.
This is indicator refers to a foundational requirement to objectify and organise ‘FAIR-to-be’ data into units of data that are purposed for FAIR sharing. FAIR-DSM model refers to a unit of FAIR data as a Dataset. See indicator DSM-1-R2 to satisfy this prerequisite requirement.
Dataset Metadata is NOT formally represented in a structured form, i.e. a Dataset Descriptor
Maturity Level
0
Category
Metadata Representation
Granularity Level
Dataset
Description
This is a ZERO-LEVEL indicator, which indicates the absence of one of the essential requirements for a Dataset to reach minimum level of FAIRness (Level 1) according to the FAIR-DSM model. This indicator refers to the absence of a structured object to contain the necessary and expected dataset metadata that will be required to reach higher levels of maturity.
See DSM-1-0R to be able to satisfy this prerequisite requirement.
This is a ZERO-LEVEL indicator, which indicates the absence of one of the essential requirements for a Dataset to reach minimum level of FAIRness (Level 1) according to the FAIR-DSM model.
This indicator refers to the necessity to establish machine-processable metadata objects. This is to avoid having metadata represented in PDFs or otherwise non-parsable formats. See DSM-1-R4 to satisfy this prerequisite requirement.
This is a ZERO-LEVEL indicator, which indicates the absence of one of the essential requirements for a Dataset to reach minimum level of FAIRness (Level 1) according to the FAIR-DSM model.
This indicator refers to the necessity of establishing FAIR digital objects and hence data should be available in a machine-processable format, which should not be a problem in today’s research data. See DSM-1-R5 to satisfy this prerequisite requirement.
Dataset Descriptor includes Descriptive Study/Project-Level summary information
Maturity Level
1
Category
Content and Context
Granularity Level
Project
Description
This is a metadata-related requirement. Metadata should include summary information about the study or project that the Data Object is related to. This is basic contextual-metadata that will allow minimum levels of human interpretation of the data being shared.
Related DSM Indicator
Related FAIR Principle
R1. Meta(data) are richly described with a plurality of accurate and relevant attributes
Dataset Descriptor includes Identifying and Descriptive Dataset-Level metadata
Maturity Level
1
Category
Content and Context
Granularity Level
Dataset
Description
This is a metadata-related requirement. Metadata should include the Dataset Identifier it is describing AND descriptive information about the Dataset as a whole to enable Search-ability and Findability of Data (e.g., name, description, keywords).
Related DSM Indicator
Related FAIR Principle
F3. Metadata clearly and explicitly include the identifier of the data they describe, R1. Meta(data) are richly described with a plurality of accurate and relevant attributes.
Metadata hosting environment stores and maintains an identifiable Dataset Descriptor for each identifiable Dataset
Maturity Level
1
Category
Hosting Environment
Capability
Storage Capability
Description
The hosting environment stores for each data object a related metadata record, which enables findability. At this basic level of maturity, there is no restriction on the persistence model for the metadata records as long as the representation for data exchange is offered in accordance to a standard generic metadata schema (DSM-1-R4)
Related DSM Indicator
DSM-1-R4
Related FAIR Principle
A1. (Meta)data are retrievable by their identifier using a standardised communications protocol
Metadata hosting environment offers the capability to browse and search contents of the Dataset Descriptor
Maturity Level
1
Category
Hosting Environment
Capability
Searching Capability
Description
This capability is enabled by the Metadata Hosting Environment storing and indexing the metadata that is included in the Dataset Descriptor (F+MM-1.H2). As a gained benefit, the hosting environment should be able to offer simple keyword search against their locally defined metadata schema to enable basic human led discoverability of the associated datasets.
Related DSM Indicator
Related FAIR Principle
F4. (Meta)data are registered or indexed in a searchable resource
Structured and/or Unstructured Data are organised into Dataset (s) created for the purpose of FAIR sharing
Maturity Level
1
Category
Data Representation
Granularity Level
Dataset
Description
This is a pre-requisite requirement to define the unit of data that is the subject-matter of the FAIRification process. This requirement requires the data managers to consider the form and the representation of data into Datasets that are designed and purposed for sharing and re-use by users unfamiliar with the data. Once defined, a Dataset should be assigned an identifier as indicated by DSM-1-C0, which then makes it an Identifiable Dataset.
What this requirement is trying to advice against are decisions to FAIRify data stored in databases according to a defined schema without defining the ‘data exchange unit’ that is meant for sharing and re-use.
This is a format-related requirement that focuses on machine-readability aspect of the metadata. This is a pre-requisite requirement to having the metadata indexed and searchable in a hosting resource DSM-1-H4.
This is a format-related requirement that focuses on machine-readability aspect of the data. This is a pre-requisite requirement to having the data indexed and searchable in a hosting resource DSM-1-H4.
A locally defined Domain Model contains concepts that describes the overall project/study design, the relationships between the Datasets, the key entities reported within the Datasets and the relationships between them.
Maturity Level
2
Category
Content and Context
Granularity Level
Dataset
Description
This is a metadata-related requirement focusing on context and domain description. This is an entry level requirement to describe the ‘Domain’ of the data, which at this level might not be fully represented by either the hosting environment or an adopted standard (level 3). The Metadata Record should include information that can help a researcher understand the data context, especially in relation to the overall project or study design that this dataset belongs to as well as the entities that are represented by the dataset content.
Related DSM Indicator
Related FAIR Principle
R1. Meta(data) are richly described with a plurality of accurate and relevant attributes
This is a data-related requirement that is a pre-requisite for the ‘accuracy’ of metadata that FAIR principle (R1) refers to. This requirement is borrowed from one of the key Tidy Data Principles, which states that each column/field should be a single variable. This prevents the often seen scenario in structured data whereby a single column header might carry values for more than one variable. For example, ‘temperature_screening’, ‘temperature_followup’, each column implicitly carries the value for a visit variable and a value for an observation temperature in this case. This indicator therefore requires the data manager to split these variables into two fields: One per variable that is a field for temperature and a field for visit. This is a pre-requisite to DSM-2-C5 and DSM-2-C6 since each Dataset Field is expected to control its terms and create a local dictionary. Unless individual concepts are reported per Dataset Field it will not be possible to find suitable terms that can later be standardised for level 3.
Related DSM Indicator
DSM-2-C5, DSM-2-C6
Related FAIR Principle
R1. Meta(data) are richly described with a plurality of accurate and relevant attributes
This is a data-related requirement that focuses on the consistency of a Dataset’s textual content. This is also related to the ‘accuracy’ and overall consistency of data content within and across multiple related project datasets. Level 2 content standardisation is not required to comply with standard terminologies or ontologies. However, to achieve Level 2 content standardisation, textual values reported in text-based Dataset Fields are expected to be consistently reported using locally defined terms. These local terms are defined in a local Data Dictionary that ought to be reported as well as part of the content-related metadata (DSM-2-C6).
Related DSM Indicator
DSM-2-C6
Related FAIR Principle
F2. Data are described with rich metadata, R1. Meta(data) are richly described with a plurality of accurate and relevant attributes
Dataset Descriptor includes [Field-level Metadata]8https://fairplus.github.io/Data-Maturity/docs/Glossary/#field-level-metadata) as prescribed by the adopted Dataset Model
Maturity Level
2
Category
Content and Context
Granularity Level
Dataset Field
Description
This is a metadata-related requirement focusing on the Dataset’s structure. This is a requirement to include structural metadata into the Dataset’s Metadata Record irrespective of how this information is represented (DSM-2-R1). Dataset-Field metadata include ‘field name’, ‘description’, ‘data type’
Related DSM Indicator
DSM-2-R1
Related FAIR Principle
F2. Data are described with rich metadata, R1. Meta(data) are richly described with a plurality of accurate and relevant attributes
This is a metadata-related requirement. This indicator is related to F+MM-2.C5, which requires that textual data values used within and across related datasets should consistently reported using locally defined terms or values. In case of using numeric values instead of textual values, a data dictionary is needed to map these values and allow users to interpret the data. Therefore, a data dictionary that associated each Dataset Field with its associated list of permissible terms or values and their meanings should also be made available. This could either be represented inside the Metadata Record itself if the metadata schema allows (DSM-2-R3), or otherwise represented separately.
Related DSM Indicator
DSM-2-R3
Related FAIR Principle
F2. Data are described with rich metadata, R1. Meta(data) are richly described with a plurality of accurate and relevant attributes
Data hosting environment stores data in accordance to a locally defined Domain Model for persistence purposes
Maturity Level
2
Category
Hosting Environment
Capability
Storage
Description
This is a data-storage related requirement. In order to provide a basic level of contextual browsing or searching capabilities, the data hosting environment/resource should offer a common data model albeit being a locally defined one or project-specific one, against which all hosted datasets can be navigated and explored against.
Related DSM Indicator
Related FAIR Principle
R1. Meta(data) are richly described with a plurality of accurate and relevant attributes
Metadata hosting environment provides programmatic access and retrieval (API) for the Dataset Descriptor
Maturity Level
2
Category
Hosting Environment
Capability
Metadata Retrieval
Description
This is a metadata-retrieval-related requirement. The metadata hosting environment (which could be the same or different from the data hosting environment) should offer the capability to retrieve the Dataset’s Metadata Record using API technologies like REST, RPC or GRAPHQL.
Related DSM Indicator
Related FAIR Principle
A1. (Meta)data are retrievable by their identifier using a standardised communications protocol
Data hosting environment offers the capability to browse and search related Datasets
Maturity Level
2
Category
Hosting Environment
Capability
Searching Capability
Description
This capability provides enhanced contextual interpretation of multiple related datasets when they are commonly linked to a study or a project. This capability is enabled by the hosting environment’s capitalising on contextual metadata and dataset structural metadata made available at this level of maturity and established by dsm-22c, dsm24c and dsm-26c.
Related DSM Indicator
dsm-22c, dsm-24c, dsm-26c
Related FAIR Principle
F4. (Meta)data are registered or indexed in a searchable resource
Contextual Metadata necessary to understand and interpret Datasets’ content is defined and conforms to a locally defined Domain Model
Maturity Level
2
Category
Metadata Representation
Granularity Level
Project
Description
This is a metadata-related requirement focusing on the representation of the reported Contextual Metadata (DSM-2-C2). For level 2, having a human interpretable representation suffices to pass this requirement. This can be a visual diagram, or textual documentation that can be available from the hosting environment’s documentation pages.
Related F+MM Indicator
DSM-2-C2
Related FAIR Principle
R1. Meta(data) are richly described with a plurality of accurate and relevant attributes
Project collected Data are organized into structured Dataset(s) and conform to a locally defined Dataset Model
Maturity Level
2
Category
Data Representation
Granularity Level
Dataset
Description
This is a data-modelling related requirement. More specifically, this indicator focuses on the data model used to describe the structure of the Dataset which is the form that data is modelled against for the purpose of being utilized for FAIR sharing and re-use (DSM-2-C1). At Level 2, this model is simply a set of defined Dataset Types or Names and their respective Dataset Fields to be used consistently by all project related Datasets.
This is often represented in the form of pre-defined templates that data owners define for their project data or made available by the data hosting environment to be used for importing and exporting the FAIRified Datasets. This is to guarantee a minimum level of consistency amongst similarly reported datasets, which directly affects the storage capability of the hosting environment. This consistency will enable the hosting environment to store and index multiple datasets against this locally defined dataset model and hence offer better searching and discovery capabilities.
Related DSM Indicator
DSM-2-C1
Related FAIR Principle
R1. Meta(data) are richly described with a plurality of accurate and relevant attributes
This is a metadata-modelling requirement focusing on the representation of the Dataset Field level and Field Value level metadata required by DSM-2-C4 and DSM-2-C6. This indicator requires that the chosen standard metadata schema used to describe the Dataset should be amenable to represent structural metadata about the Dataset. Each Dataset Field will have a name, description, data type …etc. Value related metadata may include reference to local dictionary of controlled terms used in each field. Examples of generic metadata schemas supporting field-level metadata are DATS and BioSchemas Dataset.
This is a data-related formatting requirement. The exchange format used to share the Dataset(s) should be readable by machines. This is not a requirement to use semantic representations, this is a simple requirement to use standard formats (e.g. CSV, JSON, XML or similar) for data exchanged via the relevant API (DSM-2-H2).
Related DSM Indicator
DSM-2-H2
Related FAIR Principle
I1. (Meta)data use a formal, accessible, shared, and broadly applicable language for knowledge representation.
If applicable, study-level / experimental metadata is reported in compliance with relevant Standard Minimum Information Reporting Guidelines
Maturity Level
3
Category
Content and Context
Granularity Level
Project/Study
Description
This is a metadata-related requirement focusing on standardisation of context and domain representation. Minimum information standards are sets of guidelines and formats for reporting data derived by specific high-throughput methods. Their purpose is to ensure the data generated by these methods can be easily verified, analysed and interpreted by the wider scientific community
Related DSM Indicator
Related FAIR Principle
R1.3. (Meta)data meet domain-relevant community standards
If applicable, Dataset(s) content is reported in compliance with relevant community-defined Data Reporting Guidelines
Maturity Level
3
Category
Content and Context
Granularity Level
Dataset
Description
This is a data-related requirement focusing on standardisation of the type and definition of Dataset(s) that should be reported for a given subject-area or a study type. Minimum information standards are sets of guidelines and formats for reporting data derived by specific high-throughput methods. Their purpose is to ensure the data generated by these methods can be easily verified, analysed and interpreted by the wider scientific community
Related DSM Indicator
Related FAIR Principle
R1.3. (Meta)data meet domain-relevant community standards
This is a data-related requirement focusing on the standardisation of the terminologies used within and across related Datasets. This indicator focuses on the set of data values for a given dataset field. To promote interoperability and enable the hosting environment to carry out cross-study queries, dataset textual field values should be standardised against community-standard controlled terminology or ontologies.
Related F+MM Indicator
DSM-3-C5
Related FAIR Principle
I2. (Meta)data use vocabularies that follow FAIR principles, R1.3. (Meta)data meet domain-relevant community standards
This is a metadata-related requirement related to F+MM-3.C4. Dataset Field Values are expected to use standard terminology that are defined and described by other external resources. This indicator requires that for each term used a reference to its external definition is included in either the dataset itself (e.g. as a separate related field) or in a metadata record as specified by the Dataset Exchange Model if applicable.
Related DSM Indicator
DSM-3-C4
Related FAIR Principle
I2. (Meta)data use vocabularies that follow FAIR principles, I3. (Meta)data include qualified references to other (meta)data
For each dataset, the hosting environment provides a permanent and persistent address containing its unique identifier for access and retrieval.
Maturity Level
3
Category
Hosting Environment related requirements
Capability
Storage
Description
This indicator is about the resolution of the identifier that identifies the dataset. The hosting environment should assign a persistent identifier to the dataset and associate it with a formally defined retrieval/resolution mechanism for dataset access and retrieval.
Related DSM Indicator
Related FAIR Principle
F1. (Meta)data are assigned a globally unique and persistent identifier.
Hosting environment offers Data Discovery capability
Maturity Level
3
Category
Hosting Environment related requirements
Capability
Searching Capability
Description
This indicator requires the hosting environment to offer enhanced data discovery capabilities. This can be achieved by linking multiple related datasets through an implementation of a common domain model (DSM-3-R1), which allows users to search and find contextually-related datasets. Datasets at this level are expected to use annotated and described standardised Dataset Fields (DSM-3-R2) and their content to use standardised value terms and concepts (DSM-3-C2). The hosting resource should therefore capitalise on these rich annotations and allow users to search across datasets for concepts and standard terms in their content through the use of ontology-related query expansions.
Related DSM Indicators
DSM-3-R1, DSM-3-R2, DSM-3-C2
Related FAIR Principle
F1. (Meta)data are assigned a globally unique and persistent identifier
If applicable, Dataset hosting environment offers dataset-level authentication and authorisation capabilities.
Maturity Level
3
Category
Hosting Environment related requirements
Capability
Data Retrieval
Description
This indicator requires the hosting resource to provide a more granular approach to data accessibility. A dataset level authorisation capability would allow data owners to define user-access rights per dataset rather than on a project or study based level.
Related DSM Indicator
Related FAIR Principle
A1.2 The protocol allows for an authentication and authorisation procedure, where necessary.
When dataset(s) are typed semantically, means that the data is structured and represented in a logical way. It adds a basic meaning to the data and the relationships that lie between them, for data consisteny and easy maintenance.
Related DSM Indicator
Related FAIR Principle
I1. (Meta)data use a formal, accessible, shared, and broadly applicable language for knowledge representation.
When dataset(s) are typed semantically, means that the data is structured and represented in a logical way. It adds a basic meaning to the data and the relationships that lie between them, for data consisteny and easy maintenance.
Related DSM Indicator
Related FAIR Principle
I1. (Meta)data use a formal, accessible, shared, and broadly applicable language for knowledge representation.
If applicable, license information and/or permitted use and accessability to parts of the dataset is formally represented and encoded in a Machine Readable Format.