Link Search Menu Expand Document

Level 2

Description

Level 2 aims to enhance the usability of a project, or a study’s structured data, which often are represented by multiple related datasets. Different projects usually have their own data model and collect different subsets of clinical, molecular, imaging or other data. The FAIRplus-DSM model distinguishes between structured data, unstructured data, and data objects in a project. Structured data include subject-based clinical data, sample-based assay data and other data associated with the data schema. Therefore, indicators at this level, refer to the FAIR Data Object as the Dataset indicating more requirements related to the structural metadata of the Dataset, namely the Dataset Fields and the corresponding Dataset Field Values.

This level of maturity aims to increase the FAIRness level of structured data by focusing on Dataset-level structural metadata and Project-level contextual metadata.

This level of maturity is aimed at data hosted within project-based data repositories, general purpose data repositories or data catalogues.

In terms of hosting, level 2 compliant datasets would be hosted in project-specific or institutional data repositories that provide all the accessibility and storage capabilities required for sharing and reusing data within project data users.

Example

Level2-Overview

In order to comply with level 2 maturity requirements, a dataset needs to conform to a locally defined domain model such as a project data dictionary or standard generic domain model such as W3C’s DCAT or Bioschemas. This allows data values to be mapped uniformely using standardised terms for both variables and valuesl, where possible.

Level2-Model

Where appropriate, datasets should also conform to “Tidy Data Principles”, ie each column and field should represent a single variable. In the example shown below, the initial data encodes both the measured variable (eg sysbp - systolic blood pressure) and the visit during which the measurement was taken (eg sc - screening). After transformation, the measured variables have been reduced from 6 to 3 columns (one for each variable) and a further column added to represent the visit. This makes querying and filtering the data much easier.

Level2-Tidy

FAIR-DSM Level 2 Indicators

DSM-2-C1

IdentifierDSM-2-C1
NameA locally defined Domain Model contains concepts that describes the overall project/study design, the relationships between the Datasets, the key entities reported within the Datasets and the relationships between them.
Maturity Level2
CategoryContent and Context
Granularity LevelDataset
DescriptionThis is a metadata-related requirement focusing on context and domain description. This is an entry level requirement to describe the ‘Domain’ of the data, which at this level might not be fully represented by either the hosting environment or an adopted standard (level 3). The Metadata Record should include information that can help a researcher understand the data context, especially in relation to the overall project or study design that this dataset belongs to as well as the entities that are represented by the dataset content.
Related DSM Indicator 
Related FAIR PrincipleR1. Meta(data) are richly described with a plurality of accurate and relevant attributes
Cross-reference FAIR indicatorsRDA-R1-01M, FsF-R1-01MD

DSM-2-C2

IdentifierDSM-2-C2
NameWhere applicable, the Dataset Model organises data values within a dataset according to the Tidy Data Principles
Maturity Level2
CategoryContent and Context
Granularity LevelDataset Fields
DescriptionThis is a data-related requirement that is a pre-requisite for the ‘accuracy’ of metadata that FAIR principle (R1) refers to. This requirement is borrowed from one of the key Tidy Data Principles, which states that each column/field should be a single variable. This prevents the often seen scenario in structured data whereby a single column header might carry values for more than one variable. For example, ‘temperature_screening’, ‘temperature_followup’, each column implicitly carries the value for a visit variable and a value for an observation temperature in this case. This indicator therefore requires the data manager to split these variables into two fields: One per variable that is a field for temperature and a field for visit. This is a pre-requisite to DSM-2-C5 and DSM-2-C6 since each Dataset Field is expected to control its terms and create a local dictionary. Unless individual concepts are reported per Dataset Field it will not be possible to find suitable terms that can later be standardised for level 3.
Related DSM IndicatorDSM-2-C5, DSM-2-C6
Related FAIR PrincipleR1. Meta(data) are richly described with a plurality of accurate and relevant attributes
Cross-reference FAIR indicators 

DSM-2-C3

IdentifierDSM-2-C3
NameDataset(s) include Reference Fields that enable joining related datasets
Maturity Level2
CategoryContent and Context
Granularity LevelDataset Fields
DescriptionThe Dataset(s) includes Reference Fields that allows the joining of other datasets that might or not be part of the same project.
Related FAIR Principle 
Cross-reference FAIR indicators 

DSM-2-C4

IdentifierDSM-2-C4
NameWhere applicable, Dataset Field Values are standardized against a locally defined Data Dictionary within and across related datasets
Maturity Level2
CategoryContent and Context
Granularity LevelDataset Field Values
DescriptionThis is a data-related requirement that focuses on the consistency of a Dataset’s textual content. This is also related to the ‘accuracy’ and overall consistency of data content within and across multiple related project datasets. Level 2 content standardisation is not required to comply with standard terminologies or ontologies. However, to achieve Level 2 content standardisation, textual values reported in text-based Dataset Fields are expected to be consistently reported using locally defined terms. These local terms are defined in a local Data Dictionary that ought to be reported as well as part of the content-related metadata (DSM-2-C6).
Related DSM IndicatorDSM-2-C6
Related FAIR PrincipleF2. Data are described with rich metadata, R1. Meta(data) are richly described with a plurality of accurate and relevant attributes
Cross-reference FAIR indicators 

DSM-2-C5

IdentifierDSM-2-C5
NameDataset Descriptor includes reference to related Datasets and if applicable the relevant joining Dataset Fields
Maturity Level2
CategoryContent and Context
Granularity LevelDataset Level
DescriptionThe Reference Fields that allow you to join Datasets are included in the Dataset Descriptor (metadata).
Related DSM IndicatorDSM-2-C3
Related FAIR Principle 
Cross-reference FAIR indicators 

DSM-2-C6

IdentifierDSM-2-C6
NameDataset Descriptor includes [Field-level Metadata]8https://fairplus.github.io/Data-Maturity/docs/Glossary/#field-level-metadata) as prescribed by the adopted Dataset Model
Maturity Level2
CategoryContent and Context
Granularity LevelDataset Field
DescriptionThis is a metadata-related requirement focusing on the Dataset’s structure. This is a requirement to include structural metadata into the Dataset’s Metadata Record irrespective of how this information is represented (DSM-2-R1). Dataset-Field metadata include ‘field name’, ‘description’, ‘data type’
Related DSM IndicatorDSM-2-R1
Related FAIR PrincipleF2. Data are described with rich metadata, R1. Meta(data) are richly described with a plurality of accurate and relevant attributes
Cross-reference FAIR indicatorsRDA-F2-01M, FsF-R1-01MD

DSM-2-C7

IdentifierDSM-2-C7
NameDataset Descriptor includes Value-level Metadata or if applicable includes a reference to the Data Dictionary
Maturity Level2
CategoryContent and Context
Granularity LevelDataset Field Values
DescriptionThis is a metadata-related requirement. This indicator is related to F+MM-2.C5, which requires that textual data values used within and across related datasets should consistently reported using locally defined terms or values. In case of using numeric values instead of textual values, a data dictionary is needed to map these values and allow users to interpret the data. Therefore, a data dictionary that associated each Dataset Field with its associated list of permissible terms or values and their meanings should also be made available. This could either be represented inside the Metadata Record itself if the metadata schema allows (DSM-2-R3), or otherwise represented separately.
Related DSM IndicatorDSM-2-R3
Related FAIR PrincipleF2. Data are described with rich metadata, R1. Meta(data) are richly described with a plurality of accurate and relevant attributes
Cross-reference FAIR indicators 

DSM-2-H1

IdentifierDSM-2-H1
NameData hosting environment stores data in accordance to a locally defined Domain Model for persistence purposes
Maturity Level2
CategoryHosting Environment
CapabilityStorage
DescriptionThis is a data-storage related requirement. In order to provide a basic level of contextual browsing or searching capabilities, the data hosting environment/resource should offer a common data model albeit being a locally defined one or project-specific one, against which all hosted datasets can be navigated and explored against.
Related DSM Indicator 
Related FAIR PrincipleR1. Meta(data) are richly described with a plurality of accurate and relevant attributes
Cross-ref FAIR indicators 

DSM-2-H2

IdentifierDSM-2-H2
NameMetadata hosting environment provides programmatic access and retrieval (API) for the Dataset Descriptor
Maturity Level2
CategoryHosting Environment
CapabilityMetadata Retrieval
DescriptionThis is a metadata-retrieval-related requirement. The metadata hosting environment (which could be the same or different from the data hosting environment) should offer the capability to retrieve the Dataset’s Metadata Record using API technologies like REST, RPC or GRAPHQL.
Related DSM Indicator 
Related FAIR PrincipleA1. (Meta)data are retrievable by their identifier using a standardised communications protocol
Cross-ref FAIR indicatorsRDA-A1.1-01M, FsF-A1-02M

DSM-2-H3

IdentifierDSM-2-H3
NameData hosting environment offers the capability to browse and search related Datasets
Maturity Level2
CategoryHosting Environment
CapabilitySearching Capability
DescriptionThis capability provides enhanced contextual interpretation of multiple related datasets when they are commonly linked to a study or a project. This capability is enabled by the hosting environment’s capitalising on contextual metadata and dataset structural metadata made available at this level of maturity and established by dsm-22c, dsm24c and dsm-26c.
Related DSM Indicatordsm-22c, dsm-24c, dsm-26c
Related FAIR PrincipleF4. (Meta)data are registered or indexed in a searchable resource
Cross-ref FAIR indicators 

DSM-2-R1

IdentifierDSM-2-R1
NameContextual Metadata necessary to understand and interpret Datasets’ content is defined and conforms to a locally defined Domain Model
Maturity Level2
CategoryMetadata Representation
Granularity LevelProject
DescriptionThis is a metadata-related requirement focusing on the representation of the reported Contextual Metadata (DSM-2-C2). For level 2, having a human interpretable representation suffices to pass this requirement. This can be a visual diagram, or textual documentation that can be available from the hosting environment’s documentation pages.
Related F+MM IndicatorDSM-2-C2
Related FAIR PrincipleR1. Meta(data) are richly described with a plurality of accurate and relevant attributes
Cross-reference FAIR indicatorsRDA-R1.3-01M, FsF-R1.3-01M

DSM-2-R2

IdentifierDSM-2-R2
NameProject collected Data are organized into structured Dataset(s) and conform to a locally defined Dataset Model
Maturity Level2
CategoryData Representation
Granularity LevelDataset
DescriptionThis is a data-modelling related requirement. More specifically, this indicator focuses on the data model used to describe the structure of the Dataset which is the form that data is modelled against for the purpose of being utilized for FAIR sharing and re-use (DSM-2-C1). At Level 2, this model is simply a set of defined Dataset Types or Names and their respective Dataset Fields to be used consistently by all project related Datasets.

This is often represented in the form of pre-defined templates that data owners define for their project data or made available by the data hosting environment to be used for importing and exporting the FAIRified Datasets. This is to guarantee a minimum level of consistency amongst similarly reported datasets, which directly affects the storage capability of the hosting environment. This consistency will enable the hosting environment to store and index multiple datasets against this locally defined dataset model and hence offer better searching and discovery capabilities.
Related DSM IndicatorDSM-2-C1
Related FAIR PrincipleR1. Meta(data) are richly described with a plurality of accurate and relevant attributes
Cross-reference FAIR indicators 

DSM-2-R3

IdentifierDSM-2-R3
NameDataset Descriptor(s) provide a formal representation of the local Dataset Model, if applicable using or extending a Standard Generic Dataset Descriptor Model
Maturity Level2
CategoryMetadata Representation
Granularity LevelDataset
DescriptionThis is a metadata-modelling requirement focusing on the representation of the Dataset Field level and Field Value level metadata required by DSM-2-C4 and DSM-2-C6. This indicator requires that the chosen standard metadata schema used to describe the Dataset should be amenable to represent structural metadata about the Dataset. Each Dataset Field will have a name, description, data type …etc. Value related metadata may include reference to local dictionary of controlled terms used in each field. Examples of generic metadata schemas supporting field-level metadata are DATS and BioSchemas Dataset.
Related DSM IndicatorDSM-2-C1
Related FAIR Principle 
Cross-reference FAIR indicatorsRDA-R1.3-02M

DSM-2-R4

IdentifierDSM-2-R5
NameDataset Descriptor(s) conforms to or extends a Standard Generic Dataset Descriptor Model to describe and represent structural metadata of Dataset(s)
Maturity Level2
CategoryMetadata Format
Granularity LevelDataset
Description 
Related DSM Indicator 
Related FAIR Principle 
Cross-reference FAIR indicators 

DSM-2-R5

IdentifierDSM-2-R5
NameDataset(s) available in Machine Readable Format
Maturity Level2
CategoryData Format
Granularity LevelDataset
DescriptionThis is a data-related formatting requirement. The exchange format used to share the Dataset(s) should be readable by machines. This is not a requirement to use semantic representations, this is a simple requirement to use standard formats (e.g. CSV, JSON, XML or similar) for data exchanged via the relevant API (DSM-2-H2).
Related DSM IndicatorDSM-2-H2
Related FAIR PrincipleI1. (Meta)data use a formal, accessible, shared, and broadly applicable language for knowledge representation.
Cross-reference FAIR indicatorsRDA-I1-02D

Back to top

The FAIRplus DSM Model content is licensed under the Creative Commons Attribution 4.0 International (CC BY 4.0) license.