Search

Creating a Metadata Profile


Recipe metadata

identifier: RX.X

version: v1.0

Difficulty level

Reading Time

20 minutes

Recipe Type

Hands-on

Executable Code

Yes

Intended Audience

Principal Investigators

Data Managers

Data Scientists


Graphical Overview:

graph TD A[Defining a Metadata Requirement Profile: Transcriptomics Data]:::box --> Z:::box Z(fa:fa-pie-chart Requirement Analysis) --> W[fa:fa-file-text fa:fa-bars List of Requirements: Minimal vs Recommended] W:::box --> |Survey State of the Art| C{fa:fa-binoculars
Is there
prior work?}:::box C --> |No| E[fa:fa-magic Create Checklist
fa:fa-check-square
fa:fa-check-square
fa:fa-square-o
fa:fa-check-square]:::box C --> |Yes| D[Minimum Information Checklist]:::box D --> G{Evaluation:
is it enough?}:::box G --> |Yes| H[fa:fa-recycle Reuse Checklist
fa:fa-check-square
fa:fa-check-square
fa:fa-square-o
fa:fa-check-square]:::box G --> |No| I[fa:fa-code-fork Extend Checklist
fa:fa-check-square
fa:fa-check-square
fa:fa-square-o
fa:fa-check-square
fa:fa-check-square]:::box H --> K{Machine
actionable
checklist?
fa:fa-code
fa:fa-cogs}:::box E --> K{Machine
actionable
checklist?
fa:fa-code
fa:fa-cogs}:::box I --> K{Machine
actionable
checklist?
fa:fa-code
fa:fa-cogs}:::box K --> |Entity Mapping to Ontology| L[ontology
tagged
requirements]:::box K --> |Entity Data Typing| M[Data
typed
requirements]:::box M --> |Definition of value sets| N[Ontology
constrained
fa:fa-link requirements]:::box K --> |Formalization| J[Machine Readable Metadata Profile
fa:fa-code
fa:fa-cogs
fa:fa-code]:::box J --> |Implementation| O[User Friendly Metadata Collection
e.g. Form, Tabular template
fa:fa-file-excel-o fa:fa-file-excel-o fa:fa-file-excel-o
fa:fa-group fa:fa-group fa:fa-group]:::box linkStyle 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15 stroke:#2a9fc9,stroke-width:1px,color:#2a9fc9,font-family:avenir; classDef box font-family:avenir,font-size:14px,fill:#2a9fc9,stroke:#222,color:#fff,stroke-width:1px

How to generate a metadata template

The following steps are intended as a starting point to guide the generation of a metadata template.

Step 1: Define competency questions

  • What are the questions you would like to address with the template? Without a set of a competency questions, important variables may easily be forgotten. It is equally possible to collect too much metadata, making the resulting metadata model opaque and difficult to navigate. Competency questions serve as a guide to identify the most relevant experimental factors.

Step 2: Define a Minimal Set Of Metadata (MSOM) according to these questions

  • Compile metadata from different sources
  • Generate consolidated view on metadata by merging attributes as far as possible
  • Differentiate metadata available for most of the studies from metadata occurring rarely (sparse matrix)
  • Identify gaps in the metadata available for most of the studies comprising data that is considered important but has not been captured in the past
  • Define a MSOM to be captured in the future from the metadata that is available for most of the studies and the metadata considered to be important
  • Identify available community standards regarding minimal sets of metadata
  • Add metadata attributes from those community standards to the MSOM, if they are not yet included
  • Assign cardinality to the MSOM (identify mandatory metadata and how many times the attributes may be reported. Some metadata might not be mandatory but are still important to capture, if available)
  • Identify appropriate ontologies representing your data and establish an application ontology (see recipe 4 of UC3)
  • Assign, as far as possible, ontologies to the MSOM and the sparse matrix

Step 3: Introduce semantics into the template

  • Identify most important objects to be represented in the model (e.g. study, sample, treatment, result, etc.)
  • Make sure to have an appropriate naming strategy for the objects (e.g. an NGSstudy is an OMICSstudy is a Study; do not call an NGSstudy a Study; make sure the granularity fits your purposes)
  • Assign MSOM and sparse matrix attributes to the respective objects
  • Identify and introduce relationships among the identified objects (e.g. “an NGSstudy contains samples”, “a result is derived from a sample”)
  • Identify dependencies to data not represented as objects at this point in time, but, e.g. as termlists
  • Make sure that your model can be expanded subsequently to represent those data as objects, as well
  • Integrate the sparse matrix of metadata not contained in the MSOM in the model

Step 4: Reality check

  • Introduce measures allowing identifying errors in reported data according to your model
  • Expose your model to actual data delivered by independent colleagues and capture the errors and gaps that occurred
  • Identify errors and gaps that are related to the model and not occurring due to errors in the data
  • Adjust the model according to these errors and gaps
  • Re-iterate the reality check until no more severe errors and gaps are occurring that are relevant for the previously defined competency questions

Authors

Name Affiliation ORCID CRediT role
0000-0000-0000-0000 Writing - Original Draft
Writing - Original Draft

License