Packaging ISA as a Research Object (RO) - Dataset Maturity Level 4



Recipe Overview
Reading Time
15 minutes
Executable Code
Yes
Difficulty
Dissemination - Packaging ISA as a Research Object (RO)
FAIRPlus logo
Recipe Type
Hands-on
Maturity Level & Indicator
DSM-1-C3
hover me Tooltip text

Abstract:

The goal of this tutorial is to show how to package a dataset, an ISA JSON-LD document with the associated raw data files and a computational workflow available as a CWL file in this example, as a minimal Research Object crate.

To do so, we will be using:

Let’s get started by getting all necessary modules:

import os
import json
import datetime
import isatools
import uuid
import hashlib
import datetime
from json import load
from rocrate.rocrate import ROCrate
from rocrate.model.person import Person
from rocrate.model.dataset import Dataset
from rocrate.model.softwareapplication import SoftwareApplication
from rocrate.model.computationalworkflow import ComputationalWorkflow
from rocrate.model.computerlanguage import ComputerLanguage
from rocrate import rocrate_api

Packaging the ISA various serializations (Tab, JSON, JSON-LD) as a Research Object Crate

With the previous notebooks (recipes FCBXY1 and FCBXY2), we generated several distinct ISA documents:

  • a basic ISA-Tab descriptor.

  • a more completely described ISA-JSON descriptor, meeting communication metadata annotation.

  • a semantically typed ISA JSON-LD descriptor, which is an RDF serialization of the same information.

We will be using the RDF serialization, associated raw data files (dummy FASTQ files), a computational workflow available as a CWL file.

1. Instantiating a Research Object and providing basic metadata

ontology = "obo"
a_crate_for_isa = ROCrate()
# a_crate_for_isa.id = "#research_object/" + str(ro_id)
a_crate_for_isa.name = "ISA JSON-LD representation of BII-S-3"
a_crate_for_isa.description = "ISA study serialized as JSON-LD using " + ontology + " ontology mapping"
a_crate_for_isa.keywords = ["ISA", "JSON-LD"]
a_crate_for_isa.license = "https://creativecommons.org/licenses/by/4.0/"
# a_crate_for_isa.creator = Person(a_crate_for_isa, "https://www.orcid.org/0000-0001-9853-5668", {"name": "Philippe Rocca-Serra"})
test = a_crate_for_isa.add()

2. Improving Reusability by setting a license for the RO-Crate.

a_crate_for_isa.license = "https://creativecommons.org/licenses/by/4.0/"

3. Allowing proper credit by associating authors and creators author to a globally unique identifier.

In this case, we show how to use an ORCID to do so but using the creator property of the RO-crate object, and building a Person object

a_crate_for_isa.creator = Person(a_crate_for_isa,"https://www.orcid.org/0000-0001-9853-5668")

4. Adding two ISA RDF serializations to the newly created Research Object create.

# instance_path = os.path.join("./output/BII-S-3-synth/", "isa-new_ids.json")
#
# with open(instance_path, 'r') as instance_file:
#         instance = load(instance_file)
#         instance_file.close()

isa_json_ld_path = os.path.join("./output/BII-S-3-synth/", "isa-new_ids-BII-S-3-ld-" + ontology + "-v1.json")
isa_nquads_path = os.path.join("./output/BII-S-3-synth/", "isa.ttl")

files = [isa_json_ld_path, isa_nquads_path ]
# with a python comprehension, we do it like this:
[a_crate_for_isa.add_file(file) for file in files]

5. Now adding a dataset to the Research Object, which is meant to describe a bag of associated images.

ds = Dataset(a_crate_for_isa, "raw_images")
ds.format_id="http://edamontology.org/format_3604"
ds.datePublished=datetime.datetime.now()
ds.as_jsonld=isa_json_ld_path
a_crate_for_isa.add(ds)
6. Next, we create a Computational Workflow object and we add it to the Research Object

tip

Note that the Computation Workflow may also be representated as an ISA Protocol Object.

wf = ComputationalWorkflow(a_crate_for_isa, "metagenomics-sequence-analysis.cwl")
wf.language="http://edamontology.org/format_3857"
wf.datePublished=datetime.datetime.now()

with open("metagenomics-sequence-analysis.cwl","rb") as f:
        bytes = f.read()
        new_hash = hashlib.sha256(bytes).hexdigest()

wf.hash=new_hash
a_crate_for_isa.add(wf)

7. Finally, we write the Research Object to file

ro_outpath = "./output/BII-S-3-synth/ISA_in_a_ROcrate"
a_crate_for_isa.write_crate(ro_outpath)

with open(os.path.join(ro_outpath,"ro-crate-metadata.json"), 'r') as handle:
        #     print(handle)
        parsed = json.load(handle)

print(json.dumps(parsed, indent=4, sort_keys=True))

8. Alternately, a zipped archive can be created as follows:

a_crate_for_isa.write_zip(ro_outpath)

et Voilà!

Conclusion:

With this content type, we have briefly introduced the notion of RO-Crate as a mechanism to package data and associated metadata using a python library providing initial capability by offering a minimal implementation of the specifications. The current iteration of the python library presents certain limitations. For instance, it does not provide the necessary functionality to allow recording of Provenance information. However, this can be easily accomplished by extending the code. The key message behind this recipe is simply to show that RO-crate can improve over simply zipping a bunch of files together by providing a little semantic over the different parts making up an archive. Also, it is important to bear in mind that the Research Object crate is nascent and more work is needed to define use best practices and implementation profiles.

What to read next ?

  • What is Provenance information?

  • Upload to Zenodo and get a DOI

  • How to make workflow FAIR ?

Authors