2. InChI and SMILES identifiers for chemical structures




Recipe Overview
Reading Time
15 minutes
Executable Code
Yes
Difficulty
InChI and SMILES identifiers for chemical structures
FAIRPlus logo
Recipe Type
Hands-on
Audience
Chemoinformatician, Data Curator, Data Manager, Data Scientist

2.1. Main Objectives

The main purpose of this recipe is:

To take an SDF file, validate the content for chemical inconsistencies, and generate InChIs, InChIKeys, and SMILES for each entry in the SDF file.


2.2. Requirements

  • Skill depedency:

    • Bash experience

  • Technical requirements:

    • Groovy


2.3. FAIRification Objectives, Inputs and Outputs

Actions.Objectives.Tasks

Input

Output

validation

Structure Data File (SDF)

report

calculation

Structure Data File (SDF)

InChI

calculation

Structure Data File (SDF)

SMILES


2.4. Creating InChI and SMILES identifiers for chemical structures

To run the below scripts, you need a Groovy installation. The Groovy scripts use version 2.5 of the Chemistry Development Kit (see 2). This library and its use in Groovy is further explain in the book Groovy Cheminformatics with the Chemistry Development Kit. Check this git repository for more detailed use instructions and where to find the tools: https://github.com/FAIRplus/fairplus-sdf

2.4.1. Record validation

When generating InChIs, the InChI library (see 1) may return several success states reflecting issues with the compound record in the SDF file, including: WARNING and ERROR. This first script reports such issues:

groovy badRecords.groovy -f foo.sdf

The output may look like this:

Sulfinpyrazone  Omitted undefined stereo        WARNING
Isosorbide mononitrate  Charges were rearranged WARNING
Compound52      Proton(s) added/removed WARNING

2.4.2. Calculate InChls

Similarly, InChIKeys can be generated:

groovy inchikeys.groovy -f foo.sdf

When the success state is ERROR, nothing is outputted.

2.4.3. Calculate SMILES strings

The last script calculates a SMILES for each entry in the SDF file:

groovy smiles.groovy -f foo.sdf

2.5. Conclusion

This recipe explained who to validate the chemical structures in an SDF file, and convert them to SMILES, InChI, and InChIKey. The latter can then be used with BridgeDb and its metabolite ID mapping databases to get additional identifiers.


2.6. References

1

Jonathan M. Goodman, Igor Pletnev, Paul Thiessen, Evan Bolton, and Stephen R. Heller. Inchi version 1.06: now more than 99.99. Journal of Cheminformatics, may 24 2021.

2

Egon Willighagen, John W Mayfield, Jonathan Alvarsson, Arvid Berg, Lars Carlsson, Nina Jeliazkova, Stefan Kuhn, Tomáš Pluskal, Miquel Rojas-Chertó, Ola Spjuth, Gilleain Torrance, Chris T. Evelo, Rajarshi Guha, and Christoph Steinbeck. The Chemistry Development Kit (CDK) v2.0: atom typing, depiction, molecular formulas, and substructure searching. Journal of Cheminformatics, jun 6 2017.


2.7. Authors

Name

ORCID

Affiliation

Type

ELIXIR Node

Contribution

Maastricht University

Writing - Original Draft, Conceptualization

University of Oxford

Writing - Review & Editing


2.8. License

This page is released under the Creative Commons 4.0 BY license.