2. InChI and SMILES identifiers for chemical structures

Recipe Overview
Reading Time
15 minutes
Executable Code
No
Difficulty
Creating InChI & SMILES identifiers for chemical structures
FAIRPlus logo
Recipe Type
Hands-on
Audience
Maturity Level & Indicator
DSM-4-C4
hover me Tooltip text

2.1. Main Objectives

The main purpose of this recipe is:

To take an SDF file, validate the content for chemical inconsistencies, and generate InChIs, InChIKeys, and SMILES for each entry in the SDF file.


  • Skill dependency:

    • Bash experience

  • Technical requirements:

    • Groovy

2.2. Creating InChI and SMILES identifiers for chemical structures

To run the below scripts, you need a Groovy installation. The Groovy scripts use version 2.7.1 of the Chemistry Development Kit (see 2). This library and its use in Groovy is further explain in the book Groovy Cheminformatics with the Chemistry Development Kit. Check this git repository for more detailed use instructions and where to find the tools: https://github.com/FAIRplus/fairplus-sdf

2.2.1. Record validation

When generating InChIs, the InChI library (see 1) may return several success states reflecting issues with the compound record in the SDF file, including: WARNING and ERROR. This first script reports such issues:

groovy badRecords.groovy -f foo.sdf

The output may look like this:

Sulfinpyrazone  Omitted undefined stereo        WARNING
Isosorbide mononitrate  Charges were rearranged WARNING
Compound52      Proton(s) added/removed WARNING

2.2.2. Calculate InChls

Similarly, InChIKeys can be generated:

groovy inchikeys.groovy -f foo.sdf

When the success state is ERROR, nothing is outputted.

2.2.3. Calculate SMILES strings

The last script calculates a SMILES for each entry in the SDF file:

groovy smiles.groovy -f foo.sdf

2.3. Conclusion

This recipe explained who to validate the chemical structures in an SDF file, and convert them to SMILES, InChI, and InChIKey. The latter can then be used with BridgeDb and its metabolite ID mapping databases to get additional identifiers.

2.4. References

2.5. Authors