Tools and data formats for chemical data handling
From BlueObelisk
Enabling chemists to send experimental or theoretical data together with a publication requires software (commercial and open access) which can create, handle or transform chemistry related data. This includes chemical drawings, reactions, spectral data and chemical property data.
Data for publication supplements should be submitted in open data formats (XML, CML, ThermoML, JCAMP) or at least in data formats which are well defined (like SD format V3000).Chemical data supplements should not be submitted in PDF format, a format which destroys chemical information and hinders automated machine readability. The publishing of chemical molecule and reaction drawings as picture data (TIFF, BMP, PNG) is needed for the print process, but breaks any simple computer capturing process. Instead such chemical bitmap data needs to be run through an optical character recognition process (OCR) to capture the chemical formulas. This process is not error-free, has a poor accuracy and would not be needed if the chemical meta data is submitted as CML.
Every modern chemistry software can export CML for molecule and reaction drawings and every software which captures experimental thermodynamic or spectroscopic data must support open data exchange formats (JCAMP, netCDF, ThermoML and others).
Contents |
Tools
Molecule drawings
This section includes tools and data formats for molecules (mol, sdf, cml, SMILES) and reaction data (rnx, rdx, cml, SMARTS, SMIRKS). Chemical drawings should be exported at CML (Chemical markup language) or mol format. No software or vendor specific format or even worse picture formats (BMP, JPEG, TIFF) should be used. If possible a list of InChI codes (InChIKey) should be created from all molecules. Examples see below.
| Name | Vendor | Open/Closed Source | Operating System | Note |
|---|---|---|---|---|
| ISISDraw | MDL | Closed | Windows software | (deprecated no CML import/export; copy/paste into other programs is possible) |
| ChemDraw | CambridgeSoft | Closed | Windows software | |
| ChemSketch | ACDLabs | Closed | Windows software | |
| MarvinSketch | BioRad | Closed | Windows | |
| KnowItAll | BioRad | Closed | Windows | |
| XDrawChem | [1] | Open | Windows/LINUX/OSX | |
| JChemPaint | [2] | Open | Platform Independent | |
| Bioclipse | [3] | Open | Windows/Linux/OS-X |
Chemical reaction drawings
Chemical drawings should be exported into CML or RNX format. Examples see below.
| Name | Vendor | Open/Closed Source | Operating System | Note |
|---|---|---|---|---|
| ISISDraw | MDL | Closed | Windows software | (deprecated no CML import/export; copy/paste into other programs is possible) |
| ChemDraw | CambridgeSoft | Closed | Windows software | |
| ChemSketch | ACDLabs | Closed | Windows software | |
| MarvinSketch | BioRad | Closed | Windows | |
| KnowItAll | BioRad | Closed | Windows | |
| JChemPaint | [4] | Open | Platform Independent | |
| Bioclipse | [5] | Open | Windows/Linux/OS-X |
Building and visualising molecules
The below table provides is only intended to provide an overview of the functionality of a limited number of codes. The pages linked to in the "Special Features" section are places for users/developers to highlight particular strengths or unique features of a code.
For a more comprehensive list of the various builders and visualisers that are available, please see the Linux4Chemistry list or Mario Valle's list of Free Chemistry Visualisation Tools.
| Program | Building | Visualising | Platforms | Open | Special Features | |||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Small Mol. | Large Struct. | Periodic Struct. | Internal Minimiser | Molecules | Isosurfaces | Vector Fields | Windows | Mac OSX | Linux | |||
| Aten | y | y | y | y | y | y | - | - | y | y | ? | AtenFeatures |
| Avogadro | y | y | y | y | y | y | - | y | y | y | y | AvogadroFeatures |
| CCP1GUI | y | - | - | y | y | y | y | y | y | y | ? | CCP1GUIFeatures |
| Jmol | y | y | y | y | y | y | - | y | y | y | y | JmolFeatures |
| Molden | y | - | - | y | y | y | - | y | y | y | ? | MoldenFeatures |
| Molekel | - | - | - | - | y | y | - | y | y | y | ? | MolekelFeatures |
| Zeobuilder | y | y | y | - | y | - | - | - | - | y | ? | ZeobuilderFeatures |
| Jamberoo | y | y | y | y | y | y | ZeobuilderFeatures | |||||
Chemical file format converters
Such converter tools can be used to convert chemical data into accepted data formats (CML, MOL, SDF, PDB).
- OpenBabel an open source project - OpenBabel
- Molconvert a free converter - Molconvert
Chemical property data handling and storage
Pure experimental and calculated molecular property data (mp, bp, logP, pKa, solubility, toxicity, molecular descriptors, toxicity data) should be supplied in open data formats like XML, allowed but discouraged are also TXT (TAB separeated) or XLS (BIFF4 or later) format. If molecular data is available the files should can be exported in SDF format together with molecular information. Forbidden are supplements in PDF format. Large files should be compressed in .gz or .zip format.
| Name | Vendor | Open/Closes Source | Note |
|---|---|---|---|
| Bioclipse | Bioclipse team | Open | |
| EXCEL | Microsoft | Closed | |
| Calc Spreadsheet | OpenOffice | Open | |
| Instant-JChem | ChemAxon | Closed | |
| ACDLabs | Closed | several spectral data packages | |
| 7ZIP | ? | free compression and decompression tool for WIN/LINUX/OSX | |
| TRC | ? | tools for ThermoML conversion, capturing of experimental data and data format conversions |
Spectral data and hyphenated techniques data
Here we are talking about NMR, MS, UV, IR, GC-MS, LC-MS, LC-UV.
- Vendor specfic software (hardware dependent)
- BioClipse - BioClipse team
- ACDLabs - several spectral data packages
- GRAMS - Thermo Grams/AI
Data Formats
Molecular data
Allowed but discouraged are vendor specific formats (like .skc in case of ISIS Draw or SMILES). Large files should be compressed in .gz format (GNU ZIP) or .zip format.
- CML (Chemical Markup Language)
- SD file format (V2000, V3000 form MDL)
- MOL format (MDL)
- PDB format
- SMARTS (Daylight)
- InChI (IUPAC and NIST)
- InChIKey (IUPAC and NIST) short InChI hash code - IUPAC
Quantum Chemistry
- Gaussian
- GAMESS-UK
- GAMESS - the US version GAMESS manual
Nuclear Magnetic Resonance (NMR)
- JCAMP
Mass Spectrometry (IR)
- JCAMP
Infrared Data (IR)
- JCAMP
Optical Spectroscopy in general
- JCAMP
Crystal Structures
- CIF
- PDB
GC-MS data
LC-MS data
Thermodynamic Property Data
- ThermoML - IUPAC and NIST standard format for thermodynamic property data (bp, entropy, solubility and 120 other properties)
BACK to Open Data in Chemistry

