Tools and data formats for chemical data handling

From BlueObelisk

Jump to: navigation, search

Enabling chemists to send experimental or theoretical data together with a publication requires software (commercial and open access) which can create, handle or transform chemistry related data. This includes chemical drawings, reactions, spectral data and chemical property data.

Data for publication supplements should be submitted in open data formats (XML, CML, ThermoML, JCAMP) or at least in data formats which are well defined (like SD format V3000).Chemical data supplements should not be submitted in PDF format, a format which destroys chemical information and hinders automated machine readability. The publishing of chemical molecule and reaction drawings as picture data (TIFF, BMP, PNG) is needed for the print process, but breaks any simple computer capturing process. Instead such chemical bitmap data needs to be run through an optical character recognition process (OCR) to capture the chemical formulas. This process is not error-free, has a poor accuracy and would not be needed if the chemical meta data is submitted as CML.

Every modern chemistry software can export CML for molecule and reaction drawings and every software which captures experimental thermodynamic or spectroscopic data must support open data exchange formats (JCAMP, netCDF, ThermoML and others).


Contents

Tools

Molecule drawings

This section includes tools and data formats for molecules (mol, sdf, cml, SMILES) and reaction data (rnx, rdx, cml, SMARTS, SMIRKS). Chemical drawings should be exported at CML (Chemical markup language) or mol format. No software or vendor specific format or even worse picture formats (BMP, JPEG, TIFF) should be used. If possible a list of InChI codes (InChIKey) should be created from all molecules. Examples see below.

Name Vendor Open/Closed Source Operating System Note
ISISDraw MDL Closed Windows software (deprecated no CML import/export; copy/paste into other programs is possible)
ChemDraw CambridgeSoft Closed Windows software
ChemSketch ACDLabs Closed Windows software
MarvinSketch BioRad Closed Windows
KnowItAll BioRad Closed Windows
XDrawChem [1] Open Windows/LINUX/OSX
JChemPaint [2] Open Platform Independent
Bioclipse [3] Open Windows/Linux/OS-X
Chemical reaction drawings

Chemical drawings should be exported into CML or RNX format. Examples see below.

Name Vendor Open/Closed Source Operating System Note
ISISDraw MDL Closed Windows software (deprecated no CML import/export; copy/paste into other programs is possible)
ChemDraw CambridgeSoft Closed Windows software
ChemSketch ACDLabs Closed Windows software
MarvinSketch BioRad Closed Windows
KnowItAll BioRad Closed Windows
JChemPaint [4] Open Platform Independent
Bioclipse [5] Open Windows/Linux/OS-X


Building and visualising molecules

The below table provides is only intended to provide an overview of the functionality of a limited number of codes. The pages linked to in the "Special Features" section are places for users/developers to highlight particular strengths or unique features of a code.

For a more comprehensive list of the various builders and visualisers that are available, please see the Linux4Chemistry list or Mario Valle's list of Free Chemistry Visualisation Tools.

Program Building Visualising Platforms Open Special Features
Small Mol. Large Struct. Periodic Struct. Internal Minimiser Molecules Isosurfaces Vector Fields Windows Mac OSX Linux
Aten y y y y y y - - y y ? AtenFeatures
Avogadro y y y y y y - y y y y AvogadroFeatures
CCP1GUI y - - y y y y y y y ? CCP1GUIFeatures
Jmol y y y y y y - y y y y JmolFeatures
Molden y - - y y y - y y y ? MoldenFeatures
Molekel - - - - y y - y y y ? MolekelFeatures
Zeobuilder y y y - y - - - - y ? ZeobuilderFeatures
Jamberoo y y y y y y ZeobuilderFeatures
Chemical file format converters

Such converter tools can be used to convert chemical data into accepted data formats (CML, MOL, SDF, PDB).


Chemical property data handling and storage

Pure experimental and calculated molecular property data (mp, bp, logP, pKa, solubility, toxicity, molecular descriptors, toxicity data) should be supplied in open data formats like XML, allowed but discouraged are also TXT (TAB separeated) or XLS (BIFF4 or later) format. If molecular data is available the files should can be exported in SDF format together with molecular information. Forbidden are supplements in PDF format. Large files should be compressed in .gz or .zip format.

Name Vendor Open/Closes Source Note
Bioclipse Bioclipse team Open
EXCEL Microsoft Closed
Calc Spreadsheet OpenOffice Open
Instant-JChem ChemAxon Closed
ACDLabs Closed several spectral data packages
7ZIP  ? free compression and decompression tool for WIN/LINUX/OSX
TRC  ? tools for ThermoML conversion, capturing of experimental data and data format conversions
Spectral data and hyphenated techniques data

Here we are talking about NMR, MS, UV, IR, GC-MS, LC-MS, LC-UV.

  • Vendor specfic software (hardware dependent)
  • BioClipse - BioClipse team
  • ACDLabs - several spectral data packages
  • GRAMS - Thermo Grams/AI


Data Formats

Molecular data

Allowed but discouraged are vendor specific formats (like .skc in case of ISIS Draw or SMILES). Large files should be compressed in .gz format (GNU ZIP) or .zip format.

  • CML (Chemical Markup Language)
  • SD file format (V2000, V3000 form MDL)
  • MOL format (MDL)
  • PDB format
  • SMARTS (Daylight)
  • InChI (IUPAC and NIST)
  • InChIKey (IUPAC and NIST) short InChI hash code - IUPAC


Quantum Chemistry
Nuclear Magnetic Resonance (NMR)
  • JCAMP


Mass Spectrometry (IR)
  • JCAMP


Infrared Data (IR)
  • JCAMP


Optical Spectroscopy in general
  • JCAMP


Crystal Structures
  • CIF
  • PDB


GC-MS data


LC-MS data


Thermodynamic Property Data
  • ThermoML - IUPAC and NIST standard format for thermodynamic property data (bp, entropy, solubility and 120 other properties)


BACK to Open Data in Chemistry

Personal tools