Tuesday, 15 November 2011

GoBioSpace' depositors

GoBioSpace is a tool to turn measured masses into source tagged sum-formulas and with this blog entry I want to focus on the datasources.

In contrast to combinatorial sum formula prediction tools coming up with a blank formula, the GoBioSpace user gets a formula tagged with many more information such as names, InChIs, hyperlinks and so on. To make this clear GoBioSpace heavily depends on its data sources, and PubChem (Compound and Substance) and ChemSpider are the biggest ones. First, I want to thank those data sources for making the data available to the community and I want to thank for the time and effort the people spend in developing this databases. Second, I want to give a short statistic with respect to the data sources. The table given below lists all databases sourced into GoBioSpace showing the total number of sum formula coming from this database and the number of sum-formula which are made available only from this depositor (no other depositor published this formula).

depositor name total formula formula just here
PubChem (ChemSpider)1,582,073251,834
ChemSpider 2011.06.011,268,051105,062
ChemSpider 2008.09.281,178,614124,046
PubChem (DiscoveryGate)1,011,3458,049
PubChem (NextBio)958,6961,417
PubChem (Thomson Pharma)761,17642,357
PubChem (MolPort)428,11710,980
PubChem (ChemDB)421,222198
PubChem (ZINC)362,51361,484
PubChem (Ambinter)297,5309,077
PubChem (ChEMBL)196,4964,182
PubChem (ChemBank)195,10416,284
PubChem (Vitas-M Laboratory)117,95135
PubChem (ChemIDplus)117,8652,357
PubChem (BindingDB)111,4522,458
PubChem (DTP/NCI)96,9965,118
PubChem (NIAID)87,5861,331
PubChem (ChemBridge)81,5250
PubChem (ASINEX)75,9160
PubChem (MLSMR)75,731659
PubChem (Specs)67,7223
PubChem (LeadScope)65,416257
PubChem (ICCB-Longwood/NSRB Screening Facility, Harvard Medical School)64,003493
PubChem (ChemExper Chemical Directory)61,3310
PubChem (NIST)53,5326
PubChem (AAA Chemistry)50,03347
PubChem (ChemBlock)48,9100
PubChem (NovoSeek)43,80747
PubChem (Emory University Molecular Libraries Screening Center)42,4816
PubChem (Southern Research Institute)39,2241
PubChem (MTDP)36,8492
PubChem (NCGC)35,575291
PubChem (Burnham Center for Chemical Genomics)23,96419
PubChem (Abbott Labs)22,196287
PubChem (Broad Institute)20,506187
PubChem (Sigma-Aldrich)18,9202
PubChem (NIST Chemistry WebBook)18,5010
PubChem (NMRShiftDB)16,89620
PubChem (UPCMLD)15,8366
PubChem (IS Chemical Technology)15,583267
PubChem (GLIDA, GPCR-Ligand Database)14,497318
KNApSAcK 201113,869222
PubChem (The Scripps Research Institute Molecular Screening Center)12,54614
PubChem (MMDB)12,495862
PubChem (Kingston Chemistry)12,3010
PubChem (KEGG)117,3279
PubChem (MP Biomedicals)10,719263
PubChem (ChemSynthesis)10,5703
PubChem (ChEBI)9,901428
PubChem (GlaxoSmithKline (GSK))9,72896
PubChem (Aronis)9,6991
PubChem (HDH Pharma)9,6433
PubChem (TCI (Tokyo Chemical Industry))9,469110
PubChem (Hangzhou APIChem Technology)6,8930
KNApSAcK v1.200.036,7721
KNApSAcK v1.200.026,7244
PubChem (SMID)5,4380
PubChem (EPA DSSTox)5,42310
PubChem (DrugBank)5,39815
PubChem (BioCyc)5,237810
PubChem (Hangzhou Trylead Chemical Technology)4,9983
PubChem (Prous Science Drugs of the Future)4,8199
PubChem (LipidMAPS)4,78076
PubChem (Tractus)4,51533
PubChem (Alinda Chemical)4,4210
PubChem (NMMLSC)4,3572
PubChem (R&D Chemicals)4,2020
PubChem (Nature Chemical Biology)3,918125
PubChem (Jamson Pharmachem Technology)3,33421
PubChem (Comparative Toxicogenomics Database)3,27213
PubChem (KUMGM)3,2156
PubChem (Shanghai Institute of Organic Chemistry)3,140253
PubChem (Tyger Scientific)2,6451
PubChem (PDSP)2,6280
PubChem (xPharm)1,9591
PubChem (Ennopharm)1,9333
Human Metabolome Database1,8156
Target Lipids1,574615
PubChem (Nature Chemistry)1,514142
PubChem (Calbiochem)1,45613
PubChem (University of Pittsburgh Molecular Library Screening Center)1,4000
PubChem (Vanderbilt Specialized Chemistry Center)1,38471
PubChem (Biosynth)1,32637
PubChem (BIDD)1,3200
PubChem (Exchemistry)1,2902
PubChem (CMLD-BU)1,2690
PubChem (UCLA Molecular Screening Shared Resource)1,2381
PubChem (MOLI)1,2191
Yeast Metabolome Database (2011)1,18335
PubChem (Circadian Research, Kay Laboratory, University of California at San Diego (UCSD))1,1790
Maximum Recommended Therapeutic Dose (MRTD) Database 1,1010
PubChem (BIND)9820
PubChem (NINDS Approved Drug Screening Program)9640
PubChem (UM-BBD)8978
PubChem (Total TOSLab Building-Blocks)7820
PubChem (InFarmatik)7260
PubChem (Golm Metabolome Database (GMD), Max Planck Institute of Molecular Plant Physiology)7232
PubChem (NIH Clinical Collection)7052
YEASTNET Vers. 4, (2011)5210
PubChem (Alsachim)5178
PubChem (MIC Scientific)5080
PubChem (Molecular Libraries Program, Specialized Chemistry Center, University of Kansas)4315
PubChem (Selleck Chemicals)4072
PubChem (Biological Magnetic Resonance Data Bank (BMRB))3940
PubChem (CC_PMLSC)3580
PubChem (Shanghai Sinofluoro Scientific Company)3490
PubChem (MICAD)33936
PubChem (True PharmaChem)3160
PubChem (Columbia University Molecular Screening Center)3121
PubChem (SGCOxCompounds)2770
PubChem (EMD Biosciences)2677
PubChem (SRMLSC)2490
PubChem (Avanti Polar Lipids)232109
PubChem (Nantong Baihua Bio-Pharmaceutical Co., Ltd)1740
PubChem (Creasyn Finechem)1701
PubChem (Structural Genomics Consortium)890
PubChem (IUPHAR-DB)880
PubChem (Chemical Biology Department, Max Planck Institute of Molecular Physiology)820
PubChem (PCMD)791
PubChem (Amatye)680
PubChem (PANACHE)650
PubChem (iThemba Pharmaceuticals)650
PubChem (Vanderbilt University Medical Center)5512
PubChem (Excenen Pharmatech)430
PubChem (PENN-ABS)390
PubChem (Ambit Biosciences)380
PubChem (Nature Communications)360
PubChem (Vanderbilt Screening Center for GPCRs, Ion Channels and Transporters)300
PubChem (Johns Hopkins Ion Channel Center)200
PubChem (SGCStoCompounds)170
PubChem (Web of Science)160
PubChem (Zancheng Functional Chemicals)141
PubChem (Southern Research Specialized Biocontainment Screening Center)140
PubChem (Laboratory of Environmental Genomics, Carolina Center for Computational Toxicology, University of North Carolina at Chapel Hill)140
PubChem (PennChem-GAM)121
PubChem (Annker Organics)120
PubChem (Nitric Oxide Research, National Cancer Institute (NCI))80
PubChem (Paul Baures)60
PubChem (Isoprenoids)40
PubChem (Ganolix LifeScience)20
PubChem (CLRI (CSIR))20
PubChem (Finley and King Labs, Harvard Medical School)10
PubChem (Bioprocess Technology Lab, Department of Microbiology, Bharathidasan University)10
PubChem (VIT University)10

PubChem was imported at 2011-FEB-15th. PubChem refers to PubChem Compound where as PubChem (...) refers to PubChem Substance with the specific sub database
One striking observation is that ChemSpider has in the later import 2011 a large proportion of unique formula,which are not included in the import from 2008. In fact, many of those formula are tagged as "This record is deprecated and may be removed soon." on the ChemSpider website.What does this mean for the chemical formula? Is the formula valid?

Some other observation from the import of the Yeast Metabolome Database is, that also smaller databases contribute formula which are not included so far in the larger databases PubChem and ChemSpider.

Any thoughts?

