In contrast to combinatorial sum formula prediction tools coming up with a blank formula, the GoBioSpace user gets a formula tagged with many more information such as names, InChIs, hyperlinks and so on. To make this clear GoBioSpace heavily depends on its data sources, and PubChem (Compound and Substance) and ChemSpider are the biggest ones. First, I want to thank those data sources for making the data available to the community and I want to thank for the time and effort the people spend in developing this databases. Second, I want to give a short statistic with respect to the data sources. The table given below lists all databases sourced into GoBioSpace showing the total number of sum formula coming from this database and the number of sum-formula which are made available only from this depositor (no other depositor published this formula).
depositor name | total formula | formula just here |
---|---|---|
PubChem | 1,968,759 | 32,049 |
PubChem (ChemSpider) | 1,582,073 | 251,834 |
ChemSpider 2011.06.01 | 1,268,051 | 105,062 |
ChemSpider 2008.09.28 | 1,178,614 | 124,046 |
PubChem (DiscoveryGate) | 1,011,345 | 8,049 |
PubChem (NextBio) | 958,696 | 1,417 |
PubChem (Thomson Pharma) | 761,176 | 42,357 |
PubChem (MolPort) | 428,117 | 10,980 |
PubChem (ChemDB) | 421,222 | 198 |
PubChem (ZINC) | 362,513 | 61,484 |
PubChem (Ambinter) | 297,530 | 9,077 |
PubChem (ChEMBL) | 196,496 | 4,182 |
PubChem (ChemBank) | 195,104 | 16,284 |
PubChem (Vitas-M Laboratory) | 117,951 | 35 |
PubChem (ChemIDplus) | 117,865 | 2,357 |
PubChem (BindingDB) | 111,452 | 2,458 |
PubChem (DTP/NCI) | 96,996 | 5,118 |
PubChem (NIAID) | 87,586 | 1,331 |
PubChem (ChemBridge) | 81,525 | 0 |
PubChem (ASINEX) | 75,916 | 0 |
PubChem (MLSMR) | 75,731 | 659 |
PubChem (Specs) | 67,722 | 3 |
PubChem (LeadScope) | 65,416 | 257 |
PubChem (ICCB-Longwood/NSRB Screening Facility, Harvard Medical School) | 64,003 | 493 |
PubChem (ChemExper Chemical Directory) | 61,331 | 0 |
PubChem (NIST) | 53,532 | 6 |
PubChem (AAA Chemistry) | 50,033 | 47 |
PubChem (ChemBlock) | 48,910 | 0 |
PubChem (NovoSeek) | 43,807 | 47 |
PubChem (Emory University Molecular Libraries Screening Center) | 42,481 | 6 |
PubChem (Southern Research Institute) | 39,224 | 1 |
PubChem (MTDP) | 36,849 | 2 |
PubChem (NCGC) | 35,575 | 291 |
Metabolome.JP | 25,396 | 806 |
PubChem (Burnham Center for Chemical Genomics) | 23,964 | 19 |
PubChem (Abbott Labs) | 22,196 | 287 |
PubChem (Broad Institute) | 20,506 | 187 |
PubChem (Sigma-Aldrich) | 18,920 | 2 |
PubChem (NIST Chemistry WebBook) | 18,501 | 0 |
PubChem (NMRShiftDB) | 16,896 | 20 |
PubChem (UPCMLD) | 15,836 | 6 |
PubChem (IS Chemical Technology) | 15,583 | 267 |
PubChem (GLIDA, GPCR-Ligand Database) | 14,497 | 318 |
KNApSAcK 2011 | 13,869 | 222 |
PubChem (The Scripps Research Institute Molecular Screening Center) | 12,546 | 14 |
PubChem (MMDB) | 12,495 | 862 |
PubChem (Kingston Chemistry) | 12,301 | 0 |
PubChem (KEGG) | 117,32 | 79 |
PubChem (MP Biomedicals) | 10,719 | 263 |
PubChem (ChemSynthesis) | 10,570 | 3 |
PubChem (ChEBI) | 9,901 | 428 |
PubChem (GlaxoSmithKline (GSK)) | 9,728 | 96 |
PubChem (Aronis) | 9,699 | 1 |
PubChem (HDH Pharma) | 9,643 | 3 |
PubChem (TCI (Tokyo Chemical Industry)) | 9,469 | 110 |
PubChem (Hangzhou APIChem Technology) | 6,893 | 0 |
KNApSAcK v1.200.03 | 6,772 | 1 |
KNApSAcK v1.200.02 | 6,724 | 4 |
KNApSAcK | 6,037 | 0 |
PubChem (SMID) | 5,438 | 0 |
PubChem (EPA DSSTox) | 5,423 | 10 |
PubChem (DrugBank) | 5,398 | 15 |
PubChem (BioCyc) | 5,237 | 810 |
PubChem (Hangzhou Trylead Chemical Technology) | 4,998 | 3 |
PubChem (Prous Science Drugs of the Future) | 4,819 | 9 |
PubChem (LipidMAPS) | 4,780 | 76 |
PubChem (Tractus) | 4,515 | 33 |
PubChem (Alinda Chemical) | 4,421 | 0 |
PubChem (NMMLSC) | 4,357 | 2 |
PubChem (R&D Chemicals) | 4,202 | 0 |
PubChem (Nature Chemical Biology) | 3,918 | 125 |
PubChem (Jamson Pharmachem Technology) | 3,334 | 21 |
PubChem (Comparative Toxicogenomics Database) | 3,272 | 13 |
PubChem (KUMGM) | 3,215 | 6 |
PubChem (Shanghai Institute of Organic Chemistry) | 3,140 | 253 |
PubChem (Tyger Scientific) | 2,645 | 1 |
PubChem (PDSP) | 2,628 | 0 |
PubChem (xPharm) | 1,959 | 1 |
PubChem (Ennopharm) | 1,933 | 3 |
Human Metabolome Database | 1,815 | 6 |
PubChem (ORST SMALL MOLECULE SCREENING CENTER) | 1,653 | 0 |
Target Lipids | 1,574 | 615 |
PubChem (Nature Chemistry) | 1,514 | 142 |
PubChem (Calbiochem) | 1,456 | 13 |
PubChem (University of Pittsburgh Molecular Library Screening Center) | 1,400 | 0 |
PubChem (Vanderbilt Specialized Chemistry Center) | 1,384 | 71 |
PubChem (Biosynth) | 1,326 | 37 |
PubChem (BIDD) | 1,320 | 0 |
PubChem (Exchemistry) | 1,290 | 2 |
PubChem (CMLD-BU) | 1,269 | 0 |
PubChem (UCLA Molecular Screening Shared Resource) | 1,238 | 1 |
PubChem (MOLI) | 1,219 | 1 |
Yeast Metabolome Database (2011) | 1,183 | 35 |
PubChem (Circadian Research, Kay Laboratory, University of California at San Diego (UCSD)) | 1,179 | 0 |
Maximum Recommended Therapeutic Dose (MRTD) Database | 1,101 | 0 |
PubChem (BIND) | 982 | 0 |
PubChem (NINDS Approved Drug Screening Program) | 964 | 0 |
PubChem (UM-BBD) | 897 | 8 |
PubChem (Total TOSLab Building-Blocks) | 782 | 0 |
PubChem (InFarmatik) | 726 | 0 |
PubChem (Golm Metabolome Database (GMD), Max Planck Institute of Molecular Plant Physiology) | 723 | 2 |
PubChem (NIH Clinical Collection) | 705 | 2 |
YEASTNET Vers. 4, (2011) | 521 | 0 |
PubChem (Alsachim) | 517 | 8 |
PubChem (MIC Scientific) | 508 | 0 |
PubChem (Molecular Libraries Program, Specialized Chemistry Center, University of Kansas) | 431 | 5 |
PubChem (Selleck Chemicals) | 407 | 2 |
PubChem (Biological Magnetic Resonance Data Bank (BMRB)) | 394 | 0 |
PubChem (CC_PMLSC) | 358 | 0 |
PubChem (Shanghai Sinofluoro Scientific Company) | 349 | 0 |
PubChem (MICAD) | 339 | 36 |
PubChem (True PharmaChem) | 316 | 0 |
PubChem (Columbia University Molecular Screening Center) | 312 | 1 |
PubChem (SGCOxCompounds) | 277 | 0 |
PubChem (EMD Biosciences) | 267 | 7 |
PubChem (SRMLSC) | 249 | 0 |
PubChem (Avanti Polar Lipids) | 232 | 109 |
PubChem (Nantong Baihua Bio-Pharmaceutical Co., Ltd) | 174 | 0 |
PubChem (Creasyn Finechem) | 170 | 1 |
PubChem (Structural Genomics Consortium) | 89 | 0 |
PubChem (IUPHAR-DB) | 88 | 0 |
PubChem (Chemical Biology Department, Max Planck Institute of Molecular Physiology) | 82 | 0 |
PubChem (PCMD) | 79 | 1 |
PubChem (Amatye) | 68 | 0 |
PubChem (PANACHE) | 65 | 0 |
PubChem (iThemba Pharmaceuticals) | 65 | 0 |
PubChem (Vanderbilt University Medical Center) | 55 | 12 |
PubChem (Excenen Pharmatech) | 43 | 0 |
PubChem (PENN-ABS) | 39 | 0 |
PubChem (Ambit Biosciences) | 38 | 0 |
PubChem (Nature Communications) | 36 | 0 |
PubChem (Vanderbilt Screening Center for GPCRs, Ion Channels and Transporters) | 30 | 0 |
PubChem (Johns Hopkins Ion Channel Center) | 20 | 0 |
PubChem (SGCStoCompounds) | 17 | 0 |
PubChem (Web of Science) | 16 | 0 |
PubChem (Zancheng Functional Chemicals) | 14 | 1 |
PubChem (Southern Research Specialized Biocontainment Screening Center) | 14 | 0 |
PubChem (Laboratory of Environmental Genomics, Carolina Center for Computational Toxicology, University of North Carolina at Chapel Hill) | 14 | 0 |
PubChem (PennChem-GAM) | 12 | 1 |
PubChem (Annker Organics) | 12 | 0 |
PubChem (Nitric Oxide Research, National Cancer Institute (NCI)) | 8 | 0 |
PubChem (Paul Baures) | 6 | 0 |
PubChem (Isoprenoids) | 4 | 0 |
PubChem (Ganolix LifeScience) | 2 | 0 |
PubChem (CLRI (CSIR)) | 2 | 0 |
PubChem (Finley and King Labs, Harvard Medical School) | 1 | 0 |
PubChem (Bioprocess Technology Lab, Department of Microbiology, Bharathidasan University) | 1 | 0 |
PubChem (VIT University) | 1 | 0 |
PubChem was imported at 2011-FEB-15th. PubChem refers to PubChem Compound where as PubChem (...) refers to PubChem Substance with the specific sub database
One striking observation is that ChemSpider has in the later import 2011 a large proportion of unique formula,which are not included in the import from 2008. In fact, many of those formula are tagged as "This record is deprecated and may be removed soon." on the ChemSpider website.What does this mean for the chemical formula? Is the formula valid?
Some other observation from the import of the Yeast Metabolome Database is, that also smaller databases contribute formula which are not included so far in the larger databases PubChem and ChemSpider.
Any thoughts?
No comments:
Post a Comment