Friday, 11 May 2012

How to import the GMD refenrence library into NIST MS Search

I got this question asked today and thought this is worth be documented. I only have version 2.0 of the NIST ms search software at hand, but I assume that the way to import the reference library is pretty similar in later versions.
  1. Start the program.
  2. Click Librarian on the tab control at the bottom.
  3. Create a library by clicking the most right button in the toolbar.
  4. Type an appropriate library name - "GMD".
  5. Close the dialog by clicking OK. The click on the Import button, left in the toolbar.
  6. Select the msp-file downloaded from the GMD. Make sure to first select "All files" in the file type menu.
  7. In the option check "include Synonyms" Click "Import All".
  8. The import is starting.
  9. You can cancel the library matching process.
  10.  The GMD reference library is imported and ready for use.

Thursday, 22 March 2012

3 question about the GMD mass spectral reference library

I got this three questions and think I should answer those here because the topic might be interesting to other people as well...
1.- Could I consider the following examples (VAR5-Alk-NA 170001  (Classified unknown); VAR5-Alk-NA and VAR5-Alk-unknown)  as unknown or do they mean something different?
These terms do all refer to a unknown compound. This just points to a lack in annotation. As we also ask other laboratories for their spectral libraries it may happen that users use different terms. However, in your example with NA170001 (classified unknown) Joachim Kopka tried to highlight, that there is a chance of identification either as "[C5H12O5 (5TMS)|C20H52O5Si5]" or "[Pentitol (5TMS)|C20H52O5Si5]".
2.- Sometimes the compounds are label as True-VAR5-Alk, False-VAR5-Alk and Pred-VAR5-Alk. can I understand that the database was not curate?  therefore there are some False. However the ones that are PRED  (=predicted?) Can i trust in this prediction? or do i need only to pay attention to the compounds that says true?
This "True", "False" and "pred" just refers to the Retention index. The retention index is specific to the chromatographic setup. As the GMD is a collection of reference spectra from different labs utilising different chromatographic setups we differentiate the quality of the retention index values.
  • "true" is the best and refers to the fact, that this RI was actually experimentally observed on this chromatographic setup.
  • "pred" means predicted and is the next lower quality level. Importing a library from a other laboratory we can correlate the retention indexes values for all compounds which were measured in both labs. Next we use this correlation (a polynomial fitting or whatever) for a regression of the RIs from new Compounds in the other laboratory's library into our own chromatographic system.
    The quality of the retention index values from such regression depends on the similarity of the chromatographic variant in terms of column polarity, temperature programming and so on. This retention index prediction looks perfect, however as can be seen from the plot, the estimated error in such an predicted retention index is already too large for an automated spectral identification processing. Nevertheless, it is a valuable information in the identification process of unknown spectra.
    A retention index prediction with a quality near to experimental observed retention indexes is shown in plot below.
    If we don't have any experimentally measured retention index available, but can predict one from different chromatographic setups, we chose the one which makes the best sense from a chromatographic similarity point of view.
    As far as I understand this field, a final identification can only be proofed by using authentic reference substances.
  • "False" refers to the fact, that we don't have a retention index available for the selected chromatographic setup.
retention index regression between very similar chromatographic variants but different retention index markers.
3.-PRED-Var5-Alk-Similar Does mean there is a high probability that  this compound  will be true?
Again, predicted refers only to the retention index. And "similar" comes from a user input. In this case it is a unknown compound measured in a other lab with different chromatographic setup. That’s why we just have a predicted RI available.


How to integrate mass spectral refenrence libraries into AMDIS

Recently, I got this question on how to import GMD mass spectral reference libraries into the Automated Mass Spectral Deconvolution and Identification System (AMDIS). As I think this might be interesting for other people as well, I copy the question and my answer below:
I am XXX in YYY lab at ZZZ university. I have just downloaded the following library GMD_20111121_VAR5_ALK_MSL.txt from  Golm Metabolome DB. Because I would like to use it AMDIS software.
However, the extension file of the library is in TXT and I need to converte it to MSL. I will appreciate a lot if you could tel me how to do it.

PS. I tried to rename the extention but it did not work in AMDIS
Dear XXX,
Thank you very much for using the Golm Metabolome Database (GMD).

Please use the Amdis software to convert the downloaded file into a library. I try to list all necessary steps in the following
  • Open Amdis :)
  • Click Library ==> Build One Library  (this option is only available if a data file is open)
  • Click Files
  • Click Load Library Select the file downloaded from the GMD, you might need to change file type to "all files *.*" to see your file with file extension .txt
  • The import is now starting and as a result you should see a list of 2,594 imported spectra
  • Click Files
  • Click "Save Library As"
  • Give a appropriate file location and name and use the file extension msl The file is now exported and a ".cid" file (compound identification library) is generated, this is a crucial step
  • Click Exit to close the Library Window
  • Click Analyse ==> Analyse GC/MS Data...
  • Click Target Library
  • Select Page "Libr."
  • Select "Target Compounds Library"
  • Click "Select New"
  • Select the new generated file, not the file downloaded from the GMD Click Save Click Run
If you any problem with the library please don't hesitate to drop me a line.
Your feedback is highly appreciated.

Best regards


Friday, 3 February 2012

update spectral library for TargetSearch

Ricardo Silva pointed me to a problem in the GMD spectrum export for the TargetSearch software:
He wrote:
I've started to work with GC-MS analysis on R, and the TargetSearch recomends the golm database, but the librarys don't have Retention Index, is this correct? How do a get a library with Retention Indexes?
Indeed, I found a format error due to the globalisation which led TargetSearch fail to load the textfile.

Thanks Ricardo!

ps.: If you finde any problem, please drop me a line...

Thursday, 2 February 2012

Tweaking ChemDoodle

Patrik Rydberg posted some code to automatically scale a molecule in the ChemDoodle canvas. I was looking for something like this for quite some time. Now I could this improve for my settings having molFile from many different sources by first scaling the molecule with the scaleToAverageBondLength(Number length) function.

See an example here:

My code (taken from Patrik) is below:

structure = ChemDoodle.readMOL(molFile);
size = structure.getDimension();
scale = Math.min(canvas.width / size.x, canvas.height / size.y);
canvas.specs.scale = scale * .9;


Tuesday, 17 January 2012

PubChem update

We regret that GoBioSpace service is likely to be unavailable today 17th. Jan.2012 on account of maintenance work and for the import of the current PubChem Compound and Substance databases. More than 2.5 million structures from the IBM BAO (Business Analytics and Optimization) strategic IP insight platform (SIIP) are now available in PubChem and we think this is very valuable for matching potentially unknown mass peaks.

Your GoBioSpace-Team

[update 2012/01/18]
We released a new data version of GoBioSpace, now including the latest version (yesterday, 2012/01/17) of PubChem Compound and Substance databases and adding 119,958 new unique formula to the GoBioSpace repository. However, approx. 190,000 formula are not referenced anymore and subsequently were purged from GoBioSpace.