I got this three questions and think I should answer those here because the topic might be interesting to other people as well...
1.- Could I
consider the following examples (VAR5-Alk-NA 170001 (Classified unknown); VAR5-Alk-NA and VAR5-Alk-unknown) as unknown or do they mean something different?
[JH]
These terms do all refer to a unknown compound. This just
points to a lack in annotation. As we also ask other laboratories for their spectral
libraries it may happen that users use different terms. However, in your
example with
NA170001 (classified unknown) Joachim Kopka tried to
highlight, that there is a chance of identification either as "[C5H12O5
(5TMS)|C20H52O5Si5]" or "[Pentitol (5TMS)|C20H52O5Si5]".
2.- Sometimes
the compounds are label as True-VAR5-Alk, False-VAR5-Alk and Pred-VAR5-Alk. can I understand that the
database was not curate? therefore
there are some False. However the ones that are PRED (=predicted?)
Can i trust in this prediction? or do i need only to pay attention to the compounds that says true?
[JH]
This "True",
"False" and "pred" just refers to the Retention index. The
retention index is specific to the chromatographic setup. As the GMD is a
collection of reference spectra from different labs utilising different chromatographic
setups we differentiate the quality of the retention index values.
- "true" is the best
and refers to the fact, that this RI was actually experimentally observed on
this chromatographic setup.
- "pred" means
predicted and is the next lower quality level. Importing a library from a other
laboratory we can correlate the retention indexes values for all compounds
which were measured in both labs. Next we use this correlation (a polynomial
fitting or whatever) for a regression of the RIs from new Compounds in the
other laboratory's library into our own chromatographic system.
The quality of the retention
index values from such regression depends on the similarity of the chromatographic
variant in terms of column polarity, temperature programming and so on. This retention index prediction looks perfect,
however as can be seen from the plot, the estimated error in such an predicted
retention index is already too large for an automated spectral identification processing.
Nevertheless, it is a valuable information in the identification process of unknown
spectra.
A retention index prediction with a quality near to experimental observed retention indexes is shown in plot below.
If we don't have any experimentally measured retention index available, but can predict one from different chromatographic setups, we chose the one which makes the best sense from a chromatographic similarity point of view.As far as I understand this field, a final identification can only be
proofed by using authentic reference substances.
- "False" refers to the
fact, that we don't have a retention index available for the selected chromatographic
setup.
|
retention index regression between very similar chromatographic variants but different retention index markers. |
3.-PRED-Var5-Alk-Similar Does mean there is a
high probability that this compound will
be true?
[JH]
Again, predicted refers only to the retention index. And
"similar" comes from a user input. In this case it is a unknown
compound measured in a other lab with different chromatographic setup. That’s why
we just have a predicted RI available.
cheers,
Jan