1.- Could I consider the following examples (VAR5-Alk-NA 170001 (Classified unknown); VAR5-Alk-NA and VAR5-Alk-unknown) as unknown or do they mean something different?
These terms do all refer to a unknown compound. This just points to a lack in annotation. As we also ask other laboratories for their spectral libraries it may happen that users use different terms. However, in your example with NA170001 (classified unknown) Joachim Kopka tried to highlight, that there is a chance of identification either as "[C5H12O5 (5TMS)|C20H52O5Si5]" or "[Pentitol (5TMS)|C20H52O5Si5]".
2.- Sometimes the compounds are label as True-VAR5-Alk, False-VAR5-Alk and Pred-VAR5-Alk. can I understand that the database was not curate? therefore there are some False. However the ones that are PRED (=predicted?) Can i trust in this prediction? or do i need only to pay attention to the compounds that says true?
This "True", "False" and "pred" just refers to the Retention index. The retention index is specific to the chromatographic setup. As the GMD is a collection of reference spectra from different labs utilising different chromatographic setups we differentiate the quality of the retention index values.
- "true" is the best and refers to the fact, that this RI was actually experimentally observed on this chromatographic setup.
- "pred" means
predicted and is the next lower quality level. Importing a library from a other
laboratory we can correlate the retention indexes values for all compounds
which were measured in both labs. Next we use this correlation (a polynomial
fitting or whatever) for a regression of the RIs from new Compounds in the
other laboratory's library into our own chromatographic system.
The quality of the retention index values from such regression depends on the similarity of the chromatographic variant in terms of column polarity, temperature programming and so on. This retention index prediction looks perfect, however as can be seen from the plot, the estimated error in such an predicted retention index is already too large for an automated spectral identification processing. Nevertheless, it is a valuable information in the identification process of unknown spectra.
A retention index prediction with a quality near to experimental observed retention indexes is shown in plot below.
If we don't have any experimentally measured retention index available, but can predict one from different chromatographic setups, we chose the one which makes the best sense from a chromatographic similarity point of view.As far as I understand this field, a final identification can only be proofed by using authentic reference substances.
- "False" refers to the fact, that we don't have a retention index available for the selected chromatographic setup.
|retention index regression between very similar chromatographic variants but different retention index markers.|
3.-PRED-Var5-Alk-Similar Does mean there is a high probability that this compound will be true?
Again, predicted refers only to the retention index. And "similar" comes from a user input. In this case it is a unknown compound measured in a other lab with different chromatographic setup. That’s why we just have a predicted RI available.