February 24, 2005
Self-organizing and self-correcting classifications of biological data
An algorithm for automated classification based on evolutionary distance data was written in S. The algorithm was tested on a dataset of 1,436 small subunit ribosomal RNA sequences and was able to classify the sequences according to an extant scheme, use statistical measurements of group membership to detect sequences that were misclassified within this scheme and produce a new classification. In this study, the use of the algorithm to address problems in prokaryotic taxonomy is discussed. The algorithm we have developed provides an intuitive approach to making and viewing classifications; conceivably, persons with no training could generate classifications and, by looking at the heatmaps, see how a classification might be improved. Our algorithm formalizes and automates the means used to achieve such improvements. Errors in data curation, classification and identification (of both sequences and source organisms) can be easily spotted and their effects corrected. Also, the classification itself can be modified so that the information content of the taxonomy is enhanced.
[permalink] Posted February 24, 2005.