Publications

June 6, 2017

Systems and Methods for Automatically Identifying and Linking Names in Digital Resources

The present invention provides systems and methods for automatically identifying name-like-strings in digital resources, matching these name-like-string against a set of names held in an expertly curated database, and for those name-like-strings found in said database, enhancing the content by associating additional matter with the name, wherein said matter includes information about the names that is held within said database and pointers to other digital resources which include the same name and it synonyms.

Parker, C.T., Lyons, C.M., Roston, G.R. and Garrity, G.M. Systems and Methods for Automatically Identifying and Linking Names in Digital Resources; 2017. United States Patent and Trademark Office.

May 23, 2017

Classification of Nucleotide Sequences by Latent Semantic Analysis

DNA sequences are analyzed using latent semantic analysis. A set of nucleotide sequences is received in which the set has a first number of sequences. A set of basis vectors is determined, in which the set has a second number of basis vectors, the second number being smaller than the first number. Each basis vector represents a specific combination of predetermined nucleotide segments. For each of the nucleotide sequences, an approximate representation of the nucleotide sequence is determined based on a combination of the basis vectors. For each pair of nucleotide sequences, a distance between the pair of nucleotide sequences is determined according the distance between the approximate representation of the pair of nucleotide sequences. The set of nucleotide sequences are classified based on the distances between the pairs of nucleotide sequences.

Sayood, K., Way, S., Ozkan, U.N. and Garrity, G.M. Classification of Nucleotide Sequences by Latent Semantic Analysis; 2017. United States Patent and Trademark Office.

December 2, 2014

Semiotic Indexing of Digital Resources

A method of classifying a plurality of documents. The method includes steps of providing a first set of classification terms and a second set of classification terms, the second set of classification terms being different from the first set of classification terms; generating a first frequency array of a number of occurrences of each term from the first set of classification terms in each document; generating a second frequency array of a number of occurrences of each term from the second set of classification terms in each document; generating a first similarity matrix from the first frequency array; generating a second similarity matrix from the second frequency array; determining an entrywise combination of the first similarity matrix and the second similarity matrix; and clustering the plurality of documents based on the result of the entrywise combination.

Parker, C.T. and Garrity, G.M. Semiotic Indexing of Digital Resources; 2014. United States Patent and Trademark Office.

October 11, 2011

Methods for data classification

The present invention provides methods for classifying data and uncovering and correcting annotation errors. In particular, the present invention provides a self-organizing, self-correcting algorithm for use in classifying data. Additionally, the present invention provides a method for classifying biological taxa.

Garrity, G.M. and Lilburn, T.G. Methods for data classification; 2011. United States Patent and Trademark Office.

June 22, 2011

Intellogist article on NamesforLife

Kristin Whitman from Landon IP has published an article about how NamesforLife adds value to your searches, from the perspective of the patent community.

..there are a number of patents in the green technology collection that include long lists of named species (in some cases redundantly), but fail to specify a given strain that actually performs the claimed invention…Patents that include “laundry lists” of organisms that may or may not perform according to claims (and in fact, may not even exist) open the door to what could be some interesting challenges and counter-claims in the courts dealing with both non-enablement and prior art.

George Garrity, NamesforLife, LLC

Based on this initial analysis from the NamesforLife team, the challenges faced by biological taxonomists directly affect the work of inventors and patent searchers. I think it’s likely that their data may become integrated into more patent and non-patent databases as the value of their work becomes more obvious.

Kristin Whitman, Landon IP

Whitman, K. Biotech patents and their pitfalls: NamesforLife adds value to your biology searches; 2011. Intellogist.

April 12, 2011

Systems and methods for resolving ambiguity between names and entities

The present invention provides systems and methods that utilize an information architecture for disambiguating scientific names and other classification labels and the entities to which those names are applied, as well as a means of accessing data on those entities in a networked environment using persistent, unique identifiers.

Garrity, G.M. and Lyons, C.M. Semiotic Indexing of Digital Resources; 2011. United States Patent and Trademark Office.

May 6, 2010

DOI News

NamesforLife has a mention in the DOI News. See ‘DOI-based Tool for Taxonomy’.

IDF member NamesforLife, in partnership with the Society for General Microbiology and the International Committee on the Systematics of Prokaryotes, has announced the launch of a specialist browser tool which provides current information on taxonomic nomenclature of Bacteria and Archaea, through DOI name links providing authoritative and persistent online annotation. This allows authors to obtain current data from the rapidly changing taxonomic literature easily, and allows third party re-use of the information as persistent and reliable current data. Expert annotation is presented via a menu that collocates with the occurrence of a name on a web page and links to other resources.

February 13, 2010

Microbiology Today

NamesforLife has a full page write-up in the February 2010 issue of Microbiology Today.

George Garrity explains the philosophy behind the new NamesforLife BrowserTool, developed in partnership with the SGM and ICSP to help the wider microbiological community keep in touch with and understand the changes in bacterial and archaeal systematics. Never again need a reader be ill-informed about the status or meaning of a name.

Garrity, G.M. NamesforLife: BrowserTool takes expertise out of the database and puts it right in the browser; 2010. Microbiology Today 2(2):9.

March 6, 2007

Taxonomic Outline of Bacteria and Archaea 7.7

The Taxonomic Outline of Bacteria and Archaea (TOBA) 7.7 has been published.

TOBA 7.7 provides coverage of the validly published named species and higher taxa of Bacteria and Archaea through October 1, 2006, including all those names included on Validation Lists through No. 111. In addition, TOBA 7.7 contains a limited number of well known taxa of Cyanobacteria that were included in earlier releases, the myxobacterial taxa described by Reichenbach for which duplicate deposits had not been confirmed at the time of publication), and a number of provisional names of higher taxa that were used as placeholders in previous releases.

We also include NamesforLife name-ids (N4Lids) to provide direct, persistent links to content provided by that project. N4Lids are suffices of Digital Object Identifiers (DOIs) that resolve to individual NamesforLife Information Objects that contain more detailed information about the nomenclature, taxonomy, and members of higher taxa and additional strain identifiers, sequences, and other information about the type strains and higher taxa. N4Lids preceded by the “DOI:” prefix will resolve to web pages that are part of Release 6.0 of the Taxonomic Outline.

Garrity, G.M., Lilburn, T.G., Cole, J.R., Harrison, S.H., Euzeby, J. and Tindall, B.J. Taxonomic Outline of Bacteria and Archaea; 2007. Michigan State University and NamesforLife, LLC.

April 24, 2006

Computational aspects of systematic biology

Lilburn, Harrison, Cole and Garrity survey the resources currently available to systematic biologists, and outline some steps forward to data integration and interoperability.

The barriers between databases, and between databases and applications need to be reduced. One giant step towards such interoperability will be the institution of methods to tame the nomenclature issues so that biologists can ensure that the names they use are correct or, if not, that they can find the correct name along with the history of labels associated with the organism they are interested in. The automation of identification will also free researchers to apply their intellectual energy to the exploration of new areas in systematics and biodiversity. The discovery of new species and novel, deep-branching lineages equivalent to phyla and the need to discriminate among organisms below the species level are certain to be drivers of future developments in computational systematic biology.

The ability of computational approaches to adapt to new discoveries, present clear depictions of alternative classifications and integrate disparate data types relevant to the classifications, will play a key role in the surveys of the natural world.

Lilburn, T.G., Harrison, S.H., Cole, J.R. and Garrity, G.M. Computational aspects of systematic biology; 2006. Briefings in Bioinformatics 7(2):186-195.

February 24, 2005

Self-organizing and self-correcting classifications of biological data

An algorithm for automated classification based on evolutionary distance data was written in S. The algorithm was tested on a dataset of 1,436 small subunit ribosomal RNA sequences and was able to classify the sequences according to an extant scheme, use statistical measurements of group membership to detect sequences that were misclassified within this scheme and produce a new classification. In this study, the use of the algorithm to address problems in prokaryotic taxonomy is discussed. The algorithm we have developed provides an intuitive approach to making and viewing classifications; conceivably, persons with no training could generate classifications and, by looking at the heatmaps, see how a classification might be improved. Our algorithm formalizes and automates the means used to achieve such improvements. Errors in data curation, classification and identification (of both sequences and source organisms) can be easily spotted and their effects corrected. Also, the classification itself can be modified so that the information content of the taxonomy is enhanced.

Garrity, G.M. and Lilburn, T.G. Self-organizing and self-correcting classifications of biological data; 2005. Bioinformatics 21:2309-2314.

November 10, 2004

19th International CODATA Conference —Digital Object Identifiers for scientific data
Berlin, Germany November 10, 2004

Norman Paskin has published an article regarding the use of Digital Object Identifiers (DOIs) for scientific data. A description of the NamesforLife system is given on page 7.

The aim of this project is “future-proofing biological nomenclature”; it proposes DOIs as persistent identifiers of taxonomic definitions. A name ascribed to a given group in a biological taxonomy is fixed in both time and scope and may or may not be revised when new information is available.

The NamesforLife project is developing a model for assigning DOIs to prokaryotic taxa as a test case. Though the definition of a taxon may be refined and its nomenclature redefined, the DOI will persist, leaving a forward-pointing trail that can be used to reliably locate digital and physical resources, even when a name may be deemed obsolete. Forward linking from a synonym to a record of the publication that asserts synonymy is especially important, as there is currently no mandatory mechanism for asserting and resolving names that become ambiguous.

The model seeks to strengthen the association of names with taxa by using DOIs to track the taxonomic definition of a name over time. It is extensible to the level of individual genes within a given species. However, the real power of this method lies in the ability of DOIs to become embedded in the information environment, providing a direct and persistent link to the full record of taxonomic and nomenclatural revision and ensuring consistency and accuracy throughout online scientific resources. A DOI-based infrastructure for formally associating nomenclature with taxonomy enables a name to be used unambiguously and persistently, only one mouse-click away from a record of its current definition and historical development.

Paskin, N. Digital Object Identifiers for scientific data; 2005. Data Science Journal 4:12-20.

January 1, 2003

Future-proofing biological nomenclature

The original white paper behind the NamesforLife concept.

As biological data proliferates and interconnects, it depends increasingly on software infrastructure, and it becomes increasingly obvious that biological names do not meet the requirements of a good identifier, in strict computing terms. A good identifier should be unique and persistent. We believe that an implementation of the Digital Object Identifier (DOI) may provide the most robust and future-proof solution to this problem.

We are developing a model for assigning DOIs to prokaryotic taxa as a test case. The real power of this method lies in the ability of DOIs to become embedded in the information environment, providing a direct and persistent link to the full record of taxonomic and nomenclatural revision and ensuring consistency and accuracy throughout online scientific resources.

Garrity, G.M. and Lyons, C.L. Future-proofing biological nomenclature; 2003. OMICS: A Journal of Integrative Biology 7(1):31-33.

Back to top