N4L::Scribe is a web service that recognizes important terms in documents and embeds semantic links to authoritative resources. We also provide a web-based version and can modify this to your requirements.
The N4L::Scribe is backed primarily by the N4L Nomenclature Database, a comprehensive bacterial and archaeal nomenclature and taxonomy that is kept current by professional curators at NamesforLife.
The IJSEM journal has historically only used the bacterial and archaeal vocabulary for tagging names. However, we have several other vocabularies available. The N4L::Scribe is also able to recognize viral, zoological, and plant names (which links to the NCBI taxonomy), GenBank accessions (links to GenBank record), chemical names (links to CHEBI), author names (links to ORCiD) and strain identifiers (uses NamesforLife StrainFinder to link to hundreds of culture collections, e.g. ATCC 6051). Other vocabularies and identifiers can be integrated if needed.
The exact nature of the semantic links embedded into a document is dependent on the document format. The currently supported document formats are: Microsoft Word (DOC and DOCX), OpenOffice (ODT) and any well-formed XML file. Certain XML formats are specifically recognized by N4L::Scribe, and for those we use annotation elements, attributes and namespaces specifically supported by the appropriate schema or DTD.
NLM and JATS XML
When annotating NLM and JATS XML, we use the named-content element to tag prokaryotic names. On pages 5-8 below we describe the approach for document tagging employed from 2011-current, and on pages 9-11 we propose some changes to that approach in order to resolve some issues with the current method.
Microsoft Word (DOC and DOCX) and OpenOffice/LibreOffice Open Document Text (ODT)
DOC, DOCX and ODT documents each have a specific approach for embedding hyperlinks, and we use the style particular to each format.
Additionally, with Microsoft Word and OpenOffice documents, we are able to embed comments and reports describing the resources identified in the document (Figure 2). This greatly assists proofreaders by validating that GenBank identifiers are correct and pointing out errors in name usage.