Considerations for Automating Salmonella Serovar Identification within an Electronic Public Health Reporting Environment
CDC's requirements for Salmonella surveillance reporting include submission of serovars from the recognized naming scheme, Kauffmann-White (K-W), using identifiers curated by the Systematized Nomenclature of Medicine-Clinical Terms (SNOMED CT®). Translating the serotype formula of a Salmonella isolate to the correct identifier has been a multistep manual process for users. Our goal was to determine whether a degree of automation could be achieved using an ontology based on K-W.
We investigated information artifacts presently available, namely K-W, SNOMED CT and CDC's Public Health Information Network - Vocabulary Access and Distribution System (PHIN-VADS). As SNOMED CT creates identifiers and associates them with serovar names, we performed detailed analysis on its coverage of K-W. An overall error rate of 13.1% included simple omissions and transcription errors. We limited our assessment of K-W and PHIN-VADS to the functional characteristics of the resources they distribute. K-W creates serovar names but does not provide identifiers. PHIN-VADS includes the identifiers but not antigenic formulae for most isolates. In summary, neither K-W nor PHIN-VADS contained all information users require.
Two different ontology prototypes were developed. Prototype I placed K-W serovars as terminal nodes in the hierarchy and these were given logic-based definitions. Prototype II added isolate classes as serovar subtypes. Only the isolate classes had complete logical definitions. Both prototypes were logically sound and functioned as expected. Prototype I paralleled existing SNOMED CT content but required more robust description logic than currently employed in SNOMED CT. Prototype II was more compatible with current functionality of SNOMED CT but created identifiers that would not meet current requirements for public health reporting.
Prototype I was fully populated as the Salmonella Serotype Designation Ontology (SSDO). As it stands, SSDO reliably places isolates in the appropriate classes, with few and predictable exceptions. Although SNOMED CT cannot accommodate its functionality at this time, SSDO can serve as the basis for a stand-alone application.
Ultimately whether by improving functionality of existing systems or providing a framework for an ancillary automated system, this work should facilitate real-time reporting and analysis of surveillance data that will prevent new or reduce severity of infectious disease outbreaks.