This document presents the Concept Definition Generator (CDG), an open source tool for defining healthcare concepts using multilevel modeling specifications. The CDG allows domain experts to graphically represent healthcare concepts and automatically generate associated XML schemas. It was developed in Python with a wxPython graphical interface to run cross-platform. The CDG addresses the significant challenges of knowledge representation for semantic interoperability of electronic health records. Future work includes further standardizing healthcare terminologies and developing proper modeling tools.
1. CONSTRAINT DEFINITION GENERATOR: AN OPEN SOURCE TOOL FOR
MULTILEVEL MODELING OF HEALTHCARE INFORMATION SYSTEMS
Eduardo César Pimenta Ribeiro1, Douglas Santos Möeller de Carvalho2, Jesiree Iglesias Quadros2,
Lorena Silva de Moura2, Joyce Rocha de Matos Nogueira3, Timothy Wayne Cook3 and Luciana Tricai Cavalini4
Laboratório“Multilevel Healthcare Information Modeling” (MLHIM) Associado ao
Instituto Nacional de Ciência e Tecnologia – Medicina Assistida por Computação Científica
Introduction. Semantic interoperability is crucial in recording information in purpose specific applications that need to synchronize to larger databases. There
are no remaining obstacles related to hardware, including mobile computing and pervasive medicine, but software based on traditional data models are not
fitted to deal with the significant spatial and temporal complexities of healthcare information 6. That is the case because health information systems based on
traditional data models are not interoperable and have high maintenance costs. The solution most fitted to the specific features of healthcare information
involves the separation between domain model and persistence of data. This multilevel modeling approach proposes the definition of at least two levels: the
Reference Model, which defines generic types of data and data structures and a Domain Model, defined by restrictions on the Reference Model 8. The goal of
this paper is to present the technical aspects of the first knowledge modeling software developed for the MLHIM specifications.
Methods. According to the MLHIM specifications, in order to develop any healthcare application, only the Reference Model is implemented in software. The
Domain Models are implemented in the XML Schema language and they represent constraints on the Reference Model, called Concept Constraint Definitions
(CCD) as XML Schema files. The CCDs are used to add knowledge model information to applications built using the MLHIM Reference Model. The Concept
Definition Generator (CDG) was devised to create the conceptual outline of any specific knowledge model defined by the domain expert. The CDG user
interface represents the CCD metadata and the MLHIM packages that are required for knowledge modeling: Content, Structures and Datatypes. The
healthcare concept representation, when complete, is converted to a CCD, expressed as a XML Schema defining cardinality and constraints against the
MLHIM Reference Model. The CCD can vary from the minimal to the maximal data definition for a specific concept, being context-sensitive in multicultural,
multilingual and geopolitical aspects.
Results and Discussion. The CDG code was developed by using the Python programming language. We adopted the wxPython graphical library, an open
source library based on wxWidgets which allows the application to run on any operation system, without requiring source code changes. The CDG source
code and an executable file are openly available from the Healthcare Knowledge Component Repository (HKCR) website on http://www.hkcr.net/tools. The
current CDG user interface is shown on Figure 1. In order to automate the standardized data elements available on the National Institutes of Health’s Common
Data Elements (NIH CDE) Browser, a CDG plug-in was developed. This script has the capability of automatically populate CCDs with the metadata, context
and data structures of data standards that can be modeled as Element root class CCDs with DvString data type, with no further additional DvString class
constraints. The script was also implemented in Python, and the urllib and lxml libraries were used. Currently, the following caBIG Data Element subsets have
been converted into CCDs: Person Name, Person Age, Religion, Race, Ethnicity, Language, Organization, Address, Email Address, Organism Identification,
Equipment, Genomic Identifers and Imaging Data Standards.
Figure 1: The CDG user interface
The challenges related to the knowledge representation of healthcare concepts, towards the achievement of semantic interoperability of electronic health
records, are significantly complex. Because of this, the field of health informatics will probably continue to expand the multidisciplinary features of its experts,
thus improving the intellectual debate on the field. Thus, information exchanges will be possible without failure to representing the specific domain concepts in
health information systems17.
Conclusion. The challenges related to the knowledge representation of healthcare concepts, towards the achievement of semantic interoperability of
electronic health records, are significantly complex. As seen in this study, there are still many challenges to be faced in the field of knowledge representation
of multilevel modeling of health information systems. It can be highlighted the need for the development of proper tools and the conversion of standardized
terminologies to knowledge modeling artifacts. Thus, information technology can be a powerful tool to support the practice of healthcare.
Acknowledgements. This work is a product of the "Multilevel Healthcare Information Modeling" (LA-MLHIM) Laboratory, Associated to the National Institute of
Science and Technology – Medicine Assisted by Scientific Computing (INCT-MACC) – CNPq and FAPERJ funding. The author Eduardo Ribeiro receives an
Graduated Technical Training scholarship from CNPq.
[1] Hudson DL, Cohen ME. Uncertainty and complexity in personal health records. In: Conference Proceedings: 2010 Annual International Conference of the IEEE Engineering in Medicine and
Biology Society, Piscataway 2010; IEEE, 6773-6.
[2] Saleem JJ, Russ AL, Neddo A, Blades PT, Doebbeling BN, Foresman BH. Paper persistence, workarounds, and communication breakdowns in computerized consultation management. Int. J.
Med. Inform. 2011; Elsevier Science Ireland Ltd., Shannon, 2011; 80(7): 466-79.
[3] Ohmann C, Kuchinke W. Future developments of medical informatics from the viewpoint of networked clinical research. Interoperability and integration. Methods Inf. Med. 2009; 48(1): 45-54.
[4] De Vlieger P, Boire JY, Breton V, Legre Y, Manset D, Revillard J, Sarramia D, Maigne L. Sentinel e-health network on grid: developments and challenges. Stud. Health Technol. Inform. 2010; 159:
134-45.
[5] Garde S, Hovenga E, Buck J, Knaup P. Ubiquitous information for ubiquitous computing: expressing clinical data sets with openEHR archetypes. Stud. Health Technol. Inform. 2006; 124: 215-20.
[6] Wollersheim D, Sari A, Rahayu W. Archetype-based electronic health records: a literature review and evaluation of their applicability to health data interoperability and access. HIM J. 2009;
38(2): 7-17.
[7] Grimshaw J, Russell I. Achieving health gain through clinical guidelines. I: Developing scientifically valid guidelines. Qual. Health Care 1993; 2(4): 243-8.
[8] Michelsen L, Pedersen SS, Tilma HB, Andersen SK. Comparing different approaches to two-level modelling of electronic health records. Stud. Health Technol. Inform. 2005; 116: 113-8.
[9] Chen R, Klein G. The openEHR Java reference implementation project. Stud. Health Technol. Inform. 2007; 129: 58-62.
[10] Dias RDM, Cook TW, Freire SM. Modeling healthcare authorization and claim submissions using the openEHR dual-model approach. BMC Med. Inform. Decis. Mak. 2011; 11:60.
[11] Kashfi H. Applying a user centered design methodology in a clinical context. Stud. Health Technol. Inform. 2010; 160(Pt 2): 927-31.
[12] Beale T. Archetypes and the HER. Stud. Health Tech. Inform. 2003; 96: 238-44.
[13] Sundvall E, Qamar R, Nyström M, Forss M, Petersson H, Karlsson D, Ahlfeldt H, Rector A. Integration of tools for binding archetypes to SNOMED CT. BMC Med. Inform. Decis. Mak. 2008; 8(Suppl 1):
S7.
[14] Maldonado JA, Moner D, Boscá D, Fernández-Breis JT, Angulo C, Robles M. LinkEHR-Ed: a multi-reference model archetype editor based on formal semantics. Int. J. Med. Inform. 2009; 78(8):
559-70.
[15] Cavalini LT, Cook TW. Health informatics: the relevance of open source and multilevel modeling. IFIP Adv. Inform. Commun. Tech. 2011; 365: 338-47.
[16] Hillman D. Using Dublin Core. Available at: http://dublincore.org/documents/usageguide/. Last access May 20, 2012.
[17] Blobel B. Ontologies, knowledge representation, artificial intelligence -hype or prerequisites for international pHealth Interoperability? Stud. Health Technol. Inform. 2011; 165: 11-20 .
1.Graduação em Ciência da Computação, Universidade Federal de Minas Gerais (UFMG), Brasil; 2.Graduação em Medicina, Universidade Federal Fluminense (UFF), Brasil; 3.Instituto Nacional de Ciência e
Visite-nos: www.mlhim.org
Tecnologia – Medicina Assistida por Computação Científica (INCT-MACC), Brasil; 4.Departamento de Epidemiologia e Bioestatística, Universidade Federal Fluminense (UFF), Brasil - lutricav@vm.uff.br