Grafana in space: Monitoring Japan's SLIM moon lander in real time
Phenoflow: A Microservice Architecture for Portable Workflow-based Phenotype Definitions
1. Martin Chapman
King’s College London
#IS21
Phenoflow: A Microservice Architecture for
Portable Workflow-based Phenotype
Definitions
Phenotyping: Implementation and Application
S25
2. Learning Objectives
After participating in this session the learner should be better able to:
• Understand the current issues with converting phenotype definitions into executable code, and
how a novel structured phenotype definition can improve clarity and reduce implementation
burden.
2
2021 Informatics Summit | amia.org
3. Disclosure
I and my spouse/partner have no relevant relationships with commercial
interests to disclose.
3
2021 Informatics Summit | amia.org
4. Phenotype definition vs. computable form
Phenotype definitions are designed to ensure portability across multiple use
cases by providing an abstract outline of functionality (e.g. a data flow
diagram, a code list, etc.), which is then realised as a computable phenotype
for a given dataset (e.g. SQL script, Python code, etc.).
4
2021 Informatics Summit | amia.org
Definition Computable Form
5. Definition challenges
1. Complex phenotype definitions, both in terms of structure and
terminology, are needed for accuracy but reduce portability.
2. An abstract definition says little about how to realise the phenotype in
practice (i.e. from a technical perspective), also reducing portability.
5
2021 Informatics Summit | amia.org
6. Workflow-based model
We introduce a new workflow-based model for the definition of a phenotype,
designed to address these issues. The layers of the model are:
1. Abstract - Expresses the logic of a phenotype through a set of simple
sequential, potentially nested steps, each of which is annotated with
multiple descriptions, in order to tackle complexity.
2. Functional - Specifies the metadata of entities passed between the
operations within the abstract layer, e.g., the format of an intermediate
cohort.
3. Computational - Defines an environment for the execution of one or
more implementation units (e.g. a script, data pipeline module, etc.) for each
step in the abstract layer, providing a template for development.
6
2021 Informatics Summit | amia.org
9. Phenoflow
A researcher is not expected to develop definitions under this model directly.
Instead, definitions are authored using an online library, Phenoflow, which is
able to generate a computable form from a definition as a Common Workflow
Language (CWL) workflow.
Phenoflow comprises several microservices to enable the generation process.
9
2021 Informatics Summit | amia.org
10. Phenoflow
Authoring a new definition under our model:
Phenotypes can also be authored via an API (with accompanying Python
client), or by bulk importing existing definitions.
10
2021 Informatics Summit | amia.org
12. Phenoflow
The CWL workflow can then be generated—based on the definition and
supplied implementation units—downloaded and executed against a local
dataset in order to identify a given cohort:
12
2021 Informatics Summit | amia.org
13. Evaluation and results
Determine the suitability of the model as a representation format, and the
suitability of the CWL implementations:
1. Selected T2DM phenotype definition (logic-based), and example
computable form (phekb.org/phenotype/type-2-diabetes-mellitus).
2. Selected research cohort from Northwestern University (26,406 patients).
3. Re-authored the definition according to our model, using Phenoflow.
4. Generated a CWL implementation of the definition, using Phenoflow.
5. Executed both computable forms against the dataset, confirming same
results using a gold standard.
13
2021 Informatics Summit | amia.org
14. Evaluation and results
Determine the suitability of the model as a representation format, and the
suitability of the generated implementations:
6. Repeated for COVID-19 phenotype (code-based), taken from covid19-
phenomics.org, and a set of 1468 individuals who tested positive for
COVID-19 at Guy's and St. Thomas' NHS Foundation Trust (GSTT).
14
2021 Informatics Summit | amia.org
15. Evaluation and results
Showed portability improvements in terms of clinical knowledge requirements
and programming expertise using the Knowledge conversion, clause
Interpretation, and Programming (KIP) phenotype portability scoring system
(Shang et al., JBI, 2019.).
15
2021 Informatics Summit | amia.org
Knowledge Clause Programming Total*
Traditional code 0 2 2 4
Structured code 0 0 0 0
Traditional logic 1 1 2 4
Structured logic 0 1 0 1
Table 1: KIP scores indicating the portability of traditional code-based (COVID-19) and logic-based (Type 2 Diabetes)
phenotype definitions and their structured counterparts.
*High scores =
less portable
16. Definition challenges
1. Complex phenotype definitions, both in terms of structure and
terminology, are needed for accuracy but reduce portability.
1. The Phenoflow model provides a specific structure and intelligible multi-dimensional
descriptions to enable both accurate and portable definitions.
2. An abstract definition says little about how to realise the phenotype in
practice (i.e. from a technical perspective), also reducing portability.
1. The Phenoflow model includes information to guide implementation, improving portability.
Additional impact on portability provided by Phenoflow library, beyond just the
model:
16
2021 Informatics Summit | amia.org
17. Library impact on portability
Adding an alternate implementation for an abstract step:
17
2021 Informatics Summit | amia.org
18. Library impact on portability
Selecting which type of implementation units to include in the computable form,
depending on local development requirements:
18
2021 Informatics Summit | amia.org
19. Future work
1. Leveraging the multi-layer model to introduce advanced library search
criteria, and novel ways to search (e.g. uploading existing definitions).
2. Further leveraging the multi-layer model to express relationships between
phenotypes (e.g. sub-phenotypes) at each layer of the model.
3. Increase the library of workflow modules (e.g. types of dataset connectors)
ready for download and use.
1. We already provide connectors for i2b2 and OMOP (as well as local CSV files).
4. Automatic data conversion to enable use of different implementation
techniques on same dataset, e.g. conversion from CSV to DB to allow use
of SQL scripts.
19
2021 Informatics Summit | amia.org