Phenoflow: A Microservice Architecture for Portable Workflow-based Phenotype Definitions

Martin Chapman
King’s College London
#IS21
Phenoflow: A Microservice Architecture for
Portable Workflow-based Phenotype
Definitions
Phenotyping: Implementation and Application
S25

Learning Objectives
After participating in this session the learner should be better able to:
• Understand the current issues with converting phenotype definitions into executable code, and
how a novel structured phenotype definition can improve clarity and reduce implementation
burden.
2
2021 Informatics Summit | amia.org

Disclosure
I and my spouse/partner have no relevant relationships with commercial
interests to disclose.
3

Phenotype definition vs. computable form
Phenotype definitions are designed to ensure portability across multiple use
cases by providing an abstract outline of functionality (e.g. a data flow
diagram, a code list, etc.), which is then realised as a computable phenotype
for a given dataset (e.g. SQL script, Python code, etc.).
4
Definition Computable Form

Definition challenges
1. Complex phenotype definitions, both in terms of structure and
terminology, are needed for accuracy but reduce portability.
2. An abstract definition says little about how to realise the phenotype in
practice (i.e. from a technical perspective), also reducing portability.
5

Workflow-based model
We introduce a new workflow-based model for the definition of a phenotype,
designed to address these issues. The layers of the model are:
1. Abstract - Expresses the logic of a phenotype through a set of simple
sequential, potentially nested steps, each of which is annotated with
multiple descriptions, in order to tackle complexity.
2. Functional - Specifies the metadata of entities passed between the
operations within the abstract layer, e.g., the format of an intermediate
cohort.
3. Computational - Defines an environment for the execution of one or
more implementation units (e.g. a script, data pipeline module, etc.) for each
step in the abstract layer, providing a template for development.
6

7

8

Phenoflow
A researcher is not expected to develop definitions under this model directly.
Instead, definitions are authored using an online library, Phenoflow, which is
able to generate a computable form from a definition as a Common Workflow
Language (CWL) workflow.
Phenoflow comprises several microservices to enable the generation process.
9

Phenoflow
Authoring a new definition under our model:
Phenotypes can also be authored via an API (with accompanying Python
client), or by bulk importing existing definitions.
10

Phenoflow
Proceed with implementation by matching each step in the model to an
implementation unit:
11

Phenoflow
The CWL workflow can then be generated—based on the definition and
supplied implementation units—downloaded and executed against a local
dataset in order to identify a given cohort:
12

Evaluation and results
Determine the suitability of the model as a representation format, and the
suitability of the CWL implementations:
1. Selected T2DM phenotype definition (logic-based), and example
computable form (phekb.org/phenotype/type-2-diabetes-mellitus).
2. Selected research cohort from Northwestern University (26,406 patients).
3. Re-authored the definition according to our model, using Phenoflow.
4. Generated a CWL implementation of the definition, using Phenoflow.
5. Executed both computable forms against the dataset, confirming same
results using a gold standard.
13

Determine the suitability of the model as a representation format, and the
suitability of the generated implementations:
6. Repeated for COVID-19 phenotype (code-based), taken from covid19-
phenomics.org, and a set of 1468 individuals who tested positive for
COVID-19 at Guy's and St. Thomas' NHS Foundation Trust (GSTT).
14

Showed portability improvements in terms of clinical knowledge requirements
and programming expertise using the Knowledge conversion, clause
Interpretation, and Programming (KIP) phenotype portability scoring system
(Shang et al., JBI, 2019.).
15
Knowledge Clause Programming Total*
Traditional code 0 2 2 4
Structured code 0 0 0 0
Traditional logic 1 1 2 4
Structured logic 0 1 0 1
Table 1: KIP scores indicating the portability of traditional code-based (COVID-19) and logic-based (Type 2 Diabetes)
phenotype definitions and their structured counterparts.
*High scores =
less portable

Definition challenges
1. Complex phenotype definitions, both in terms of structure and
terminology, are needed for accuracy but reduce portability.
1. The Phenoflow model provides a specific structure and intelligible multi-dimensional
descriptions to enable both accurate and portable definitions.
2. An abstract definition says little about how to realise the phenotype in
practice (i.e. from a technical perspective), also reducing portability.
1. The Phenoflow model includes information to guide implementation, improving portability.
Additional impact on portability provided by Phenoflow library, beyond just the
model:
16

Library impact on portability
Adding an alternate implementation for an abstract step:
17

Library impact on portability
Selecting which type of implementation units to include in the computable form,
depending on local development requirements:
18

Future work
1. Leveraging the multi-layer model to introduce advanced library search
criteria, and novel ways to search (e.g. uploading existing definitions).
2. Further leveraging the multi-layer model to express relationships between
phenotypes (e.g. sub-phenotypes) at each layer of the model.
3. Increase the library of workflow modules (e.g. types of dataset connectors)
ready for download and use.
1. We already provide connectors for i2b2 and OMOP (as well as local CSV files).
4. Automatic data conversion to enable use of different implementation
techniques on same dataset, e.g. conversion from CSV to DB to allow use
of SQL scripts.
19

Links
https://kclhi.org/phenoflow
https://github.com/kclhi/phenoflow
20

Phenoflow: A Microservice Architecture for Portable Workflow-based Phenotype Definitions

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (15)

Similaire à Phenoflow: A Microservice Architecture for Portable Workflow-based Phenotype Definitions

Similaire à Phenoflow: A Microservice Architecture for Portable Workflow-based Phenotype Definitions (20)

Plus de Martin Chapman

Plus de Martin Chapman (20)

Dernier

Dernier (20)

Phenoflow: A Microservice Architecture for Portable Workflow-based Phenotype Definitions