Carl Kesselman and I (along with our colleagues Stephan Erberich, Jonathan Silverstein, and Steve Tuecke) participated in an interesting workshop at the Institute of Medicine on July 14, 2009. Along with Patrick Soon-Shiong, we presented our views on how grid technologies can help address the challenges inherent in healthcare data integration.
Best Rate (Patna ) Call Girls Patna ⟟ 8617370543 ⟟ High Class Call Girl In 5 ...
Grid And Healthcare For IOM July 2009
1. Grid computing and health information sharing — A platform proposal — Ian Foster Director, Computation Institute Chan Soon-Shiong Scholar U. Chicago & Argonne Natl Lab National Coalition For Heath Integration Carl Kesselman Co-Director Center for Health Informatics University of Southern California
6. We need to function in the zone of complexity Ralph Stacey, Complexity and Creativity in Organizations , 1996 Low Low High High Agreement about outcomes Certainty about outcomes Plan and control Chaos Zone of complexity
7. We need to function in the zone of complexity Ralph Stacey, Complexity and Creativity in Organizations , 1996 Low Low High High Agreement about outcomes Certainty about outcomes Plan and control Chaos
8.
9.
10. The Grid paradigm and healthcare information integration Data sources Platform services Radiology Medical records Name data and move it around Make data usable and useful Make data accessible over the network Pathology Genomics Labs Manage who can do what RHIO
11. The Grid paradigm and healthcare information integration Data sources Platform services Transform data into knowledge Radiology Medical records Management Integration Publication Enhance user cognitive processes Incorporate into business processes Pathology Genomics Labs Security and policy RHIO
12. The Grid paradigm and healthcare information integration Data sources Platform services Value services Analysis Radiology Medical records Management Integration Publication Cognitive support Applications Pathology Genomics Labs Security and policy RHIO
13.
14.
15. Identity-based authZ Most simple - not scalable Unix Access Control Lists (Discretionary Access Control: DAC) Groups, directories, simple admin POSIX ACLs/MS-ACLs Finer-grained admin policy Role-based Access Control (RBAC) Separation of role/group from rule admin Mandatory Access Control (MAC) Clearance, classification, compartmentalization Attribute-based Access Control (ABAC) Generalization of attributes >>> Policy language abstraction level and expressiveness >>>
18. Imaging clinical trials use case Childrens Oncology Group VO Neuroblastoma Cancer Foundation VO
19.
20. As of Oct 19, 2008: 122 participants 105 services 70 data 35 analytical
21.
22.
23. Health Object Identifier (HOI) naming system uri:hdl :// 888 .us.npi. 1234567890 .dicom/ 8A648C33 -A5…4939EBE Random String for Identifier-Body PHI-free and guaranteed unique 888: CHI’s top-level naming authority National Provider Id used in hierarchical Identifier Namespace Application Context’s Namespace governed by provider Naming Authority HOI’s URI schema identifier—based on Handle
26. Integration : Making data usable and useful ? 0% 100% Degree of prior syntactic and semantic agreement Degree of communication 0% 100% Rigid standards-based approach Loosely coupled approach Adaptive approach
27.
28.
29.
30. ECOG 5202 integrated sample management No coordinated data systems MD Anderson ECOG PCO ECOG CC Web portal CHI appliance CHI appliance CHI appliance CHI appliance OGSA-DQP OGSA-DAI OGSA-DAI OGSA-DAI Mediator
31.
32.
33. Many many tasks: Identifying potential drug targets 2M+ ligands Protein x target(s) (Mike Kubal, Benoit Roux, and others)
34. start report DOCK6 Receptor (1 per protein: defines pocket to bind to) ZINC 3-D structures ligands complexes NAB script parameters (defines flexible residues, #MDsteps) Amber Score: 1. AmberizeLigand 3. AmberizeComplex 5. RunNABScript end BuildNABScript NAB Script NAB Script Template Amber prep: 2. AmberizeReceptor 4. perl: gen nabscript FRED Receptor (1 per protein: defines pocket to bind to) Manually prep DOCK6 rec file Manually prep FRED rec file 1 protein (1MB) PDB protein descriptions For 1 target: 4 million tasks 500,000 cpu-hrs (50 cpu-years) 6 GB 2M structures (6 GB) DOCK6 FRED ~4M x 60s x 1 cpu ~60K cpu-hrs Amber ~10K x 20m x 1 cpu ~3K cpu-hrs Select best ~500 ~500 x 10hr x 100 cpu ~500K cpu-hrs GCMC Select best ~5K Select best ~5K
35.
36.
37. Functioning in the zone of complexity Ralph Stacey, Complexity and Creativity in Organizations , 1996 Low Low High High Agreement about outcomes Certainty about outcomes Plan and control Chaos
38. The Grid paradigm and healthcare information integration Data sources Platform services Value services Analysis Radiology Medical records Management Integration Publication Cognitive support Applications Pathology Genomics Labs Security and policy RHIO
Notes de l'éditeur
We were asked to consider an H1N1 pandemic—certainly a challenging use case for healthcare integration As pandemic proceeds, we see an expanding set of individuals and institutions involved—CDC, HHS, local hospitals, clinics. Need for rapid access to information from many sources that have not previously interacted, dynamic integration of new capabilities—data mining, simulation, etc. Explosion in the number of sick people. Need for new tests. Etc. etc. -- Rapid integration of systems that haven’t worked together before (incompatible EHR implementations) -- One off new data model -- Rapidly changing set of participants -- Dynamic integration of new capabilities -- Unknown scale
A second, very different example – information integration in a poor urban setting. A more constrained set of participants, but otherwise not that different. Many different IT systems (or nonsystems) pose significant barriers to entry and make integration difficult. Thus many untapped opportunities: better patient care, healthcare effectiveness research, clinical trial recruitment, etc.
What these (and other examples that we will not have time to review) have in common …
We cite [Rouse, Health Care as a CAS: Implications for Design… , NAE 2008] for the righthand side aprt. Must support Dynamic composition for a specific purpose Evolving community, function, environment Messy data, failure, incomplete knowledge Nice, but insufficient Data standards Platform standards Federal policies
Another perspective on the problem. A few words of explanation. If we are deploying a hospital IT system, we are (hopefully) in the bottom left hand corner. “ You can’t achieve success via central planning.” Quoted in Crossing the Quality Chasm, p. 312 In our scenarios, we don’t have that ability to control.
What is the alternative? We can put in place mechanisms that facilitate groups with some common goal to form and function. Over time, things change, these groups evolve. If we are successful, they can expand, perhaps merge. Challenges: make this easy. Leverage scale effects.
These are issues that the grid community has been working on for many years. We call these groupings Virtual Organizations. In healthcare today, there are of course many such “VOs.” But they are hard to form, fragmented, …
Principles and mechanisms that has been under development for some years. First CS, then physical sciences, then biology, most recently biomedicine –
What are these grid mechanisms and concepts, then? Hard to say something sensible in a few minutes. But basically it is about separating out concerns in a way that reduces barriers to entry and permits flexible use.
Talk about API vs Protocol Add “ilities,” function benefits to stack.
[Create an image here.] For example DICOM and HL7 combine messaging and data model in the same interoperability standard. People are contextualizing this problem at the data interoperability level. Systems interoperability often neglected. An area of differentiation, bringing in best practice in industry and science into health care space. Open source platform. Experience with systems interoperability standards: IETF, OASIS, W3C,
Attribute authorities emerge as an important system component Bridge between local and global: honest broker is an example Note sure what “policy in the network” means.
List services from
DO SOMETHING INTERESTING ON THE RIGHT Scaling via automating data adapters Representations of those things and semantics of those representations. Talk about how services are published, data modeling, etc. Publish data bases Publish services Name published objects
07/25/09 Test Built using the same mechanisms used to build SOI. -- PKI, delegation, attribute-based authorization -- Registries, monitoring Operating a service is a pain! Would be nice to outsource. But they need to be near the data, which also has privacy concerns. So things become complicated.
Objects are published, they need to be named, then they can be moved around without losing track of them Bulk data movement Fine grain access for data integration
Clinical, administrative, research. Issues often hidden and escalate Uniqueness No guaranteed global uniqueness Name ownership No ability to prove that a certain entity issued that name PHI-tainted names Filenames for some images have patientID embedded – sharing of name only may constitute HIPPA violation
Talk about handle….
TO PUT IN A SLIDE? Loose coupling and encapsulation Interoperability through integration based on data mediation Evolutionary in nature Set of scalable systems and methods Explicit in architecture – data integration layer Demonstrated in GSI, GridFTP, MDS, ECOG
Free text : common in electronic health records Tight encoding : Common in clinical trials and biomedical research Post-hoc : good but not sufficient to maintain context (e.g. Google Health fiasco) Constraining : ideal but burdensome (e.g. caBIG/caDSR deployment challenges) Warehouses : query is difficult
granularity varies according to purpose ICD-9: International Statistical Classification of Diseases CPT: Current Procedural Terminology Physicians prefer free text: maximum expressivity; but subsequent NLP/encoding loses context
This would be a good place for a graphic, perhaps showing top down vs. bottom up.
Show the types of data below? Do we really have to use CHI appliances? (That seems a substantial barrier to entry.)
DO SOMETHING INTERESTING ON THE RIGHT Scaling via automating data adapters Representations of those things and semantics of those representations. Talk about how services are published, data modeling, etc. Publish data bases Publish services Name published objects
07/25/09 Test Workflows are becoming a widespread mechanism for coordinating the execution of scientific services and linking scientific resources. Analytical and data processing pipelines. Is this stuff real? EBI 3 million+ web service API submissions in 2007 A lot? We want to publish workflows as services. Think of caBIG services as service providers that then invoke grid services to execute services. (E.g., via TeraGrid gateways.)
"docking" is the identification of the low-energy binding modes of a small molecule (ligands) within the active site of a macromolecule (receptor) whose structure is known A compound that interacts strongly with (i.e. binds) a receptor associated with a disease may inhibit its function and thus act as a drug Typical Workload: Application Size: 7MB (static binary) Static input data: 35MB (binary and ASCII text) Dynamic input data:10KB (ASCII text) Output data: 10KB (ASCII text) Expected execution time: 5~5000 seconds Parameter space: 1 billion tasks
More precisely, step 3 is “GCMC + hydration.” Mike Kubal say: “This task is a Free Energy Perturbation computation using the Grand Canonical Monte Carlo algorithm for modeling the transition of the ligand (compound) between different potential states and the General Solvent Boundary Partition to explicitly model the water molecules in the volume around the ligand and pocket of the protein. The result is a binding energy just like the task at the top of the funnel; it is just a more rigorous attempt to model the actual interaction of protein and compound. To refer to the task in short hand, you can use "GCMC + hydration". This is a method that Benoit has pioneered.”
Application Efficiency was computed between the 16 rack and 32 rack runs. Sustained Utilization is the utilization achieved during the part of the experiment while there was enough work to do, 0 to 5300 sec. Overall utilization is the number of CPU hours used divided by total number of CPU hours allocated. The experiment included the caching of the 36 MB (52MB uncompressed) archive on each of the 1 st access per node We use “dd” to move data to and from GPFS…. The application itself had some bad I/O patterns in the write, which prevented it from scaling well, so we decided to write to RAM, and then dd back to GPFS. For this particular run, we had 464 Falkon services running on 464 I/O nodes, 118K workers (256 per Falkon service), and 1 client on a login node. The 32 rack job took 15 minutes to start. It took the client 6 minutes to establish a connection and setup the corresponding state with all 464 Falkon services. It took the client 40 seconds to dispatch 118K tasks to 118K CPUs. The rest can be seen from the graph and slide text…
We could show these things as moving if we wanted to be really clever Over time, things change, these groups evolve. If we are successful, they merge
Talk about API vs Protocol Add “ilities,” function benefits to stack.