Ce diaporama a bien été signalé.
Le téléchargement de votre SlideShare est en cours. ×

Dmla0609 Hoeck Presentation

Plus De Contenu Connexe

Dmla0609 Hoeck Presentation

  1. 1. Interactive Visual Data Analytics Wolfgang G. Hoeck, Ph.D. Senior Manager, Therapeutic Area Systems Amgen Inc. Laboratory Data Management, Munich, June16-17, 2009
  2. 2. Agenda  A bit about Amgen  Interactive visual data analytics explained  Screening and target identification/validation  Expectations from an interactive visual data analytics platform  Data formats ARE important  Registration systems: uniquely identifying what you are working with  Bringing data together, the art of data mapping  From tabular data to data networks Wolfgang G. Hoeck, Ph.D., Laboratory Data Management Conference, Munich, June 16/17 th, 2009 2
  3. 3. Amgen: A Biotechnology Pioneer  Founded in 1980, Amgen was one of the first biotechnology companies to successfully discover, develop and make protein-based medicines  Today, we’re leading the industry in its next wave of innovation by: – Developing therapies in multiple modalities – Driving cutting-edge research and development – Continuing to advance the science of biotechnological manufacturing 3
  4. 4. Our Worldwide Presence Cambridge, MA Norway Denmark Toronto, ON Luxembourg Finland West Greenwich, RI The Netherlands Sweden Washington, DC Belgium Estonia Burnaby, BC Ireland Latvia Lithuania Bothell, WA Russia Seattle, WA England Longmont, CO Czech Republic France Boulder, CO Poland Switzerland Slovakia Hungary Fremont, CA India United Arab Emirates South San Francisco, CA Hong Kong Greece Slovenia Thousand Oaks, CA Austria Germany Mexico City, Mexico Italy Louisville, KY Spain Australia Juncos, Puerto Rico Portugal New Zealand Wolfgang G. Hoeck, Ph.D., Laboratory Data Management Conference, Munich, June 16/17 th, 2009 4
  5. 5. Scientific Data are complex, and it’s not going to get any better  Target Identification & Validation – Gene Expression of Cell Line Panels: 200 x 45000 x 3 • Understand differential expression of one or a handful of genes • Understand expression profile in a particular cell line only – Gene Expression of tumor samples: The Cancer Genome Atlas • Pilot phase: 3 tumor types - 500 GBM/Ovarian Cancer & 200 Lung Cancer samples • Next years: 25 more tumor types  Compound/Target Profiling – 400+ targets across 100’s of small molecule compounds • Compare target properties with compound properties  Cell Line Profiling – 500 cell lines treated with 50 therapeutic molecules – Each cell line has genetic abnormalities in many genes (mutations, deletions, insertions, rearrangements, etc.) Wolfgang G. Hoeck, Ph.D., Laboratory Data Management Conference, Munich, June 16/17 th, 2009 5
  6. 6. Visualization of complex data must be made available in interactive format Wolfgang G. Hoeck, Ph.D., Laboratory Data Management Conference, Munich, June 16/17 th, 2009 6
  7. 7. Interactive browsing of pre-analyzed data – finding cell lines for in-vitro work Step 1: Select gene of interest, e.g.: EGFR Step 2: Select study of interest Step 3: Review relative expression pattern Step 4: Select cell line(s) for further work
  8. 8. Steps to share data in an interactive visual format  Determine the location of desired data (one or multiple places and/or formats)  Run a query against a database/data warehouse Power User  Capture a dataset(table) of rows and columns  Decide on needed analytics & visualizations  Determine visualization settings and state  Share the results with other scientists Decision Maker  Enable scientists to interact with data  Enable scientists to download sets of data Wolfgang G. Hoeck, Ph.D., Laboratory Data Management Conference, Munich, June 16/17 th, 2009 8
  9. 9. We have many choices to visualize data …  Table  Bar Chart  Box Plot  Scatter Plot (X/Y-Plot)  Line Chart  Heatmap  Parallel Coordinates Plot (Profile Chart)  Network  Map  TreeMap  e-Northern Wolfgang G. Hoeck, Ph.D., Laboratory Data Management Conference, Munich, June 16/17 th, 2009 9
  10. 10. …and all choices should retain interactivity Filtering a set of cell line and gene alteration data to view a particular set of cells and the set of genes harboring deletions Wolfgang G. Hoeck, Ph.D., Laboratory Data Management Conference, Munich, June 16/17 th, 2009 10
  11. 11. The ideal interactive, visual data analytics platform Desktop Clients Zero-footprint Web Clients  A desktop client – Rich interactivity, visuals – Rich analytics tools  A server component – Configurable security – Configurable data access Analysis Web Server Server  A web client – Rich interactivity Stats, etc. Server – Easy access  An API for extension capabilities DB1 DB2 DB3 Wolfgang G. Hoeck, Ph.D., Laboratory Data Management Conference, Munich, June 16/17 th, 2009 11
  12. 12. From Desktop Client to Analysis Server  Data Access  Data Analysis – From files – Clustering Methods – From databases • Hierarchical – From clipboard • K-Means – From services • PCA • SOM  Data Manipulations – Profile Searching – Data Mapping – Data Merging  Documentation – Calculations – Space to explain what was done – Data Transformations  Data Content  Visualizations – Tabular Format – Table – Multiple Tables – X/Y-Plot – Relationships between Tables – Bar Chart  Data Security – Parallel Coordinate Plot – Group Level Security – Box-Plot – Function Level Security – Networks – Integration with Corporate LDAP  Data Storage  Action Logging – One or many tabular datasets – Who, When, What Wolfgang G. Hoeck, Ph.D., Laboratory Data Management Conference, Munich, June 16/17 th, 2009 12
  13. 13. About Data Formats  Tall-Skinny – aka non-pivoted data format – Each row represents a single event  Short-Wide – aka pivoted data format – Each rows represents a summary of events in particular circumstances – Typically results in “data loss”  Subject-Verb-Object – aka network data format – aka nodes and edges – Represent complex data relationships, i.e.: everything has a potential many-to- many relationship Wolfgang G. Hoeck, Ph.D., Laboratory Data Management Conference, Munich, June 16/17 th, 2009 13
  14. 14. We are dealing with a Complex Data Concept Network – register your entities! is critical in Disease Project Target has a is critical in BioProcess is represented by works in Gene Pathway occurs in is translated into is functional in has a has a Protein Protein Gene is expressed in Status Status Diff.Expressed Postt.Modified Wildtype Mutated Cell Line is derived from Diff.Expressed Amplified/Deleted Tissue VIBEvents, Laboratory Data Management Conference, Munich, June 16/17th, 2009 14
  15. 15. Data Assembly and Integration Contract Human Gene Amgen Screening Results KinomeTree Nomenclature Project/Compound on monthly Kinase Map Database Association spreadsheets • POC/Kd values • Gene Symbol • Kinase • Compound • Entrez Gene • Full Name Classification registered for Symbol • Gene Synonyms • Manual mapping specific Amgen • Compound of Gene Symbols Project Concentration to Kinome • Compound ID classes Contract Screening Data Assembly Data get assembled in Spotfire based on in Desktop Client matching data keys such as Gene Symbol or CompoundID. Visualizations are prepared Publication Step based on scientist’s input. Filters are organized according to frequency of usage. Adjustments can typically be made in a couple of hours. The final file is published Contract Screening into a web-library accessible via hyperlink. Data Assembly in Announcements are made via e-mail and Web Client embedded hyperlink. Wolfgang G. Hoeck, Ph.D., Laboratory Data Management Conference, Munich, June 16/17 th, 2009 15
  16. 16. Viewing and interacting with integrated data Wolfgang G. Hoeck, Ph.D., Laboratory Data Management Conference, Munich, June 16/17 th, 2009
  17. 17. Biology Visualizations – Pathway Example Wolfgang G. Hoeck, Ph.D., Laboratory Data Management Conference, Munich, June 16/17 th, 2009 17
  18. 18. Network Visualizations – The hairball principle Wolfgang G. Hoeck, Ph.D., Laboratory Data Management Conference, Munich, June 16/17 th, 2009 18
  19. 19. Network Visualizations – The hairball principle resolved Tools to connect Nodes Kidney Tools to extend Nodes Bladder Wolfgang G. Hoeck, Ph.D., Laboratory Data Management Conference, Munich, June 16/17 th, 2009 19
  20. 20. Combining tabular & network visualizations Step 1: Select Disease, then select Therapeutic Molecule Step 2: Study Therapeutic Molecule network Wolfgang G. Hoeck, Ph.D., Laboratory Data Management Conference, Munich, June 16/17 th, 2009 20
  21. 21. Concluding Thoughts  Interactive visualizations are a key to making complex data shareable and understandable  If interactivity is self-explanatory, adoption is very rapid – nobody wants to read a manual  Analytics can be accomplished in the hands of the power user, it does not need to be available for everyone  Data complexity is not getting any simpler, however, with more sophisticated tools even complex data can be made accessible and understandable Thank you for your time and interest Wolfgang G. Hoeck, Ph.D., Laboratory Data Management Conference, Munich, June 16/17 th, 2009 21

×