Ce diaporama a bien été signalé.
Le téléchargement de votre SlideShare est en cours. ×

Tamr | Biogen data unification imperative

Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Chargement dans…3
×

Consultez-les par la suite

1 sur 19 Publicité

Plus De Contenu Connexe

Diaporamas pour vous (20)

Publicité

Similaire à Tamr | Biogen data unification imperative (20)

Plus récents (20)

Publicité

Tamr | Biogen data unification imperative

  1. 1. THE DATA UNIFICATION IMPERATIVE ANDY PALMER | CO-FOUNDER, TAMR
  2. 2. BACKGROUND Career is a mashup of: start-ups + enterprise customer + vendor data + application technical + business
  3. 3. HEALTHCARE INVESTMENTS
  4. 4. HUGE INVESTMENT IN ENTERPRISE IT & BIG DATA Companies invested $3-4 Trillion in IT over last 20+ years And now are investing billions in “Big Data” and Analytics 3.0...
  5. 5. DIRTY LITTLE SECRET: DATA VARIETY IN ENTERPRISE Most investments oriented towards some “silo” in the enterprise ● application ● function ● division ● geography Data tied to these investments is extremely siloed
  6. 6. BIG DATA & ANALYTICS NEED CLEAN + UNIFIED DATA “Consider the more than $44 billion projected by Gartner to be spent on big data in 2014. The vast majority of it — $37.4 billion — is going to IT services. Enterprise software only accounts for about a tenth. The disproportionate spending on services is a sign of immaturity in how we manage data.” - Mahesh S. Kumar, Harvard Business Review
  7. 7. TACKLING THE ENTERPRISE DATA SILO PROBLEM All are necessary but not sufficient to truly address next-gen challenges ● Democratized visualization and modeling - radical consumption heterogeneity ● SemanticWeb/LinkedData - radical source heterogeneity ● Provenance for data to improve reliability ● Rapid iteration/change requires reproducability from source ● Desire for longitudinal data across many entities ● Need for automated data quality / assurance Traditional approaches... ● Standardization - worth trying ● Aggregation - yes - but actually makes the problem worse ● Top-down modeling (MDM/ETL) - ok for app-specific or well-defined data
  8. 8. THE MYTH OF THE SINGLE TECH VENDOR SOLUTION “Use my brand and data unification will just happen!” REALLY?
  9. 9. HEALTHCARE/BIOPHARMA IS THE FRONT LINE The diversity of data and decentralized nature of healthcare and specifically biopharmaceutical research make our industry the place where next gen data management will develop.
  10. 10. TABULAR DATA IS KEY ASSET But it’s messy ...
  11. 11. CURATION AT SCALE Hiring More Data Scientists Makes the Problem Worse Reality Enterprise RealityGoal • Manual data collection and preparation • Long lead time to analyses • Limited individual view on variety of data • Extensive rework • No cohesive view of data efforts • Expertise across organization underutilized
  12. 12. NEW TOOLS ARE NECESSARY New transformation tools are necessary… but not sufficient to solve the enterprise data variety problem Unified View A few sources... Thousands of sources
  13. 13. SOLUTION: BOTTOM-UP, PROBABILISTIC DATA MODELING & “COLLABORATIVE CURATION” Time to embrace the reality of extreme data variety across the entire enterprise - “Unified Data” Back to the future ● 1990’s web: probabilistic search / website connection ● 2020’s enterprise: probabilistic data source connection & curation Requires a bottom-up, probabilistic and collaborative approach to data (complements deterministic) ● Rules for transformation are necessary but not sufficient to solve broad problem of broad integration ● Mix of 80% probabilistic & 20% deterministic ● Iteratively and systematically engage data experts
  14. 14. CORE OF TAMR Machine Learning with Human Insight Identify sources, understand relationships and curate the massive variety of siloed data Structured and Semi-structured Data Sources Collaborative Curation Data Experts (Source owners) Data Stewards and Curators Data Inventory APIs Systems Tools Data Scientists Advanced Algorithms & Machine Learning Expert Input Integrated Data & Metadata Expert Directory
  15. 15. FORTUNE 5 BIOPHARMA Challenges • 7k+ scientists • Decentralized organization • Assay data in spreadsheets • 30k+ tables • 100k+ unique attributes • Error detection in units Tamr Unified View Thousands of Potential Sources
  16. 16. SOLUTION OVERVIEW: CDISC CONVERSION The Problem • Clinical trial data reported in wide variety of formats, ontologies and standards • Underspecified attribute names, varying qualities of annotation, duplicate data, etc… The Solution • A scalable, replicable way to automatically unify and convert clinical trial data to CDISC format. Benefit • Tamr technology solves common CDISC problems: schema mapping and expert sourcing • Faster way to aggregate and report ongoing trial data for regulatory filings • Simplified reporting for various agency ontologies
  17. 17. TAMR
  18. 18. TAMR
  19. 19. Thank You

×