SlideShare une entreprise Scribd logo
1  sur  28
Big Data in Genomics and Personalized
Medicine – Challenges and Solutions

Gaurav Kaul
Software Architect, Intel
JAX London 2013
Agenda
Global Healthcare Trends
The Rise of Personalized Medicine
Big Data Scenarios in Healthcare
Methods to Manage Big Data
Use Cases
Summary and Next Steps

2

*Other names and brands may be claimed as the property of others
We are at an Inflection Point in
Healthcare - TRENDS

% of population over age 60

30+ %

25-29%

20-24%

10-19%

0-9%

2050

WW Average Age 60+: 21%
Source: United Nations “Population Aging 2002”

Healthcare costs are
RISING
Significant % of GDP

Global AGING
Average Age 60+:
growing from 10% to
21% by 2050

Source: McKinsey Global Institute Analysis
ESG Research Report 2011 – North American Health Care Provider Market Size and Forecast

3

*Other names and brands may be claimed as the property of others

US Healthcare BIG DATA
Value
$300 Billion in value/year
~ 0.7% annual productivity
growth
We are at an Inflection Point in
Healthcare - TRENDS
Storage Growth

Total Data Healthcare Providers (PB)
15000

Admin

Imaging

10000

Medical Imaging Archive Projection
Case from just 1 healthcare system

EMR
Email

5000

File
Non Clin Img

0
2010 2011 2012 2013 2014 2015

Research

Data Explosion projected to reach 35 Zetabytes by 2020, with a 44-fold increase from 20095

Source: McKinsey Global Institute Analysis
ESG Research Report 2011 – North American Health Care Provider Market Size and Forecast

4

*Other names and brands may be claimed as the property of others
Sequencing Cost Trend

5

*Other names and brands may be claimed as the property of others
6

*Other names and brands may be claimed as the property of others
Vision for Personalized Medicine

7

*Other names and brands may be claimed as the property of others
How can we take
Personalized Medicine
Mainstream by 2020 ??
A “bioinformatics computing system” includes
technologies from this entire “stack”
Software Frameworks
Applications
Programming Model (abstraction)
Virtualization
System Software and Resource
Management

Computer Hardware, Storage and
Networks
A “bioinformatics computing system” includes
technologies from this entire “stack”

Software Frameworks

Applications

Programming Model
(abstraction)
Virtualization

System Software and
Resource Management

Computer
Hardware, Storage and
Networks

Multiple
Cores –
Shared
memory, multi
ple
threads, Open
MP
Multiple
Nodes –
MPI;
GAS, PGAS;
Hadoop

galaxy.psu.edu

Searching for SNPs with
cloud computing
Langmead, Schatz et al;
The Crossbow Pipeline

11

*Other names and brands may be claimed as the property of others
Big Data – A Foundation For Delivering Big Value

Big Data Building Blocks
Network

Storage

Software & Technologies

Intel® Xeon®
Product Family E3E5-E7

Intel® Ethernet
Controllers

Intelligent Storage1

Intel® Distribution for
Apache Hadoop

Energy
Efficient

Responsive

Compute

Intel®

Atom™

Xeon PhiTM

Ethernet
Adapters

Intel® Ethernet
Switch Silicon
Intel® True Scale
Fabric

Choice

High
Availability

Secure

Intel®

Intel®

Scale-out Storage1
Scale-up Storage1
Intel®

SSD 710
series, DC S3700
(SATA)
Intel® SSD 910
series (PCIe)

Intel® Node Manager
Intel® Expressway
Service Gateway
Intel® Cache
Acceleration Software
Intel’s Lustre
Intel® VT and
Intel® TXT
Intel® AES-NI

Intel’s Foundational Technologies Offer Advanced Solutions for Big data Analytics

Xeon-based storage systems are available in a wide range of configuration options from the industry’s leading storage vendors

12

Intel® Data Center
Manager

*Other names and brands may be claimed as the property of others
Big Data Compute Platform
Optimizations
Intel® Xeon® E5 Family

Intel® Xeon® E7 Family

RAM
QPI 1
QPI 2

Xeon E7-4800

CORE 3

CORE 4

QPI 4

CORE 5

CORE 6

CORE 7

CORE 8

CORE 9

CORE 10

Up to 4 channels
DDR3 1600 MHz
memory
Up to 8 cores
Up to 20 MB

cache

SCALE-OUT with Hadoop
and analytic/DW engines

Proof point: E5 Analytics 25X Improvement
Hadoop on E5

13

CORE 2

QPI 3

Integrated
PCI
Express*
3.0
Up to 40
lanes
per socket

CORE 1

*Other names and brands may be claimed as the property of others

4 QPI 1.0
Lanes for
robust
scalability

Up to 8 channels
DDR3 1066 MHz
memory

CACHE

Up to 10 cores
Up to 30 MB

cache

SCALE-UP in-memory analytic engines
and databases: Oracle*, SAS*, SAP Hana*

Proof point: SAP HANA
Big Data – A Foundation For Delivering Big Value

Intel® Ethernet Reduces Time to Process Large Data Sets

1GbE Network Connections

Trends and Challenges
Big data is hitting the enterprise with
unprecedented
volume, velocity, variety, complexity, and
OPPORTUNITY

Intel® Ethernet Solution
Up to 20x performance boost over legacy
infrastructure with optimizations on
Intel® Xeon® processors, Intel® SSD
storage, and 10Gb Intel® Ethernet
networking
10 Gigabit Ethernet allows quicker import
and export of large data sets for processing

VM VM VM

VM VM VM

Hypervisor

Hypervisor

Moving the Data with 10GbE
Up to

*Other names and brands may be claimed as the property of others

Up to

80%

15%

Reduction
in Cables & Switch
ports

Reduction
in Infrastructure
Costs

1 http://www.intel.com/content/dam/www/public/us/en/documents/white-papers/10gbe-10gbase-t-hadoop-clusters-paper.pdf

14

2 Ports 10GbE

10 Ports 1GbE

Up to

2x

Improved
Bandwidth per
Server
Big Data – A Foundation For Delivering Big Value

Intel® CAS with Intel® SSD Solution
Added as cache layer accelerates Big Data workloads

50X IOPS
3X TPC-C
20X TPC-H

Performance near equal to replacing all hard drives
with SSDs at significantly lower cost
http://www.intel.com/content/www/us/en/mission-critical/mission-critical-scalability-oracle-intel-brief.html

15

*Other names and brands may be claimed as the property of others

throughput performance
Big Data – A Foundation For Delivering Big Value

Data Methods for the Right Data Structure
Unstructured
Data

Emerging
Technologies

Analytical
Paradigms

MapReduce
/Hive

Structured
Data

Relational
Database

EXALYTICS

* Other names and brands may be claimed as the property of others.

16

*Other names and brands may be claimed as the property of others
Big Data – A Foundation For Delivering Big Value

HiTune (URL)

Intel® Distribution for Apache Hadoop* & Tools

MapReduce

File-based Encryption in HDFS

Up to 20x faster decryption with AES-NI*
Role-based access control for Hadoop services

Instrument

Up to 8.5X faster Hive queries using HBase co-processor

Aggregation
Engine

Report
Engine

HiTune Controller

Optimized for SSD with Cache Acceleration Software
Adaptive replication in HDFS and HBase

HiBench (URL)

Integrated text search with Lucene
1

2

Micro Benchmarks
Sort
WordCount
TeraSort

Simplified deployment & comprehensive monitoring
Deployment of HBase across multiple datacenters

Web Search

Nutch Indexing
Page Rank

HiBench

Automated configuration with Intel ® Active Tuner
Detailed profiling of Hadoop jobs
Simplified design of HBase schemas (+ in 2.4)
REST APIs for deployment and management (+ in 2.4)

3

Machine Learning

Bayesian Classification
K-Means Clustering

4

HDFS

Enhanced DFSIO

Result = many Hadoop optimization tips
(IDF2012 presentation “Big Data
Analytics on a Performance-optimized
Hadoop Infrastructure”)

17

*Other names and brands may be claimed as the property of others
Life Sciences 2013:
Key Industry Challenges and Solutions
Many (most) applications are singlethreaded, single address space
Intel is delivering optimizations working with
open source community, developing
NGS+HPC curriculum

Some algorithms scale quadratically with the
size of the problem. Large data sets exceed
available memory and storage
Innovations in
acceleration, compute, storage, networking,
security, and *-as-a-service.

International collaboration is an
imperative, bioinformatics expertise is scarce
Intel is working closely with the ecosystem to
address enterprise to cloud transmission of
terabyte payloads

Need are distributed, data is siloed and
for Balanced Compute Infrastructure
Databases

18will likely stay that way

*Other names and brands may be claimed as the property of others
Examples of Intel®-powered Servers in Big Data
and Analytics

Cisco* UCS Server1
Intel® Xeon® 5600

Cisco UCS server with EMC
Greenplum MR software “enterprise-class” Hadoop*
distribution that features
technology from MapR

1

Dell* PowerEdge* C Series2
Intel Xeon 5500/5600

The Dell | Cloudera* solution for
Apache* Hadoop sold pre-configured

Oracle* Sun Fire* server3
Intel Xeon E7-4800

Oracle Exalytics* In-Memory
Machine, features the Oracle BI
Foundation Suite and Oracle
TimesTen In-Memory Database for
Exalytics

http://gigaom.com/cloud/ciscos-servers-now-tuned-for-hadoop/
http://www.businesswire.com/news/home/20110804005376/en/Dell-Cloudera-Collaborate-Enable-Large-Scale-Data
3
19 http://www.itp.net/mobile/588145-oracle-unveils-exalytics-in-memory-machine
INTEL CONFIDENTIAL
2
Solution 4.0 – NGS Appliances
16 Cores
96 GB RAM
18T Red. Storage
SSD for OS

32 Cores
1.2 TFlops
18-56TB RAID
NSS-HA Pair

NSS User Data

HSS Metadata Pair

HSS OSS Pair

HSS User Data

2U Plenum
Actual placement in racks may vary.

Scale through independent solutions,
each targeting a different segment & usage model
20
Intel Confidential may be claimed as the property of others
*Other names and brands
NGS Appliance
Dell Scalable Unit “SANGER”
Infrastructure:
Dell PE, PC & F10

NSS-HA Pair
NSS User
Data

Dell NSS (NFS)
(up to 180TB)

Challenge: Experiment processing takes 7
days with current infrastructure. Delays
treatment for sick patients
Solution: Dell Next Generation Sequencing
Appliance
•
•

HSS Metadata
Pair
HSS OSS Pair

Dell HSS (Lustre)
(up to 360TB)

9 Teraflops of Sandy Bridge Processors

•

Lustre File Storage

•

Intel SW tools and engineers

Benefits: RNA-Seq processing reduced to
4 hour

HSS User
Data

M420 (Compute)
(up to 32 nodes)

2U Plenum

21

Single Rack Solution

*Other names and racks may vary.
Actual placement in brands may be claimed as the property of others

Includes everything you need for NGS compute, storage, software, networking, infra
structure, installation, deployment, training,
service & support
22

*Other names and brands may be claimed as the property of others
Use Case: NEXTBIO

Analytics for Genomics Data
•

Cost to sequence a Genome has fallen by
800x in the last 4 years

•

Each Genome has ~4 million variants

•

Growth in the genomics data in the public
and private domain

•

Data available in variety of sources
–

•

Structured, semi-structured, Un-structured

New aggregated data growing
exponentially

Sequencing
3 Billion
base Pairs

23

Data
Processing
Cloud Storage
Visualization
Millions of
variants

*Other names and brands may be claimed as the property of others

Interpretation &
Analytics
Millions of Variants
Millions of Patients

Commercializing
Targeted
Therapeutics
Companion
Diagnostics
Actionable Biomarkers
Data-Intensive Discovery: Genomics
Value
Enable researchers to discover biomarkers and
drug targets by correlating genomic data sets
90% gain in throughput; 6X data compression

Analytics
Provide curated data sets with pre-computed
analysis (classification, correlation, biomarkers)
Provide APIs for applications to combine and
analyze public and private data sets

Data Management
Use Hive and Hadoop for query and search
Dynamically partition and scale Hbase
10-node cluster / Intel Xeon E5 processors
10GbE network

24

*Other names and brands may be claimed as the property of others

Intel Distribution
Use Case: NEXTBIO

Nextbio & Intel Collaboration
Technical Challenge:
Immutable Data – write once,
change, read many times

never

Traditional Bloom Filters works
Hadoop & HBase well suited
1 Genome  10 Million rows
100 Genomes  1Billion rows
1M Genomes  10 Trillion rows
100M Genomes  1 Quadrillion
1,000,000,000,000,000 rows

App can dynamically partitions HBase as
data size grows
Intel Optimizations for Hadoop:
Optimized Hadoop stack in Open Source
Stabilize HBase to provide reliable scalable
25
deployment

*Other names and brands may be claimed as the property of others
Putting it together ..
Software Frameworks
Applications
Programming Model (abstraction)

Virtualization
System Software and Resource
Management

Computer Hardware, Storage and
Networks
Summary
• Enabling ecosystem of partners to innovate and make
Personalized Medicine vision a reality

• Delivering hardware-enhanced capabilities and software to
deploy Personalized Medicine
• Work with Big Data Vendors to onboard increasing number
of life science workloads to Hadoop and other analytics
technologies
Q&A

GAURAV.KAUL@INTEL.COM

Contenu connexe

Tendances

POWER9 AC922 Newell System - HPC & AI
POWER9 AC922 Newell System - HPC & AI POWER9 AC922 Newell System - HPC & AI
POWER9 AC922 Newell System - HPC & AI Anand Haridass
 
08 Supercomputer Fugaku
08 Supercomputer Fugaku08 Supercomputer Fugaku
08 Supercomputer FugakuRCCSRENKEI
 
HPC Infrastructure To Solve The CFD Grand Challenge
HPC Infrastructure To Solve The CFD Grand ChallengeHPC Infrastructure To Solve The CFD Grand Challenge
HPC Infrastructure To Solve The CFD Grand ChallengeAnand Haridass
 
EMC in HPC – The Journey so far and the Road Ahead
EMC in HPC – The Journey so far and the Road AheadEMC in HPC – The Journey so far and the Road Ahead
EMC in HPC – The Journey so far and the Road Aheadinside-BigData.com
 
OpenCAPI-based Image Analysis Pipeline for 18 GB/s kilohertz-framerate X-ray ...
OpenCAPI-based Image Analysis Pipeline for 18 GB/s kilohertz-framerate X-ray ...OpenCAPI-based Image Analysis Pipeline for 18 GB/s kilohertz-framerate X-ray ...
OpenCAPI-based Image Analysis Pipeline for 18 GB/s kilohertz-framerate X-ray ...Ganesan Narayanasamy
 
IBM Data Centric Systems & OpenPOWER
IBM Data Centric Systems & OpenPOWERIBM Data Centric Systems & OpenPOWER
IBM Data Centric Systems & OpenPOWERinside-BigData.com
 
Ibm symp14 referentin_barbara koch_power_8 launch bk
Ibm symp14 referentin_barbara koch_power_8 launch bkIbm symp14 referentin_barbara koch_power_8 launch bk
Ibm symp14 referentin_barbara koch_power_8 launch bkIBM Switzerland
 
HPC DAY 2017 | The network part in accelerating Machine-Learning and Big-Data
HPC DAY 2017 | The network part in accelerating Machine-Learning and Big-DataHPC DAY 2017 | The network part in accelerating Machine-Learning and Big-Data
HPC DAY 2017 | The network part in accelerating Machine-Learning and Big-DataHPC DAY
 
13 Supercomputer-Scale AI with Cerebras Systems
13 Supercomputer-Scale AI with Cerebras Systems13 Supercomputer-Scale AI with Cerebras Systems
13 Supercomputer-Scale AI with Cerebras SystemsRCCSRENKEI
 
HPC DAY 2017 | HPE Storage and Data Management for Big Data
HPC DAY 2017 | HPE Storage and Data Management for Big DataHPC DAY 2017 | HPE Storage and Data Management for Big Data
HPC DAY 2017 | HPE Storage and Data Management for Big DataHPC DAY
 
01 From K to Fugaku
01 From K to Fugaku01 From K to Fugaku
01 From K to FugakuRCCSRENKEI
 
Mellanox Announces HDR 200 Gb/s InfiniBand Solutions
Mellanox Announces HDR 200 Gb/s InfiniBand SolutionsMellanox Announces HDR 200 Gb/s InfiniBand Solutions
Mellanox Announces HDR 200 Gb/s InfiniBand Solutionsinside-BigData.com
 
Exceeding the Limits of Air Cooling to Unlock Greater Potential in HPC
Exceeding the Limits of Air Cooling to Unlock Greater Potential in HPCExceeding the Limits of Air Cooling to Unlock Greater Potential in HPC
Exceeding the Limits of Air Cooling to Unlock Greater Potential in HPCinside-BigData.com
 
Implementing AI: High Performance Architectures: Large scale HPC hardware in ...
Implementing AI: High Performance Architectures: Large scale HPC hardware in ...Implementing AI: High Performance Architectures: Large scale HPC hardware in ...
Implementing AI: High Performance Architectures: Large scale HPC hardware in ...KTN
 
HPC DAY 2017 | HPE Strategy And Portfolio for AI, BigData and HPC
HPC DAY 2017 | HPE Strategy And Portfolio for AI, BigData and HPCHPC DAY 2017 | HPE Strategy And Portfolio for AI, BigData and HPC
HPC DAY 2017 | HPE Strategy And Portfolio for AI, BigData and HPCHPC DAY
 
Nvidia SC16: The Greatest Challenges Can't Wait
Nvidia SC16: The Greatest Challenges Can't WaitNvidia SC16: The Greatest Challenges Can't Wait
Nvidia SC16: The Greatest Challenges Can't Waitinside-BigData.com
 
NNSA Explorations: ARM for Supercomputing
NNSA Explorations: ARM for SupercomputingNNSA Explorations: ARM for Supercomputing
NNSA Explorations: ARM for Supercomputinginside-BigData.com
 
High Performance Interconnects: Assessment & Rankings
High Performance Interconnects: Assessment & RankingsHigh Performance Interconnects: Assessment & Rankings
High Performance Interconnects: Assessment & Rankingsinside-BigData.com
 

Tendances (20)

POWER9 AC922 Newell System - HPC & AI
POWER9 AC922 Newell System - HPC & AI POWER9 AC922 Newell System - HPC & AI
POWER9 AC922 Newell System - HPC & AI
 
Summit workshop thompto
Summit workshop thomptoSummit workshop thompto
Summit workshop thompto
 
SGI HPC Update for June 2013
SGI HPC Update for June 2013SGI HPC Update for June 2013
SGI HPC Update for June 2013
 
08 Supercomputer Fugaku
08 Supercomputer Fugaku08 Supercomputer Fugaku
08 Supercomputer Fugaku
 
HPC Infrastructure To Solve The CFD Grand Challenge
HPC Infrastructure To Solve The CFD Grand ChallengeHPC Infrastructure To Solve The CFD Grand Challenge
HPC Infrastructure To Solve The CFD Grand Challenge
 
EMC in HPC – The Journey so far and the Road Ahead
EMC in HPC – The Journey so far and the Road AheadEMC in HPC – The Journey so far and the Road Ahead
EMC in HPC – The Journey so far and the Road Ahead
 
OpenCAPI-based Image Analysis Pipeline for 18 GB/s kilohertz-framerate X-ray ...
OpenCAPI-based Image Analysis Pipeline for 18 GB/s kilohertz-framerate X-ray ...OpenCAPI-based Image Analysis Pipeline for 18 GB/s kilohertz-framerate X-ray ...
OpenCAPI-based Image Analysis Pipeline for 18 GB/s kilohertz-framerate X-ray ...
 
IBM Data Centric Systems & OpenPOWER
IBM Data Centric Systems & OpenPOWERIBM Data Centric Systems & OpenPOWER
IBM Data Centric Systems & OpenPOWER
 
Ibm symp14 referentin_barbara koch_power_8 launch bk
Ibm symp14 referentin_barbara koch_power_8 launch bkIbm symp14 referentin_barbara koch_power_8 launch bk
Ibm symp14 referentin_barbara koch_power_8 launch bk
 
HPC DAY 2017 | The network part in accelerating Machine-Learning and Big-Data
HPC DAY 2017 | The network part in accelerating Machine-Learning and Big-DataHPC DAY 2017 | The network part in accelerating Machine-Learning and Big-Data
HPC DAY 2017 | The network part in accelerating Machine-Learning and Big-Data
 
13 Supercomputer-Scale AI with Cerebras Systems
13 Supercomputer-Scale AI with Cerebras Systems13 Supercomputer-Scale AI with Cerebras Systems
13 Supercomputer-Scale AI with Cerebras Systems
 
HPC DAY 2017 | HPE Storage and Data Management for Big Data
HPC DAY 2017 | HPE Storage and Data Management for Big DataHPC DAY 2017 | HPE Storage and Data Management for Big Data
HPC DAY 2017 | HPE Storage and Data Management for Big Data
 
01 From K to Fugaku
01 From K to Fugaku01 From K to Fugaku
01 From K to Fugaku
 
Mellanox Announces HDR 200 Gb/s InfiniBand Solutions
Mellanox Announces HDR 200 Gb/s InfiniBand SolutionsMellanox Announces HDR 200 Gb/s InfiniBand Solutions
Mellanox Announces HDR 200 Gb/s InfiniBand Solutions
 
Exceeding the Limits of Air Cooling to Unlock Greater Potential in HPC
Exceeding the Limits of Air Cooling to Unlock Greater Potential in HPCExceeding the Limits of Air Cooling to Unlock Greater Potential in HPC
Exceeding the Limits of Air Cooling to Unlock Greater Potential in HPC
 
Implementing AI: High Performance Architectures: Large scale HPC hardware in ...
Implementing AI: High Performance Architectures: Large scale HPC hardware in ...Implementing AI: High Performance Architectures: Large scale HPC hardware in ...
Implementing AI: High Performance Architectures: Large scale HPC hardware in ...
 
HPC DAY 2017 | HPE Strategy And Portfolio for AI, BigData and HPC
HPC DAY 2017 | HPE Strategy And Portfolio for AI, BigData and HPCHPC DAY 2017 | HPE Strategy And Portfolio for AI, BigData and HPC
HPC DAY 2017 | HPE Strategy And Portfolio for AI, BigData and HPC
 
Nvidia SC16: The Greatest Challenges Can't Wait
Nvidia SC16: The Greatest Challenges Can't WaitNvidia SC16: The Greatest Challenges Can't Wait
Nvidia SC16: The Greatest Challenges Can't Wait
 
NNSA Explorations: ARM for Supercomputing
NNSA Explorations: ARM for SupercomputingNNSA Explorations: ARM for Supercomputing
NNSA Explorations: ARM for Supercomputing
 
High Performance Interconnects: Assessment & Rankings
High Performance Interconnects: Assessment & RankingsHigh Performance Interconnects: Assessment & Rankings
High Performance Interconnects: Assessment & Rankings
 

Similaire à Jax 2013 - Big Data and Personalised Medicine

2014 Big_Data_Forum_Intel
2014 Big_Data_Forum_Intel2014 Big_Data_Forum_Intel
2014 Big_Data_Forum_IntelCOMPUTEX TAIPEI
 
Computação de Alto Desempenho - Fator chave para a competitividade do País, d...
Computação de Alto Desempenho - Fator chave para a competitividade do País, d...Computação de Alto Desempenho - Fator chave para a competitividade do País, d...
Computação de Alto Desempenho - Fator chave para a competitividade do País, d...Igor José F. Freitas
 
High Performance Computing and Big Data: The coming wave
High Performance Computing and Big Data: The coming waveHigh Performance Computing and Big Data: The coming wave
High Performance Computing and Big Data: The coming waveIntel IT Center
 
Intel life sciences_personalizedmedicine_stanford biomed 052214 dist
Intel life sciences_personalizedmedicine_stanford biomed 052214 distIntel life sciences_personalizedmedicine_stanford biomed 052214 dist
Intel life sciences_personalizedmedicine_stanford biomed 052214 distKetan Paranjape
 
Dell Digital Transformation Through AI and Data Analytics Webinar
Dell Digital Transformation Through AI and  Data Analytics WebinarDell Digital Transformation Through AI and  Data Analytics Webinar
Dell Digital Transformation Through AI and Data Analytics WebinarBill Wong
 
Big data high performance computing commenting
Big data   high performance computing commentingBig data   high performance computing commenting
Big data high performance computing commentingIntel IT Center
 
Phoenix Data Conference - Big Data Analytics for IoT 11/4/17
Phoenix Data Conference - Big Data Analytics for IoT 11/4/17Phoenix Data Conference - Big Data Analytics for IoT 11/4/17
Phoenix Data Conference - Big Data Analytics for IoT 11/4/17Mark Goldstein
 
Innovating to Create a Brighter Future for AI, HPC, and Big Data
Innovating to Create a Brighter Future for AI, HPC, and Big DataInnovating to Create a Brighter Future for AI, HPC, and Big Data
Innovating to Create a Brighter Future for AI, HPC, and Big Datainside-BigData.com
 
Ron Kasabian - Intel Big Data & Cloud Summit 2013
Ron Kasabian - Intel Big Data & Cloud Summit 2013Ron Kasabian - Intel Big Data & Cloud Summit 2013
Ron Kasabian - Intel Big Data & Cloud Summit 2013IntelAPAC
 
Big Data Analytics from Edge to Core
Big Data Analytics from Edge to CoreBig Data Analytics from Edge to Core
Big Data Analytics from Edge to CoreDataWorks Summit
 
Big Data Intel® Platform
Big Data Intel® PlatformBig Data Intel® Platform
Big Data Intel® Platformxband
 
Exploring the Wider World of Big Data
Exploring the Wider World of Big DataExploring the Wider World of Big Data
Exploring the Wider World of Big DataNetApp
 
HPE Solutions for Challenges in AI and Big Data
HPE Solutions for Challenges in AI and Big DataHPE Solutions for Challenges in AI and Big Data
HPE Solutions for Challenges in AI and Big DataLviv Startup Club
 
Saviak lviv ai-2019-e-mail (1)
Saviak lviv ai-2019-e-mail (1)Saviak lviv ai-2019-e-mail (1)
Saviak lviv ai-2019-e-mail (1)Lviv Startup Club
 
The Importance of Fast, Scalable Storage for Today’s HPC
The Importance of Fast, Scalable Storage for Today’s HPCThe Importance of Fast, Scalable Storage for Today’s HPC
The Importance of Fast, Scalable Storage for Today’s HPCIntel IT Center
 
Webinar: The Modern Streaming Data Stack with Kinetica & StreamSets
Webinar: The Modern Streaming Data Stack with Kinetica & StreamSetsWebinar: The Modern Streaming Data Stack with Kinetica & StreamSets
Webinar: The Modern Streaming Data Stack with Kinetica & StreamSetsKinetica
 
Tackling Retail Technology Management Challenges at the Edge
Tackling Retail Technology Management Challenges at the EdgeTackling Retail Technology Management Challenges at the Edge
Tackling Retail Technology Management Challenges at the EdgeRebekah Rodriguez
 
HKG15-The Machine: A new kind of computer- Keynote by Dejan Milojicic
HKG15-The Machine: A new kind of computer- Keynote by Dejan MilojicicHKG15-The Machine: A new kind of computer- Keynote by Dejan Milojicic
HKG15-The Machine: A new kind of computer- Keynote by Dejan MilojicicLinaro
 
Big data intel platform commenting
Big data   intel platform commentingBig data   intel platform commenting
Big data intel platform commentingIntel IT Center
 

Similaire à Jax 2013 - Big Data and Personalised Medicine (20)

2014 Big_Data_Forum_Intel
2014 Big_Data_Forum_Intel2014 Big_Data_Forum_Intel
2014 Big_Data_Forum_Intel
 
Computação de Alto Desempenho - Fator chave para a competitividade do País, d...
Computação de Alto Desempenho - Fator chave para a competitividade do País, d...Computação de Alto Desempenho - Fator chave para a competitividade do País, d...
Computação de Alto Desempenho - Fator chave para a competitividade do País, d...
 
High Performance Computing and Big Data: The coming wave
High Performance Computing and Big Data: The coming waveHigh Performance Computing and Big Data: The coming wave
High Performance Computing and Big Data: The coming wave
 
Intel life sciences_personalizedmedicine_stanford biomed 052214 dist
Intel life sciences_personalizedmedicine_stanford biomed 052214 distIntel life sciences_personalizedmedicine_stanford biomed 052214 dist
Intel life sciences_personalizedmedicine_stanford biomed 052214 dist
 
Dell Digital Transformation Through AI and Data Analytics Webinar
Dell Digital Transformation Through AI and  Data Analytics WebinarDell Digital Transformation Through AI and  Data Analytics Webinar
Dell Digital Transformation Through AI and Data Analytics Webinar
 
Big data high performance computing commenting
Big data   high performance computing commentingBig data   high performance computing commenting
Big data high performance computing commenting
 
Phoenix Data Conference - Big Data Analytics for IoT 11/4/17
Phoenix Data Conference - Big Data Analytics for IoT 11/4/17Phoenix Data Conference - Big Data Analytics for IoT 11/4/17
Phoenix Data Conference - Big Data Analytics for IoT 11/4/17
 
Innovating to Create a Brighter Future for AI, HPC, and Big Data
Innovating to Create a Brighter Future for AI, HPC, and Big DataInnovating to Create a Brighter Future for AI, HPC, and Big Data
Innovating to Create a Brighter Future for AI, HPC, and Big Data
 
Ron Kasabian - Intel Big Data & Cloud Summit 2013
Ron Kasabian - Intel Big Data & Cloud Summit 2013Ron Kasabian - Intel Big Data & Cloud Summit 2013
Ron Kasabian - Intel Big Data & Cloud Summit 2013
 
Big Data Analytics from Edge to Core
Big Data Analytics from Edge to CoreBig Data Analytics from Edge to Core
Big Data Analytics from Edge to Core
 
Big Data and OSS at IBM
Big Data and OSS at IBMBig Data and OSS at IBM
Big Data and OSS at IBM
 
Big Data Intel® Platform
Big Data Intel® PlatformBig Data Intel® Platform
Big Data Intel® Platform
 
Exploring the Wider World of Big Data
Exploring the Wider World of Big DataExploring the Wider World of Big Data
Exploring the Wider World of Big Data
 
HPE Solutions for Challenges in AI and Big Data
HPE Solutions for Challenges in AI and Big DataHPE Solutions for Challenges in AI and Big Data
HPE Solutions for Challenges in AI and Big Data
 
Saviak lviv ai-2019-e-mail (1)
Saviak lviv ai-2019-e-mail (1)Saviak lviv ai-2019-e-mail (1)
Saviak lviv ai-2019-e-mail (1)
 
The Importance of Fast, Scalable Storage for Today’s HPC
The Importance of Fast, Scalable Storage for Today’s HPCThe Importance of Fast, Scalable Storage for Today’s HPC
The Importance of Fast, Scalable Storage for Today’s HPC
 
Webinar: The Modern Streaming Data Stack with Kinetica & StreamSets
Webinar: The Modern Streaming Data Stack with Kinetica & StreamSetsWebinar: The Modern Streaming Data Stack with Kinetica & StreamSets
Webinar: The Modern Streaming Data Stack with Kinetica & StreamSets
 
Tackling Retail Technology Management Challenges at the Edge
Tackling Retail Technology Management Challenges at the EdgeTackling Retail Technology Management Challenges at the Edge
Tackling Retail Technology Management Challenges at the Edge
 
HKG15-The Machine: A new kind of computer- Keynote by Dejan Milojicic
HKG15-The Machine: A new kind of computer- Keynote by Dejan MilojicicHKG15-The Machine: A new kind of computer- Keynote by Dejan Milojicic
HKG15-The Machine: A new kind of computer- Keynote by Dejan Milojicic
 
Big data intel platform commenting
Big data   intel platform commentingBig data   intel platform commenting
Big data intel platform commenting
 

Dernier

From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 

Dernier (20)

From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 

Jax 2013 - Big Data and Personalised Medicine

  • 1. Big Data in Genomics and Personalized Medicine – Challenges and Solutions Gaurav Kaul Software Architect, Intel JAX London 2013
  • 2. Agenda Global Healthcare Trends The Rise of Personalized Medicine Big Data Scenarios in Healthcare Methods to Manage Big Data Use Cases Summary and Next Steps 2 *Other names and brands may be claimed as the property of others
  • 3. We are at an Inflection Point in Healthcare - TRENDS % of population over age 60 30+ % 25-29% 20-24% 10-19% 0-9% 2050 WW Average Age 60+: 21% Source: United Nations “Population Aging 2002” Healthcare costs are RISING Significant % of GDP Global AGING Average Age 60+: growing from 10% to 21% by 2050 Source: McKinsey Global Institute Analysis ESG Research Report 2011 – North American Health Care Provider Market Size and Forecast 3 *Other names and brands may be claimed as the property of others US Healthcare BIG DATA Value $300 Billion in value/year ~ 0.7% annual productivity growth
  • 4. We are at an Inflection Point in Healthcare - TRENDS Storage Growth Total Data Healthcare Providers (PB) 15000 Admin Imaging 10000 Medical Imaging Archive Projection Case from just 1 healthcare system EMR Email 5000 File Non Clin Img 0 2010 2011 2012 2013 2014 2015 Research Data Explosion projected to reach 35 Zetabytes by 2020, with a 44-fold increase from 20095 Source: McKinsey Global Institute Analysis ESG Research Report 2011 – North American Health Care Provider Market Size and Forecast 4 *Other names and brands may be claimed as the property of others
  • 5. Sequencing Cost Trend 5 *Other names and brands may be claimed as the property of others
  • 6. 6 *Other names and brands may be claimed as the property of others
  • 7. Vision for Personalized Medicine 7 *Other names and brands may be claimed as the property of others
  • 8. How can we take Personalized Medicine Mainstream by 2020 ??
  • 9. A “bioinformatics computing system” includes technologies from this entire “stack” Software Frameworks Applications Programming Model (abstraction) Virtualization System Software and Resource Management Computer Hardware, Storage and Networks
  • 10. A “bioinformatics computing system” includes technologies from this entire “stack” Software Frameworks Applications Programming Model (abstraction) Virtualization System Software and Resource Management Computer Hardware, Storage and Networks Multiple Cores – Shared memory, multi ple threads, Open MP Multiple Nodes – MPI; GAS, PGAS; Hadoop galaxy.psu.edu Searching for SNPs with cloud computing Langmead, Schatz et al;
  • 11. The Crossbow Pipeline 11 *Other names and brands may be claimed as the property of others
  • 12. Big Data – A Foundation For Delivering Big Value Big Data Building Blocks Network Storage Software & Technologies Intel® Xeon® Product Family E3E5-E7 Intel® Ethernet Controllers Intelligent Storage1 Intel® Distribution for Apache Hadoop Energy Efficient Responsive Compute Intel® Atom™ Xeon PhiTM Ethernet Adapters Intel® Ethernet Switch Silicon Intel® True Scale Fabric Choice High Availability Secure Intel® Intel® Scale-out Storage1 Scale-up Storage1 Intel® SSD 710 series, DC S3700 (SATA) Intel® SSD 910 series (PCIe) Intel® Node Manager Intel® Expressway Service Gateway Intel® Cache Acceleration Software Intel’s Lustre Intel® VT and Intel® TXT Intel® AES-NI Intel’s Foundational Technologies Offer Advanced Solutions for Big data Analytics Xeon-based storage systems are available in a wide range of configuration options from the industry’s leading storage vendors 12 Intel® Data Center Manager *Other names and brands may be claimed as the property of others
  • 13. Big Data Compute Platform Optimizations Intel® Xeon® E5 Family Intel® Xeon® E7 Family RAM QPI 1 QPI 2 Xeon E7-4800 CORE 3 CORE 4 QPI 4 CORE 5 CORE 6 CORE 7 CORE 8 CORE 9 CORE 10 Up to 4 channels DDR3 1600 MHz memory Up to 8 cores Up to 20 MB cache SCALE-OUT with Hadoop and analytic/DW engines Proof point: E5 Analytics 25X Improvement Hadoop on E5 13 CORE 2 QPI 3 Integrated PCI Express* 3.0 Up to 40 lanes per socket CORE 1 *Other names and brands may be claimed as the property of others 4 QPI 1.0 Lanes for robust scalability Up to 8 channels DDR3 1066 MHz memory CACHE Up to 10 cores Up to 30 MB cache SCALE-UP in-memory analytic engines and databases: Oracle*, SAS*, SAP Hana* Proof point: SAP HANA
  • 14. Big Data – A Foundation For Delivering Big Value Intel® Ethernet Reduces Time to Process Large Data Sets 1GbE Network Connections Trends and Challenges Big data is hitting the enterprise with unprecedented volume, velocity, variety, complexity, and OPPORTUNITY Intel® Ethernet Solution Up to 20x performance boost over legacy infrastructure with optimizations on Intel® Xeon® processors, Intel® SSD storage, and 10Gb Intel® Ethernet networking 10 Gigabit Ethernet allows quicker import and export of large data sets for processing VM VM VM VM VM VM Hypervisor Hypervisor Moving the Data with 10GbE Up to *Other names and brands may be claimed as the property of others Up to 80% 15% Reduction in Cables & Switch ports Reduction in Infrastructure Costs 1 http://www.intel.com/content/dam/www/public/us/en/documents/white-papers/10gbe-10gbase-t-hadoop-clusters-paper.pdf 14 2 Ports 10GbE 10 Ports 1GbE Up to 2x Improved Bandwidth per Server
  • 15. Big Data – A Foundation For Delivering Big Value Intel® CAS with Intel® SSD Solution Added as cache layer accelerates Big Data workloads 50X IOPS 3X TPC-C 20X TPC-H Performance near equal to replacing all hard drives with SSDs at significantly lower cost http://www.intel.com/content/www/us/en/mission-critical/mission-critical-scalability-oracle-intel-brief.html 15 *Other names and brands may be claimed as the property of others throughput performance
  • 16. Big Data – A Foundation For Delivering Big Value Data Methods for the Right Data Structure Unstructured Data Emerging Technologies Analytical Paradigms MapReduce /Hive Structured Data Relational Database EXALYTICS * Other names and brands may be claimed as the property of others. 16 *Other names and brands may be claimed as the property of others
  • 17. Big Data – A Foundation For Delivering Big Value HiTune (URL) Intel® Distribution for Apache Hadoop* & Tools MapReduce File-based Encryption in HDFS Up to 20x faster decryption with AES-NI* Role-based access control for Hadoop services Instrument Up to 8.5X faster Hive queries using HBase co-processor Aggregation Engine Report Engine HiTune Controller Optimized for SSD with Cache Acceleration Software Adaptive replication in HDFS and HBase HiBench (URL) Integrated text search with Lucene 1 2 Micro Benchmarks Sort WordCount TeraSort Simplified deployment & comprehensive monitoring Deployment of HBase across multiple datacenters Web Search Nutch Indexing Page Rank HiBench Automated configuration with Intel ® Active Tuner Detailed profiling of Hadoop jobs Simplified design of HBase schemas (+ in 2.4) REST APIs for deployment and management (+ in 2.4) 3 Machine Learning Bayesian Classification K-Means Clustering 4 HDFS Enhanced DFSIO Result = many Hadoop optimization tips (IDF2012 presentation “Big Data Analytics on a Performance-optimized Hadoop Infrastructure”) 17 *Other names and brands may be claimed as the property of others
  • 18. Life Sciences 2013: Key Industry Challenges and Solutions Many (most) applications are singlethreaded, single address space Intel is delivering optimizations working with open source community, developing NGS+HPC curriculum Some algorithms scale quadratically with the size of the problem. Large data sets exceed available memory and storage Innovations in acceleration, compute, storage, networking, security, and *-as-a-service. International collaboration is an imperative, bioinformatics expertise is scarce Intel is working closely with the ecosystem to address enterprise to cloud transmission of terabyte payloads Need are distributed, data is siloed and for Balanced Compute Infrastructure Databases 18will likely stay that way *Other names and brands may be claimed as the property of others
  • 19. Examples of Intel®-powered Servers in Big Data and Analytics Cisco* UCS Server1 Intel® Xeon® 5600 Cisco UCS server with EMC Greenplum MR software “enterprise-class” Hadoop* distribution that features technology from MapR 1 Dell* PowerEdge* C Series2 Intel Xeon 5500/5600 The Dell | Cloudera* solution for Apache* Hadoop sold pre-configured Oracle* Sun Fire* server3 Intel Xeon E7-4800 Oracle Exalytics* In-Memory Machine, features the Oracle BI Foundation Suite and Oracle TimesTen In-Memory Database for Exalytics http://gigaom.com/cloud/ciscos-servers-now-tuned-for-hadoop/ http://www.businesswire.com/news/home/20110804005376/en/Dell-Cloudera-Collaborate-Enable-Large-Scale-Data 3 19 http://www.itp.net/mobile/588145-oracle-unveils-exalytics-in-memory-machine INTEL CONFIDENTIAL 2
  • 20. Solution 4.0 – NGS Appliances 16 Cores 96 GB RAM 18T Red. Storage SSD for OS 32 Cores 1.2 TFlops 18-56TB RAID NSS-HA Pair NSS User Data HSS Metadata Pair HSS OSS Pair HSS User Data 2U Plenum Actual placement in racks may vary. Scale through independent solutions, each targeting a different segment & usage model 20 Intel Confidential may be claimed as the property of others *Other names and brands
  • 21. NGS Appliance Dell Scalable Unit “SANGER” Infrastructure: Dell PE, PC & F10 NSS-HA Pair NSS User Data Dell NSS (NFS) (up to 180TB) Challenge: Experiment processing takes 7 days with current infrastructure. Delays treatment for sick patients Solution: Dell Next Generation Sequencing Appliance • • HSS Metadata Pair HSS OSS Pair Dell HSS (Lustre) (up to 360TB) 9 Teraflops of Sandy Bridge Processors • Lustre File Storage • Intel SW tools and engineers Benefits: RNA-Seq processing reduced to 4 hour HSS User Data M420 (Compute) (up to 32 nodes) 2U Plenum 21 Single Rack Solution *Other names and racks may vary. Actual placement in brands may be claimed as the property of others Includes everything you need for NGS compute, storage, software, networking, infra structure, installation, deployment, training, service & support
  • 22. 22 *Other names and brands may be claimed as the property of others
  • 23. Use Case: NEXTBIO Analytics for Genomics Data • Cost to sequence a Genome has fallen by 800x in the last 4 years • Each Genome has ~4 million variants • Growth in the genomics data in the public and private domain • Data available in variety of sources – • Structured, semi-structured, Un-structured New aggregated data growing exponentially Sequencing 3 Billion base Pairs 23 Data Processing Cloud Storage Visualization Millions of variants *Other names and brands may be claimed as the property of others Interpretation & Analytics Millions of Variants Millions of Patients Commercializing Targeted Therapeutics Companion Diagnostics Actionable Biomarkers
  • 24. Data-Intensive Discovery: Genomics Value Enable researchers to discover biomarkers and drug targets by correlating genomic data sets 90% gain in throughput; 6X data compression Analytics Provide curated data sets with pre-computed analysis (classification, correlation, biomarkers) Provide APIs for applications to combine and analyze public and private data sets Data Management Use Hive and Hadoop for query and search Dynamically partition and scale Hbase 10-node cluster / Intel Xeon E5 processors 10GbE network 24 *Other names and brands may be claimed as the property of others Intel Distribution
  • 25. Use Case: NEXTBIO Nextbio & Intel Collaboration Technical Challenge: Immutable Data – write once, change, read many times never Traditional Bloom Filters works Hadoop & HBase well suited 1 Genome  10 Million rows 100 Genomes  1Billion rows 1M Genomes  10 Trillion rows 100M Genomes  1 Quadrillion 1,000,000,000,000,000 rows App can dynamically partitions HBase as data size grows Intel Optimizations for Hadoop: Optimized Hadoop stack in Open Source Stabilize HBase to provide reliable scalable 25 deployment *Other names and brands may be claimed as the property of others
  • 26. Putting it together .. Software Frameworks Applications Programming Model (abstraction) Virtualization System Software and Resource Management Computer Hardware, Storage and Networks
  • 27. Summary • Enabling ecosystem of partners to innovate and make Personalized Medicine vision a reality • Delivering hardware-enhanced capabilities and software to deploy Personalized Medicine • Work with Big Data Vendors to onboard increasing number of life science workloads to Hadoop and other analytics technologies

Notes de l'éditeur

  1. Our main building blocks consist of: Server. The Xeon family of processors consists of the E3, E5 and E7 product lines which offer different combinations of capabilities and price points for different workloads. The upcoming Intel MIC (Many Integrated Core) processor is targeted primarily at the portion of the HPC market that values maximum parallel processing density such as…. And our Atom line aims at the low-cost, low-power, ultra dense microserver market where node density is paramount. Networking. Intel is the Industry’s #1 selling 1GbE and 10GbE adapters and silicon and also offers a family of industry leading, low latency 10GbE/40GbE switch silicon productsStorage: one of the biggest trend in storage is the increasing use of compute within the storage box to reduce latencies and also provide lower overall cost/GB of storage thru more efficient storage. For large data sets and those storage workloads requiring the lowest latencies Xeon is the industry choice. Xeon provides the compute capability in over 80% of the storage market. And Intel enterprise SSD’s are designed for the demanding performance and endurance needs of the datacenterSoftware and other technologies: We are developing strong open-source components such as our Intel Distribution of Hadoop. Intel Datacenter Manager enables better power management at the server, rack and datacenter level. Advanced RAS (reliability, availability and serviceability) features ensure high levels of system resiliency and availability. And Intel’s heavy investment in industry enabling ensures these come available in the widest choice of systems. The most popular are general purpose systems, but many of our partners innovate further to create highly workload-optimized platforms and converged architecture systems. The greater level of bundling and integration in these systems allows for simpler and faster deployments and ongoing maintenance.Now lets look at the specific building blocks….
  2. Field note:  There are few hyperlinks on this presentation in the blue boxes.  The first link in E5 leads to a solution showing a 25x increase in data analytics running on Intel architecture, which shows the capability of the new Xeon E5 processor family, using AVX technology and a variety of other performance optimizations from IBM. The second link in E5 will lead to a solution brief highlighting how Intel® Xeon® E5 processor based servers running Hadoop are at least three times faster than previous solution. They can load, sort, and perform their data analyses faster, and Intel® Hyper-Threading Technology really helps with Hadoop workloads The link in E7 proof point is focused on a scale-up in-memory analytics solution, SAP HANA, running on Intel’s Xeon E7 processor family.  All these proof points help the customer understand the power and variability of our processor solutions for Big Data.Key points:Significant performance gains delivered by featuressuch as new Intel® Advanced Vector Extensions and improved Intel® Turbo Boost Technology 2.0To improve flexibility and operational efficiency significant improvements in I/O with new Intel® Integrated I/O which reduces latency ~30% will adding more lanes and higher bandwidth with support for PCI Express 3.0Story:To meet the growing demands of IT such as readiness for cloud computing, the growth in users and the ability to tackle the most complex technical problems, Intel has focused on increasing the capabilities of the processor that lies at the heart of a next generation data center. The Intel Xeon processor E5-2600 product family is the next generation Xeon processor that replaces Platforms based on the Intel Xeon processor 5600 & 5500 series. Continuing to build on the success of the Xeon 5600, the E5-2600 product family has increased core count and cache size in addition to supporting more efficient instructions with Intel® Advance Vector Extensions, to deliver up to an average of 80% more performance across a range of workloads. These processors will offer better than ever performance no matter what your constraint is – floor space, power or budget – and on workloads that range from the most complicated scientific exploration to simple, yet crucial, web serving and infrastructure applications. In addition to the raw performance gains, we’ve invested in improved I/O with Intel Integrated I/O which reduces latency ~30% will adding more lanes and higher bandwidth with support for PCIe 3.0. This helps to reduce network and storage bottlenecks to unleash the performance capabilities of the latest Xeon processor. The Intel® Xeon® processor E5-2600 product family – versatile processers at the heart of today’s data center. Let’s look at just what kind of performance that these products are capable of…Legal Info:Configuration for 80% claim:Source: Performance comparison using best submitted/published 2-socket server results on the SPECfp*_rate_base2006 benchmark as of 6 March 2012. Baseline score of 271 published by Itautec on the ServidorItautec MX203* and ServidorItautec MX223* platforms based on the prior generation Intel® Xeon® processor X5690. New score of 492 submitted for publication by Dell on the PowerEdge T620 platform and Fujitsu on the PRIMERGY RX300 S7* platform based on the Intel® Xeon® processor E5-2690. For additional details, please visit www.spec.org.Intel does not control or audit the design or implementation of third party benchmark data or Web sites referenced in this document. Intel encourages all of its customers to visit the referenced Web sites or others where similar performance benchmark data are reported and confirm whether the referenced benchmark data are accurate and reflect performance of systems available for purchase.Configuration for latency reduction: Source: Intel internal measurements of average time for an I/O device read to local system memory under idle conditions comparing the Intel® Xeon® processor E5-2600 product family (230 ns) vs.. the Intel® Xeon® processor 5500 series (340 ns). Baseline Configuration: Green City system with two Intel® Xeon® processor E5520 (2.26GHz, 4C), 12GB memory @ 1333, C-States Disabled, Turbo Disabled, SMT Disabled. New Configuration: Meridian system with two Intel® Xeon® processor E5-2665 (2.4GHz, 8C), 32GB memory @1600 MHz, C-States Enabled, Turbo Enabled. The measurements were taken with a LeCroy* PCIe* protocol analyzer using Intel internal Rubicon (PCIe* 2.0) and Florin (PCIe* 3.0) test cards running under Windows* 2008 R2 w/SP1.
  3. Field note: There is a link to a proof point on this slide. Intel IT has a whitepaper on the performance benefits of 10GbE on Apache Hadoop. This whitepaper is at our Intel IT Resource Center, which is useful in many ways for your customer. We would recommend pointing the customer to this site for answers to a variety of questions and configurations.Up to 20x performance boost over legacy infrastructure with optimizations on Intel® Xeon processors, SSD storage, and 10GbE networking 10 Gigabit Ethernet (GbE) networks allow you to quickly import large data sets for processing in multiple locationsNetwork: 10 Gigabit Ethernet (10GbE) networking demonstrates its value in the form of high levels of network utilization in the Hadoop cluster. The full use of greater bandwidth can reduce time to ingest and to export data by 80 percent. Moreover, the cost per gigabit of bandwidth with 10GbE is now much lower than 1GbE, making it a natural choice for big data.Much of the performance gain from the underlying hardware requires deep optimization in the software as well as careful tuning of Hadoop configuration parameters. The Intel Distribution is optimized with the latest Intel® processor, storage, and networking hardware components to ensure that the platform delivers balanced performance for the widest range of use cases. The Need for a Balanced System Hadoop is designed and optimized for commonly available hardware. The pace of server innovation has continued unabated for many years, and mainstream systems now deliver massive processing power. To keep pace with that capability, it is vital to deploy Hadoop in the environment it was designed for, one that is balanced between compute, storage, and networking.Hadoop* is increasingly popular for processing big data. Dramatic improvements in mainstream compute and storage resources help make Hadoop clusters viable for most organizations. But to provide a balanced system, those building blocks must be complemented by 10 Gigabit Ethernet (10GbE), rather than legacy Gigabit Ethernet (GbE) networking. This study found success by building on a 10GBASE-T foundation that combines Arista switches, Intel® Ethernet 10 Gigabit Converged Network Adapters, and Intel® Xeon® processor based servers. In the area of networking for this balanced system, the performance of Gigabit Ethernet (GbE) implementations for Hadoop has been a major limiting factor to overall performance. Using the large block size means that, forexample, when a packet is dropped and retransmitted, the system needs to handle a large piece of data, which strains network bandwidth in a GbE environment. 10 Gigabit Ethernet (10GbE) networking proves its value in Hadoop clusters through high observed levels of network utilization, demonstrating the benefit of the higher bandwidth.4x Increase in Write PerformanceHadoop* PUT operation completed in 80 percent less time using 10 Gigabit Ethernet, compared to Gigabit Ethernet
  4. Field note: There is a link to Intel CAS throughput performance data that is in the backup of this presentation.Field note: There is a link to a proof point for performance of an SSD on Oracle TimesTen using Intel SSDs. This is a useful whitepaper that shows how adding SSDs to a system configuration saves in both hardware acquisition and software license costs that pay many times over for the initial investment.There are a variety of new opportunities for solid state disk technologies in the enterprise, and this is enhanced by our new Intel CAS software.Intel Solid State drives come in a variety of form factors, and have enterprise-class levels of reliability along with capacities that are near those of fast rotating media. They can be used as a direct replacement for rotating media. For high-performance needs in the datacenter, Intel SSDs are a great solution that will likely pay for themselves in a short time. We have a pointer to an example that uses Oracle TimesTen if you’re interested in further information or examples.For some applications, adding the Intel Cache Acceleration Software (Intel CAS) solution enables an SSD to act as a local buffer for data on rotating media in the server. This enables you to add in a minimum of cost and get performance at near-SSD levels for all your data, which is a good hybrid solution for cost-conscious deployments. We can look at the performance data in backup if you’re interested.
  5. Key Message: Whatever the solution, Intel is actively working with partners to optimize solutions for analyzing the huge variety of data, providing new insight models, and delivering real-time or near real-time information services.Intel is at the Core of the Big Data across provisioning models and in understanding the right data methods for the right data structure. In the last 24 months there has been abundant innovation on the DB product market than at any time in the last 10 years. While locality and distribution of compute, storage and IO platforms many vary. Intel has been actively working to optimize its technology portfolio within relational, emerging technologies and in the Analytical Engines that are commercially available
  6. While Intel has started doing work in the area of Big Data with a distribution of Apache Hadoop, you should not assume that this will be the only thing we plan to do. It’s useful to look at what we’re doing and understand the type of capability we can bring to your company with our optimized tools.We are currently focusing our IDH efforts at adding key functionality that we can uniquely provide. For instance, we have added AES-NI support to the distribution, which makes encryption of the data set up to 20x faster. In other words, you have the capability to encrypt your data “for free” in terms of performance, making your data secure without penalty.We are also using our Intel CAS software to optimize data acquisition, and we are adding a variety of other features. Many of these features will be checked back into the Apache open source, providing benefit. If you have interest in understanding our Hadoop roadmap, we would be happy to set up a more detailed meeting with our team to give you details.Note to field: There is an additional slide in the backup for the Intel Lustre file system distribution for another example of where Intel is contributing to Big Data, specifically in the area of open-source file systems for better performance.Intel Tools for Apache Hadoop – Getting under the Hood of Hadoop for tuning & insightHiTune: monitors key performance metrics on each server in cluster, then aggregates/correlates these low-level indicators w/high-level data flow models – providing insight into performance bottlenecks, hw problems, application hot spots and more.HiBench: Measure, validate & compare performance of Hadoop clusters across a variety of workloads. Cluster performance can be measured for specific/common tasks such as sorting, word counting, web searching and data analytics.Distributed Hadoop environments can be challenging to fine-tune because of the way the framework handles data partitioning, load balancing, fault tolerance, and other low-level operations that Hadoop structures automatically. Intel recently introduced two open-source tools—HiBench and HiTune—to help optimize Hadoop clusters for faster analytics.
  7. Many (most) applications are single threaded, single address spaceMany (most) applications are written for a single address space.NGS-size data quickly pushes 1) and 2) beyond the capacity of a single nodeNeed multiple threads, A large memory footprint Some algorithms (SW as an example) scale quadratically with the size of the problemMotivating algorithmic substitution or hardware accelerationCloud - Building in house means capital equipment investment, DC operating costs, and fixed capacity for growing workloads Building in the cloud offers elastic hourly capacity expansion, but brings challenge around management, ease of use, and data movement How best to leverage cloud resources in HPC business process? As a service – Working subsets are growing too large to fit into available memoryMapping/aligning with BW and assembly with De Bruijn are good examplesMotivating algorithmic innovations and novel approaches to large memory computers. The amount of data barely fits into currently available disk space. (And soon might not ) Databases are distributed and will likely stay that wayMotivating much talk of “bringing the computing to the data”Of preprocessing for downstream upload, etc.
  8. Cisco* UCS Server1 Intel® Xeon® 5600Dell*PowerEdge* C Series2 Intel Xeon 5500/5600The Dell | Cloudera* solution for Apache*Hadoop combines Dell servers and networking components with Cloudera’s Distribution Including Apache Hadoop (CDH), as well as management tools, training, technology support and professional services, to give customers a single source to deploy, manage, and scale a comprehensive Apache Hadoop-based stackOracle* Sun Fire* server3 Intel Xeon E7-4800Oracle Exalytics* In-Memory Machine, features the Oracle BI Foundation Suite and Oracle TimesTen In-Memory Database for Exalytics, enhanced for an Oracle server designed for in-memory analytics. Contains 1 Terabyte of RAM, 40 Gb/s InfiniBand and 10 Gb/s Ethernet connectivity, and Integrated Lights Out Management.
  9. IMS Demo Unit Provided to BioTeam configured with:3 blades each with dual 5650 CPUs and 24GB of RAM & 4 GbE NICsDual Ethernet Switches7 x 600GB Intel 320 Series SSD drivesTurnkey solutionMiniLIMS + Local Analysis EnginePlan is to link to cloud resources: automatic backup & link to hosted MiniLIMSWill ship with Ion Torrent initallySolution for any lab needing LIMS
  10. Cost to soon reach $1000 to sequence the full Genomehttp://www.youtube.com/watch?v=F27BvqqNcY4