Open source Apache Hadoop is a great framework for distributed processing of large data sets. But there’s a difference between “playing” with big data versus solving real problems. The reality is that Hadoop alone is not enough. In fact, almost every organization that plans to use Hadoop for production use quickly discovers that it lacks the required features for enterprise use. And, fewer still have the Hadoop specialists on hand to navigate through the complexity to build reliable, robust applications. As a result, many Hadoop projects never make it to production as executives say, “we just don’t have the skills.” In this session, we will discuss these enterprise capabilities and why they’re important: analytics, visualization, security, enterprise integration, developer/admin tools, and more. Additionally, we will share several real-world client examples who have found it necessary to use an enterprise-grade Hadoop platform to tackle some of the most interesting and challenging business problems.
2. Safe area – no graphics here
Safe area – no graphics hereSafearea–nographicshere
Safearea–nographicshere
Big Data is the next Natural Resource
Harvesting any resource requires Mining, Refining and Delivering
Big Dataisthenext
Natural Resource
“We have for the first time an economy
based on a key resource (Information)
that is not only renewable, but self-generating.
Running out of it is not a problem, but
drowning in it is.”
— John Naisbitt
Cost efficiently
processing the
growing Volume
300x
20202005 Source: IDC
Responding to the
increasing Velocity
19 Billion
RFID
sensors and
counting
Source: RFID Forecasts
Responding to the
increasing Velocity
19 Billion
RFID
sensors and
counting
Source: RFID Forecasts
Collectively
analyzing the
broadening Variety
Source: IBM Market Information
80% of the
world’s data
is unstructured
Collectively
analyzing the
broadening Variety
Source: IBM Market Information
80% of the
world’s data
is unstructured
Establishing the
Veracity of big
data sources
1 in 3 business leaders don’t trust
the information they use to make
decisions
Source: IBM. BAO for the Intelligent Enterprise
Establishing the
Veracity of big
data sources
1 in 3 business leaders don’t trust
the information they use to make
decisions
Source: IBM. BAO for the Intelligent Enterprise
40 ZB
3. 24 hour
earlier detection of infections
You could detect a neonatal
infections sooner?
What if…
Big Data enabled doctors from University of Ontario to apply neonatal infant
monitoring to predict infection in ICU 24 hours in advance
120 children monitored :120K message
per sec, billion messages per day
Solution
9. Emerging Pattern of Big Data Implementation
Ingest
Landing and Analytics Sandbox Zone
Indexes,
facets
Hive/HBase
Col Stores
Documents
In Variety
of Formats
Analytics
MapReduce
Repository, Workbench
Ingestion and Real-time Analytic Zone
Data
Sinks
Filter, Transform
Ingest
Correlate, Classify
Extract, Annotate
Warehousing Zone
Enterprise
Warehouse
Data Marts
Query
Engines
Cubes
Descriptive,
Predictive
Models
Models
Widgets
Discovery,
Visualizer
Search
Analytics and
Reporting Zone
Metadata and Governance Zone
9
Connectors
10. Big Data Exploration
Find, visualize, understand
all big data to improve
decision making
Enhanced 360o
View
of the Customer
Extend existing customer
views (MDM, CRM, etc) by
incorporating additional
internal and external
information sources
Operations Analysis
Analyze a variety of machine
data for improved business results
Data Warehouse Augmentation
Integrate big data and data warehouse
capabilities to increase operational efficiency
Security/Intelligence
Extension
Lower risk, detect fraud
and monitor cyber security
in real-time
The 5 Key Use Cases
11. Cloud | Mobile | Security
Big Data Platform and Application Framework
Gather, extract and
explore data using
best of breed
visualization
Speed time to value
with analytic and
application
acceleratorsBI /
Reporting
BI /
Reporting
Exploration /
Visualization
Functional
App
Industry
App
Predictive
Analytics
Content
Analytics
Analytic Applications
IBM Big Data Platform
Systems
Management
Applications &
Development
Visualization
& Discovery
Analyze streaming
data and large data
bursts for real-time
insights
Govern data quality
and manage
information lifecycle
Cost-effectively
analyze
petabytes of
structured and
unstructured
information
Deliver deep insight
with advanced
in-database analytics
and operational
analytics
Accelerators
Information Integration & Governance
Hadoop
System
Stream
Computing
Data
Warehouse
Contextual
Discovery
Index and federated
discovery for
contextual
collaborative insights
University of Ontario Institute of Technology http://www.youtube.com/watch?v=YosyLqbCrD4 ftp://public.dhe.ibm.com/common/ssi/ecm/en/odc03157usen/ODC03157USEN.PDF [UOIT Case study] Fifteen million babies are born prematurely every year. Of those, over 1 million die, often in the first month of life. Many of these babies are in ICUs, connected to numerous monitors that measure key statistics such as heart rates, temperature, etc. Until recently, these measurements were only sampled and aggregated into 2-3 readings to indicate the health of the baby. IBM collaborated with UOIT to develop a solution that processes 1000 pieces of information/sec … identifies patterns …correlates this with doctor’s notes and family history… applies predictive analytics … and this has allowed us to spot the onset of an infection 24 hours in advance. Same data … but saved lives. ----------------------------------------------------- University of Ontario Institute of Technology http://www.youtube.com/watch?v=YosyLqbCrD4 ftp://public.dhe.ibm.com/common/ssi/ecm/en/odc03157usen/ODC03157USEN.PDF To better detect subtle warning signs of complications, clinicians need to gain greater insight into the moment-by-moment condition of n eonatal infants in a ICU . Fifteen million babies, one in 10 births, are born prematurely every year, a global project suggests led by the WHO. Of those over 1 million die, often in the first 30 days of life – a terrible tragedy. Yet, many of these babies are in NICUs, connected to all sorts of monitors that measure key statistics such as their heart rates, skin temperature, respiration, etc. These measurements add up to 90M/patient/day, yet most of this data is just sampled periodically and written into the patient record, not used for its predictive value. IBM and UOIT developed first-of-its-kind, analytics solution using stream-computing to capture and analyze real-time data from medical monitors, alerting hospital staff to potential health problems before patients manifest clinical signs of infection or other issues. Early warning gives caregivers the ability to proactively deal with potential complications—such as detecting infections in premature infants up to 24 hours before they exhibit symptoms. Solution monitors 120 children analyzing 120K message per second, billions of messages per day. Trials expanding beyond Canada to include hospitals in US, China and Australia. IBM Innovate 2013 07/10/13 16:10 Drury Design Dynamics