Contenu connexe
Similaire à EDF2012 Wolfgang Nimfuehr - Bringing Big Data to the Enterprise
Similaire à EDF2012 Wolfgang Nimfuehr - Bringing Big Data to the Enterprise (20)
Plus de European Data Forum
Plus de European Data Forum (20)
EDF2012 Wolfgang Nimfuehr - Bringing Big Data to the Enterprise
- 1. Bringing Big Data to the Enterprise
Dipl.Ing.W olfgang Nimfuehr
Information Agenda Executive Consultant
Big Data Tiger Team
IBM Software Group Europe
7 June 2012
wolfgang.nimfuehr@at.ibm.com
© 2012 IBM Corporation
- 2. Legal Disclaimer
© IBM Corporation 2012. All Rights Reserved.
The information contained in this publication is provided for informational purposes only. While efforts were made to verify
the completeness and accuracy of the information contained in this publication, it is provided AS IS without warranty
of any kind, express or implied. In addition, this information is based on IBM’s current product plans and strategy,
which are subject to change by IBM without notice. IBM shall not be responsible for any damages arising out of the
use of, or otherwise related to, this publication or any other materials. Nothing contained in this publication is intended
to, nor shall have the effect of, creating any warranties or representations from IBM or its suppliers or licensors, or
altering the terms and conditions of the applicable license agreement governing the use of IBM software.
References in this presentation to IBM products, programs, or services do not imply that they will be available in all
countries in which IBM operates. Product release dates and/or capabilities referenced in this presentation may
change at any time at IBM’s sole discretion based on market opportunities or other factors, and are not intended to be
a commitment to future product or feature availability in any way. Nothing contained in these materials is intended to,
nor shall have the effect of, stating or implying that any activities undertaken by you will result in any specific sales,
revenue growth or other results.
Information regarding potential future products is intended to outline our general product direction and it should not b e
relied on in making a purchasing decision. The information mentioned regarding potential future products is not a
commitment, promise, or legal ob ligation to deliver any material, code or functionality. Information about potential
future products may not b e incorporated into any contract. The development, release, and timing of any future
features or functionality described for our products remains at our sole discretion.
2 © 2012 IBM Corporation
- 3. The Information Explosion in Data and Real World Events
44x
as much Data and Content
2020
35 zettabytes
Business leaders frequently
Over Coming Decade 1 in3 make decisions based on
information they don’t trust, or
don’t ha ve
2009
800,000 petabytes
1 in2 Business leaders say they don’t
have access to the information
they need to do their jobs
80% of CIOs cited “Business
Of world’s data
is unstructured
83% intelligence and analytics” as
part of their visionary plans
to enhance competitiveness
of CEOs need to do a better job
60% capturing and understanding
information rapidly in order to
make swift business decisions
Organizations Need Deeper Insights
3
3 © 2012 IBM Corporation
- 4. Challenge
Study a Large Volume and Variety of Data to Find New Insights
Multi-channel customer
sentiment and experience a
analysis
Support medical diagnostics
Detect life-threatening
conditions
Predict weather patterns to plan
optimal wind turbine usage, and
optimize capital expenditure on
asset placement
Make risk decisions and frauds
detection based on real-time
transactional data
Identify criminals and threats
from disparate video, audio,
and data feeds
4 © 2012 IBM Corporation
- 5. Leveraging Big Data Analytics can improve Experience
… Client Mgr Data Scientist Dashboards Call Center …
Information Management Capabilities Natural
Language
External Data Internal Data
• Web Logs • Relationship / risk • Event triggers
• Twitter feeds data • Customer Profitability
• Facebook chats • Product analysis
• YouTube Video profitability data • Complaint Data
• Blogs/Posting Big Data • Email • Voice to Te xt Data
• Appraisal data Analytics correspondents • Transactional data
• Company website • Policy & Procedure
• Credit bureau data Hub
logs data
5 © 2012 IBM Corporation
- 6. On Feb 16 2011 the IBM Watson system won Jeopardy!
Can we design a computing system that rivals a human’s ability to answer
questions posed in natural language, interpreting meaning and context and
retrieving, analyzing and understanding vast amounts of information in real-time?
6 © 2012 IBM Corporation
- 7. IBM Watson‘s project started 2007
• Project started in 2007, lead David Ferrucci
• Initial goal: create a system able to process
natural language & extract knowledge faster
than any other computer or human
• Jeopardy! was chosen because it’s a huge “IBM is not in the entertainment
challenge for a computer to find the questions business. But we are in the business of
to such “human” answers under time pressure technology and pushing frontiers.”
David Shepler, IBM Research Program Manager
• Watson was NOT online!
• Watson weighs the probability of his answer
being right – doesn’t ring the buzzer if he’s not
confident enough
• Which questions Watson got wrong almost as
interesting as which he got right!
7 © 2012 IBM Corporation
- 8. Different Types of Evidence: Keyword Evidence
In May 1898 Portugal celebrated In May, Gary arrived in
the 400th anniversary of this India after he celebrated his
explorer’s arrival in India. anniversary in Portugal.
arrived in
celebrated Keyword Matching
Keyword Matching celebrated
In May Keyword Matching
Keyword Matching In May
1898
Evidence
400th Keyword Matching anniversary
suggests “Gary” anniversary
Keyword Matching
is the answer
BUT the system Portugal Keyword Matching
Keyword Matching in Portugal
must learn that
keyword arrival in
matching may
be weak relative India Keyword Matching
Keyword Matching India
to other types of
evidence
explorer Gary
8 © 2012 IBM Corporation
- 9. Different Types of Evidence: Deeper Evidence
In May 1898 Portugal celebrated On 27th May 1498, Vasco da Gama
On 27th May 1498, Vasco da Gama
On 27th May 1498, Vasco da Gama
the 400th anniversary of this On landedin Kappad Beach Vasco da
landed in of May Beach
the in th Kappad 1498,
landed 27Kappad Beach
explorer’s arrival in India. Gama landed in Kappad Beach
Search Far and Wide
Explore many hypotheses
celebrated
Find Judge Evidence
landed in
Portugal Many inference algorithms
Temporal
May 1898 400th anniversary 27th May 1498
Reasoning
Date
Math
arrival Statistical
Stronger in Paraphrasing
Para-
evidence can phras es
GeoSpatial
be much India
Reasoning
Kappad Beach
harder to find Geo-KB
and score. explorer Vasco da Gama
9 The evidence is still not 100% certain. © 2012 IBM Corporation
- 10. DeepQA:
Massively Parallel Probabilistic Evidence-Based Architecture
Question 1000’s of 100,000’s scores from many simultaneous
100s Possible Pieces of Evidence
100s sources Text Analysis Algorithms
Answers
Multiple
Interpretations
Question & Final Confidence
Question Hypothesis Hypothesis and
Topic Synthesis Merging &
Decomposition Generation Evidence Scoring
Analysis Ranking
Hypothesis Hypothesis and Evidence
Generation Scoring
Answer &
Confidence
...
10 © 2012 IBM Corporation
- 11. Maximum Benefit Requires Combining Deep
and Reactive Analytics
Hypotheses Predictions Real time Optimization
100,000 updates/sec,
5 ms/decision
Exa Round-trip automation
Deep Deep 10 PB f or Deep Analytics
Analytics
Peta History
Predictive Analytics
100,000 records/sec, 6B/day
10 ms/decision
6 PB f or Deep Analytics
Feedback
Data Scale
Tera
nio
In
Smart Traffic
ra t
te
250K GPS probes/sec
g
Reality Actions
g
ra
Inte
630K segments/sec
tio
n
Giga 2 ms/decision, 4K vehicles
DeepQA
Fast
Traditional Data 100s GB for Deep Analytics
Mega
Warehouse and 3 sec/decision
1 PB training corpus
Business Integration
Intelligence Observations
Kilo Reactive
yr mo wk day hr min sec … ms µs Analytics
Occasional Frequent Real-time
11 Decision Frequency © 2012 IBM Corporation
- 12. Big Data use cases across all industries
Financial Services Utilities
Fraud detection Weather impact analysis on
Risk management power generation
360° View of the Customer Transmission monitoring
Smart grid management
Transportation IT
Weather and traffic Transition log analysis
impact on logistics and for multiple
fuel consumption transactional systems
Cybersecurity
Health & Life Sciences
Epidemic early warning Retail
system 360° View of the Customer
ICU monitoring Click-stream analysis
Remote healthcare monitoring Real-time promotions
Telecommunications Law Enforcement
CDR processing Real-time multimodal surveillance
Churn prediction Situational awareness
Geomapping / marketing Cyber security detection
Network monitoring
12 © 2012 IBM Corporation
- 13. Monetizing Relationships - not just Transactions
Calling Network
Merged Network
company
Telco
Amy Bearn
32, Married, mother of 3, How v aluable is Amy to my mobile
phone network? How likely is she to
Accountant switch carriers? How many other
Telco Score: 91 customers will f ollow
CPG Score: 76
Fashion Score: 88
Retail
Telco
How v aluable is Amy to my retail
sales? Who does she influence?
Social Network Public What do they spend?
Database
13 © 2012 IBM Corporation
- 14. °
Sample: Big Data 360°Lead Generation
Personal Attributes
Personal Attributes
• Identifiers: name, address, age, gender,
• Identifiers: name, address, age, gender,
occupation…
occupation…
Timely Insights
Timely Insights
• Interests: sports, pets, cuisine… • Intent to buy various products
• Interests: sports, pets, cuisine… • Intent to buy various products
• Life Cycle Status: marital, parental • Current Location
• Life Cycle Status: marital, parental • Current Location
Social Media based • Sentiment on products, services, campaigns
• Sentiment on products, services, campaigns
360-degree • Incidents damaging reputation
• Incidents damaging reputation
Consumer Profiles • Customer satisfaction/attrition
• Customer satisfaction/attrition
Life Events
Life Events
• Life-changing events: relocation, having a
• Life-changing events: relocation, having a
baby, getting married, getting divorced, buying
baby, getting married, getting divorced, buying
a house…
a house…
Products Interests
Products Interests
• Personal preferences of products
• Personal preferences of products
• Product Purchase history
• Product Purchase history
Relationships
Relationships • Suggestions on products & services
• Suggestions on products & services
• Personal relationships: family, friends and
• Personal relationships: family, friends and
roommates…
roommates…
• Business relationships: co-workers and
• Business relationships: co-workers and
work/interest network…
work/interest network…
Monetizable intent to buy products Life Events
I need a new digital camera for my food pictures, any College: Off to Stanford for my MBA! Bbye chicago!
I need a new digital camera for my food pictures, any College: Off to Stanford for my MBA! Bbye chicago!
recommendations around 300?
recommendations around 300?
Looks like we'll be moving to New Orleans sooner than I thought.
What should I buy?? A mini laptop with Windows 7 OR a Apple Looks like we'll be moving to New Orleans sooner than I thought.
What should I buy?? A mini laptop with Windows 7 OR a Apple
MacBook!??!
MacBook!??!
Intent to buy a house
Location announcements I'm thinking about buying a home in Buckingham Estates per a
I'm thinking about buying a home in Buckingham Estates per a
I'm at Starbucks Parque Tezontle http://4sq.com/fYReSj recommendation. Anyone have advice on that area? #atx #austinrealestate
14 at Starbucks Parque Tezontle http://4sq.com/fYReSj
I'm recommendation. Anyone have advice on that area? #atx #austinrealestate
© 2012 IBM Corporation
#austin
#austin
- 15. °
Sample: Big Data 360°Lead Generation
Real-time product
Real-time product
intents enriched with
intents enriched with
consumer attributes
consumer attributes
Entries contain promotional messages,
Entries contain promotional messages,
wishful thinking, questions, etc
wishful thinking, questions, etc
Integration across Social Media sites
Integration across Social Media sites
Micro-segmentation of
Micro-segmentation of
product intents by
product intents by Real-time tracking by
occupation Real-time tracking by
occupation micro-segmentation
micro-segmentation
For many of the attributes we need to extract,
For many of the attributes we need to extract,
cleanse, normalize and categorize
cleanse, normalize and categorize
Micro-segmentation of
Micro-segmentation of
consumers by hobbies
consumers by hobbies
15 © 2012 IBM Corporation
- 16. Sample: Institutional Risk Application
Comprehensive view of publicly traded companies and related
people based on regulatory filings
Extract
Integrate
16 © 2012 IBM Corporation
- 17. Requirements for a Big Data Solution Platform
Analyze a Variety of Information
Novel analytics on a broad set of mixed information that
could not be analyzed before
Multiple relational & non-relational data types and schemas
Analyze Information in Motion
Streaming data analysis
Large volume data bursts & ad-hoc analysis
Analyze Extreme Volumes of Information
Cost-efficiently process and analyze petabytes of information
Manage & analyze high volumes of structured, relational data
Discover & Experiment
Ad-hoc analytics, data discovery &
experimentation
Manage & Plan
Enforce data structure, integrity and control to
ensure consistency for repeatable queries
17 © 2012 IBM Corporation
- 18. IBM Big Data Platform for Ingest, Data and Analytics
Analytic Applications
BI / Exploration / Functional Industry Predictive Content
Reporting Visualization App App Analytics Analytics
New analytic applications drive the
requirements for a big data platform
IBM Big Data Platform
• Integrate and manage the full
variety, velocity and volume of data Visualization Application Systems
& Discovery Development Management
• Apply advanced analytics to
information in its native form
• Visualize all available data for ad- Accelerators
hoc analysis
• Development environment for Hadoop Stream Data
building new analytic applications System Computing Warehouse
• Workload optimization and
scheduling
• Security and Governance
Information Integration & Governance
18 © 2012 IBM Corporation
- 19. Big Data Capabilities
Big Data Challenges IBM Big Data Solutions
• High volume of structured data
• Valuable Information IBM Netezza
Analytic appliance for high
SQL Data
• Compute intensive analytics
speed, advanced analytics on
• Low latency response on queries large structured data sets
• Business Intelligence and Analytics
• Understanding the customer
through segmentation and analysis
• Very high volumes (TBs to PBs) IBM BigInsights
NoSQL Data
unstructured data Hadoop-based processing for
• Exploration and discovery analytics on variety and
• Text, Entity and Social Media volumes of data
Analytics
• Real time processing
Streaming
• Detect failure patterns IBM Streams
• High volume, low latency Low latency analytics for
processing streaming data
• Scoring and decision analytics
19 © 2012 IBM Corporation
- 20. InfoSphere BigInsights
Analytical platform for Big Data at-rest
Based on open source & IBM Analytic Applications
technologies BI / Exploration / Functional Industry Predictiv e Content
Reporting Visualization App App Analytics Analytics
Distinguishing characteristics
• Built-in analytics enhances business IBM Big Data Platform
knowledge
Visualization Application Systems
• Enterprise software integration & Discovery Development Management
complements and extends existing
capabilities
Accelerators
• Production-ready platform with tooling for
analysts, developers, and administrators Hadoop Stream Data
speeds time-to-value and simplifies System Computing Warehouse
development/maintenance
IBM advantage
• Combination of software, hardware,
services and advanced research Information Integration & Governance
20 © 2012 IBM Corporation
- 21. InfoSphere BigInsights
Embrace and Extend Hadoop
Analytics BigSheets Text Analytics ML Analytics *) Interface
Management Console
Application (browser based)
Pig Hive Jaql
Avro
IBM LZO Compression
Zookeeper
MapReduce
AdaptiveMR FLEX BigIndex Developing Tooling
(Eclipse Plug-Ins)
Oozie Lucene
Rest API
Storage HBase (for Applications)
HDFS GPFS-SNC *)
Data Streams Netezza BoardReader R IBM
Sources/ Open Source
Data Stage DB2 CSV/XML/JSON SPSS
Connectors
Flume JDBC Web Crawler *) future release
21 © 2012 IBM Corporation
- 22. BigSheets
A visual tool for data manipulation and prototyping
• Ad-hoc analytics for LOB user
• Analyze a variety of data - unstructured and structured
• Spreadsheet metaphor for exploring/ visualizing data
• Browser-based
22 © 2012 IBM Corporation
- 23. Text Analytics
Turns disparate words into measurable insights
Physically Identify positive or Reporting/Monitoring
assemble data, Part-of-speech negative sentiment, Iterative social commentary,
standardize identification, standard NLP-based classification using combination w /structured
form ats, address and custom ized analytics, define autom ated and data, clustering,
auto-identify extraction dictionaries, variables, m acros m anual techniques. associated concepts,
language, process proper noun and rules. Concept derivation & correlated concepts, auto-
punctuation and identification, concept inclusion, semantic classification of
non-gramm atical categorization, networks and co- documents, sites, posts.
characters, synonyms, exclusions, occurrence rules
standardize m ulti-terms, regular
spelling. expressions, fuzzy-
m atching
Pre-configured text annotators ready for distributed processing on Big Data
Support for native languages including double-byte
23 © 2012 IBM Corporation
- 24. Public wind data is available on 284km x 284
km grids (2.5o LAT/LONG)
More data means more accurate and richer
models (adding hundreds of variables)
- Vestas wind library at 2.5 PB: to grow to over 6
PB in the near-term
- Granularity 27km x 27km grids: driving to 9x9,
3x3 to 10m x 10m simulations
Reduced turbine placement identification
from weeks to hours
Perspective: The Vestas Wind library
24
24 © 2012 IBM Corporation
24
- 25. InfoSphere Streams
Analytical platform for Big Data in-motion
Analytic Applications
BI / Exploration / Functional Industry Predictiv e Content
Reporting Visualization App App Analytics Analytics
Built to analyze data in motion
• Multiple concurrent input streams IBM Big Data Platform
• Massive scalability Visualization Application Systems
& Discovery Development Management
Process and analyze a variety of Accelerators
data
• Structured, unstructured content, video, Hadoop Stream Data
audio System Computing Warehouse
• Advanced analytic operators
Information Integration & Governance
25 © 2012 IBM Corporation
- 26. InfoSphere Streams
Massively Scalable Stream Analytics
Linear Scalability
Deployments
Clustered deployments – unlimited Source Analytic Sync
scalability Adapters Operators Adapters
Automated Deployment
Automatically optimize operator
deployment across clusters Streams Studio IDE
Performance Optimization Automated and
Optimized
JVM Sharing – minimize memory use Deployment
Fuse operators on Streaming Data Streams Runtime
Sources
same cluster
Telco client – 25 Million
Visualization
messages per second
Analytics on Streaming Data
Analytic accelerators for a
variety of data types
Optimized for real-time performance
26 © 2012 IBM Corporation
- 27. University of Ontario Institute of Technology
Use case
– Neonatal infant monitoring
– Predict infection in ICU 24 hours in advance
Solutions
– 120 children monitored :120K msg/sec, billion msg/day
– Trials expanding to include hospitals in US and China
Event Pre- Analysis
processer Framework
Sensor Stream-based Distributed Interoperable Solutions
Network Health care Infrastructure (Applications)
27 © 2012 IBM Corporation
- 28. Without a Big Data Platform You Code…
Over 100 sample applications and toolkits with industry focused
toolkits with 300+ functions and operators
Event Custom SQL
Handling and
Scripts
Multithreading
Check Application
Pointing M anagement Accelerators
Streams provides development, deployment,
HA and
Tool kits
runtime, and infrastructure services
Performance Debug
Connectors
Optimization
Security “TerraEchos developers can deliver applications
45% faster due to the agility of Streams
Processing Language…”
– Alex Philip, CEO and President, TerraEchos
28 © 2012 IBM Corporation
- 29. IBM is Committed to Innovation 2012
IBM Resarch Selected SW Acquisitions
Almaden
Austin
Melbourne
Sao Paulo
Beijing
Haif a
Delhi
Ireland
Y amato
Watson
Zurich
• •$16B+ in acquisitions since 2005
$16B+ in acquisitions since 2005
• •10,000+ technical professionals
10,000+ technical professionals
• •~8000 dedicated consultants
~8000 dedicated consultants
• •27,000+ business partner
27,000+ business partner
certifications
certifications
• •88 Analytics SolutionsCenters
Analytics Solutions Centers
• •100 analytics-based research assets;
100 analytics-based research assets;
almost 300 researchers
almost 300 researchers
“Watson is going to revolutionize many,
many industries and it will fundamentally
change the way we interact with computers
& machines.”
John Kelly, SVP & Head of IBM Research
2005 * TeaLeaf, Varicent Vivismo pending acquisition close
29 © 2012 IBM Corporation
- 30. Making Learning Easy and Fun
bigdatauniversity.com/
ibm.com/software/data/bigdata/
ibm.com/software/data/infosphere/biginsights/ youtube.com/user/ibmbigdata
30 © 2012 IBM Corporation
- 31. Questions & Answers
Dipl.Ing. IBM Austria
Wolfgang Nimführ Obere Donaustrass e 95
A1020 Vienna
Information Agenda
Executive Consultant
Tel +43-664-618-5389
Big Data Tiger Team
wolfgang.nimfuehr@at.ibm.com
IBM Software Group Europe
31 © 2012 IBM Corporation