Contenu connexe Similaire à Cognitive computing big_data_statistical_analytics (20) Cognitive computing big_data_statistical_analytics1. December 2013
Some Smarter Analytics A Talk with students of
Innovation Trends University of Bari (Italy) –
Computing Science Department
Cognitive Computing, Big Data e Knowledge Bases and Data
Statistical Analytics Mining (Basi di Conoscenza e
Data Mining) Course
Pietro Leo
IBM GBS Executive Architect – Member of IBM Academy of Technology Leadership Team
@pieroleo www.linkedin.com/in/pieroleo
© 2012 IBM Corporation
2. December 2012
My Personal IT Mind-Map
Data Models World Instrumentation eBusiness
Services/Legacy Applications
Enterprise Data Pervasive Computing
Storage
(IMS, DBMS, Portals – Webization
Internet of Things
Etc,)
Big Data Social-ization App-ization
(structured & unstructured)
Virtualization
Web-App-ization
Cloud Cloud Services IT Consumerization/BYOD
Computing
Cognitive
Workload- Computing
Optimizied
Business Analytics Mobile Computing
Parallel
Computing Optimization
Data Warehousing /
Computing Models, Business Intelligence Social Business &
Architectures & Styles Analytics - Information-based Intelligence Mobility
= Conceptual connection, Evolution Path, Cause-Effect, etc. @pieroleo www.linkedin.com/in/pieroleo
© 2012 IBM Corporation
3. December 2012
Agenda
Research Overview and Grand Challenges
1 Cognitive Systems Era
Data Centric ← Beyond Big Data
Statistical Analytics ← Beyond Machine Learning
2 Cognitive Systems Strategic challenges for Our Organizations
3 Statistical Analytics Strategy
4 Examples of Statistical Analytics Problems & Benefits
@pieroleo www.linkedin.com/in/pieroleo
© 2012 IBM Corporation
4. December 2012
IBM - Continually Looking Forward
C-suite Studies
Executive Exchange: http://www-935.ibm.com/services/c-suite/insights/index.html
IBM Institute for IBM Global
Business Value Technology Outlook
Smarter Planet
@pieroleo www.linkedin.com/in/pieroleo
© 2012 IBM Corporation
5. December 2012
Nothing Is Changing More than IT …
The way The way The way
it’s accessed it’s applied it’s architected
Integrated
ubiquitously for insight
and flexible
@pieroleo www.linkedin.com/in/pieroleo
© 2012 IBM Corporation
6. December 2012
Grand Challenges are the trigger of new changes…
IBM Is Founded The IBM Punched Card RAMAC FORTRAN IBM 1401: The
Mainframe
1911 1920 1954 1957 1959
Magnetic Stripe Universal Product Code The PC Scanning Tunneling
Technology (UPC) barcode Microscope
1969 1973 1981 1986
Optimizing the Food Chain The Globally
e-business Linux Integrated Enterprise
1988 1990s 2000 2006
Breaking the Petaflop
The DNA Transistor Smarter Planet A Computer Called Watson
Barrier
2008 2009 2008 @pieroleo 2011
www.linkedin.com/in/pieroleo
© 2012 IBM Corporation
7. December 2012
Ultimately Leading to
Tremendous New Value
Provide New Types of Insights
@pieroleo www.linkedin.com/in/pieroleo
© 2012 IBM Corporation
8. December 2012
Agenda
1 Cognitive Systems Era
Data Centric ← Beyond Big Data
Statistical Analytics ← Beyond Machine Learning
2 Cognitive Systems Strategic challenges for Our Organizations
3 Statistical Analytics Strategy
4 Examples of Statistical Analytics Problems & Benefits
@pieroleo www.linkedin.com/in/pieroleo
© 2012 IBM Corporation
9. December 2012
Eras of computing
Cognitive
Systems Era
Programmable
Computer Intelligence
Systems Era
Tabulating
Systems Era
Time Time
@pieroleo www.linkedin.com/in/pieroleo
© 2012 IBM Corporation
10. December 2012
Cognitive Systems Cognitive
Systems Era
1. Data-centric
Programmable 2. Statistical analytics
Systems Era 3. Scale in
4. Automated systems/
1. Processor-centric workload managemen
2. Fixed calculation
3. Scale up/out
4. Manual systems Cognitive
management Systems Era
Programmabl
e Systems Era
@pieroleo www.linkedin.com/in/pieroleo
© 2012 IBM Corporation
11. December 2012
Cognitive Systems Cognitive
Systems Era
1. Data-centric
Programmable 2. Statistical analytics
Systems Era 3. Scale in
4. Automated systems/
1. Processor-centric workload managemen
2. Fixed calculation
3. Scale up/out
4. Manual systems Cognitive
management Systems Era
Programmabl
e Systems Era
@pieroleo www.linkedin.com/in/pieroleo
© 2012 IBM Corporation
12. December 2012
Data-Centric: Big Data this is just the beginning
Cognitive
Systems Era
Programmable
Systems Era
Computer Intelligence
Percentage of uncertain data
Tabulating Percentage of uncertain data
Systems
Era
Time
@pieroleo www.linkedin.com/in/pieroleo
© 2012 IBM Corporation
13. December 2012
Data-centric models are driving us to a new era of
computing
Volume Variety
Structured, Semi-
Terabytes to exabytes of
structured Unstructured,
existing data <20% Content
Data text & multimedia
to process >80%
Traditional
Velocity Enterprise Data Veracity
Streaming data, Social
Data from and about People Uncertainty from
milliseconds to seconds to inconsistency,
respond ambiguities, etc.
Physical
Sensors & Streams
@pieroleo www.linkedin.com/in/pieroleo
13 Nove © 2012 IBM Corporation
14. Big data is a business priority – inspiring new models and
processes for organizations, and even entire industries
14 | ©2012 IBM Corporation
15. December 2012
Statistical analytics: Develop tools that augment human intelligence and
productivity
Cognitive
Systems Era
Programmable
Systems Era
Computer Intelligence
Tabulating
Systems
Era Information-based
Intelligence
The Singularity!
Kurzweil > 2045: The
Year Man Becomes
Artificial Intelligence Immortal
Strong Approach
Surpass Humans in Intelligence
Time @pieroleo www.linkedin.com/in/pieroleo
© 2012 IBM Corporation
16. December 2012
Information-based Intelligence
Approach
Statistical, brute force approach based on analyzing
Strong Approach vast amounts of information using powerful computers
Early efforts approached AI based on programming and sophisticated algorithms
logic, reasoning, planning, learning
A number of government supported academic efforts
Scales very nicely: the more information you have, the
in the 1960s and 1970s, primarily in the US (MIT, more powerful the computer, the more sophisticated
Stanford, etc) and UK. Many felt that problem was the analytical algorithms . . . the better the results
speed of machines - therefore machines would catch
up with human intelligence within a generation based
on advances in technology Data & Knowledge Integration more insights you
have, more methods and approaches you have, more
Fifth Generation Project: Major Japanese effort in
1980s to leap ahead of US in computer development longitudianlabilities you have to generato point of views
by creating new generation of intelligent, reasoning … more effective will be the final result
machines
All these efforts failed. Grossly underestimated Originated in science, especially high energy physics
difficulty of developing machines exhibiting human
intelligence Statistical
Data mining (mainly from 1990s)
Analytics
Deep Blue (1997)
Watson (2011) @pieroleo www.linkedin.com/in/pieroleo
© 2012 IBM Corporation
17. December 2012
Agenda
1 Cognitive Systems Era
Data Centric ← Beyond Big Data
Statistical Analytics ← Beyond Machine Learning
2 Cognitive Systems Strategic challenges for Our Organizations
3 Statistical Analytics Strategy
4 Examples of Statistical Analytics Problems & Benefits
@pieroleo www.linkedin.com/in/pieroleo
© 2012 IBM Corporation
18. December 2012
Statistical Analytics challenges for Our Organizations
From Data to Insight to Context
From Data to Insight to Context
Not about bigger or …It’s about fusing data and
faster data from any one analytics from 100s-1000s of
source… sources
Analyze Structured, Un-
structure and
Unstructured Data and
Integrate Insights
Analyst Social
Web/digital
From the Field Contact Center -
Interactions
@pieroleo www.linkedin.com/in/pieroleo
These capabilities exist today: High Value Context Requires a Wide Variety of High-V Data SourcesCorporation
© 2012 IBM
19. December 2012
Cognitive Systems Strategic challenges for Our Organizations
Create an integrated view of from Data & Content coming from ALL data channels
including social business
Data Channels
Data Analysts/Cases From the Field Interactions Web/digital Social
Semi-structured and
Structured Unstructured Structured
Data & Content Agent/case Data Call logs, Web Logs, Observation Data
Transcripts, Emails…
Big Data
& Business Integrate and Analyze Structured and Unstructured Data
Organization /
Analytics Enterprise
Insights Crime Intelligence Statistical Reports Predictive Models
Distribution Alerts & warning Analytics Reports Geo-spatial Display
& Utilization generation Relation Resolution Deep Text analytics
Identity Resolution
@pieroleo www.linkedin.com/in/pieroleo
19 Nove © 2012 IBM Corporation
20. December 2012
Analytics challenge: Fusion reduces uncertainty by constructing context
Required: tight integration to
maximize context discovery
Credit Loyalty
Data Required: common practices followed
FUSION
finds by multiple standards for representing
Michael Data uncertain data and uncertainty of all
San Jose, CA Mother types, provenance, and lineage and
Date other metadata
Buyin
Buyin Son
g
g Fact Birthday $560
DSLR
DSLR
today !! Discovery OR
today
Influencers Intent A $999
&
NY Buying
Spatial Reasoning
a
Sense Making DSLR
& today !
Customer at Mall Temporal
Reasoning
Maximum Context
For
Customer in Store #42
Correlation
Minimum
Uncertainty
Required: common APIs to enable
$999 $560 sharing across the uncertainty
Corroboration
management pipeline
In-Store Pricing (Evidence Combination)
And Discounts ETC. No such common practices,
standards or APIs exist today
@pieroleo www.linkedin.com/in/pieroleo
20
© 2012 IBM Corporation
21. December 2012
The value of analytics grows by incorporating new sources of data,
composing a variety of analytic techniques, spanning organizational
silos, and enabling iterative, user-driven interaction
New format or
usage of data
Multi-modal
Intent-to-buy trends demand forecasting
Sources and types of data
Segmentation-
based
market impact
estimates
Price-based
demand forecasting
Sales-based (own & competitors)
demand
forecasting
Structured or
standardized
Low Scope of decision High
@pieroleo www.linkedin.com/in/pieroleo
21
© 2012 IBM Corporation
22. December 2012
Analytics toolkits will be expanded to support ingestion and interpretation of
unstructured data, and enable adaptation and learning
Adaptive Analysis Responding to context Learn
In the context of the
Continual Analysis Responding to local change/feedback
decision process
Optimization under Uncertainty Quantifying or mitigating risk
Decide and Act
s doh e M w N
Optimization Decision complexity, solution speed
e
Predictive Modeling Causality, probabilistic, confidence levels
Simulation High fidelity, games, data farming
Understand
t
Forecasting Larger data sets, nonlinear regression and Predict
Alerts Rules/triggers, context sensitive, complex events
Query/Drill Down In memory data, fuzzy search, geo spatial
l anoti da T
r
Ad hoc Reporting Query by example, user defined reports Report
i
Standard Reporting Real time, visualizations, user interaction
Entity Resolution People, roles, locations, things
Collect and
Relationship, Feature Extraction Rules, semantic inferencing, matching Ingest/Interpret
Decide what to count;
Annotation and Tokenization Automated, crowd sourced
a aD w N
enable accurate counting
e
Extended from: Competing on Analytics, Davenport and Harris, 2007
@pieroleo www.linkedin.com/in/pieroleo
22
© 2012 IBM Corporation
t
23. December 2012
Analytics solution development requires several interacting design steps
Algorithm Composition and Invention
Data Evaluation and Fusion Testing and Execution Optimization
Streaming data
Data mining
& statistics
Text data
Optimization
Multi-dimensional & simulation
Semantic
Time series analysis
Fuzzy
Geo spatial matching
Video
& image Network
algorithms
Relational
New
algorithms
Social network
✔
Filtering and
Business Rules Engine
Composition and
Data Acquisition Core Analytics Deployment
Extraction Validation Packaging
@pieroleo www.linkedin.com/in/pieroleo
23
© 2012 IBM Corporation
24. December 2012
Agenda
1 Cognitive Systems Era
Data Centric ← Beyond Big Data
Statistical Analytics ← Beyond Machine Learning
2 Cognitive Systems Strategic challenges for Our Organizations
3 Statistical Analytics Strategy
4 Examples of Statistical Analytics Problems & Benefits
@pieroleo www.linkedin.com/in/pieroleo
© 2012 IBM Corporation
25. December 2012
Statistical Analytics Strategy
Content Access Content & data Insight Distribution
& Integration Organization Analytics
& Utilization
Disorganized Organized Investigation Knowledge
And/OR Siloed Content added-value Accumulation &
Content from Contend Distribution
and Data
From the chaos to the New visibility and Insight generation and
order knowledge investigation support
@pieroleo www.linkedin.com/in/pieroleo
25 Nove 25
© 2012 IBM Corporation
26. December 2012
A full set of functional capabilities needed to support a A
Statistical Analytics Strategy
Content Access Content & data Insight Distribution
Organization Analytics
& Integration & Utilization
Natural LP Social Media Analytics
Content Management Inf. Brodcast News Monitoring
Extraction
Image Analytics
Adv. Advanced User Profiles Analytics
Process Management Enterprise
Analytics
Search Deep Question & Answer
Content Federation & Mining
Content Predictive Reporting & Dashboards
Entity Analytics Analytics
Master Data resolution & Business Network Visualization
Management Relation Rules
Adv. Case Management
discovery Content Classification
Standard Datawarehouse models
Advanced Big Data models (streams and restfull data)
Disorganized Organized Investigation Investigation
And/OR Siloed Content added-value from Knowledge
Content Contend and Data Accumulation &
Distribution
@pieroleo www.linkedin.com/in/pieroleo
26 Nove © 2012 IBM Corporation
27. December 2012
Agenda
1 Cognitive Systems Era
Data Centric ← Beyond Big Data
Statistical Analytics ← Beyond Machine Learning
2 Cognitive Systems Strategic challenges for Our Organizations
3 Statistical Analytics Strategy
4 Examples of Statistical Analytics Problems & Benefits
@pieroleo www.linkedin.com/in/pieroleo
© 2012 IBM Corporation
28. December 2012
Examples of Statistical Analytics Benefits
Retail Banking Customer Care Retail Customer Care
Analyzing: Call logs, internal and external media, claim Analyzing: Call logs, online media
For: Buyer Behavior For: Brand Reputation Management
Benefits: Improve Customer satisfaction, marketing Benefits: Improve customer sat, marketing campaigns
campaigns, find new revenue opportunities
Healthcare Analytics Crime Analytics
Analyzing: Care records Analyzing: Police records, Emergency calls…
For: Clinical analysis; treatment protocol optimization For: Rapid crime solving & crime trend analysis
Benefits: Better management of chronic diseases; optimized drug Benefits: Safer communities & optimized force deployment
formularies; improved patient outcomes
Insurance Fraud Automotive Quality Insight
Analyzing: Insurance claims Analyzing: Tech notes, call logs, online media
For: Detecting Fraudulent activity & patterns For: Brand Reputation Management
Benefits: Reduced losses, faster detection, more efficient Benefits: Reduce warranty costs, improve customer
claims processes satisfaction, marketing campaigns
@pieroleo www.linkedin.com/in/pieroleo
28 Nove © 2012 IBM Corporation
29. December 2012
Agenda
Ongoing Research Project with University of Bari:
Recognise a “Complex Event” from Social Media Data
Students:
Francesco Tangari
Rocco Caruso
@pieroleo www.linkedin.com/in/pieroleo
© 2012 IBM Corporation
30. December 2012
Dipartimento di Informatica
Università degli Studi di Bari
Research challenge and its business value (1/2)
A complex event has People Attributes
a defined spatio- who is planning? who is going to Spatial
temporal connotation: participate/attend?, who is interested Where is it
and follows, which is the network located? it can be
It involves one or created around the event… a square, a station,
more individuals a virtual a place,
(Who) that organize etc. where
and/or Who everyone can
Participate and/or are see the event
followers to set up a
defined action (what) Complex Event
in a defined location, Argument & its Dimensions
real or virtual, (where) what was What PROFILE Where
planned
in a given moment for the
(when). Event? Whis
is the topic
and the
A “flas mob” is an
motivation?,
When
example of a complex es People
event, other examples will dance, will
are srikes, sport freeze, etc... Temporal
events, protests, etc. The date and the time at which the event
will take place, the date and the time where
the event preparation will take place….
@pieroleo www.linkedin.com/in/pieroleo
Un approccio Statistico per la Predizione di Flashmob da Reti © 2012 IBM Corporation
Sociali 2
31. December 2012
Research challenge and its business value (2/2 )
….in the case of predicting a Flash Mob
Leveraging social
media data and
•Ex. 2: Knowing that a flash mob will be
generate insights used for the promotion of a new product, a
about complex firm which is in competition on the same
market can organize counter-action.
business relevant
phenomena by • Ex 2: A law enforcement org knowing that
connecting the a flash mob will be organized for political
purpose or for demonstration can
dots effectively relocate law forces.
@pieroleo www.linkedin.com/in/pieroleo
© 2012 IBM Corporation
32. December 2012
Information about the Event are spread on a number of social
media channels: An example of Flash Mob organization dynamic
@pieroleo www.linkedin.com/in/pieroleo
© 2012 IBM Corporation
33. December 2012
General System Context: First Prototype & Experimentation based on
Twitter channel
Twitter Channel (*)
Flash Mobs Profile
Recostruction
& Alerts
Information
Extraction Who
What FlashMob Where
When
Who
(POS, named entities: person,
Data Access organization, Locations, data, Event What FlashMob Where
& Basic etc. High-level, concepts,
wikification, etc..) Prediction Who
Feature When
& Alerting
Extraction What FlashMob Where
When Who
(Tokenization, hashtags, (Clustering, Incremental What FlashMob Where
URs, Geotags, social Social Network Clustering, Burst
network metadata, etc.) Analysis recognition..) When
(Clique, Relevant Nodes, Page
Rank ndes, etc..
Implemented path Planed integration
Analytics
Consumers
Acquiring tweets including
the #flashmob hashtag
(*) In our vision a number of “channels” should provide data (What’s up App for
and/or the keyword “flashmob” to the system such as Facebook, YouTube, etc. As well as also smartphone, Social analytics
and/or “flash mob” Other social analytics applications such as IBM COBRA or CCO, etc.
@pieroleo client, etc. etc.)
www.linkedin.com/in/pieroleo
© 2012 IBM Corporation
34. December 2012
Working on real data and applying the prediction model
Period: 1/gen – 29/Feb
Alerts/Clusters = 59
Analyzed 5148 (English
language) Tweets that
included the word or the
hashtag “flashmob”
Generated in total 59 Flash
Mob Alerts (clusters)
involving 1267 tweets
20 Alerts correctly
aggregated data about 20
Flash Mobs with an
accuracy about of 100%
@pieroleo www.linkedin.com/in/pieroleo
© 2012 IBM Corporation
35. December 2012
In a new research phase we are now extending the predicton model to
recostruct main Complex Event attributes
Complex Event
Attributes
Hadoop
CLUSERING
CLUSERING
DBSCAN
DBSCAN
COMPLEX
COMPLEX
EVENT
EVENT
PROFILER
PROFILER NLP ANNOTATION
EXTRACTOR
EXTRACTOR TOOLS
{1..N}
HIVE DW
HIVE DW
Streaming
JSON HDFS
@pieroleo www.linkedin.com/in/pieroleo
© 2012 IBM Corporation
37. December 2012
My Personal IT Mind-Map
Data Models World Instrumentation eBusiness
Services/Legacy Applications
Enterprise Data Pervasive Computing
Storage
(IMS, DBMS, Portals – Webization
Internet of Things
Etc,)
Big Data Social-ization App-ization
(structured & unstructured)
Virtualization
Web-App-ization
Cloud Cloud Services IT Consumerization/BYOD
Computing
Cognitive
Workload- Computing
Optimizied
Business Analytics Mobile Computing
Parallel
Computing Optimization
Data Warehousing /
Computing Models, Business Intelligence Social Business &
Architectures & Styles Analytics - Information-based Intelligence Mobility
= Conceptual connection, Evolution Path, Cause-Effect, etc. @pieroleo www.linkedin.com/in/pieroleo
© 2012 IBM Corporation
38. December 2012
Grazie!
@pieroleo www.linkedin.com/in/pieroleo
© 2012 IBM Corporation