SlideShare une entreprise Scribd logo
1  sur  79
From BigData to Data Science:
Predictive Analytics in a Changing Data Landscape
Usama Fayyad, Ph.D.
Group Chief Data Officer
and CIO Risk, Finance, & Treasury Technology
Barclays
Twitter: @usamaf
Overviews for South Africa
July 15-16, 2015
Outline
• Big Data all around us: the problems that Data
Science is NOT addressing
• The CDO role and Data Axioms
• Some of the issues in BigData
• Case studies
• Context and sentiment analysis
• A case-study on IOT
• Yahoo! predictions at scale
• Summary and conclusions
Why should banks worry about data?
100 years ago
• Smaller
• Much more local
• Deep understanding of customer needs
• Deep knowledge of customer
Now
• Digital, decline of the branch
• Global
• No face of the customer
• New generation of millennials
A big part of banking was knowing the customer intimately – personal relationships making KYC, risk
modelling simple
Why data is important: Every customer interaction in an opportunity
to capture data, learn & act
An instant car loan
product offering is
displayed in the app
Connie logs into BMB to
check her balance. From
cookies and browsing
behaviour we know she
is looking for a car loan
CRM data is used to
pre-calculate
Connie’s borrowing
limit for a car loan
Based on multiple
internal and external
data sources and
predictive models, we
identify cross-sell
opportunities
Connie is offered a
competitively priced
‘bespoke’ offer for car
servicing / MOT
Connie’s journey is
enhanced based on
previous multivariate
testing results
User experience is
continuously, iteratively
improved by capturing
user interaction in real-
time during every
session
Connie has a
personalised journey
based on pre-calculated
limit for the car loan
amount
Data Capture/
Opportunity to
learn
Actions
Customer Interaction Event (branch, telephony, digital, mobile, sales)
Millions of customer
interactions per day
Customer Interaction CRM Predictive Analytics Multivariate Testing
Targeted offers
during browsing
Product Discovery Cross-sell
Measure feedback
per session
Every
interaction is
an opportunity
to capture
data, to learn
and to act
Big Data
platform
Imagine we could move back to this model 100 years ago – on demand, consumable, understandable
information to build intimate relationships & an understanding of the customer
What matters in the age of analytics?
1. Being able to exploit all the data that is available
• Not just what you have available
• What you can acquire and use to enhance your actions
2. Proliferating analytics throughout the organization
• Make every part of your business smarter
• Actions and not just insights
3. Driving significant business value
• Embedding analytics into every area of your business can significantly
drive top line revenues and/or bottom line cost efficiencies
Data Fusion: Can we bring all data together for analysis & action?
Customer and Client
Interactions
Central Data
Fusion Engine
Ingesting,
persisting,
processing
and servicing in
Real-time.
Analysis
TransactionsSocial Data
Trade
Application Logs Network Traffic
RiskMarketingFinancial CrimeFraudCyber Security
DaaS
• Big Data: is a mix of structured, semi-structured, and
unstructured data
• Typically breaks barriers for traditional RDB storage
• Typically breaks limits of indexing by “rows”
• Typically requires intensive pre-processing before each query to extract “some
structure” – usually using Map-Reduce type operations
• Above leads to “messy” situations with no standard recipes or
architecture: hence the need for “data scientists”
• Conduct “Data Expeditions”
• Discovery and learning on the spot
Why Big Data?
A new term, with associated ‘Data Scientist’ positions
A Data Scientist is someone who
Knows a lot more software engineering than Statisticians
& Knows a lot more Statistics than software engineers
The 4-V’s of ‘Big Data’
Big Data is Characterized by the 3-V’s:
Volume: larger than “normal” – challenging to load/process
• Expensive to do ETL
• Expensive to figure out how to index and retrieve
• Multiple dimensions that are “key”
Velocity: Rate of arrival poses real-time constraints on what are typically “batch
ETL” operations
• If you fall behind catching up is extremely expensive (replicate very expensive
systems)
• Must keep up with rate and service queries on-the-fly
Variety: Mix of data types and varying degrees of structure
• Non-standard schema
• Lots of BLOB’s and CLOB’s
• DB queries don’t know what to do with semi-structured and unstructured data
9 Invited talk – #ODSC, Boston– Copyright Usama Fayyad © 2015
Male, age 32
Lives in SF
Lawyer
Searched on
from London
last week
Searched on:
“Italian
restaurant
Palo Alto”
Checks Yahoo!
Mail daily via PC
& Phone
Has 25 IM Buddies,
Moderates 3 Y!
Groups, and hosts a
360 page viewed by
10k people
Searched on:
“Hillary Clinton”
Clicked on
Sony Plasma TV
SS ad
Registration Campaign Behavior Unknown
Spends 10 hour/week
On the internet Purchased Da
Vinci Code from
Amazon
“Classic” Data: e.g. Yahoo! User DNA
10 Invited talk – #ODSC, Boston– Copyright Usama Fayyad © 2015
Male, age 32
Lives in SF
Lawyer
Searched on from
London last week
Searched on:
“Italian
restaurant
Palo Alto”
Checks Yahoo! Mail
daily via PC & Phone
Has 25 IM Buddies, Moderates
3 Y! Groups, and hosts a 360
page viewed by 10k people
Searched on:
“Hillary Clinton”
Clicked on
Sony Plasma TV
SS ad
Spends 10 hour/week
On the internet Purchased Da Vinci
Code from Amazon
How Data Explodes: really big
Social Graph (FB)
Likes &
friends likes
Professional netwk
- reputation
Web searches on
this person,
hobbies, work,
locationMetaData on everything
Blogs, publications,
news, local papers, job
info, accidents
The Distinction between “Classic Data” and “Big Data” is fast
disappearing
• Most real data sets nowadays come with a serious mix of semi-
structured and unstructured components:
o Images
o Video
o Text descriptions and news, blogs, etc…
o User and customer commentary
o Reactions on social media: e.g. Twitter is a mix of data anyway
• Using standard transforms, entity extraction, and new generation
tools to transform unstructured raw data into semi-structured
analyzable data
Text Data: The Big Driver
• We speak of “big data” and the “Variety” in 3-V’s
• Reality: biggest driver of growth of Big Data has been text data
o Most work on analysis of “images” and “video” data has really been reduced to
analysis of surrounding text
Nowhere more so than on the internet
• Map-Reduce popularized by Google to address the problem of processing
large amounts of text data:
o Many operations with each being a simple operation but done at large scale
o Indexing a full copy of the web
o Frequent re-indexing
A few words on: The Chief Data Officer
Why are companies creating this position?
• There is a fundamental realisation that Data needs to become a primary value
driver at organizations
o We have lots of Data
o We spend much on it: in technology and people
o We are not realising the value we expect from it
• A strong business need to create the CDO role:
o Traditional companies are not following, but adopting the model that
actually works in other data-intensive industries
• CDO has a seat at executive table: the voice of Data
• Data done right is an essential element to unify large enterprises to unlock
value from business synergies
Fundamental Data Principles to Support Analytics
Usama’s Obvious Data Axioms
1. Data gains value exponentially when integrated and coalesced.
• When fragmented: dramatic value loss takes place;
• Increased costs;
• Reduced utility/integrity;
• Increased security risks
2. Fusing Data together from disparate/independent sources is
difficult to achieve and impossible to maintain
Hence only viable approach is:
• Intercepting and documenting at the source
• Fusing at the source
• Controlling lifecycle and flow
Fundamental Data Principles to Support Analytics
Usama’s Obvious Data Axioms
3. Standardisation is essential
• For sustained ability to integrate data sources and hence growing value;
• For simplifying down-stream systems and apps
• For enforcing discipline as a firm increases its data sources
4. Data governance and policy must be centralised
• Needs to be enforced strongly else we slip into chaos and a Babylon of
terms/languages
• An Enterprise Data Architecture spanning structured and unstructured data
5. Recency Matters -> data streaming in modelling and scoring
• Often, accuracy of prediction drops quickly with time (e.g. consumer shopping)
• Value of alerts drop exponentially with time…
• Ability to trigger responses based on real-time scoring critical
• Streaming, real-time model updates, real-time scoring
Fundamental Data Principles to Support Analytics
Usama’s Obvious Data Axioms
6. Abstraction layer of data from Apps/platforms is a must
• Rapid renewal & modernization: the pace of change and development of
technology are very rapid
• Design for migration and infrastructure replacement via abstraction layers that
remove tech dependencies
• Encryption and Masking: Persisting unencrypted confidential and secret data
(even within secure firewalls) is an invitation for problems and risks
7. Data is a primary competency and not a side-activity supporting
other processes
• Hence specialized skills and know-how are a must
• Generalists will create a hopeless mess
• Data is difficult: modelling, architecture, and design to support analytics
Driving the need for integrated processes, data & architecture
Regulatory Demand
 Data quality, completeness, lineage
and aggregation
 Company financial reporting presented
at most granular level through various
lenses: business units, legal entities,
regional, sector level, …
 Faster turnaround for increasingly
complex ad hoc data requests
 Additional sophisticated stress tests
delivered in a joined-up manner
between Risk and Finance
 Enhanced hybrid capital computation
methodologies
Business Drivers
 Better and smarter controls
 Capital precision
 Better customer
experience
 Better colleagues
experience
 More cost effective
Data Governance is BORING for most…
Policy, governance
& control
Business ownership
& data stewardship
Definitions &
metadata
Permit to build
(change governance)
Reporting &
analytics
Data architecture
Data sourcing
Reconciliation
Data quality
management
Documentation,
tracking & audit
…but it shouldn’t be!
Reality check: So what should users of analytics in Big Data world
do?
Load the Data into a “Data Lake”
Your life will be so much easier as you can now do Data Acrobatics
& other amazing data feats
The Data Lake – according to Waterline
We loaded the Data!
Congratulations
Now What?
From a Data Lake to Amazon Browser
Amazon Simple
Search
Amazon gives you
facets
Product
details
Reality check: So where do analysts & Data Scientists spend all
their time?
Let’s Mine
the Data
Reality check: So what do Technology people worry about these
days?
To Hadoop or not to Hadoop?
When to use techniques requiring Map-Reduce and grid
computing?
Typically organizations try to use Map-Reduce for everything to do with Big Data
• This is actually very inefficient and often irrational
• Certain operations require specialized storage
o Updating segment memberships over large numbers of users
o Defining new segments on user or usage data
Drivers of Hadoop in large enterprises
Cost of storage
Fastest growing demand is more storage
Data in Data Warehouses have traditionally required expensive storage technology:
• $20K to $50k per terabyte per year – cost of premium storage
• $2K per terabyte – much lower per year – cost of Hadoop
on commodity storage
Analysis & programming software
PIG
HIPI
Hadoop Stack
Big Data Landscape
Data Platforms Landscape Map
Reality check: If storage is biggest driver of Hadoop adoption, what
is next biggest?
ETL
• Replaces expensive licenses
• Much higher performance with lower infrastructure costs (processors,
memory)
• Flexibility in changing schema and representation
• Flexibility on taking on unstructured and semi-structured data
• Plus suite of really cool tools…
Turning the 3-Vs of Big Data into Value
Understand context and content
• What are appropriate actions?
• Is it Ok to associate my brand with this content?
• Is content sad?, happy?, serious?, informative?
Understand community sentiment
• What is the emotion?
• Is it negative or positive?
• What is the health of my brand online?
Understand customer intent?
• What is each individual trying to achieve?
• Can we predict what to do next?
• Critical in cross-sell, personalization, monetization, advertising, etc…
Many business uses of predictive analytics
Analytic technique Uses in business
Marketing and sales Identify potential customers; establish the effectiveness of a
Understanding customer behavior model churn, affinities, propensities, …
Web analytics & metrics model user preferences from data, collaborative filtering, targeting,
Fraud detection Identify fraudulent transactions
Credit scoring credit worthiness of a customer requesting a loan, secured and
loans
Manufacturing process analysis Identify the causes of manufacturing problems
Portfolio trading optimize a portfolio of financial instruments by maximizing returns &
minimizing risks
Healthcare Application fraud detection, cost optimization, detection of events like epidemics,
etc...
Insurance fraudulent claim detection, risk assessment
Security and Surveillance intrusion detection, sensor data analysis, remote sensing,
detection, link analysis, etc...
Application Log File Analytics Understanding application, network, event logs in IT
Case Studies:
1. Context Analysis (unstructured data)
2. IOT Case Study
3. Yahoo! Predictive Modeling
Reality Check
So who is the company we think is best at handling Big
Data?
Case Study: Biggest Big Data in advertising?
Understanding context for ads
What Ad would you
place here?
Case Study: Biggest Big Data in advertising?
Understanding context for ads
Damaging to Brand?
38 Invited talk – #ODSC, Boston– Copyright Usama Fayyad © 2015
The Display Ads Challenge Today
What Ad
would you
place here?
NetSeer: Solving accuracy issues
Ambiguity, waste, brand, safety
Why did Google Serve
this Ad?
39
this is how NetSeer
actually sees this
content
NetSeer: How it works
40
high MPG
ford
low emission
fuel efficiency
ECONOMY CARS
economy
vehicles
microscope
lenses
reading
glasses
autofocus
bifocal
refraction
VISION TOOLS
eye chart
focus groups
A/B testing
consumer study
surveying
blind study
analytics
MARKET RESEARCH
~ ~ ~ ~ ~
~ ~ ~ ~ ~
~ ~ ~ ~
~ ~ ~ ~ ~
WEBSITE.COM
~ ~ ~ ~ ~
~ ~ ~ ~
~ ~ ~ ~ ~
~ ~ ~ ~ ~
~ ~ ~ ~
electric
vehicles
service
record
safety rating focus
<CONCEPT>
DISCERNS AND MONETIZES HUMAN
INTENT
+ Identifies Concepts expressed on
a page
+ Disambiguates language
+ Builds increasingly rich profile over
time
52M
2.3B
CONCEPTS
RELATIONSHIPS
BETWEEN
CONCEPTS
NetSeer: Intent for Display
Positive vs negative content re a particular topic
Problem: Hard to understand user intent
Contextual Ad served by Google What NetSeer Sees:
Case Studies:
1. Context Analysis (unstructured data)
2. IOT Case Study
3. Yahoo! Predictive Modeling
The Connected Cow
Joseph Sirosh
VP, Machine Learning
Microsoft
Case Studies:
1. Context Analysis (unstructured data)
2. IOT Case Study
3. Yahoo! Predictive Modeling
Yahoo! – One of Largest Destinations on the Web
80% of the U.S. Internet population uses Yahoo!
– Over 600 million users per month globally!
Global network of content, commerce, media, search and access
products
100+ properties including mail, TV, news, shopping, finance, autos,
travel, games, movies, health, etc.
25+ terabytes of data collected each day
• Representing 1000’s of cataloged consumer behaviors
More people visited
Yahoo! in the past
month than:
• Use coupons
• Vote
• Recycle
• Exercise regularly
• Have children living
at home
• Wear sunscreen
regularly
Sources: Mediamark Research, Spring 2004 and comScore Media Metrix, February 2005.
Data is used to develop content, consumer, category and campaign insights for
our key content partners and large advertisers
Yahoo! Big Data – A league of its own…
Terrabytes of Warehoused Data
25 49 94 100
500
1,000
5,000
Amazon
Korea
Telecom
AT&T
Y!LiveStor
Y!Panama
Warehouse
Walmart
Y!Main
warehouse
GRAND CHALLENGE PROBLEMS OF DATA PROCESSING
TRAVEL, CREDIT CARD PROCESSING, STOCK EXCHANGE, RETAIL, INTERNET
Y! Data Challenge Exceeds others by 2 orders of magnitude
Millions of Events Processed Per Day
50 120 225
2,000
14,000
SABRE VISA NYSE YSM Y! Global
Behavioral Targeting (BT)
Search
Ad Clicks
Content
Search Clicks
BT
Targeting ads to
consumers whose recent
behaviors online indicate
which product category is
relevant to them
Male, age 32
Lives in SF
Lawyer
Searched on
from London
last week
Searched on:
“Italian
restaurant
Palo Alto”
Checks Yahoo!
Mail daily via PC
& Phone
Has 25 IM Buddies,
Moderates 3 Y!
Groups, and hosts a
360 page viewed by
10k people
Searched on:
“Hillary Clinton”
Clicked on
Sony Plasma TV
SS ad
Registration Campaign Behavior Unknown
Spends 10 hour/week
On the internet Purchased Da
Vinci Code from
Amazon
Yahoo! User DNA
• On a per consumer basis: maintain a behavioral/interests profile and profitability
(user value and LTV) metrics
How it works | Network + Interests + Modelling
Analyze predictive patterns for purchase
cycles in over 100 product categories
In each category, build models to describe
behaviour most likely to lead to an ad
response (i.e. click).
Score each user for fit with every
category…daily.
Target ads to users who get highest
‘relevance’ scores in the targeting categories
Varying Product Purchase CyclesMatch Users to the ModelsRewarding Good BehaviourIdentify Most Relevant Users
Recency Matters, So Does Intensity
Active now… …and with feeling
Differentiation | Category specific modelling
time
intensityscore
time
intensityscore
IntenseClickZone
Example 1: Category Automotive Example 2: Category Travel/Last Minute
Different models allow us to weight and determine intensity and recency
Alt Behaviour 1: 5 pages, 2 search keywords, 1 search click, 1 ad click Alt Behaviour 1: 5 pages, 2 search keywords, 1 search click, 1 ad click
IntenseClickZone
Differentiation | Category specific modelling
time
intensityscore
Intense Click Zone
Example 1: Category Automotive
Different models allow us to weight and determine intensity and recency
with no further activity, decay takes effect
Alt Behaviour 1: 5 pages, 2 search keywords, 1 search click, 1 ad click
user is in the Intense Click Zone
Automobile Purchase Intender Example
A test ad-campaign with a major Euro automobile manufacturer
 Designed a test that served the same ad creative to test and control groups on
Yahoo
 Success metric: performing specific actions on Jaguar website
Test results: 900% conversion lift vs. control group
 Purchase Intenders were 9 times more likely to configure a vehicle, request a
price quote or locate a dealer than consumers in the control group
 ~3x higher click through rates vs. control group
Mortgage Intender Example
We found:
1,900,000 people looking for
mortgage loans.
+122%
CTR Lift
Mortgages Home Loans Refinancing Ditech
Financing section in Real Estate
Mortgage Loans area in Finance
Real Estate section in Yellow Pages
+626%
Conv Lift
Example search terms qualified for this target:
Example Yahoo! Pages visited:
Source: Campaign Click thru Rate lift is determined by Yahoo! Internal
research. Conversion is the number of qualified leads from clicks over number
of impressions served. Audience size represents theaudiencewithinthis behavioralinterest
categorythat hasthe highestpropensitytoengagewitha brandorproductandtoclickon anoffer.
Date: March 2006
Results from a client campaign on Yahoo!
Network
Example: Mortgages
Experience summary at Yahoo!
• Dealing with one of the largest data sources (25 Terabyte
per day)
• Behavioral Targeting business was grown from $20M to >
$400M in 3 years of investment!
• Yahoo! Specific? -- BigData critical to operations
– Ad targeting creates huge value
– Right teams to build technology (3 years of recruiting)
– Search is a BigData problem (but this has moved to mainstream)
Lessons Learned
A lot more data than qualified talent
 Finding talent in BigData is very difficult
 Retaining talent in BigData is even harder
At Yahoo! we created central group that drove huge value to
company
Data people need to feel like they have critical mass
 Makes it easier to attract the right people
 Makes it easier to retain
Drive data efforts by business need, not by technology priorities
 Chief Data Officer role at Yahoo! – now popular
Where does this leave us?
1. What matters in the age of analytics?.
• Being able to exploit all the data that is available
• Proliferating analytics throughout the organization
• Driving significant business value
2. Where are we today?
• New Data Landscape via Hadoop
• Confusion by marketers, analysts, and technical community
• Struggling with basics of managing data because of the new flexibility
3. What are the real issues?
• Data management and governance
• Talent for Data and Data Science is rare but critical
• Data and Analytics are specialisms that need management and know-how: not for
generalists…
The early days of mass
auto production
Today’s Auto:
It just works!
No need to understand what happens
when you turn on ignition
Very complex inside, but all simplicity on
the outside
Data Science, AI, Algorithms
Data
Science
Artificial
Intelligence /
Machine Learning
Algorithms
Data as a Service (DaaS)
to provide our business units with better,
faster, cheaper data
Algorithms
Create leading edge Algorithms for
Automation via AI, Machine Learning,
and leading edge data science
Automation Tech via AI
CoE for Intelligent Automation via AI, ML,
case-based reasoning & Planning
Secure Cloud Data Lab
Secure Cloud data lab with open source
stack to be able to leverage external
datasets and work with the start-ups to
innovate at scale
Experimentation evaluation
Data-fuelled evaluation of experimentation
at scale, e.g. digital experiences
empowered by data
Data Solutions for Customers/ Clients
Create a Lab for innovative solutions for
customers and Clients
KEY ADVANCED CAPABILITIES
FRAUD / CYBERSECURITY
• Reduced false alarms
• Less write-offs
• Better intrusion detection, cyber attacks, internal
financial crime
BUSINESS IMPACT - EXAMPLES
RAPID AUTOMATION
• Learning algorithms automate and roboticise
manual tasks automatically by ‘crowd-sourcing’
• Automated reporting and reduced manual
reconciliation
SMARTER TRADING ALGORITHMS
• Smarter trading algorithms and quantitative
analytics enabled by new and more powerful
technology
TARGETED MARKETING
• Leverage automated learning and classification to
customer acquisition, retention, and activation
REVENUE DRIVER
• Leverage social media / unstructured data for
Cross-sell, up-sell, next-best-action, CRM, etc.
Barclays Local Insights: Providing
insights about people
www.insights.barclays.co.uk
Application Dates:
Rising Eagles Graduate Programme: April-June | Explorer Programme: 1 week in July
We recruit on a rolling bases & early applications are advised – find out more and apply at
joinus.barclays.com/Africa
Technology opportunities at Barclays
We seek world class technologists, data scientists, problem solvers, Data systems engineers – the
team that will reinvent Financial Services
Create game changing new products and services for our Africa businesses across retail,
business banking, Card, Corporate/IB, and Wealth/Investment management that do not only
generate new experiences and growth but new business models.
We are looking for hungry data talent that is looking to move the needle in all our businesses.
Yassi Hadjibashi
Chief Data Officer, Barclays Africa - DSI
Africa is a huge growth opportunity for Barclays. If you know your Data Science, We want you!
If you love Hadoop & BigData, We love you! And if you want to be part of an industry
transformation around Data in Financial Services, come help us make it happen!
Dr. Usama Fayyad
Group Chief Data Officer, Barclays Bank
Security, Data, and Software as code. – Basically everything automated for sophisticated rapid
innovation, deployment cycles, and new age software engineering. Come join our journey and
drive this with backing of most senior leadership and attention!!!
Peter Rix
Chief Technology Officer, Barclays Africa
Thank you & questions
Usama Fayyad
Geraldine.Bolton@barclays.com
@Usamaf
joinus.barclays.com/
Africa
Yasaman.Hadjibashi@barclayscorp.com
Pieter.Vorster@absacapital.com
Ekow.Duker@barclaysafrica.com

Contenu connexe

Tendances

Early Warning Signs of IT Project Failure -- The Deadly Dozen and the Four Ho...
Early Warning Signs of IT Project Failure -- The Deadly Dozen and the Four Ho...Early Warning Signs of IT Project Failure -- The Deadly Dozen and the Four Ho...
Early Warning Signs of IT Project Failure -- The Deadly Dozen and the Four Ho...
Leon Kappelman
 
5 big data at work linking discovery and bi to improve business outcomes from...
5 big data at work linking discovery and bi to improve business outcomes from...5 big data at work linking discovery and bi to improve business outcomes from...
5 big data at work linking discovery and bi to improve business outcomes from...
Dr. Wilfred Lin (Ph.D.)
 
Data Architecture Process in a BI environment
Data Architecture Process in a BI environmentData Architecture Process in a BI environment
Data Architecture Process in a BI environment
Sasha Citino
 

Tendances (20)

Data Integration, Interoperability and Virtualization
Data Integration, Interoperability and VirtualizationData Integration, Interoperability and Virtualization
Data Integration, Interoperability and Virtualization
 
Early Warning Signs of IT Project Failure -- The Deadly Dozen and the Four Ho...
Early Warning Signs of IT Project Failure -- The Deadly Dozen and the Four Ho...Early Warning Signs of IT Project Failure -- The Deadly Dozen and the Four Ho...
Early Warning Signs of IT Project Failure -- The Deadly Dozen and the Four Ho...
 
Lean Modeling for Any Methodology
Lean Modeling for Any MethodologyLean Modeling for Any Methodology
Lean Modeling for Any Methodology
 
Using Graphs to Comply with GDPR
Using Graphs to Comply with GDPRUsing Graphs to Comply with GDPR
Using Graphs to Comply with GDPR
 
5 big data at work linking discovery and bi to improve business outcomes from...
5 big data at work linking discovery and bi to improve business outcomes from...5 big data at work linking discovery and bi to improve business outcomes from...
5 big data at work linking discovery and bi to improve business outcomes from...
 
ADV Slides: The Impact of Machine Learning on the Enterprise Today
ADV Slides: The Impact of Machine Learning on the Enterprise TodayADV Slides: The Impact of Machine Learning on the Enterprise Today
ADV Slides: The Impact of Machine Learning on the Enterprise Today
 
HP Vertica Architecture Gives Massive Performance Boost to Toughest BI Querie...
HP Vertica Architecture Gives Massive Performance Boost to Toughest BI Querie...HP Vertica Architecture Gives Massive Performance Boost to Toughest BI Querie...
HP Vertica Architecture Gives Massive Performance Boost to Toughest BI Querie...
 
Data Architecture Process in a BI environment
Data Architecture Process in a BI environmentData Architecture Process in a BI environment
Data Architecture Process in a BI environment
 
Data science and its potential to change business as we know it. The Roadmap ...
Data science and its potential to change business as we know it. The Roadmap ...Data science and its potential to change business as we know it. The Roadmap ...
Data science and its potential to change business as we know it. The Roadmap ...
 
Fasten you seatbelt and listen to the Data Steward
Fasten you seatbelt and listen to the Data StewardFasten you seatbelt and listen to the Data Steward
Fasten you seatbelt and listen to the Data Steward
 
The Future Paradigm Shifts of the Cloud and Big Data: Security Impacts & New ...
The Future Paradigm Shifts of the Cloud and Big Data: Security Impacts & New ...The Future Paradigm Shifts of the Cloud and Big Data: Security Impacts & New ...
The Future Paradigm Shifts of the Cloud and Big Data: Security Impacts & New ...
 
Learning New Skills for the Digital Age
Learning New Skills for the Digital AgeLearning New Skills for the Digital Age
Learning New Skills for the Digital Age
 
[AIIM16] The Last Mile in Information Management
[AIIM16] The Last Mile in Information Management[AIIM16] The Last Mile in Information Management
[AIIM16] The Last Mile in Information Management
 
Align Business Data & Analytics for Digital Transformation
Align Business Data & Analytics for Digital TransformationAlign Business Data & Analytics for Digital Transformation
Align Business Data & Analytics for Digital Transformation
 
Building Competitive Moats With Data
Building Competitive Moats With DataBuilding Competitive Moats With Data
Building Competitive Moats With Data
 
TBR IT Trends Study
TBR IT Trends StudyTBR IT Trends Study
TBR IT Trends Study
 
MDM - The Key to Successful Customer Experience Managment
MDM - The Key to Successful Customer Experience ManagmentMDM - The Key to Successful Customer Experience Managment
MDM - The Key to Successful Customer Experience Managment
 
CTAM Making Analytics Actionable RJA FINAL
CTAM Making Analytics Actionable RJA FINALCTAM Making Analytics Actionable RJA FINAL
CTAM Making Analytics Actionable RJA FINAL
 
Why Information Architecture is Vital for Effective Information Management
Why Information Architecture is Vital for Effective Information ManagementWhy Information Architecture is Vital for Effective Information Management
Why Information Architecture is Vital for Effective Information Management
 
Real-Time Data Integration for Modern BI
Real-Time Data Integration for Modern BIReal-Time Data Integration for Modern BI
Real-Time Data Integration for Modern BI
 

En vedette

Social media-101-presentation
Social media-101-presentationSocial media-101-presentation
Social media-101-presentation
aja1973
 

En vedette (20)

Keynote talk at Financial Times Forum - BigData and Advanced Analytics at SIB...
Keynote talk at Financial Times Forum - BigData and Advanced Analytics at SIB...Keynote talk at Financial Times Forum - BigData and Advanced Analytics at SIB...
Keynote talk at Financial Times Forum - BigData and Advanced Analytics at SIB...
 
BDtraining
BDtrainingBDtraining
BDtraining
 
Seattle code camp 2016 - Role of Data Science in Healthcare
Seattle code camp 2016  - Role of Data Science in HealthcareSeattle code camp 2016  - Role of Data Science in Healthcare
Seattle code camp 2016 - Role of Data Science in Healthcare
 
Social media-101-presentation
Social media-101-presentationSocial media-101-presentation
Social media-101-presentation
 
World Retail Banking Report 2016
World Retail Banking Report 2016World Retail Banking Report 2016
World Retail Banking Report 2016
 
BigData
BigDataBigData
BigData
 
Open Source Framework for Deploying Data Science Models and Cloud Based Appli...
Open Source Framework for Deploying Data Science Models and Cloud Based Appli...Open Source Framework for Deploying Data Science Models and Cloud Based Appli...
Open Source Framework for Deploying Data Science Models and Cloud Based Appli...
 
How to Choose the Right Technology, Framework or Tool to Build Microservices
How to Choose the Right Technology, Framework or Tool to Build MicroservicesHow to Choose the Right Technology, Framework or Tool to Build Microservices
How to Choose the Right Technology, Framework or Tool to Build Microservices
 
DataScience and BigData Cebu 1st meetup
DataScience and BigData Cebu 1st meetupDataScience and BigData Cebu 1st meetup
DataScience and BigData Cebu 1st meetup
 
Big Data Use Cases
Big Data Use CasesBig Data Use Cases
Big Data Use Cases
 
The age of artificial intelligence
The age of artificial intelligenceThe age of artificial intelligence
The age of artificial intelligence
 
Big Data Use Cases
Big Data Use CasesBig Data Use Cases
Big Data Use Cases
 
Bigdata Machine Learning Platform
Bigdata Machine Learning PlatformBigdata Machine Learning Platform
Bigdata Machine Learning Platform
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Big Data: It’s all about the Use Cases
Big Data: It’s all about the Use CasesBig Data: It’s all about the Use Cases
Big Data: It’s all about the Use Cases
 
Bigdata based fraud detection
Bigdata based fraud detectionBigdata based fraud detection
Bigdata based fraud detection
 
Fraud Detection presentation
Fraud Detection presentationFraud Detection presentation
Fraud Detection presentation
 
Surprising failure factors when implementing eCommerce and Omnichannel eBusiness
Surprising failure factors when implementing eCommerce and Omnichannel eBusinessSurprising failure factors when implementing eCommerce and Omnichannel eBusiness
Surprising failure factors when implementing eCommerce and Omnichannel eBusiness
 
Magento scalability from the trenches (Meet Magento Sweden 2016)
Magento scalability from the trenches (Meet Magento Sweden 2016)Magento scalability from the trenches (Meet Magento Sweden 2016)
Magento scalability from the trenches (Meet Magento Sweden 2016)
 
Omnichannel Customer Experience
Omnichannel Customer ExperienceOmnichannel Customer Experience
Omnichannel Customer Experience
 

Similaire à Usama Fayyad talk in South Africa: From BigData to Data Science

final oracle presentation
final oracle presentationfinal oracle presentation
final oracle presentation
Priyesh Patel
 
Nasscomilf2014 thedigitalenterprise-bigdataandanalyticsleadtheway-thomashdave...
Nasscomilf2014 thedigitalenterprise-bigdataandanalyticsleadtheway-thomashdave...Nasscomilf2014 thedigitalenterprise-bigdataandanalyticsleadtheway-thomashdave...
Nasscomilf2014 thedigitalenterprise-bigdataandanalyticsleadtheway-thomashdave...
Sandra Fernandes
 

Similaire à Usama Fayyad talk in South Africa: From BigData to Data Science (20)

Big data
Big dataBig data
Big data
 
Data Governance in a big data era
Data Governance in a big data eraData Governance in a big data era
Data Governance in a big data era
 
Your AI and ML Projects Are Failing – Key Steps to Get Them Back on Track
Your AI and ML Projects Are Failing – Key Steps to Get Them Back on TrackYour AI and ML Projects Are Failing – Key Steps to Get Them Back on Track
Your AI and ML Projects Are Failing – Key Steps to Get Them Back on Track
 
final oracle presentation
final oracle presentationfinal oracle presentation
final oracle presentation
 
Foundational Strategies for Trust in Big Data Part 3: Data Lineage
Foundational Strategies for Trust in Big Data Part 3: Data LineageFoundational Strategies for Trust in Big Data Part 3: Data Lineage
Foundational Strategies for Trust in Big Data Part 3: Data Lineage
 
SiriusDecisions Explores the Need for Demand Orchestration
SiriusDecisions Explores the Need for Demand OrchestrationSiriusDecisions Explores the Need for Demand Orchestration
SiriusDecisions Explores the Need for Demand Orchestration
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
 
Oceans of big data: Take the plunge or wade in slowly?
Oceans of big data: Take the plunge or wade in slowly?Oceans of big data: Take the plunge or wade in slowly?
Oceans of big data: Take the plunge or wade in slowly?
 
Nasscomilf2014 thedigitalenterprise-bigdataandanalyticsleadtheway-thomashdave...
Nasscomilf2014 thedigitalenterprise-bigdataandanalyticsleadtheway-thomashdave...Nasscomilf2014 thedigitalenterprise-bigdataandanalyticsleadtheway-thomashdave...
Nasscomilf2014 thedigitalenterprise-bigdataandanalyticsleadtheway-thomashdave...
 
Driving Business Value Through Agile Data Assets
Driving Business Value Through Agile Data AssetsDriving Business Value Through Agile Data Assets
Driving Business Value Through Agile Data Assets
 
Keynote: Graphs in Government_Lance Walter, CMO
Keynote:  Graphs in Government_Lance Walter, CMOKeynote:  Graphs in Government_Lance Walter, CMO
Keynote: Graphs in Government_Lance Walter, CMO
 
Value of data in digital transformation
Value of data in digital transformationValue of data in digital transformation
Value of data in digital transformation
 
uae views on big data
  uae views on  big data  uae views on  big data
uae views on big data
 
Big Data, Big Investment
Big Data, Big InvestmentBig Data, Big Investment
Big Data, Big Investment
 
Data Quality Success Stories
Data Quality Success StoriesData Quality Success Stories
Data Quality Success Stories
 
Big Data is Here for Financial Services White Paper
Big Data is Here for Financial Services White PaperBig Data is Here for Financial Services White Paper
Big Data is Here for Financial Services White Paper
 
Data Lake Architecture – Modern Strategies & Approaches
Data Lake Architecture – Modern Strategies & ApproachesData Lake Architecture – Modern Strategies & Approaches
Data Lake Architecture – Modern Strategies & Approaches
 
Bigdata for sme-industrial intelligence information-24july2017-final
Bigdata for sme-industrial intelligence information-24july2017-finalBigdata for sme-industrial intelligence information-24july2017-final
Bigdata for sme-industrial intelligence information-24july2017-final
 
From Near to Maturity - Presentation to European Data Forum
From Near to Maturity - Presentation to European Data ForumFrom Near to Maturity - Presentation to European Data Forum
From Near to Maturity - Presentation to European Data Forum
 
Internet of things, Big Data and Analytics 101
Internet of things, Big Data and Analytics 101Internet of things, Big Data and Analytics 101
Internet of things, Big Data and Analytics 101
 

Dernier

Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
AroojKhan71
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
only4webmaster01
 
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
amitlee9823
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
MarinCaroMartnezBerg
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
amitlee9823
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
amitlee9823
 

Dernier (20)

Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
 
Predicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science ProjectPredicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science Project
 
Anomaly detection and data imputation within time series
Anomaly detection and data imputation within time seriesAnomaly detection and data imputation within time series
Anomaly detection and data imputation within time series
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics Program
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
 

Usama Fayyad talk in South Africa: From BigData to Data Science

  • 1. From BigData to Data Science: Predictive Analytics in a Changing Data Landscape Usama Fayyad, Ph.D. Group Chief Data Officer and CIO Risk, Finance, & Treasury Technology Barclays Twitter: @usamaf Overviews for South Africa July 15-16, 2015
  • 2. Outline • Big Data all around us: the problems that Data Science is NOT addressing • The CDO role and Data Axioms • Some of the issues in BigData • Case studies • Context and sentiment analysis • A case-study on IOT • Yahoo! predictions at scale • Summary and conclusions
  • 3. Why should banks worry about data? 100 years ago • Smaller • Much more local • Deep understanding of customer needs • Deep knowledge of customer Now • Digital, decline of the branch • Global • No face of the customer • New generation of millennials A big part of banking was knowing the customer intimately – personal relationships making KYC, risk modelling simple
  • 4. Why data is important: Every customer interaction in an opportunity to capture data, learn & act An instant car loan product offering is displayed in the app Connie logs into BMB to check her balance. From cookies and browsing behaviour we know she is looking for a car loan CRM data is used to pre-calculate Connie’s borrowing limit for a car loan Based on multiple internal and external data sources and predictive models, we identify cross-sell opportunities Connie is offered a competitively priced ‘bespoke’ offer for car servicing / MOT Connie’s journey is enhanced based on previous multivariate testing results User experience is continuously, iteratively improved by capturing user interaction in real- time during every session Connie has a personalised journey based on pre-calculated limit for the car loan amount Data Capture/ Opportunity to learn Actions Customer Interaction Event (branch, telephony, digital, mobile, sales) Millions of customer interactions per day Customer Interaction CRM Predictive Analytics Multivariate Testing Targeted offers during browsing Product Discovery Cross-sell Measure feedback per session Every interaction is an opportunity to capture data, to learn and to act Big Data platform Imagine we could move back to this model 100 years ago – on demand, consumable, understandable information to build intimate relationships & an understanding of the customer
  • 5. What matters in the age of analytics? 1. Being able to exploit all the data that is available • Not just what you have available • What you can acquire and use to enhance your actions 2. Proliferating analytics throughout the organization • Make every part of your business smarter • Actions and not just insights 3. Driving significant business value • Embedding analytics into every area of your business can significantly drive top line revenues and/or bottom line cost efficiencies
  • 6. Data Fusion: Can we bring all data together for analysis & action? Customer and Client Interactions Central Data Fusion Engine Ingesting, persisting, processing and servicing in Real-time. Analysis TransactionsSocial Data Trade Application Logs Network Traffic RiskMarketingFinancial CrimeFraudCyber Security DaaS
  • 7. • Big Data: is a mix of structured, semi-structured, and unstructured data • Typically breaks barriers for traditional RDB storage • Typically breaks limits of indexing by “rows” • Typically requires intensive pre-processing before each query to extract “some structure” – usually using Map-Reduce type operations • Above leads to “messy” situations with no standard recipes or architecture: hence the need for “data scientists” • Conduct “Data Expeditions” • Discovery and learning on the spot Why Big Data? A new term, with associated ‘Data Scientist’ positions A Data Scientist is someone who Knows a lot more software engineering than Statisticians & Knows a lot more Statistics than software engineers
  • 8. The 4-V’s of ‘Big Data’ Big Data is Characterized by the 3-V’s: Volume: larger than “normal” – challenging to load/process • Expensive to do ETL • Expensive to figure out how to index and retrieve • Multiple dimensions that are “key” Velocity: Rate of arrival poses real-time constraints on what are typically “batch ETL” operations • If you fall behind catching up is extremely expensive (replicate very expensive systems) • Must keep up with rate and service queries on-the-fly Variety: Mix of data types and varying degrees of structure • Non-standard schema • Lots of BLOB’s and CLOB’s • DB queries don’t know what to do with semi-structured and unstructured data
  • 9. 9 Invited talk – #ODSC, Boston– Copyright Usama Fayyad © 2015 Male, age 32 Lives in SF Lawyer Searched on from London last week Searched on: “Italian restaurant Palo Alto” Checks Yahoo! Mail daily via PC & Phone Has 25 IM Buddies, Moderates 3 Y! Groups, and hosts a 360 page viewed by 10k people Searched on: “Hillary Clinton” Clicked on Sony Plasma TV SS ad Registration Campaign Behavior Unknown Spends 10 hour/week On the internet Purchased Da Vinci Code from Amazon “Classic” Data: e.g. Yahoo! User DNA
  • 10. 10 Invited talk – #ODSC, Boston– Copyright Usama Fayyad © 2015 Male, age 32 Lives in SF Lawyer Searched on from London last week Searched on: “Italian restaurant Palo Alto” Checks Yahoo! Mail daily via PC & Phone Has 25 IM Buddies, Moderates 3 Y! Groups, and hosts a 360 page viewed by 10k people Searched on: “Hillary Clinton” Clicked on Sony Plasma TV SS ad Spends 10 hour/week On the internet Purchased Da Vinci Code from Amazon How Data Explodes: really big Social Graph (FB) Likes & friends likes Professional netwk - reputation Web searches on this person, hobbies, work, locationMetaData on everything Blogs, publications, news, local papers, job info, accidents
  • 11. The Distinction between “Classic Data” and “Big Data” is fast disappearing • Most real data sets nowadays come with a serious mix of semi- structured and unstructured components: o Images o Video o Text descriptions and news, blogs, etc… o User and customer commentary o Reactions on social media: e.g. Twitter is a mix of data anyway • Using standard transforms, entity extraction, and new generation tools to transform unstructured raw data into semi-structured analyzable data
  • 12. Text Data: The Big Driver • We speak of “big data” and the “Variety” in 3-V’s • Reality: biggest driver of growth of Big Data has been text data o Most work on analysis of “images” and “video” data has really been reduced to analysis of surrounding text Nowhere more so than on the internet • Map-Reduce popularized by Google to address the problem of processing large amounts of text data: o Many operations with each being a simple operation but done at large scale o Indexing a full copy of the web o Frequent re-indexing
  • 13. A few words on: The Chief Data Officer Why are companies creating this position? • There is a fundamental realisation that Data needs to become a primary value driver at organizations o We have lots of Data o We spend much on it: in technology and people o We are not realising the value we expect from it • A strong business need to create the CDO role: o Traditional companies are not following, but adopting the model that actually works in other data-intensive industries • CDO has a seat at executive table: the voice of Data • Data done right is an essential element to unify large enterprises to unlock value from business synergies
  • 14. Fundamental Data Principles to Support Analytics Usama’s Obvious Data Axioms 1. Data gains value exponentially when integrated and coalesced. • When fragmented: dramatic value loss takes place; • Increased costs; • Reduced utility/integrity; • Increased security risks 2. Fusing Data together from disparate/independent sources is difficult to achieve and impossible to maintain Hence only viable approach is: • Intercepting and documenting at the source • Fusing at the source • Controlling lifecycle and flow
  • 15. Fundamental Data Principles to Support Analytics Usama’s Obvious Data Axioms 3. Standardisation is essential • For sustained ability to integrate data sources and hence growing value; • For simplifying down-stream systems and apps • For enforcing discipline as a firm increases its data sources 4. Data governance and policy must be centralised • Needs to be enforced strongly else we slip into chaos and a Babylon of terms/languages • An Enterprise Data Architecture spanning structured and unstructured data 5. Recency Matters -> data streaming in modelling and scoring • Often, accuracy of prediction drops quickly with time (e.g. consumer shopping) • Value of alerts drop exponentially with time… • Ability to trigger responses based on real-time scoring critical • Streaming, real-time model updates, real-time scoring
  • 16. Fundamental Data Principles to Support Analytics Usama’s Obvious Data Axioms 6. Abstraction layer of data from Apps/platforms is a must • Rapid renewal & modernization: the pace of change and development of technology are very rapid • Design for migration and infrastructure replacement via abstraction layers that remove tech dependencies • Encryption and Masking: Persisting unencrypted confidential and secret data (even within secure firewalls) is an invitation for problems and risks 7. Data is a primary competency and not a side-activity supporting other processes • Hence specialized skills and know-how are a must • Generalists will create a hopeless mess • Data is difficult: modelling, architecture, and design to support analytics
  • 17. Driving the need for integrated processes, data & architecture Regulatory Demand  Data quality, completeness, lineage and aggregation  Company financial reporting presented at most granular level through various lenses: business units, legal entities, regional, sector level, …  Faster turnaround for increasingly complex ad hoc data requests  Additional sophisticated stress tests delivered in a joined-up manner between Risk and Finance  Enhanced hybrid capital computation methodologies Business Drivers  Better and smarter controls  Capital precision  Better customer experience  Better colleagues experience  More cost effective
  • 18. Data Governance is BORING for most… Policy, governance & control Business ownership & data stewardship Definitions & metadata Permit to build (change governance) Reporting & analytics Data architecture Data sourcing Reconciliation Data quality management Documentation, tracking & audit …but it shouldn’t be!
  • 19. Reality check: So what should users of analytics in Big Data world do? Load the Data into a “Data Lake” Your life will be so much easier as you can now do Data Acrobatics & other amazing data feats
  • 20. The Data Lake – according to Waterline We loaded the Data! Congratulations Now What?
  • 21. From a Data Lake to Amazon Browser Amazon Simple Search Amazon gives you facets Product details
  • 22. Reality check: So where do analysts & Data Scientists spend all their time? Let’s Mine the Data
  • 23. Reality check: So what do Technology people worry about these days? To Hadoop or not to Hadoop? When to use techniques requiring Map-Reduce and grid computing? Typically organizations try to use Map-Reduce for everything to do with Big Data • This is actually very inefficient and often irrational • Certain operations require specialized storage o Updating segment memberships over large numbers of users o Defining new segments on user or usage data
  • 24. Drivers of Hadoop in large enterprises Cost of storage Fastest growing demand is more storage Data in Data Warehouses have traditionally required expensive storage technology: • $20K to $50k per terabyte per year – cost of premium storage • $2K per terabyte – much lower per year – cost of Hadoop on commodity storage
  • 25. Analysis & programming software PIG HIPI
  • 28.
  • 29.
  • 31. Reality check: If storage is biggest driver of Hadoop adoption, what is next biggest? ETL • Replaces expensive licenses • Much higher performance with lower infrastructure costs (processors, memory) • Flexibility in changing schema and representation • Flexibility on taking on unstructured and semi-structured data • Plus suite of really cool tools…
  • 32. Turning the 3-Vs of Big Data into Value Understand context and content • What are appropriate actions? • Is it Ok to associate my brand with this content? • Is content sad?, happy?, serious?, informative? Understand community sentiment • What is the emotion? • Is it negative or positive? • What is the health of my brand online? Understand customer intent? • What is each individual trying to achieve? • Can we predict what to do next? • Critical in cross-sell, personalization, monetization, advertising, etc…
  • 33. Many business uses of predictive analytics Analytic technique Uses in business Marketing and sales Identify potential customers; establish the effectiveness of a Understanding customer behavior model churn, affinities, propensities, … Web analytics & metrics model user preferences from data, collaborative filtering, targeting, Fraud detection Identify fraudulent transactions Credit scoring credit worthiness of a customer requesting a loan, secured and loans Manufacturing process analysis Identify the causes of manufacturing problems Portfolio trading optimize a portfolio of financial instruments by maximizing returns & minimizing risks Healthcare Application fraud detection, cost optimization, detection of events like epidemics, etc... Insurance fraudulent claim detection, risk assessment Security and Surveillance intrusion detection, sensor data analysis, remote sensing, detection, link analysis, etc... Application Log File Analytics Understanding application, network, event logs in IT
  • 34. Case Studies: 1. Context Analysis (unstructured data) 2. IOT Case Study 3. Yahoo! Predictive Modeling
  • 35. Reality Check So who is the company we think is best at handling Big Data?
  • 36. Case Study: Biggest Big Data in advertising? Understanding context for ads What Ad would you place here?
  • 37. Case Study: Biggest Big Data in advertising? Understanding context for ads Damaging to Brand?
  • 38. 38 Invited talk – #ODSC, Boston– Copyright Usama Fayyad © 2015 The Display Ads Challenge Today What Ad would you place here?
  • 39. NetSeer: Solving accuracy issues Ambiguity, waste, brand, safety Why did Google Serve this Ad? 39 this is how NetSeer actually sees this content
  • 40. NetSeer: How it works 40 high MPG ford low emission fuel efficiency ECONOMY CARS economy vehicles microscope lenses reading glasses autofocus bifocal refraction VISION TOOLS eye chart focus groups A/B testing consumer study surveying blind study analytics MARKET RESEARCH ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ WEBSITE.COM ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ electric vehicles service record safety rating focus <CONCEPT> DISCERNS AND MONETIZES HUMAN INTENT + Identifies Concepts expressed on a page + Disambiguates language + Builds increasingly rich profile over time 52M 2.3B CONCEPTS RELATIONSHIPS BETWEEN CONCEPTS
  • 41. NetSeer: Intent for Display Positive vs negative content re a particular topic
  • 42. Problem: Hard to understand user intent Contextual Ad served by Google What NetSeer Sees:
  • 43. Case Studies: 1. Context Analysis (unstructured data) 2. IOT Case Study 3. Yahoo! Predictive Modeling
  • 44. The Connected Cow Joseph Sirosh VP, Machine Learning Microsoft
  • 45.
  • 46.
  • 47.
  • 48.
  • 49.
  • 50.
  • 51.
  • 52.
  • 53.
  • 54.
  • 55.
  • 56.
  • 57.
  • 58.
  • 59. Case Studies: 1. Context Analysis (unstructured data) 2. IOT Case Study 3. Yahoo! Predictive Modeling
  • 60. Yahoo! – One of Largest Destinations on the Web 80% of the U.S. Internet population uses Yahoo! – Over 600 million users per month globally! Global network of content, commerce, media, search and access products 100+ properties including mail, TV, news, shopping, finance, autos, travel, games, movies, health, etc. 25+ terabytes of data collected each day • Representing 1000’s of cataloged consumer behaviors More people visited Yahoo! in the past month than: • Use coupons • Vote • Recycle • Exercise regularly • Have children living at home • Wear sunscreen regularly Sources: Mediamark Research, Spring 2004 and comScore Media Metrix, February 2005. Data is used to develop content, consumer, category and campaign insights for our key content partners and large advertisers
  • 61. Yahoo! Big Data – A league of its own… Terrabytes of Warehoused Data 25 49 94 100 500 1,000 5,000 Amazon Korea Telecom AT&T Y!LiveStor Y!Panama Warehouse Walmart Y!Main warehouse GRAND CHALLENGE PROBLEMS OF DATA PROCESSING TRAVEL, CREDIT CARD PROCESSING, STOCK EXCHANGE, RETAIL, INTERNET Y! Data Challenge Exceeds others by 2 orders of magnitude Millions of Events Processed Per Day 50 120 225 2,000 14,000 SABRE VISA NYSE YSM Y! Global
  • 62. Behavioral Targeting (BT) Search Ad Clicks Content Search Clicks BT Targeting ads to consumers whose recent behaviors online indicate which product category is relevant to them
  • 63. Male, age 32 Lives in SF Lawyer Searched on from London last week Searched on: “Italian restaurant Palo Alto” Checks Yahoo! Mail daily via PC & Phone Has 25 IM Buddies, Moderates 3 Y! Groups, and hosts a 360 page viewed by 10k people Searched on: “Hillary Clinton” Clicked on Sony Plasma TV SS ad Registration Campaign Behavior Unknown Spends 10 hour/week On the internet Purchased Da Vinci Code from Amazon Yahoo! User DNA • On a per consumer basis: maintain a behavioral/interests profile and profitability (user value and LTV) metrics
  • 64. How it works | Network + Interests + Modelling Analyze predictive patterns for purchase cycles in over 100 product categories In each category, build models to describe behaviour most likely to lead to an ad response (i.e. click). Score each user for fit with every category…daily. Target ads to users who get highest ‘relevance’ scores in the targeting categories Varying Product Purchase CyclesMatch Users to the ModelsRewarding Good BehaviourIdentify Most Relevant Users
  • 65. Recency Matters, So Does Intensity Active now… …and with feeling
  • 66. Differentiation | Category specific modelling time intensityscore time intensityscore IntenseClickZone Example 1: Category Automotive Example 2: Category Travel/Last Minute Different models allow us to weight and determine intensity and recency Alt Behaviour 1: 5 pages, 2 search keywords, 1 search click, 1 ad click Alt Behaviour 1: 5 pages, 2 search keywords, 1 search click, 1 ad click IntenseClickZone
  • 67. Differentiation | Category specific modelling time intensityscore Intense Click Zone Example 1: Category Automotive Different models allow us to weight and determine intensity and recency with no further activity, decay takes effect Alt Behaviour 1: 5 pages, 2 search keywords, 1 search click, 1 ad click user is in the Intense Click Zone
  • 68. Automobile Purchase Intender Example A test ad-campaign with a major Euro automobile manufacturer  Designed a test that served the same ad creative to test and control groups on Yahoo  Success metric: performing specific actions on Jaguar website Test results: 900% conversion lift vs. control group  Purchase Intenders were 9 times more likely to configure a vehicle, request a price quote or locate a dealer than consumers in the control group  ~3x higher click through rates vs. control group
  • 69. Mortgage Intender Example We found: 1,900,000 people looking for mortgage loans. +122% CTR Lift Mortgages Home Loans Refinancing Ditech Financing section in Real Estate Mortgage Loans area in Finance Real Estate section in Yellow Pages +626% Conv Lift Example search terms qualified for this target: Example Yahoo! Pages visited: Source: Campaign Click thru Rate lift is determined by Yahoo! Internal research. Conversion is the number of qualified leads from clicks over number of impressions served. Audience size represents theaudiencewithinthis behavioralinterest categorythat hasthe highestpropensitytoengagewitha brandorproductandtoclickon anoffer. Date: March 2006 Results from a client campaign on Yahoo! Network Example: Mortgages
  • 70. Experience summary at Yahoo! • Dealing with one of the largest data sources (25 Terabyte per day) • Behavioral Targeting business was grown from $20M to > $400M in 3 years of investment! • Yahoo! Specific? -- BigData critical to operations – Ad targeting creates huge value – Right teams to build technology (3 years of recruiting) – Search is a BigData problem (but this has moved to mainstream)
  • 71. Lessons Learned A lot more data than qualified talent  Finding talent in BigData is very difficult  Retaining talent in BigData is even harder At Yahoo! we created central group that drove huge value to company Data people need to feel like they have critical mass  Makes it easier to attract the right people  Makes it easier to retain Drive data efforts by business need, not by technology priorities  Chief Data Officer role at Yahoo! – now popular
  • 72. Where does this leave us? 1. What matters in the age of analytics?. • Being able to exploit all the data that is available • Proliferating analytics throughout the organization • Driving significant business value 2. Where are we today? • New Data Landscape via Hadoop • Confusion by marketers, analysts, and technical community • Struggling with basics of managing data because of the new flexibility 3. What are the real issues? • Data management and governance • Talent for Data and Data Science is rare but critical • Data and Analytics are specialisms that need management and know-how: not for generalists…
  • 73. The early days of mass auto production
  • 74. Today’s Auto: It just works! No need to understand what happens when you turn on ignition Very complex inside, but all simplicity on the outside
  • 75. Data Science, AI, Algorithms Data Science Artificial Intelligence / Machine Learning Algorithms Data as a Service (DaaS) to provide our business units with better, faster, cheaper data Algorithms Create leading edge Algorithms for Automation via AI, Machine Learning, and leading edge data science Automation Tech via AI CoE for Intelligent Automation via AI, ML, case-based reasoning & Planning Secure Cloud Data Lab Secure Cloud data lab with open source stack to be able to leverage external datasets and work with the start-ups to innovate at scale Experimentation evaluation Data-fuelled evaluation of experimentation at scale, e.g. digital experiences empowered by data Data Solutions for Customers/ Clients Create a Lab for innovative solutions for customers and Clients KEY ADVANCED CAPABILITIES FRAUD / CYBERSECURITY • Reduced false alarms • Less write-offs • Better intrusion detection, cyber attacks, internal financial crime BUSINESS IMPACT - EXAMPLES RAPID AUTOMATION • Learning algorithms automate and roboticise manual tasks automatically by ‘crowd-sourcing’ • Automated reporting and reduced manual reconciliation SMARTER TRADING ALGORITHMS • Smarter trading algorithms and quantitative analytics enabled by new and more powerful technology TARGETED MARKETING • Leverage automated learning and classification to customer acquisition, retention, and activation REVENUE DRIVER • Leverage social media / unstructured data for Cross-sell, up-sell, next-best-action, CRM, etc.
  • 76. Barclays Local Insights: Providing insights about people www.insights.barclays.co.uk
  • 77. Application Dates: Rising Eagles Graduate Programme: April-June | Explorer Programme: 1 week in July We recruit on a rolling bases & early applications are advised – find out more and apply at joinus.barclays.com/Africa
  • 78. Technology opportunities at Barclays We seek world class technologists, data scientists, problem solvers, Data systems engineers – the team that will reinvent Financial Services Create game changing new products and services for our Africa businesses across retail, business banking, Card, Corporate/IB, and Wealth/Investment management that do not only generate new experiences and growth but new business models. We are looking for hungry data talent that is looking to move the needle in all our businesses. Yassi Hadjibashi Chief Data Officer, Barclays Africa - DSI Africa is a huge growth opportunity for Barclays. If you know your Data Science, We want you! If you love Hadoop & BigData, We love you! And if you want to be part of an industry transformation around Data in Financial Services, come help us make it happen! Dr. Usama Fayyad Group Chief Data Officer, Barclays Bank Security, Data, and Software as code. – Basically everything automated for sophisticated rapid innovation, deployment cycles, and new age software engineering. Come join our journey and drive this with backing of most senior leadership and attention!!! Peter Rix Chief Technology Officer, Barclays Africa
  • 79. Thank you & questions Usama Fayyad Geraldine.Bolton@barclays.com @Usamaf joinus.barclays.com/ Africa Yasaman.Hadjibashi@barclayscorp.com Pieter.Vorster@absacapital.com Ekow.Duker@barclaysafrica.com