SlideShare une entreprise Scribd logo
1  sur  33
Télécharger pour lire hors ligne
​Harnessing Big Data
​Methods and Cases
​Paul Barsch, Marketing Director
2 © 2014 Teradata
Agenda
• What is Big Data?
• IoT – How it’s collected and Application of Analytics at Volvo
• Methods to Harness Big Data
o Integrated Data Warehouse and Geospatial at USAF
o Data Discovery Processes with Siemens
o Using Hadoop as a Data Lake with Yahoo
o There’s No Technology Silver Bullet
3
Big Data – The Key is Variety!
© 2014 Teradata
Definition: Datasets so complex and large that they are
awkward to work with using standard tools and techniques
Location Social Images Weblogs Videos Text Audio Sensor
Size is not what is most important; it’s variety
4
Data Growth
Source: IDC - sponsored by EMC. As the Economy Contracts, the Digital Universe Expands. May 2009
Transactions
Interactions1024
1021
1018
1015
1012
109
Yottabyte
Zettabyte
Exabyte
Petabyte
Terabyte
Gigabyte
5
IoT, Sensors, and Tiny Computers
Internet of Things
Sensors
Embedded
computers
Industry Specific
(eg CatScans, etc.)
Data Center
systems
6
The Quantified Self
• A new wave of devices will help you track health statistics but security
and privacy concerns loom for this health “Big Data”
Google Glass Jawbone UP Ingestible
Sensors
7
• Events generating data
– Vibration
– Temperature, humidity
– Wind speed, direction
– Air/liquid flow or pressure
– Location, navigation
– Tilt level, rotation
– Light, sound
– Radiation, chemicals
– Biological
- Heart rate, blood pressure
- Brain activity, chemicals
– Inventory, sales (RFID)
• Data format: JSON or proprietary
The Data Sensors Collect
8
Gartner: Growth of the Internet of Things
Source: Gartner, Forecast: The Internet of Things, Worldwide, 2013, Nov 2013
Billions of Things in Use
Connected PC, smartphone, tablet IoT
0
5
10
15
20
25
30
2009
2020
9
HOW TERADATA
HAS POWERED
OUR CONNECTED
CAR VISION
• Vital support in
product development
• Ensuring product quality
and functionality
• Enabling innovation of new
products and services for the
connected consumer
• Manage customer interactions
Video Case Link
10
Applications of Analytics at Volvo
• Customer centric
experience – connected
cars and hazards ahead
• Dealer Analytics –
Scorecards and Action Plan
• Warranty/ Repair/ Failure
Predictions
• Add New Services
• Featured Used/Not Used-
learn for the future
• Buy and customize online –
most of sales activity is
done prior to walk-in’s…
“Gamechanging”Connected car Driver Model
​Method #1
​The Advanced Data Warehouse
12
Big Data in the EDW
• An Enterprise Data Warehouse is a Centralized and Historical repository of
Integrated, Detailed and Enriched data that supports multiple decision-
making applications for multiple groups and is the single source of analytics
data for the enterprise.
Transactional
Systems
Users
Enterprise Data Warehouse
• Accts. Payable/Rec
• Sales/Orders
• Finance G/L
• HR
• Payroll
• Purchasing
• Manufacturing
• Inventory
13
> Find Teradata R&D Facility
– 17095 Via Del Campo
– San Diego, CA 92127
Teradata
> Use Geospatial coordinates
– 33° 01’20.90” N
– -117° 05’33.75” W
33° 01’20.90”
-117° 05’33.75”
What is Geospatial?
• It’s Location Data and Analysis
– New data type that captures
the exact location
- Latitude (horizontal)
- Longitude (vertical)
14
Customer
My Store
Competitors
What Can I do With Geospatial Data?
• ST_Geometry functions…
– Measurements
- Distance, surface, perimeter…
– Relationship between two objects
- Intersect, contains, within, adjacent…
• Real-world applications?
– Calculate the distance between
customers and my store
– Do I have customers within a
10-mile radius?
– Identify customers who overlap with
my competitor
15
• Which customers should I target for my campaign?
– Typical data
- Demographic information
- Sales history (RFM)
- Customer segmentation
- Customer loyalty
– Enhanced with geospatial data
- How far will customers drive to
purchase my product?
- Which of my competitor’s
customers can I draw to my
store with an aggressive
campaign?
- Which customers live close
to my store?
Customer
Profile
• Demographic
• Recency
• Frequency
• Monetary value
• Segment
• Loyalty score
• Price sensitivity
Geospatial Intelligence
• Willing to drive 30 miles for
25% discount
• Lives 25 miles away from Store
ID: 143
• Lives within 10 miles from my
competitors
Integrate Geospatial and Customer Data
Target Marketing
16
 Insurance companies can use height of water to determine if & how much
a customer was affected by a flood
 Planogram* analytics, where we want to analyze performance of shelf
space by height of their placement - x, y, and z coordinates)
 City planning(WSJ says number of tall buildings 60 stories or more will double
in next 10 years)
 Oil exploration (locating oil reserves – depth as z coordinate)
3D Geospatial Use Cases
*Planogram are visual representations of a store's products or services
17
• With geospatial capabilities the
USAF knows where every aircraft,
piece of equipment and part is
located and where it’s been—
anywhere in the world.
• 100 sources into Teradata then a
Google maps overlay
• Can see:
– Inventory control including drill
down capabilities to the part and
supply level.
– Where an asset’s been
– Monitor exceptions in real time -
to track movement of materials,
vehicles, commodities and assets.
– Proximity analysis – are assets we
need nearby and available?
Photo Credit: Flickr. Creative Commons. By Prayitno
USAF and Geospatial
​Method #2
​Data Discovery
19
• Discovery as a “process”*:
– PoC/experimentation (8-10 weeks)
– Rapid modeling –before scaling out on a
global basis
– Freedom to experiment without impacting
production systems
• Types of discovery analysis:
– Customer Path
– Fraud
– Social Network
– Attrition
– Online testing/targeting
• Go beyond expensive data scientists and
“democratize” discovery
What is Data Discovery?
Fraudulent Paths
Customer Paths To Attrition
* Content Courtesy of
Thomas Davenport
20
Siemens : predicting train-set failure
Link to Video Case Study
21
Relevant data - several million train sensor observations and
several thousand engineer’s reports – and their preparation…
…sensor readings can be categorised
according to various threshold levels –
understanding the relevant thresholds
requires domain expertise.
Engineer’s reports describe failure
incidents and root-cause – but they
must first be digitised and entity
extraction techniques applied to them
before they yield data that can be
compared with sensor observations…
22
• Nodes represent single repair codes;
• A line between nodes means that the two connected repair codes have appeared in the same
train at least once (thicker lines mean more occurrences);
• This analysis supports the identification of components that fail in combination - and variables
that are likely to be useful in predicting the target variable (failure of a train).
…using path and graph Analytics…
23
• Pathing the predictive variables identified in the affinity analysis leads to further insight;
• For example, a daily pattern of Engine Temperature readings of mid – low – mid often appears 3
days ahead of engine failure.
…exploring the “path to failure”…
​Method #3
​Advanced Analytics on Non-Relational Data
25
Non-Normalized: Big Data
Driving Value From Big Data
Social MediaCustomer Interactions
(Clickstream, CDR etc)
Images/Docs
26
Multi-Structured Examples
• Raw Click Stream Data
• Other multi-structured data examples:
• Images, text files, PDFs, sensor data, Word
documents
27
What is a Data Lake?
A data lake is a collection of long term data containers that capture,
refine, and explore any form of raw data at scale, enabled by low cost
technologies, from which multiple downstream facilities may draw upon.
Data sources Downstream
Sensors email
TransactionsMachine logs
Geolocation Media
BI Tools IDW
Data Marts Analysis
Apps Other
Data LakeData Lake
C
28
Benefits of Hadoop
• Runs on 10 to 4,000 servers
– Extreme scalability
• Data analyzed where it is stored
– Move function to data
– Don’t move data to the function
• Use popular developer tools
– Java, grep, python, etc.
• Average programmers do parallel processing
– Millions of Java programmers
• All open source (free)
29
What Yahoo! Does with Hadoop
• ≈42,000 machines running Hadoop
• Largest Hadoop clusters are currently 4000 nodes
• Several petabytes of user data (compressed, unreplicated)
• Run hundreds of thousands of jobs every month
• News stories on home page
30
There’s No Technology Silver Bullet
Source: eBay, eBay Extreme Analytics in a Virtual World, Nov 10,2010
Permission to use publicly granted by eBay.
31 © 2014 Teradata
Summary
• What is Big Data?
• IoT – How it’s collected and Application of Analytics at Volvo
• Methods to Harness Big Data
o Integrated Data Warehouse and Geospatial at USAF
o Data Discovery Processes with Siemens
o Using Hadoop as a Data Lake with Yahoo
o There’s No Technology Silver Bullet
32
32
Paul Barsch
Marketing Director, Teradata
https://www.linkedin.com/in/barsch
Twitter @paul_a_barsch
Let’s Connect
33
Questions
and Answers
Thank You!

Contenu connexe

Tendances

02 a holistic approach to big data
02 a holistic approach to big data02 a holistic approach to big data
02 a holistic approach to big data
Raul Chong
 
Internet of Things and Big Data: Vision and Concrete Use Cases
Internet of Things and Big Data: Vision and Concrete Use CasesInternet of Things and Big Data: Vision and Concrete Use Cases
Internet of Things and Big Data: Vision and Concrete Use Cases
MongoDB
 
Arpan pal tac tics2012
Arpan pal tac tics2012Arpan pal tac tics2012
Arpan pal tac tics2012
Arpan Pal
 
Key Data Management Requirements for the IoT
Key Data Management Requirements for the IoTKey Data Management Requirements for the IoT
Key Data Management Requirements for the IoT
MongoDB
 

Tendances (20)

02 a holistic approach to big data
02 a holistic approach to big data02 a holistic approach to big data
02 a holistic approach to big data
 
Identifying the new frontier of big data as an enabler for T&T industries: Re...
Identifying the new frontier of big data as an enabler for T&T industries: Re...Identifying the new frontier of big data as an enabler for T&T industries: Re...
Identifying the new frontier of big data as an enabler for T&T industries: Re...
 
Internet of Things and Big Data: Vision and Concrete Use Cases
Internet of Things and Big Data: Vision and Concrete Use CasesInternet of Things and Big Data: Vision and Concrete Use Cases
Internet of Things and Big Data: Vision and Concrete Use Cases
 
Internet of things, Big Data and Analytics 101
Internet of things, Big Data and Analytics 101Internet of things, Big Data and Analytics 101
Internet of things, Big Data and Analytics 101
 
Big data session five ( a )f
Big data session five ( a )fBig data session five ( a )f
Big data session five ( a )f
 
IBM Software Day 2013. Smarter analytics and big data. building the next gene...
IBM Software Day 2013. Smarter analytics and big data. building the next gene...IBM Software Day 2013. Smarter analytics and big data. building the next gene...
IBM Software Day 2013. Smarter analytics and big data. building the next gene...
 
Fraud and Risk in Big Data
Fraud and Risk in Big DataFraud and Risk in Big Data
Fraud and Risk in Big Data
 
IoT and Big Data - Iot Asia 2014
IoT and Big Data - Iot Asia 2014IoT and Big Data - Iot Asia 2014
IoT and Big Data - Iot Asia 2014
 
Fight Fraud with Big Data Analytics
Fight Fraud with Big Data AnalyticsFight Fraud with Big Data Analytics
Fight Fraud with Big Data Analytics
 
Arpan pal tac tics2012
Arpan pal tac tics2012Arpan pal tac tics2012
Arpan pal tac tics2012
 
Predictive maintenance withsensors_in_utilities_
Predictive maintenance withsensors_in_utilities_Predictive maintenance withsensors_in_utilities_
Predictive maintenance withsensors_in_utilities_
 
Iot data analytics
Iot data analyticsIot data analytics
Iot data analytics
 
Study: #Big Data in #Austria
Study: #Big Data in #AustriaStudy: #Big Data in #Austria
Study: #Big Data in #Austria
 
Phoenix Data Conference - Big Data Analytics for IoT 11/4/17
Phoenix Data Conference - Big Data Analytics for IoT 11/4/17Phoenix Data Conference - Big Data Analytics for IoT 11/4/17
Phoenix Data Conference - Big Data Analytics for IoT 11/4/17
 
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT StrategyIoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
 
Key Data Management Requirements for the IoT
Key Data Management Requirements for the IoTKey Data Management Requirements for the IoT
Key Data Management Requirements for the IoT
 
Big Data : Risks and Opportunities
Big Data : Risks and OpportunitiesBig Data : Risks and Opportunities
Big Data : Risks and Opportunities
 
Interpretation Tool for Genomic Sequencing Data in Clinical Environments
Interpretation Tool for Genomic Sequencing Data in Clinical EnvironmentsInterpretation Tool for Genomic Sequencing Data in Clinical Environments
Interpretation Tool for Genomic Sequencing Data in Clinical Environments
 
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATATIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
 
Research Problem Presentation - Research in Supply Chain Digital Twins
Research Problem Presentation - Research in Supply Chain Digital TwinsResearch Problem Presentation - Research in Supply Chain Digital Twins
Research Problem Presentation - Research in Supply Chain Digital Twins
 

Similaire à Harnessing Big Data_UCLA

Platforming the Major Analytic Use Cases for Modern Engineering
Platforming the Major Analytic Use Cases for Modern EngineeringPlatforming the Major Analytic Use Cases for Modern Engineering
Platforming the Major Analytic Use Cases for Modern Engineering
DATAVERSITY
 

Similaire à Harnessing Big Data_UCLA (20)

Vertica Analytics Database general overview
Vertica Analytics Database general overviewVertica Analytics Database general overview
Vertica Analytics Database general overview
 
Big data Introduction by Mohan
Big data Introduction by MohanBig data Introduction by Mohan
Big data Introduction by Mohan
 
Getting started with Hadoop on the Cloud with Bluemix
Getting started with Hadoop on the Cloud with BluemixGetting started with Hadoop on the Cloud with Bluemix
Getting started with Hadoop on the Cloud with Bluemix
 
bigdataintro.pptx
bigdataintro.pptxbigdataintro.pptx
bigdataintro.pptx
 
WWV2015: Jibes Paul van der Hulst big data
WWV2015: Jibes Paul van der Hulst big dataWWV2015: Jibes Paul van der Hulst big data
WWV2015: Jibes Paul van der Hulst big data
 
Barga ACM DEBS 2013 Keynote
Barga ACM DEBS 2013 KeynoteBarga ACM DEBS 2013 Keynote
Barga ACM DEBS 2013 Keynote
 
Cloud-Based Big Data Analytics
Cloud-Based Big Data AnalyticsCloud-Based Big Data Analytics
Cloud-Based Big Data Analytics
 
Platforming the Major Analytic Use Cases for Modern Engineering
Platforming the Major Analytic Use Cases for Modern EngineeringPlatforming the Major Analytic Use Cases for Modern Engineering
Platforming the Major Analytic Use Cases for Modern Engineering
 
Lecture 1-big data engineering (Introduction).pdf
Lecture 1-big data engineering (Introduction).pdfLecture 1-big data engineering (Introduction).pdf
Lecture 1-big data engineering (Introduction).pdf
 
Kaushal Amin & Big 5 IT trends in the world
Kaushal Amin & Big 5 IT trends in the worldKaushal Amin & Big 5 IT trends in the world
Kaushal Amin & Big 5 IT trends in the world
 
Technology Trends and Big Data in 2013-2014
Technology Trends and Big Data in 2013-2014Technology Trends and Big Data in 2013-2014
Technology Trends and Big Data in 2013-2014
 
Analytics&IoT
Analytics&IoTAnalytics&IoT
Analytics&IoT
 
predictive analysis and usage in procurement ppt 2017
predictive analysis and usage in procurement  ppt 2017predictive analysis and usage in procurement  ppt 2017
predictive analysis and usage in procurement ppt 2017
 
Digital Business Transformation for Energy & Utility company
Digital Business Transformation for Energy & Utility companyDigital Business Transformation for Energy & Utility company
Digital Business Transformation for Energy & Utility company
 
How Startups can leverage big data?
How Startups can leverage big data?How Startups can leverage big data?
How Startups can leverage big data?
 
AI in the Enterprise: Past, Present & Future - StampedeCon AI Summit 2017
AI in the Enterprise: Past,  Present &  Future - StampedeCon AI Summit 2017AI in the Enterprise: Past,  Present &  Future - StampedeCon AI Summit 2017
AI in the Enterprise: Past, Present & Future - StampedeCon AI Summit 2017
 
Igniting Audience Measurement at Time Warner Cable
Igniting Audience Measurement at Time Warner CableIgniting Audience Measurement at Time Warner Cable
Igniting Audience Measurement at Time Warner Cable
 
Denodo DataFest 2017: Conquering the Edge with Data Virtualization
Denodo DataFest 2017: Conquering the Edge with Data VirtualizationDenodo DataFest 2017: Conquering the Edge with Data Virtualization
Denodo DataFest 2017: Conquering the Edge with Data Virtualization
 
Customer value analysis of big data products
Customer value analysis of big data productsCustomer value analysis of big data products
Customer value analysis of big data products
 
Subscribed 2015: The Explosion of Smart Connected Things
Subscribed 2015: The Explosion of Smart Connected ThingsSubscribed 2015: The Explosion of Smart Connected Things
Subscribed 2015: The Explosion of Smart Connected Things
 

Plus de Paul Barsch

Plus de Paul Barsch (9)

What’s your perspective
What’s your perspectiveWhat’s your perspective
What’s your perspective
 
UCSD: Building a Big Data Culture - It Takes a Village
UCSD: Building a Big Data Culture - It Takes a VillageUCSD: Building a Big Data Culture - It Takes a Village
UCSD: Building a Big Data Culture - It Takes a Village
 
Internet of Things and the Value of Tracking Everything
Internet of Things and the Value of Tracking EverythingInternet of Things and the Value of Tracking Everything
Internet of Things and the Value of Tracking Everything
 
The Limits of Statistics in Business
The Limits of Statistics in BusinessThe Limits of Statistics in Business
The Limits of Statistics in Business
 
Introduction to Harnessing Big Data
Introduction to Harnessing Big DataIntroduction to Harnessing Big Data
Introduction to Harnessing Big Data
 
Lecture three skills to thrive in new economy slideshare
Lecture three skills to thrive in new economy slideshareLecture three skills to thrive in new economy slideshare
Lecture three skills to thrive in new economy slideshare
 
Surviving The Corporate World - 4 Lessons Learned
Surviving The Corporate World - 4 Lessons LearnedSurviving The Corporate World - 4 Lessons Learned
Surviving The Corporate World - 4 Lessons Learned
 
MBA Lecture: Supply Chain Risk Management
MBA Lecture: Supply Chain Risk ManagementMBA Lecture: Supply Chain Risk Management
MBA Lecture: Supply Chain Risk Management
 
Boundaryless Marketing
Boundaryless MarketingBoundaryless Marketing
Boundaryless Marketing
 

Harnessing Big Data_UCLA

  • 1. ​Harnessing Big Data ​Methods and Cases ​Paul Barsch, Marketing Director
  • 2. 2 © 2014 Teradata Agenda • What is Big Data? • IoT – How it’s collected and Application of Analytics at Volvo • Methods to Harness Big Data o Integrated Data Warehouse and Geospatial at USAF o Data Discovery Processes with Siemens o Using Hadoop as a Data Lake with Yahoo o There’s No Technology Silver Bullet
  • 3. 3 Big Data – The Key is Variety! © 2014 Teradata Definition: Datasets so complex and large that they are awkward to work with using standard tools and techniques Location Social Images Weblogs Videos Text Audio Sensor Size is not what is most important; it’s variety
  • 4. 4 Data Growth Source: IDC - sponsored by EMC. As the Economy Contracts, the Digital Universe Expands. May 2009 Transactions Interactions1024 1021 1018 1015 1012 109 Yottabyte Zettabyte Exabyte Petabyte Terabyte Gigabyte
  • 5. 5 IoT, Sensors, and Tiny Computers Internet of Things Sensors Embedded computers Industry Specific (eg CatScans, etc.) Data Center systems
  • 6. 6 The Quantified Self • A new wave of devices will help you track health statistics but security and privacy concerns loom for this health “Big Data” Google Glass Jawbone UP Ingestible Sensors
  • 7. 7 • Events generating data – Vibration – Temperature, humidity – Wind speed, direction – Air/liquid flow or pressure – Location, navigation – Tilt level, rotation – Light, sound – Radiation, chemicals – Biological - Heart rate, blood pressure - Brain activity, chemicals – Inventory, sales (RFID) • Data format: JSON or proprietary The Data Sensors Collect
  • 8. 8 Gartner: Growth of the Internet of Things Source: Gartner, Forecast: The Internet of Things, Worldwide, 2013, Nov 2013 Billions of Things in Use Connected PC, smartphone, tablet IoT 0 5 10 15 20 25 30 2009 2020
  • 9. 9 HOW TERADATA HAS POWERED OUR CONNECTED CAR VISION • Vital support in product development • Ensuring product quality and functionality • Enabling innovation of new products and services for the connected consumer • Manage customer interactions Video Case Link
  • 10. 10 Applications of Analytics at Volvo • Customer centric experience – connected cars and hazards ahead • Dealer Analytics – Scorecards and Action Plan • Warranty/ Repair/ Failure Predictions • Add New Services • Featured Used/Not Used- learn for the future • Buy and customize online – most of sales activity is done prior to walk-in’s… “Gamechanging”Connected car Driver Model
  • 12. 12 Big Data in the EDW • An Enterprise Data Warehouse is a Centralized and Historical repository of Integrated, Detailed and Enriched data that supports multiple decision- making applications for multiple groups and is the single source of analytics data for the enterprise. Transactional Systems Users Enterprise Data Warehouse • Accts. Payable/Rec • Sales/Orders • Finance G/L • HR • Payroll • Purchasing • Manufacturing • Inventory
  • 13. 13 > Find Teradata R&D Facility – 17095 Via Del Campo – San Diego, CA 92127 Teradata > Use Geospatial coordinates – 33° 01’20.90” N – -117° 05’33.75” W 33° 01’20.90” -117° 05’33.75” What is Geospatial? • It’s Location Data and Analysis – New data type that captures the exact location - Latitude (horizontal) - Longitude (vertical)
  • 14. 14 Customer My Store Competitors What Can I do With Geospatial Data? • ST_Geometry functions… – Measurements - Distance, surface, perimeter… – Relationship between two objects - Intersect, contains, within, adjacent… • Real-world applications? – Calculate the distance between customers and my store – Do I have customers within a 10-mile radius? – Identify customers who overlap with my competitor
  • 15. 15 • Which customers should I target for my campaign? – Typical data - Demographic information - Sales history (RFM) - Customer segmentation - Customer loyalty – Enhanced with geospatial data - How far will customers drive to purchase my product? - Which of my competitor’s customers can I draw to my store with an aggressive campaign? - Which customers live close to my store? Customer Profile • Demographic • Recency • Frequency • Monetary value • Segment • Loyalty score • Price sensitivity Geospatial Intelligence • Willing to drive 30 miles for 25% discount • Lives 25 miles away from Store ID: 143 • Lives within 10 miles from my competitors Integrate Geospatial and Customer Data Target Marketing
  • 16. 16  Insurance companies can use height of water to determine if & how much a customer was affected by a flood  Planogram* analytics, where we want to analyze performance of shelf space by height of their placement - x, y, and z coordinates)  City planning(WSJ says number of tall buildings 60 stories or more will double in next 10 years)  Oil exploration (locating oil reserves – depth as z coordinate) 3D Geospatial Use Cases *Planogram are visual representations of a store's products or services
  • 17. 17 • With geospatial capabilities the USAF knows where every aircraft, piece of equipment and part is located and where it’s been— anywhere in the world. • 100 sources into Teradata then a Google maps overlay • Can see: – Inventory control including drill down capabilities to the part and supply level. – Where an asset’s been – Monitor exceptions in real time - to track movement of materials, vehicles, commodities and assets. – Proximity analysis – are assets we need nearby and available? Photo Credit: Flickr. Creative Commons. By Prayitno USAF and Geospatial
  • 19. 19 • Discovery as a “process”*: – PoC/experimentation (8-10 weeks) – Rapid modeling –before scaling out on a global basis – Freedom to experiment without impacting production systems • Types of discovery analysis: – Customer Path – Fraud – Social Network – Attrition – Online testing/targeting • Go beyond expensive data scientists and “democratize” discovery What is Data Discovery? Fraudulent Paths Customer Paths To Attrition * Content Courtesy of Thomas Davenport
  • 20. 20 Siemens : predicting train-set failure Link to Video Case Study
  • 21. 21 Relevant data - several million train sensor observations and several thousand engineer’s reports – and their preparation… …sensor readings can be categorised according to various threshold levels – understanding the relevant thresholds requires domain expertise. Engineer’s reports describe failure incidents and root-cause – but they must first be digitised and entity extraction techniques applied to them before they yield data that can be compared with sensor observations…
  • 22. 22 • Nodes represent single repair codes; • A line between nodes means that the two connected repair codes have appeared in the same train at least once (thicker lines mean more occurrences); • This analysis supports the identification of components that fail in combination - and variables that are likely to be useful in predicting the target variable (failure of a train). …using path and graph Analytics…
  • 23. 23 • Pathing the predictive variables identified in the affinity analysis leads to further insight; • For example, a daily pattern of Engine Temperature readings of mid – low – mid often appears 3 days ahead of engine failure. …exploring the “path to failure”…
  • 24. ​Method #3 ​Advanced Analytics on Non-Relational Data
  • 25. 25 Non-Normalized: Big Data Driving Value From Big Data Social MediaCustomer Interactions (Clickstream, CDR etc) Images/Docs
  • 26. 26 Multi-Structured Examples • Raw Click Stream Data • Other multi-structured data examples: • Images, text files, PDFs, sensor data, Word documents
  • 27. 27 What is a Data Lake? A data lake is a collection of long term data containers that capture, refine, and explore any form of raw data at scale, enabled by low cost technologies, from which multiple downstream facilities may draw upon. Data sources Downstream Sensors email TransactionsMachine logs Geolocation Media BI Tools IDW Data Marts Analysis Apps Other Data LakeData Lake C
  • 28. 28 Benefits of Hadoop • Runs on 10 to 4,000 servers – Extreme scalability • Data analyzed where it is stored – Move function to data – Don’t move data to the function • Use popular developer tools – Java, grep, python, etc. • Average programmers do parallel processing – Millions of Java programmers • All open source (free)
  • 29. 29 What Yahoo! Does with Hadoop • ≈42,000 machines running Hadoop • Largest Hadoop clusters are currently 4000 nodes • Several petabytes of user data (compressed, unreplicated) • Run hundreds of thousands of jobs every month • News stories on home page
  • 30. 30 There’s No Technology Silver Bullet Source: eBay, eBay Extreme Analytics in a Virtual World, Nov 10,2010 Permission to use publicly granted by eBay.
  • 31. 31 © 2014 Teradata Summary • What is Big Data? • IoT – How it’s collected and Application of Analytics at Volvo • Methods to Harness Big Data o Integrated Data Warehouse and Geospatial at USAF o Data Discovery Processes with Siemens o Using Hadoop as a Data Lake with Yahoo o There’s No Technology Silver Bullet
  • 32. 32 32 Paul Barsch Marketing Director, Teradata https://www.linkedin.com/in/barsch Twitter @paul_a_barsch Let’s Connect