4. 4
Data Growth
Source: IDC - sponsored by EMC. As the Economy Contracts, the Digital Universe Expands. May 2009
Transactions
Interactions1024
1021
1018
1015
1012
109
Yottabyte
Zettabyte
Exabyte
Petabyte
Terabyte
Gigabyte
5. 5
IoT, Sensors, and Tiny Computers
Internet of Things
Sensors
Embedded
computers
Industry Specific
(eg CatScans, etc.)
Data Center
systems
6. 6
The Quantified Self
• A new wave of devices will help you track health statistics but security
and privacy concerns loom for this health “Big Data”
Google Glass Jawbone UP Ingestible
Sensors
7. 7
• Events generating data
– Vibration
– Temperature, humidity
– Wind speed, direction
– Air/liquid flow or pressure
– Location, navigation
– Tilt level, rotation
– Light, sound
– Radiation, chemicals
– Biological
- Heart rate, blood pressure
- Brain activity, chemicals
– Inventory, sales (RFID)
• Data format: JSON or proprietary
The Data Sensors Collect
8. 8
Gartner: Growth of the Internet of Things
Source: Gartner, Forecast: The Internet of Things, Worldwide, 2013, Nov 2013
Billions of Things in Use
Connected PC, smartphone, tablet IoT
0
5
10
15
20
25
30
2009
2020
9. 9
HOW TERADATA
HAS POWERED
OUR CONNECTED
CAR VISION
• Vital support in
product development
• Ensuring product quality
and functionality
• Enabling innovation of new
products and services for the
connected consumer
• Manage customer interactions
Video Case Link
10. 10
Applications of Analytics at Volvo
• Customer centric
experience – connected
cars and hazards ahead
• Dealer Analytics –
Scorecards and Action Plan
• Warranty/ Repair/ Failure
Predictions
• Add New Services
• Featured Used/Not Used-
learn for the future
• Buy and customize online –
most of sales activity is
done prior to walk-in’s…
“Gamechanging”Connected car Driver Model
12. 12
Big Data in the EDW
• An Enterprise Data Warehouse is a Centralized and Historical repository of
Integrated, Detailed and Enriched data that supports multiple decision-
making applications for multiple groups and is the single source of analytics
data for the enterprise.
Transactional
Systems
Users
Enterprise Data Warehouse
• Accts. Payable/Rec
• Sales/Orders
• Finance G/L
• HR
• Payroll
• Purchasing
• Manufacturing
• Inventory
13. 13
> Find Teradata R&D Facility
– 17095 Via Del Campo
– San Diego, CA 92127
Teradata
> Use Geospatial coordinates
– 33° 01’20.90” N
– -117° 05’33.75” W
33° 01’20.90”
-117° 05’33.75”
What is Geospatial?
• It’s Location Data and Analysis
– New data type that captures
the exact location
- Latitude (horizontal)
- Longitude (vertical)
14. 14
Customer
My Store
Competitors
What Can I do With Geospatial Data?
• ST_Geometry functions…
– Measurements
- Distance, surface, perimeter…
– Relationship between two objects
- Intersect, contains, within, adjacent…
• Real-world applications?
– Calculate the distance between
customers and my store
– Do I have customers within a
10-mile radius?
– Identify customers who overlap with
my competitor
15. 15
• Which customers should I target for my campaign?
– Typical data
- Demographic information
- Sales history (RFM)
- Customer segmentation
- Customer loyalty
– Enhanced with geospatial data
- How far will customers drive to
purchase my product?
- Which of my competitor’s
customers can I draw to my
store with an aggressive
campaign?
- Which customers live close
to my store?
Customer
Profile
• Demographic
• Recency
• Frequency
• Monetary value
• Segment
• Loyalty score
• Price sensitivity
Geospatial Intelligence
• Willing to drive 30 miles for
25% discount
• Lives 25 miles away from Store
ID: 143
• Lives within 10 miles from my
competitors
Integrate Geospatial and Customer Data
Target Marketing
16. 16
Insurance companies can use height of water to determine if & how much
a customer was affected by a flood
Planogram* analytics, where we want to analyze performance of shelf
space by height of their placement - x, y, and z coordinates)
City planning(WSJ says number of tall buildings 60 stories or more will double
in next 10 years)
Oil exploration (locating oil reserves – depth as z coordinate)
3D Geospatial Use Cases
*Planogram are visual representations of a store's products or services
17. 17
• With geospatial capabilities the
USAF knows where every aircraft,
piece of equipment and part is
located and where it’s been—
anywhere in the world.
• 100 sources into Teradata then a
Google maps overlay
• Can see:
– Inventory control including drill
down capabilities to the part and
supply level.
– Where an asset’s been
– Monitor exceptions in real time -
to track movement of materials,
vehicles, commodities and assets.
– Proximity analysis – are assets we
need nearby and available?
Photo Credit: Flickr. Creative Commons. By Prayitno
USAF and Geospatial
19. 19
• Discovery as a “process”*:
– PoC/experimentation (8-10 weeks)
– Rapid modeling –before scaling out on a
global basis
– Freedom to experiment without impacting
production systems
• Types of discovery analysis:
– Customer Path
– Fraud
– Social Network
– Attrition
– Online testing/targeting
• Go beyond expensive data scientists and
“democratize” discovery
What is Data Discovery?
Fraudulent Paths
Customer Paths To Attrition
* Content Courtesy of
Thomas Davenport
21. 21
Relevant data - several million train sensor observations and
several thousand engineer’s reports – and their preparation…
…sensor readings can be categorised
according to various threshold levels –
understanding the relevant thresholds
requires domain expertise.
Engineer’s reports describe failure
incidents and root-cause – but they
must first be digitised and entity
extraction techniques applied to them
before they yield data that can be
compared with sensor observations…
22. 22
• Nodes represent single repair codes;
• A line between nodes means that the two connected repair codes have appeared in the same
train at least once (thicker lines mean more occurrences);
• This analysis supports the identification of components that fail in combination - and variables
that are likely to be useful in predicting the target variable (failure of a train).
…using path and graph Analytics…
23. 23
• Pathing the predictive variables identified in the affinity analysis leads to further insight;
• For example, a daily pattern of Engine Temperature readings of mid – low – mid often appears 3
days ahead of engine failure.
…exploring the “path to failure”…
26. 26
Multi-Structured Examples
• Raw Click Stream Data
• Other multi-structured data examples:
• Images, text files, PDFs, sensor data, Word
documents
27. 27
What is a Data Lake?
A data lake is a collection of long term data containers that capture,
refine, and explore any form of raw data at scale, enabled by low cost
technologies, from which multiple downstream facilities may draw upon.
Data sources Downstream
Sensors email
TransactionsMachine logs
Geolocation Media
BI Tools IDW
Data Marts Analysis
Apps Other
Data LakeData Lake
C
28. 28
Benefits of Hadoop
• Runs on 10 to 4,000 servers
– Extreme scalability
• Data analyzed where it is stored
– Move function to data
– Don’t move data to the function
• Use popular developer tools
– Java, grep, python, etc.
• Average programmers do parallel processing
– Millions of Java programmers
• All open source (free)
29. 29
What Yahoo! Does with Hadoop
• ≈42,000 machines running Hadoop
• Largest Hadoop clusters are currently 4000 nodes
• Several petabytes of user data (compressed, unreplicated)
• Run hundreds of thousands of jobs every month
• News stories on home page
30. 30
There’s No Technology Silver Bullet
Source: eBay, eBay Extreme Analytics in a Virtual World, Nov 10,2010
Permission to use publicly granted by eBay.