What Drives the Car Business: Moving from Anecdotes to Data

THE CAR BUSINESS
MOVING FROM ANECDOTES TO DATA
WHAT DRIVES

WHO WE ARE
 TrueCar’s mission is to prove that truth
and transparency is a more profitable way
of doing business – starting with
automotive.
 The TrueCar Platform allows for data to be
dissected and transformed into easily
digestible and usable purchasing tools for
the consumer. So you can be a first-time car
buyer — you don’t have to be an expert —
and actually understand the difference
between a bad price, a fair price and a great
price.
 www.TrueCar.com, TRUE.com,
NASDAQ: TRUE

2.4%
3
$65M
ABOUT US
JOHN WILLIAMS, SVP PLATFORM OPERATIONS
RUSSELL FOLTZ-SMITH, VP DATA PLATFORM
Russ is the VP of Data Platform at TrueCar.com, where he creates the intelligence
systems driving TrueCar’s innovative interactive product set. Prior to TrueCar, he
held executive, product and technical leadership positions at category leaders like
IAC, Grind Networks, and Wolfram|Alpha. Russ holds a degree in mathematics
from the University of Chicago and currently lives in Marina Del Rey, CA with his
wife and two daughters.
John Williams is the SVP, Platform Operations of TrueCar. John has over 20 years
of experience designing, building and operating large scale Internet infrastructure.
John joined TrueCar in March 2011. John is responsible for the technology,
security and operations strategy that facilitates explosive growth while still meeting
strict requirements for performance, security and reliability. Before joining TrueCar,
John was retained as a consultant by numerous world-class technology, financial
services, entertainment, military and government organizations. Previously, John
was the CTO and co-founder of Preventsys (acquired by McAfee) where he
created the world’s first automated security policy compliance system for large
enterprise networks. Prior to that he founded and led the network penetration
testing team for Internet security pioneer Trusted Information Systems. At the start
of his career, John co-founded and built one of New York City’s first Internet
Service Providers.

2.4%
4
OUR CORE
SERVICE
Provide Interactive
Transaction Guidance to
Consumers via Web,
Mobile
PAY PER SALE
Revenue Model
CONSUMERS INDUSTRY
Provide Interactive
Transaction Tools to
OEMs, Dealers via Web,
Mobile

2.4%
6
THE SITUATION
INCREASING DATA APPETITE
GROWING TECH DIVERSITY
MORE PRODUCTS
Data Movement Pressure
Too much time keeping it together
SQL Wizardry=

2.4%
7
$65M
DATA FLOW
MULTIPLE
DATA
WAREHOUSES
100s of
enrichment
processes
1,000+ Inbound Data
Feeds
7,500+ Dealers
1,500,000+ TC Dealers
Vehicles Tracked Daily
8,000,000+ Industry Wide
Vehicles Tracked Daily
400+
Websites Powered
1,000,000+
Cars Sold
20,000,000+
Customers Serviced
Industry Leading
Analytic Products
250,000,000+ Vehicle
Images
And More…
FEEDBACK LOOPS
*NUMBERS ARE ALL APPROXIMATE

WHOLESALE
SHIFT NEEDED
It’s not just an economics exercise.
WE NEED NEW CAPABILITIES.

9
$65M
FUNDAMENTAL ROLE
TRANSFORMATION
SQL
but Faster
Data Scientists Database Developers Programmers Analysts
INTELLIGENCE ENGINEERS
YES,
THIS
NOT THIS

2.4%
10
FOCUS ON MAKING THINGS
INTELLIGENCE ENGINEERS should not
have to worry about:
 COMPUTE CYCLES
 STORAGE
 SYSTEM SCALE
 MOVING DATA
THEY SHOULD BE MAKING SMARTER THINGS

2.4%
11
$65M
DATA then APPs
EXISTING DEVELOPMENT MODEL
IS BROKEN & LIMITING
NEW MODEL
Define app
Create highly tuned DB
for specific app
Load specific
data
GET ALL THE DATA YOU CAN
HDFS
Make and Remake
apps

12
$65M
PHILOSOPHY
DELET
E DATA
MOVE
DATA
DON’Ts
LEARN MAP REDUCE WELL USE NATIVE COMPONENTS
TAKE
SHORTCUTS
DO’s

2.4%
13
$65M
NO PROOF OF CONCEPTS
POCS are:
TOO SMALL
TOO SIMPLE
TOO EASY
ONLY WAY TO BUILD LHC
is to BUILD LHC

14
$65M
OUR DATA EVOLUTION
JUNE ‘13
Initiate
Hadoop
Execution
JULY ‘13
Partner with
Hortonworks
AUG. ‘13
Training
& Dev
Begins
NOV. ‘13
(60)
Node,
2PB prod.
Cluster
live
DEC. ‘13
(3)
production
apps launch
FEB ‘14
(3) more
production apps
launch
JAN. ‘14
40% Dev
staff
proficient
MAY ‘14
IPO
12 months execution path
DataPlatformCapabilities
We addressed out data
platform capabilities
strategically as a pre-cursor to
IPO.

OUR SETUP
TrueCar Hadoop Cluster:
 60 Nodes, 2.55PB usable HDFS, 960 Xeon CPU
cores, 7.7TB RAM
- 10GbE networking, 3 racks, HDP 2.1
Final price point:
$0.23/GB hardware & software/support
$0.003/GB/mo space/power/cooling

16
$65M
SOME OF OUR
HADOOP BASED SYSTEMS
Vehicle Data Systems
Intelligent Image Processing
And of course… better BI

2.4%
17
$65M
EXAMPLE SYSTEM 1:
VEHICLE DATA
 We keep track of over
8,000,000+ new and used
vehicles in inventory in the
marketplace every day
 We enrich and use vehicle
data to power our market
reports, Live Offers,
value/pricing systems,
industry data products and
more
 Previous non-Hadoop
system took 6-24 hours to
complete a full processing
run
The Goal with Hadoop:The Situation:
 Scale up to allow
reprocessing of 50 years of
inventory/vehicle record data
available to us
 Enable attaching additional
enrichment data and
processing without a massive
overhaul (plug and play)
 Complete a full processing
run of daily inbound data in 1
hour and speedy one
off/small batch CRUD
operations

18
$65M
EXAMPLE SYSTEM 1:
VEHICLE INVENTORY DATA
1. Dealer Data Feeds
 Provide daily snapshot of raw
vehicle inventory
2. MapReduce – Data Loader
 Normalize into a standard record
 Filter out bad records
 Validate fields
3. MapReduce – VIN Decoder
 Identify trim/options for each
vehicle
4. Hive – Data Enhancer
 Join against other data sources to
enrich the vehicle information
5. MapReduce – CRUD
 Decide which entries are new,
updated or should be deleted
 Put entries in a queue for exporting
to SQL
HDFS
MR –
FILTER/VERIFY
MR – VIN DECODE
Hive Enrich
MR – Rabbit/CRUD
Database
DEALER INVENTORY FEEDS
Queue
Service
Message
Queue
HADOOP

19
$65M
EXAMPLE SYSTEM 1:
VEHICLE DATA VIN DECODER
Inventory or
transaction
data from
dealers
(HDFS)
VIN
decode
rules
(general &
make-
specific)
Compute
F1 score
for
matches
Mapper
Vehicle trim
& probability
Canonical
vehicle color
data
(HDFS)
Canonical
vehicle
trim/style
data
(HDFS)
Pre staged in memory Hadoop Components:
Just a MAPPER
Avro format for I/O
Challenge:
Understand EXACTLY
What options are on all cars.
Used to compute similarity between
inventory and canonical data
http://en.wikipedia.org/wiki/F1_score

2.4%
20
$65M
EXAMPLE SYSTEM 2:
INTELLIGENT IMAGE PROCESSING
 250,000,000+ vehicle images
currently under asset
management for live data
 1,000,000,000+ images have
passed through system
 1,000,000+ images processed
daily (and growing)
 Original system for processing
images: could take up to 1 day
to fully process all daily
images
 Scale to being able to store
online over 1,000,000,000+
image
 Allow for advanced image
recognition, OCR
 Process full run of latest
images in less than 2 hours,
allow for speedy one off/small
batch real time CRUD
operations

21
$65M
EXAMPLE SYSTEM 2: IMAGE
DOWNLOADER
Pulls Images From Providers into HDFS
Hadoop
 Downloads multiple images
simultaneously
 Downloads from multiple
providers simultaneously
 Download times scale with
cluster size

2.4%
22
$65M
BUNDLER
BUNDLES MILLIONS OF DAILY IMAGES INTO SINGLE HDFS FILE
Hadoop
Image Bundle
May 31, 2014
Image Bundle
May 30, 2014
 Uses HIPI
(http://hipi.cs.virginia.edu) to
store multiple images in an
HDFS sequence file
 Instead of millions of small
daily image files ( << block
size), have 1 large daily file
with all images bundled
inside (>> block size)
 We tag images with
metadata, permanently
linking images to our vehicle
database (e.g., VIN, Make,
Model, Model Year, etc.)

2.4%
23
$65M
Hadoop
Thumbnailing
builds thumbnail
library
Vehicle Locator
ﬁnds vehicle in image
Color Decoder
determines vehicle RGB
color code
COCOCO
Orientation
determines image
orientation
Driver Side
 Image bundles can be processed
through multiple Java MapReduce
routines
 Thumbnailing is done with ImageJ
 Vehicle locator will be done with
OpenCV, using edge detection and
shape-based features
 Average color will be determined from
pixel value ratios in the RGB layers of
the jpeg
 Orientation will be determined with
shape-based features and gradient
algorithms (see Rybski, Huber, Morris,
and Hoffman 2010)
PROCESSOR
PROCESSES IMAGE BUNDLE THROUGH HADOOP

2.4%
24
$65M
EXAMPLE SYSTEM 3:
ADVANCED BUSINESS INTELLIGENCE
 8 years of web/app behavior
 25,000+ data fields
 50,000,000+ configured vehicles
 1,000,000+ TrueCar car
transactions
 Previous approaches had various
data spread across 4+ data
warehouses and only a small
portion of the data online
available for query and required
extensive data movement
pipelines to integrate
 All behavioral data for all time
available for analytics
 Data injected no less than once
per day, with most coming in
near real time
 Remove worry from analysts and
DBAs regarding deletion or
offline archive
 Reduce data warehouses,
consolidate analytic tooling

2.4%
25
$65M
EXAMPLE SYSTEM 3:
BI GROWTH
0
200
400
600
800
1000
1200
1400
1600
1800
Millions
ACCELERATING BI DATA GROWTH

2.4%
26
EXAMPLE SYSTEM 3:
MULTI-DIMENSIONAL BI

27
$65M
WAS IT WORTH IT ?
ECONOMIC
 Storage Costs, Compute Costs
- FROM $19.00/GB to $0.23/GB
 Elimination of expensive proprietary tools
FUNCTIONALITY
 Development effort of complex data applications reduced by 3x
 Automated Trend Hunting
 Consolidation of data into immediately computable, searchable
infrastructure
 Unified ETL and Storage system – near zero data movement
environment
 Functional Programming Approach

FUTURE PREVIEW
COMPREHENSIVE
DATA
REAL TIME
MARKET
SIMULATION
REAL TIME
TRANSACTION
PROCESSING
PRESCRIPTIVE
MOBILE REAL
TIME TOOLS
TOTAL AUTO
MARKETPLACE

What Drives the Car Business: Moving from Anecdotes to Data

Recommandé

Recommandé

Contenu connexe

Similaire à What Drives the Car Business: Moving from Anecdotes to Data

Similaire à What Drives the Car Business: Moving from Anecdotes to Data (20)

Plus de DataWorks Summit

Plus de DataWorks Summit (20)

Dernier

Dernier (20)

What Drives the Car Business: Moving from Anecdotes to Data