5. About Nationwide
4
16+
MILLION
POLICIES
25MILLION
CONTRIBUTED
TO NONPROFITS
AND COMMUNITIES
$
1#
INSURER OF
FARMS AND
RANCHES
7LARGEST
HOMEOWNER AND
AUTO INSURANCE
PROVIDER IN THE U.S.
th
GALLUP
GREAT PLACE
TO WORK
AWARD WINNER
3 YEARS RUNNING
LARGEST PET
INSURER IN THE U.S.
9th
LARGEST
COMMERCIAL
INSURER
$23.9 BILLION IN REVENUE FOR 2013
Nationwide has approximately 31,000 associates
serving customers in nearly every state.
1#
PROVIDER OF
PUBLIC-SECTOR
RETIREMENT
PLANS
FOUNDED IN 1926 BY
MEMBERS OF THE
OHIO FARM BUREAU
28th
COMPUTERWORLD
GREAT PLACE TO
WORK IN IT
6. About SmartRide
• SmartRide is Nationwide's version of Telematics, offered to
customers to help them improve their driving behavior and save
on insurance premiums.
5
• Customers install a small device into their vehicle for 6 months
which measures…
7. SmartRide Data Characteristics
Multiple vendors
Files of different layouts arriving at different frequencies:
Hourly
Every 4 hrs
Four CSV files per vendor
~ 30 GB to ~ 60 GB of data per day
Data challenges
Late arriving trips
Partial trips
Duplicate trips
Orphan trips
6
11. IBM® BigInsights™ for Apache™ Hadoop Configuration
• Version 2.1.2
6 Management Nodes and 16 Data Nodes
Each with 128 GB RAM and 18 TB of storage
Hadoop 2.2, BigSQL 1.0, Hive 0.12, Hbase 0.96
• Three environments
Dev, Test, and Production
All same configuration
• Limitations
No workload management
No environment for DR
Used Test Cluster for Hbase failover
10
15. Design Considerations
• One hour window for end
to end processing
Handling data issues
Summarization
Multiple cycles per
day
• Predictable run time for
backlog processing when
jobs fail
• Reloading incorrect
batch
• Restart failed batch
14
0.67
8.30 8.30
0.00
1.00
2.00
3.00
4.00
5.00
6.00
7.00
8.00
9.00
0.00
100.00
200.00
300.00
400.00
500.00
600.00
700.00
1 Hr Batch 12 Hrs Loop 12 Hrs batch
ElapseTimeinMins
Cycles
TripScrub TripSecDetails TripSum
State Median Hbase load AuditEn
raw2can Input Size in GB
16. Acquire Phase
15
Raw trip files
copied into HDFS
using WebHDFS
protocol
Folders created by
vendor, file, load
event ID, and batch #
Used Sqoop to
transfer 4 TB of
historical data from
Data Warehouse
Hive external tables
for each file
Partitioned by load
event ID and batch #
Used both BigSQL
1.0 and HiveQL
Partitioned external tables helped in
o Processing backlog data
o Reprocessing incorrect batches
17. Standardize Phase
16
Select data from
external tables based
on load event ID
Each load event ID
can include one or
more batches
More than one load
event ID can be
processed in one
cycle
Data moved to next stage only from work tables
Helped in performance
Dynamic partitions helped in loading multiple batches
Partitions get overwritten if already exists
Helped in reprocessing incorrect batch
Work tables contain
data for CURRENT
processing cycle
Canonical tables
partitioned by
source and batch #
Load using
dynamic
partitioning
18. Data Scrubbing & Event Calculation
17
Trip
summary
Trip point
Map side join
Single read
multi write
Orphan
trips
Trip points
(Work table)
Java M/R program for
o Scrubbing
o Events calculation
Night time driving
Hard brake
Fast acceleration
Miles driven
Events at
seconds level
(Work table)
Very good performance gain
Using Java for complex scrubbing rules
Single read multiple writes
Only required data points processed
No data persisted to corpse tables
19. Summarization Phase
18
Events at
seconds level
(Work table)
Gather all trips related to
devices from current trip
and aggregate at various
levels
Union ALL
UDF to store data points
for trip graph
Replace new summary
info into final table
SRE summary
(Work table)
SRE summary
partitioned by
source
SRE summary
in HBase
Parallelized the Union All operation
Partitioning by Source enabled both Vendor
data to be processed at same time if overlap
happens
PUT from Hive to Hbase, WAL disabled
Shorten column
names
Changed to epoch
time
Prefix salting key
Generate rowkey
Column family
mapping
20. Batch Performance Metrics
19
1 Hr
Batch
SLA
0
0.5
1
1.5
2
2.5
0
5
10
15
20
25
30
35
40
DataSizeinGB
RuntimeinMins
Cycle Schedule Time
Avg Run Times for Hourly Cycles
0
2
4
6
8
10
12
0
10
20
30
40
50
60
0000 0400 0800 1200 1600 2000
DataSizeinGB
RuntimeinMins
Cycle Schedule Time
Avg Run Times for 4 hr Cycles
Trip Second Details
Standarize
SRE Trip Summary Hive
SRE Trip Summary Hbase
Audits
Acquire
Size in GB
21. Data Access
SmartRide Web Page
Application Layer
Column Family and Row Key Design
Performance Metrics
25. Column Family & RowKey Design
24
RowKey –
Pfx_pgmId_pdflg_
timestamp
Column Family –
Summary Data
Column Family –
Trip-point Data
12_8798782_Tp_201
5080912000000
SM:miles,1500001245,’15’,
SM:hb,1500001245,’2’,
SM:fa,1500001245,’5’,
SM:nt,1500001245,’Y’
TP:Trip,1500001245,’{JSON
BLOB}’
Sorted
Lexicographically
• Column family (CF) helps in grouping the related columns
depending on access pattern.
• Co-locating the keys related to one customer in one region to
access data using filter from one region server.
26. Performance Metrics
Scenarios – 1x, 2x, 3x concurrent users, Zookeeper node going
down, Datanode unavailable
Tools used – Initial test using custom program, LoadRunner for
final test, SiteScope for monitoring resource consumption
25
SLA for aggregates – 5 sec
# of concurrent users - 1200
28. Business Benefits
• Deeper Engagement with Members
Over 2 million website page views since the July
launch. To put in perspective, our vendor-hosted
website would receive 100,000 views in a 12 month
period.
Over 60K users have accessed the new site and 90%
of those are new users.
• Increase in bind ratios across all channels
• Improvement in loss ratios
• Enterprise first "big data" implementation at Nationwide
27
29. Future scope – Personal and Commercial Fleet
28
Insights Give
Nationwide
Competitive
Advantage
Weather
Data
GPS
Data
Hourly
Trip Data
from
Device
Claims
Data
Other
Public
Records
31. We Value Your Feedback!
Don’t forget to submit your Insight session and speaker
feedback! Your feedback is very important to us – we use it
to continually improve the conference.
Access your surveys at insight2015survey.com to quickly
submit your surveys from your smartphone, laptop, or
conference kiosk.
30
33. 32
Notices and Disclaimers (con’t)
Information concerning non-IBM products was obtained from the suppliers of those products, their published announcements or other publicly
available sources. IBM has not tested those products in connection with this publication and cannot confirm the accuracy of performance,
compatibility or any other claims related to non-IBM products. Questions on the capabilities of non-IBM products should be addressed to the
suppliers of those products. IBM does not warrant the quality of any third-party products, or the ability of any such third-party products to
interoperate with IBM’s products. IBM EXPRESSLY DISCLAIMS ALL WARRANTIES, EXPRESSED OR IMPLIED, INCLUDING BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE.
The provision of the information contained herein is not intended to, and does not, grant any right or license under any IBM patents, copyrights,
trademarks or other intellectual property right.
• IBM, the IBM logo, ibm.com, Aspera®, Bluemix, Blueworks Live, CICS, Clearcase, Cognos®, DOORS®, Emptoris®, Enterprise Document
Management System™, FASP®, FileNet®, Global Business Services ®, Global Technology Services ®, IBM ExperienceOne™, IBM
SmartCloud®, IBM Social Business®, Information on Demand, ILOG, Maximo®, MQIntegrator®, MQSeries®, Netcool®, OMEGAMON,
OpenPower, PureAnalytics™, PureApplication®, pureCluster™, PureCoverage®, PureData®, PureExperience®, PureFlex®, pureQuery®,
pureScale®, PureSystems®, QRadar®, Rational®, Rhapsody®, Smarter Commerce®, SoDA, SPSS, Sterling Commerce®, StoredIQ,
Tealeaf®, Tivoli®, Trusteer®, Unica®, urban{code}®, Watson, WebSphere®, Worklight®, X-Force® and System z® Z/OS, are trademarks of
International Business Machines Corporation, registered in many jurisdictions worldwide. Other product and service names might be
trademarks of IBM or other companies. A current list of IBM trademarks is available on the Web at "Copyright and trademark information" at:
www.ibm.com/legal/copytrade.shtml.
34. • IBM’s statements regarding its plans, directions, and intent are subject to change or withdrawal
without notice at IBM’s sole discretion.
• Information regarding potential future products is intended to outline our general product direction
and it should not be relied on in making a purchasing decision.
• The information mentioned regarding potential future products is not a commitment, promise, or
legal obligation to deliver any material, code or functionality. Information about potential future
products may not be incorporated into any contract.
• The development, release, and timing of any future features or functionality described for our
products remains at our sole discretion.
Performance is based on measurements and projections using standard IBM benchmarks in a
controlled environment. The actual throughput or performance that any user will experience will vary
depending upon many factors, including considerations such as the amount of multiprogramming in the
user’s job stream, the I/O configuration, the storage configuration, and the workload processed.
Therefore, no assurance can be given that an individual user will achieve results similar to those stated
here.
Please Note:
2