More Related Content Similar to PayPal Real Time Analytics (20) PayPal Real Time Analytics1. Open Source Real Time BI using
Storm, Hadoop, Titan, Druid & D3
Anil Madan
Sr. Director Engineering, PayPal
© 2014 PayPal Inc. All rights reserved. Confidential and proprietary.
2. $1 in every $6
Spent on e-commerce is
spent through PayPal.*
5. PayPal Now Available in 203 Markets
10 new markets added in the second quarter,
making PayPal available to 80 million new internet users.
Paraguay
Côte d’Ivoire
Nigeria
Monaco
Belarus
Montenegro
Moldova
Macedonia
Cameroon
Zimbabwe
6. How can we
help them to
complete their
1st payment?
Business Problem
Acquisition Awareness Activation Adoption
Where do
prospects
sign up for
accounts?
How do
prospective
customers
learn about
PayPal?
How can we
help them use
PayPal even
more?
© 2014 PayPal Inc. All rights reserved. Confidential and proprietary. We need to better understand our customers…
7. How we solved it…
Tracking Servers
Mobile
© 2014 PayPal Inc. All rights reserved. Confidential and proprietary.
Direct/Home
Page
Product
Experiences
Search Engine
Marketing
Transaction
Emails
Tracking Metadata
Tool
Taxonomy
Tracking Event
Service
Tag
Catalog
Tracking Validation
Service
Real Time Systems
Marketing
Segmentation
Experimentation
Metadata
Big Data
Exploratory Analytics Attribution Predictive Analytics
8. Metadata Instrumentation Collection Processing Analytics
Server Side
Events
© 2014 PayPal Inc. All rights reserved. Confidential and proprietary.
Pathing
Store
DRUID
Metrics
Store
Reporting &
Visualization
Logical View
Client Side
Events
Page
Performance
Events
Collection
Service
Sessionization
Behavioral
Metrics
Marketing
Metrics
Performance
Metrics
Operational Metrics (OpenTSDB)
Real Time
Event
Metrics
9. Metadata –Logical Entity Model
TEMPLATE PAGE
COMPONENTS
© 2014 PayPal Inc. All rights reserved. Confidential and proprietary.
LINK
TAGS
10. Metadata – Logical Event Model
Impression
Event
© 2014 PayPal Inc. All rights reserved. Confidential and proprietary.
Tracking
Event
Reaction
Event
Component
Impression
Event
Ad
Impression
Event
Click
Event
Click-Through
Event
Mouse-over
Event
Entry
Event
Exit
Event
Outcome
Event
Page
Impression
Event
Client Page
Impression
Event
Server Page
Impression
Event
11. Metadata - Self-Service Management Workflow…
© 2014 PayPal Inc. All rights reserved. Confidential and proprietary. 11
12. DATA PIPELINE
Processing Analysis &
Customers
Client Visualization
Side
Metadata
HTTP
© 2014 PayPal Inc. All rights reserved. Confidential and proprietary.
Performance
Collection
Metrics
Tools
REST
Spout
Bot
flagging
Bolt
Sessionization Aggregation
R
E
S
Proxy T
Server
Side
Geo
Enrichment
Bolt R
e
p
o
r
ti
n
g
Data Stores
Druid
Apache
Titan
Developers
Product Owners
Meta
data
Reporting
Consumers
Metadata
Service
13. Druid Architecture
• Open-source
• Distributed
• Real-time
• Highly-Available Data store
• Column-oriented
• Approximate or Exact
© 2014 PayPal Inc. All rights reserved. Confidential and proprietary.
14. Real Time Nodes
• Ingest data and buffer events in
memory
• Incremental indexing
• Query data as soon as it is
ingested
• Periodically persist collected
events to disk
• Combine multiple disk indexes
to create immutable ‘segments’
• Log-structured merge-tree
© 2014 PayPal Inc. All rights reserved. Confidential and proprietary. 14
16. Historical Nodes
• Load immutable read-optimized data
from deep storage
• Memory mapped storage engine
• Caches segments
• Supports tiered storage
© 2014 PayPal Inc. All rights reserved. Confidential and proprietary. 16
19. Metrics & Dimensions
"type": "doubleSum",
"name": "pageviews",
"fieldName": "PV"
},
{
"type": "doubleSum",
"name": "bounces",
"fieldName": "bnc"
},
....
{
"type": "hyperUnique",
"name": "unique_visits",
"fieldName": "user_session_guid"
},
{
"type": "hyperUnique",
"name": "unique_visitors",
"fieldName": "user_guid"
}
2014/06/11/10",
"filter": "part-",
"parser": {
"type": "string",
"timestampSpec": {
"column": "timestamp",
"format": "auto"
},
"data": {
"format": "json",
"dimensions": [
"timestamp",
"USER_GUID",
"USER_SESSION_GUID",
"PAGE_GROUP",
"PAGE_NAME",
"PAGEGROUP_LINK_NAME",
"PAGE_LINK_NAME",
…
© 2014 PayPal Inc. All rights reserved. Confidential and proprietary. 19
Standard
Metrics
Estimated
Metrics
HyperLogLog
Dimensions
20. Sessionization
Events VisitContainer
© 2014 PayPal Inc. All rights reserved. Confidential and proprietary. 20
Visitor
ID
Session
ID
Timestamp Event
Payload
V1 S1 2014-10-16
05:12
E1
V2 S2 2014-10-16
05:14
E2
V1 S1 2014-10-16
05:15
E3
V1 S1 2014-10-16
05:20
E4
V2 S2 2014-10-16
05:21
E5
V1 S3 2014-10-16
05:25
E6
… … … …
Visitor
ID
Session
ID
Payload
V1 S1 sf, mac, {flash, quicktime}, {ca,
usa}, 480 secs,….
E1
E3
E4
V2 S2 ff, win, {acrobat, mediaplayer}.
{wb, in}, 420 secs…..
E2
E5
V1 S3 sf, mac, {quicktime, java}, {on, ca},
60 secs
E6
21. Druid Storage – Columns & Dictionaries
Timestamp (Hr) Sessi
on
ID
Country OS User
Agent
Page Name
Page Name
0
1
2014-10-16 05 S1 US MAC SF Login
AccountOverview
0
2
3
0
2
4
0
5
4
0
5
2014-10-16 05 S2 DE WIN IE Login
PaymentReview
AccountHistory
2014-10-16 05 S3 US LNX FF Login
PaymentReview
Checkout
2014-10-16 05 S4 UK LNX FF Login
Profile
Checkout
2014-10-16 05 S5 DE WIN CR Login
Profile
0
1
4
2014-10-16 05 S6 UK MAC SF Login
AccountOverview
Checkout
Dictionary
Login 0
AccountOvervie
w
© 2014 PayPal Inc. All rights reserved. Confidential and proprietary. 21
1
PaymentReview 2
AccountHistory 3
Checkout 4
LZF Profile 5
22. Druid Data Structure - Bitmap Indices
© 2014 PayPal Inc. All rights reserved. Confidential and proprietary. 22
23. Herald – Self Service Analytics
© 2014 PayPal Inc. All rights reserved. Confidential and proprietary. 23
24. Herald – Self Service Analytics
© 2014 PayPal Inc. All rights reserved. Confidential and proprietary. 24
25. Druid Metrics
© 2014 PayPal Inc. All rights reserved. Confidential and proprietary. 25
26. Pathing
© 2014 PayPal Inc. All rights reserved. Confidential and proprietary. 26
Enter
27. Fallout Reports
© 2014 PayPal Inc. All rights reserved. Confidential and proprietary. 27
28. Pathing
A->B->C->D->X->A->M and A->B->C->D->E
Visitor ID Current Page Next Page 1 Next Page 2 Prev Page 1 Prev Page 2
S1 A B C null null
S1 B C D A null
S1 C D X B A
S1 D X A C B
S1 X A M D C
S1 A M null X D
S1 M Null null A X
S2 A B C null Null
S2 B C D null A
S2 C D E B A
S2 D E Null C B
S2 E Null null D C
© 2014 PayPal Inc. All rights reserved. Confidential and proprietary. 28
29. Pathing
Next Page
{
“queryType” : “groupBy”
“dimensions” : (“current_page”, “dimensions like country, segmentation etc”}
“aggregations” : [
{ “type”: “count”, “name”: “next_page_count”, “fieldname” : “next_page, next_page2” }]
“filter”: { “type”: “selector”, “dimension”: “current_page”, “value”: “C” }
}
Previous Page
{
“queryType” : “groupBy”
“dimensions” : {“current_page”, “dimensions like country, segmentations etc”}
“aggregations” : [
{ “type”: “count”, “name”: “prev_page_count”, “fieldname” : “prev_page1, prev_page2” }]
“filter”: { “type”: “selector”, “dimension”: “current_page”, “value”: “C” }
}
© 2014 PayPal Inc. All rights reserved. Confidential and proprietary. 29
30. A->B->C->D->X->A->M
A->D-> X->M
“queryType” : “search”
“dimensions” : { “current_page_path_count”, “dimensions like country, segmentation
etc”}
“filter”: { “type”: “regex”, “dimension”: “next_page_path”, “pattern”: “^A*D*X*M$” }
}
© 2014 PayPal Inc. All rights reserved. Confidential and proprietary. 30
Fallout
• Apply them to the dictionary
• Figure out the values that match
• Take those bitmap indices
• OR the bitmap indices together
• Use the output bitmap as the filter
31. Model View
Controller
Directives NVD3
© 2014 PayPal Inc. All rights reserved. Confidential and proprietary. 31
CLIENT SERVER
Herald Architecture
32. SSO
Druid
Herald Deployment
© 2014 PayPal Inc. All rights reserved. Confidential and proprietary. 32
33. Adhoc Graph Analytics
© 2014 PayPal Inc. All rights reserved. Confidential and proprietary. 33
Name:
Login_20141
01611
Country: US
Count: 15
Name:
AccountOver
view_201410
1611
Name:
PaymentRevi
ew_
2014101611
Name:
Checkout_20
14101611
Country: US
Count: 5
Country: US
Count: 5
Country: US
Count: 10
5
8
7
6
34. Name:
Login_2014
101611
Country: US
Count: 15
Name:
AccountOv
erview_201
4101611
Name:
PaymentRe
view_2014
101611
Name:
Checkout_
201410161
1
Country: US
Count: 5
6
Country: US
Count: 5
7
Country: US
Count: 10
5
8
gremlin> g.v(‘Name’, ‘Login_2014101611').
as('x’).
outE.inV.loop('x')
{it.loops < 4}
{it.object.getProperty('name') ==
'Checkout_2014101611'}.path
© 2014 PayPal Inc. All rights reserved. Confidential and proprietary. 34
35. Summary
• Problem
• Understand our customer behavior
• Across disparate channels & experiences
• Solution
• Democratize data
• Consistent standardized metadata
• Disciplined instrumentation
• Distributed scalable backend for adhoc & interactive analytics
• Self-service BI through modern visualization tools
© 2014 PayPal Inc. All rights reserved. Confidential and proprietary. 35
36. Questions ?
© 2014 PayPal Inc. All rights reserved. Confidential and proprietary.