As an ecommerce site with more than 800,000 different sellers, Etsy is particularly interested in understanding how shoppers find the items they seek. Part of this understanding involves attributing successful purchases to specific features on the site. This attribution model allows us to compare and refine Etsy’s features, but also provides valuable signals for A/B testing, search quality, and recommenders. However, the path to a successful handmade purchase often involves multiple features over the course of several visits.
This talk will discuss the challenges of funnel analysis at Etsy and the corresponding deficiencies of several widely used web analytics tools. We’ll then dive into our event sequence matching tool, which we’ve successfully applied to hundreds of millions of visits in a single Hadoop job and is widely used across our big data stack. Finally, we’ll take a look at some of our applications of the tool and compare it to related work.
10. Funnels ++
• Funnels are more than just an optimization tool
• Use them to understand different pathways throughout our
site
• Partition and compare these pathways by attributes
• A/B tests, categories, queries, cohorts
• Attribution model
Tuesday, March 12, 13
11. Attribution Models
• Tie conversions and successes to specific products and
actions
• Use to gain understanding of our users’ interaction with Etsy
• Help us to measure gains in A/B testing
• Easily compare different varieties of the same product
• Attribution techniques for internal and external attribution
Tuesday, March 12, 13
22. Segmenting Within Funnels
Old Algorithm
Search Clicked Listing Purchased
Counts 50,000 20,000 15,000
New Hotness
Search Clicked Listing Purchased
Counts 100,000 60,000 15,000
Tuesday, March 12, 13
23. Segmenting Within Funnels
Old Algorithm
Search Clicked Listing Purchased
Step 100% 50% 40%
To t a l 100% 50% 20%
New Hotness
Search Clicked Listing Purchased
Step 100% 60% 25%
To t a l 100% 60% 15%
Tuesday, March 12, 13
24. Segmenting Across Funnels
* Clicked Listing Purchased
Search 100% 50% 40%
Browse 100% 40% 30%
Home 100% 60% 36%
Activity Feed 100% 62% 28%
Ta s t e Te s t 100% 47% 31%
Search Ads 100% 45% 38%
Tuesday, March 12, 13
25. Democratized Funnels
How do we make this awesomeness available for everyone?
Tuesday, March 12, 13
27. Awesome infrastructure but...
• Must be an engineer to write your own queries
• Engineering resources become the bottleneck
• Hard to scale as the company grows
Tuesday, March 12, 13
28. So we want to:
• Allow the data engineers to focus on higher priority things.
• Allow people to answer their own questions.
Tuesday, March 12, 13
36. It’s the “Etsy Way”
• Already have existing infrastructure
• Operationally stable
• The same tools everyone else is using means a better adoption rate
across Engineering
Tuesday, March 12, 13
37. • Event Stream
• Code runs on every page view
• Simple matching system
• (ab)Used memcached as a temporary storage
• Rolled up to DB every min (near real time)
Tuesday, March 12, 13
41. • Funnels had to be setup ahead of time (no backfills)
• Reconciliation is hard
• Limited to events in our web clickstream (ios/android would be
excluded)
• Scaling and Operational issues
• Difficult to maintain multiple stacks
Tuesday, March 12, 13
42. Turns out that we don’t make
Product decisions in real time
http://mcfunley.com/whom-the-gods-would-destroy-they-first-give-real-time-analytics
Tuesday, March 12, 13
45. Getting back on the elephant
• Able to carry over the User Interface from v1
• Standardized event sessionization
• Operationally supported infrastructure
• Nightly batch process
Tuesday, March 12, 13
52. How do we get it?
select (clicks/visits * 100.0) as "CTR"
from feature_funnel
where event_type = 'search'
and ab_test = 'sitewide'
and epoch_s = 1361318400
and group_name = 'ALL_GROUPINGS';
Tuesday, March 12, 13
54. Features
• Eliminate the need for an engineer to write the queries
• Robust segmentation
• Not be limited to visit sessions
• Run Ad Hoc queries
Tuesday, March 12, 13
55. Build your own Funnels
show the builder ui here
Tuesday, March 12, 13
65. Query
• Only discussed event types
• Funnel steps require additional constraints:
• Listing referred by search
• Added that listing to cart
Tuesday, March 12, 13
66. Search
query “dinosaur”
listing_ids 119469855, 90583707, ...
“http://www.etsy.com/search/handmade/patterns?
loc q=dinosaur&order=most_relevant&view_type=gallery&ship_to=ZZ”
... ...
Tuesday, March 12, 13
97. query listing_id cart_type
Tuesday, March 12, 13
98. Segmented Funnel Analysis
• Extract segmenting properties
• Compute indicators as before
• Group on segmenting properties and sum
Tuesday, March 12, 13
99. MapReduce
• Work is done map-side
• Common first step in our jobs
• Expensive computation limited to first round mappers
Tuesday, March 12, 13
101. Components
• Predicate: matches/rejects events
• Query: tuple of predicates
• Match: tuple of events
Tuesday, March 12, 13
102. Match Predicates
• Select an event based on:
• Full event sequence
• Prior matched events
• Current candidate
Tuesday, March 12, 13
103. Match Predicate DSL
• Combine predicates with logical operators
• val Query = Seq(Search, Listing & Referred, Cart & AddedListing)
Tuesday, March 12, 13
104. Semantics
• Fixed number of events in match
• Arbitrary number of matches per sequence
• Collect and extend all partial matches
Tuesday, March 12, 13
130. Match Tree
• Purely functional data structure
• Holds matched events and indices
• Match prefixes shared
Tuesday, March 12, 13
131. Match Tree Algorithm
• Fold over sequence accumulating tree
• May extend any non-terminal node
• Each level in tree corresponds to predicate
Tuesday, March 12, 13
132. Practicality
• Explodes, but
• Queries are constant length
• Sequences are bounded (visits)
• Predicates constrain growth
Tuesday, March 12, 13
133. Summary
• Why funnels are interesting
• What we’ve built with them at Etsy
• Our approach to funnel analysis
Tuesday, March 12, 13
134. Questions?
• Steve Mardenfeld
• Wil Stuckey @quiiver
• Matt Walker @data_daddy
Tuesday, March 12, 13