Marketer and client services manager | Marketing insight and strategy à Snowplow Analytics
13 Apr 2017•0 j'aime•1,854 vues
1 sur 34
Snowplow: open source game analytics powered by AWS
13 Apr 2017•0 j'aime•1,854 vues
Télécharger pour lire hors ligne
Signaler
Données & analyses
This is a presentation by Alex Dean and Yali Sassoon at Snowplow about open source game analytics powered by AWS. It was presented at the Games Developer Conference (GDC) in San Francisco, February 2017
2. Hello! We’re Alex and Yali.
We created Snowplow
• We cofounded Snowplow
• Open source event data pipeline built on
AWS tech
• Collect granular, rich, event-level data
across digital platforms
• Validate, enrich, model and deliver that
data to the places it can be analysed and
acted on
3. Wonder at what the data made possible
drove us to create Snowplow
• Digital event data is rich, behavioral information on how millions of people do
things (play, work, socialize, flirt, unwind etc.) collected at scale
• Endless possibilities to ask and answer different questions, build intelligence
and act on that intelligence
• Packaged solutions do a poor job of enabling companies to realise all the
different possibilities presented by this data
• Lots of companies build their own event data pipelines to realise those
possibilities. If we can build a standard pipeline, companies can focus on
doing stuff with the data
5. Games companies are typically
very analytically sophisticated
• At a (often early) stage invest in event data
warehouse / data pipeline
• Analytics is often very specific to each game:
packaged solutions can only get you so far
• Data sophistication: competitive advantage
• Larger game studios typically have very large data
teams (engineering, science and analysis) and
significant analytics infrastructure that they’ve built
6. But you don’t need to build your own
event data pipeline from scratch
• We have a tried and tested open-source stack, that you can
deploy directly to your own AWS account
• Built on top of AWS services incl. Kinesis, Lambda, Redshift,
Elasticsearch, S3, EMR
• Use your data engineers to build analyses specific to your game,
not to re-build the pipe!
7. Building high quality event data
pipelines is hard
Data quality Schema evolution
Enrichment Data modeling
10. Early work with games studios heavily
influenced our thinking
Flexible data schema
that evolve!
Event grammar:
events vs entities
Evolving data models:
understanding
sequences of play
12. Game analytics encompasses a lot
• Product analytics: use data to improve the game
• Customer acquisition analytics: sustainably drive user growth
• Game health analytics: monitor the game
• Data-driven applications within the game e.g. player-matching
• Plenty more that is specific to your game
13. We distinguish between analytics on
read vs analytics on write
• Decide on how you want to process the
data at the point of query
• Prioritise having the flexibility to query the
data in a rich / varied way
• De-prioritise query latency
• Example: product analytics
Analytics on Read Analytics on Write
• Define in advance how the data will be
queried
• Prioritise low latency
• De-prioritise query flexibility
• Example: game health monitoring
Different architectures are appropriate for the above two cases
14. With Snowplow, we meet both
requirements via a Lambda Architecture
Analytics on write:
kinesis + AWS
Lambda / Spark
Streaming
Analytics on
read: Redshift /
Spark / Athena
16. Analytics on read example: A/B testing
to drive product development
• Limitless possibilities for experiments
• Wide set of metrics that you might be
looking to influence with each
experiment
• Tracking the experiments should be
easy
• All enabled by the flexibility to
compute segments and metrics after
the fact (at query time)
17. Delivering the A/B testing framework
with Redshift and/or Spark on EMR
Process
• Product manager defines A/B test in
advance incl. KPI and success
threshold
• Rolling program of tests run each week
• Test history documented
Technology
• Event tracked to indicate that a user is
assigned to a specific group and a
particular experiment is run
• KPI can be measured after the fact
19. Delivering level analytics with Redshift
and/or Spark on EMR
Process
• Define key metrics to understand player
engagement with each level
• Build out data modeling process to compute
level aggregation on the underlying event
stream
• Extend over time: build out more
sophisticated metrics as understanding of
play evolves
Technology
• Attach level metadata to all events
• Aggregate event-stream in Redshift /
Spark
• Recompute over historical data as new
metrics are developed
20. AWS provides a rich and growing toolkit
for analytics on read
• EMR enabling Hadoop, Spark, Flink
• Athena
• Redshift
• Elasticsearch Service
22. Analytics on write example 1:
Surface aggregate play data in the game
• https://next.codecombat.com/play/dungeon
23. Delivering aggregate play data into the game with
Kinesis, Lambda and DynamoDB
Example: calculating # of users live on each level now
Elegantly handle computing complex metrics (count distincts) in real-time
{…},
{ event_name: e,
level_name: l
user_name: u,
timestamp: t },
{…}
Kinesis event stream AWS Lambda
Compute
player
state
Player state
table
Event stream
of updates to
player state
DynamoDB
+ stream
Compute
level state
AWS Lambda DynamoDB
Level state
table
24. Analytics on write example 2:
Tiered support based on player LTV
Triage user based on expected LTV
1. Standard user: minimise support cost
2. Silver user: personalised service
3. Platinum user: concierge service
25. Delivering tiered support using Kinesis, Lambda,
DynamoDB and API Gateway
Example: computing customer lifetime value and serving from customer API
{…},
{ event_name: e,
user_name: u,
transaction_value: v
timestamp: t },
{…}
Kinesis event stream AWS Lambda
Compute
Player
Lifetime
Value
Player
State
table
DynamoDB
+ stream
Serve
Player
State
API Gateway
Triage
player
support
tier
26. AWS provides a rich and growing toolkit
for analytics on write
• Spark Streaming on EMR
• Kinesis Client Library
Stream processing frameworks Serverless event processing
• AWS Lambda
• Kinesis Analytics
28. 1. Keep your analytics stack
independent from your game’s stack
Evolve game and
analytics independently
Best of breed
components for
analytics and game
Handle order of
magnitude different
scale requirements
• Helpful for larger teams
• Reduce fragility
• Limited overlap
between best tools for
game engines and
best for event analytics
• Game event volumes
will dwarf active game
data
29. 2. Develop your analytics on read first,
then migrate them to on write
• Example: customer acquisition model: set bid prices for different
user cohorts
• Model developed, tested and trained on historical data in data
warehouse
• Model then put live on real-time data in-stream
30. 3. Have a formal framework for
managing change
• Change is inevitable through the lifetime of the game:
• The game evolves
• Analysts and scientists ask new questions of the game
• The analytics team must agree a framework to handle:
• Updates to the in-game event and entity schemas (affects the
developers)
• Evolution of the event data modeling (affects the wider company)
32. Standardise on your event data pipeline
• Why re-invent the wheel?
• Deploy our tried and tested open-source stack, directly in your
AWS account
• Use your data engineers to build analyses specific to your game,
not to re-build the pipe!
34. Thank you for attending #AmazonDevDay, please take a moment to
complete our survey for a chance to win the grand prize.
bit.ly/DevDaySurvey
Q&A will be in a room on the third floor