SlideShare a Scribd company logo
1 of 42
Download to read offline
© 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Jordan Young – Manager, Analytics Engineering, Glu Mobile
October 2015
GAM406
Glu Mobile
An Amazon Kinesis-centric data platform to process real-time
gaming events for 10+ million user devices
What to expect from the session
• Glu Mobile: Data requirements and challenges
• Architecture overview and decisions
• Amazon Kinesis: Producers, Streams, and the Amazon
Kinesis Connector Library
• Real Time: Storm and Amazon Kinesis Storm Spout
• Other challenges and insights
Glu mobile basics
• A mobile gaming leader across genres
• 4 titles in top 100 grossing (US) (9/24/15)
• 4–6 million daily active users (DAU) (typical, 2015)
• 1 billion+ global installs (2010-)
What we collect
High, variable volume
• 700 million to 2+ billion
events per day
• 600 bytes per event
• Up to 1.2 TB per day
• Could scale up further with
successful game launch
Multiple sources
• Client side SDKs
• Game servers, central
services servers
• Attribution partners
• Ad networks
• Third parties
Basic requirements
• Near zero data loss
• High levels of uptime
• Flexible data format — JSON with arbitrary fields
• Real-time aggregations
• Reasonably low latency for ad-hoc queries
(hourly batching OK)
Other requirements
• Not expensive
• Can be implemented with minimal engineering effort
• Requires minimal changes to existing games
Architecture: Past, Present,
and Why
First redesign
First redesign
Next Step: Bring data collection in house
• Build our own analytics SDK
• Need a framework for collecting data from SDK
• Options:
• Build our own streaming and collection (Apache Kafka)
• Use a hosted service (Amazon Kinesis)
Amazon Kinesis: Producers,
Streams, and the Amazon
Kinesis Connector Library
What is Amazon Kinesis?
Why Amazon Kinesis?
• Minimal setup time
• Prebuilt applications (Amazon Kinesis Connector Library
[KCL], Amazon Kinesis Storm Spout)
• Extremely minimal maintenance
• Minimal hardware
• No significant price advantages either way (vs Kafka)
Producers
• Custom built client SDKs
• Native Android (Java), Native iOS (Obj-C) plug-ins
• Unity wrapper for unity titles
• Built on top of AWS SDKs (for each platform)
• Implements our internal analytical schema / standards
• Additional server-side implementations
Producers (continued)
• Vanilla KinesisRecord.submitAllRecords()
• No record batching
• No compression
• Records flushed every 30 seconds, or on certain events
• Client authentication using Amazon Cognito
• Server authentication using AWS Identity and Access
Management (IAM) profiles
How many shards?
• Shard limits:
• 1,000 records per second
• 1 MB per sec writes
• 2 MB per sec read
• Our situation: 20,000 RPS, 600 bytes per message
• Need at least 20 shards to handle message count
• Only need 12 MB per sec write capacity
• 20 shards = 40 MB per sec read capacity
• Up to 3 apps OK (36 MB < 40 MB)
• Other considerations (peak load, excess capacity)
Consumers: Amazon Kinesis Connector Library
Consumers: KCL pipeline
public class S3Pipeline implements IKinesisConnectorPipeline<String, byte[]> {
@Override
public ITransformer<String, byte[]> getTransformer(KinesisConnectorConfiguration configuration) {
return new GluJsonTransformer();
}
@Override
public IFilter<String> getFilter(KinesisConnectorConfiguration configuration) {
return new GluMessageFilter<String>();
}
@Override
public IBuffer<String> getBuffer(KinesisConnectorConfiguration configuration) {
return new BasicMemoryBuffer<String>(configuration);
}
@Override
public IEmitter<byte[]> getEmitter(KinesisConnectorConfiguration configuration) {
return new GluS3Emitter(configuration);
}
}
Consumers: Transformer implementation ex.
@Override
public String toClass(Record record) {
String json_str = "";
try { json_str = new String(record.getData().array()); } catch (Exception e) { return null; }
if (json_str != null && !json_str.isEmpty()) {
if (json_str.startsWith("{") && json_str.endsWith("}")) {
json_str = json_str.substring(0, json_str.length()-1);
if (json_str.length() > 3) {
json_str += ",";
json_str = json_str + ""kin_seq_num":"" + record.getSequenceNumber() + "",";
json_str = json_str + ""server_ts":" + System.currentTimeMillis() + "}";
}
}
}
return json_str;
}
@Override
public byte[] fromClass(String record) {
return record.getBytes();
}
Real Time: Storm and the
Amazon Kinesis Storm Spout
Storm and real-time data
• Distributed, fault-tolerant
• Processes real-time data
• Views records as “tuples” which are passed through an
arbitrary DAG of nodes (a topology)
• Spouts: Emit tuples into the topology
• Bolts: Process tuples in the topology
• Read from anywhere, write to anywhere
Storm cluster architecture
Storm: Real-time aggregation
Implementing the Amazon Kinesis Storm Spout
//Define configuration parameters
final KinesisSpoutConfig config =
new KinesisSpoutConfig(streamName, zookeeperEndpoint).withZookeeperPrefix(zookeeperPrefix)
.withKinesisRecordScheme(new DefaultKinesisRecordScheme())
.withRecordRetryLimit(recordRetryLimit)
.withInitialPositionInStream(initialPositionInStream) //LATEST or TRIM_HORIZON
.withEmptyRecordListBackoffMillis(emptyRecordListBackoffMillis);
//Create Spout
final KinesisSpout spout = new KinesisSpout(config, new CustomCredentialsProviderChain(awsAccessKey,
awsSecretKey), new ClientConfiguration());
//Set Spout in Topology and define parallelism
builder.setSpout("kinesis_spout", spout, num_spout_executors);
Storm: Lessons
• Only extract necessary fields when deserializing
• Big instances with few workers (JVMs)
• Too many workers can reduce speeds
• Balance flexibility vs. speed
• Final state (throughput and hardware)
• Can handle up to ~42K RPS on (4) c4.2xlarge instances
• Using (2) m3.large for ZooKeeper, m3.xlarge for Nimbus
Glu’s new architecture
Challenges and Insights
Challenge: Shards, buffers, and file size
• Each shard is handled by its own buffer
• A KCL instance needs a buffer for each shard
• More shards per machine = more memory
• Avoid memory bottleneck by reducing buffer size
 Creates more, smaller files
• Hadoop does not like this!
Solution: CombineFileInputFormat
Solution: CombineFileInputFormat (continued)
Challenge: No IP address on record
• Record sent to stream without IP
• Device doesn’t know its own IP
• Amazon Kinesis does not provide client IP
• But rely on IP address for GEO lookup
• No geographic splits
• Big problem
Solution: Geo lookup service (v1)
Solution: Geo lookup service (v2)
Challenge: Scaling
• Can our system scale with minimal effort and impact?
• Stream
•  Can scale up / down with Amazon Kinesis Scaling Utils
• Consumers
•  Can add more machines, pegged to shard count rather
than records if memory bottlenecked
• Hadoop
•  Add more nodes, but enough extra room
• Storm?
Kinesis Scaling Utils
$ java -cp KinesisScalingUtils.jar-complete.jar 
-Dstream-name=<stream name> -Dscaling-action=resize 
–Dcount=<new shard count> ScalingClient
Github: Kinesis Scaling Utils
Scaling and Amazon Kinesis Storm Spout
• Assigning tasks to shards
• Required topology restart so that zookeeper could refresh the
shard list
• Solved in 1.1.1; now only requires “storm rebalance”
• Need to be sure that withEmptyRecordListBackoffMillis
setting is adequately low (defaults to 5 post 1.1.0)
• Loss of state
• Restarting / rebalancing causes tasks to lose their state.
• Breaks topology operations, which require state such as
unique counts and joins.
Redis to the rescue
• Redis is a scalable, in-memory key-value store
• Solution: Store long running state to Redis
• Count unique values using “sets”
• Perform joins using key-value hashes
• Easy deployment / management using Amazon ElastiCache
Redis unique counter: Local aggregator
@Override
public void execute(Tuple tuple) {
if (!(TupleHelpers.isTickTuple(tuple))) {
addItemToSet(listOfIds, tuple.getID());
} else {
emit(listOfIds);
}
collector.ack(tuple);
}
private void addItemToSet(HashSet<String> listOfIds, String id) {
listOfIds.add(id);
}
private void emit(listOfIds) {
Jedis jedis = redisPool.getResource();
Pipeline redisPipeline = jedis.pipelined();
String redisKeyName = "dau:day:" + FormatDateTime.getCurrentFormattedTimestamp("yyyy-MM-dd");
ttlSec = 60 * 60 * 24;
for(String id : listOfIds){ redisPipeline.sadd(redisKeyName, id); }
redisPipeline.sync();
if(jedis.ttl(redisKeyName) == -1 ) {
jedis.expire(redisKeyName, ttlSec);
}
jedis.close();
listOfIds.clear()
}
Redis unique counter: Global aggregator
@Override
public void execute(Tuple tuple) {
Jedis jedis = redisPool.getResource();
String redisKeyName = "dau:day:" + FormatDateTime.getCurrentFormattedTimestamp("yyyy-MM-dd");
Double unique_count = jedis.scard(redisKeyName).doubleValue();
jedis.close();
emitToWherever(redisKeyName, unique_count);
collector.ack(tuple);
}
@Override
public Map<String, Object> getComponentConfiguration() {
Map<String, Object> conf = new HashMap<String, Object>();
conf.put(Config.TOPOLOGY_TICK_TUPLE_FREQ_SECS, emitFrequencyInSeconds);
return conf;
}
In closing
• Amazon Kinesis Connector Library makes basic
consumer applications simple
• Amazon Kinesis Storm Spout enables real-time
processing
• Optimize Hadoop file size with CombineFileInputFormat
• Geo Lookup service in lieu of Amazon Kinesis API
• Scale with Amazon Kinesis scaling utils and Storm
Spout 1.1.1
Thank you!
GAM406
Remember to complete
your evaluations!

More Related Content

What's hot

What's hot (20)

Scalable Gaming with AWS - GDC 2014
Scalable Gaming with AWS - GDC 2014Scalable Gaming with AWS - GDC 2014
Scalable Gaming with AWS - GDC 2014
 
Connecting Your Customers – Building Successful Mobile Games through the Powe...
Connecting Your Customers – Building Successful Mobile Games through the Powe...Connecting Your Customers – Building Successful Mobile Games through the Powe...
Connecting Your Customers – Building Successful Mobile Games through the Powe...
 
Mobile Game Architectures on AWS (MBL201) | AWS re:Invent 2013
Mobile Game Architectures on AWS (MBL201) | AWS re:Invent 2013Mobile Game Architectures on AWS (MBL201) | AWS re:Invent 2013
Mobile Game Architectures on AWS (MBL201) | AWS re:Invent 2013
 
Deep Dive on Amazon EC2
Deep Dive on Amazon EC2Deep Dive on Amazon EC2
Deep Dive on Amazon EC2
 
Deep Dive on Microservices and Amazon ECS
Deep Dive on Microservices and Amazon ECSDeep Dive on Microservices and Amazon ECS
Deep Dive on Microservices and Amazon ECS
 
Streaming Data Analytics with Amazon Kinesis Firehose and Redshift
Streaming Data Analytics with Amazon Kinesis Firehose and RedshiftStreaming Data Analytics with Amazon Kinesis Firehose and Redshift
Streaming Data Analytics with Amazon Kinesis Firehose and Redshift
 
Deep Dive on Amazon Elastic Block Store
Deep Dive on Amazon Elastic Block StoreDeep Dive on Amazon Elastic Block Store
Deep Dive on Amazon Elastic Block Store
 
AWS re:Invent 2016: [JK REPEAT] Deep Dive on Amazon EC2 Instances, Featuring ...
AWS re:Invent 2016: [JK REPEAT] Deep Dive on Amazon EC2 Instances, Featuring ...AWS re:Invent 2016: [JK REPEAT] Deep Dive on Amazon EC2 Instances, Featuring ...
AWS re:Invent 2016: [JK REPEAT] Deep Dive on Amazon EC2 Instances, Featuring ...
 
(DAT407) Amazon ElastiCache: Deep Dive
(DAT407) Amazon ElastiCache: Deep Dive(DAT407) Amazon ElastiCache: Deep Dive
(DAT407) Amazon ElastiCache: Deep Dive
 
GDC 2015 - Low-latency Multiplayer Gaming with AWS
GDC 2015 - Low-latency Multiplayer Gaming with AWS GDC 2015 - Low-latency Multiplayer Gaming with AWS
GDC 2015 - Low-latency Multiplayer Gaming with AWS
 
Container Orchestration with Amazon ECS
Container Orchestration with Amazon ECSContainer Orchestration with Amazon ECS
Container Orchestration with Amazon ECS
 
Feedback on AWS re:invent 2016
Feedback on AWS re:invent 2016Feedback on AWS re:invent 2016
Feedback on AWS re:invent 2016
 
Cloud Price Comparison - AWS vs Azure vs Google
Cloud Price Comparison - AWS vs Azure vs GoogleCloud Price Comparison - AWS vs Azure vs Google
Cloud Price Comparison - AWS vs Azure vs Google
 
Deep Learning with AWS (November 2016)
Deep Learning with AWS (November 2016)Deep Learning with AWS (November 2016)
Deep Learning with AWS (November 2016)
 
Netflix viewing data architecture evolution - QCon 2014
Netflix viewing data architecture evolution - QCon 2014Netflix viewing data architecture evolution - QCon 2014
Netflix viewing data architecture evolution - QCon 2014
 
Achieve big data analytic platform with lambda architecture on cloud
Achieve big data analytic platform with lambda architecture on cloudAchieve big data analytic platform with lambda architecture on cloud
Achieve big data analytic platform with lambda architecture on cloud
 
AWS re:Invent 2016: Get Technically Inspired by Container-Powered Migrations ...
AWS re:Invent 2016: Get Technically Inspired by Container-Powered Migrations ...AWS re:Invent 2016: Get Technically Inspired by Container-Powered Migrations ...
AWS re:Invent 2016: Get Technically Inspired by Container-Powered Migrations ...
 
Autoscaling in kubernetes v1
Autoscaling in kubernetes v1Autoscaling in kubernetes v1
Autoscaling in kubernetes v1
 
Micrsoservices unleashed with containers and ECS
Micrsoservices unleashed with containers and ECSMicrsoservices unleashed with containers and ECS
Micrsoservices unleashed with containers and ECS
 
AWS re:Invent 2016: From Resilience to Ubiquity - #NetflixEverywhere Global A...
AWS re:Invent 2016: From Resilience to Ubiquity - #NetflixEverywhere Global A...AWS re:Invent 2016: From Resilience to Ubiquity - #NetflixEverywhere Global A...
AWS re:Invent 2016: From Resilience to Ubiquity - #NetflixEverywhere Global A...
 

Viewers also liked

Viewers also liked (7)

MOLDEAS at City College
MOLDEAS at City CollegeMOLDEAS at City College
MOLDEAS at City College
 
WP4-QoS Management in the Cloud
WP4-QoS Management in the CloudWP4-QoS Management in the Cloud
WP4-QoS Management in the Cloud
 
Transforming Software Development
Transforming Software Development Transforming Software Development
Transforming Software Development
 
Overview of AWS Partner Programs in the Public Sector
Overview of AWS Partner Programs in the Public SectorOverview of AWS Partner Programs in the Public Sector
Overview of AWS Partner Programs in the Public Sector
 
Map/Reduce intro
Map/Reduce introMap/Reduce intro
Map/Reduce intro
 
Advanced Security Best Practices Masterclass
Advanced Security Best Practices MasterclassAdvanced Security Best Practices Masterclass
Advanced Security Best Practices Masterclass
 
Apache Spark の紹介(前半:Sparkのキホン)
Apache Spark の紹介(前半:Sparkのキホン)Apache Spark の紹介(前半:Sparkのキホン)
Apache Spark の紹介(前半:Sparkのキホン)
 

Similar to (GAM406) Glu Mobile: Real-time Analytics Processing og 10 MM+ Devices

M6d cassandrapresentation
M6d cassandrapresentationM6d cassandrapresentation
M6d cassandrapresentation
Edward Capriolo
 
AWS CLOUD 2018- Amazon DynamoDB기반 글로벌 서비스 개발 방법 (김준형 솔루션즈 아키텍트)
AWS CLOUD 2018- Amazon DynamoDB기반 글로벌 서비스 개발 방법 (김준형 솔루션즈 아키텍트)AWS CLOUD 2018- Amazon DynamoDB기반 글로벌 서비스 개발 방법 (김준형 솔루션즈 아키텍트)
AWS CLOUD 2018- Amazon DynamoDB기반 글로벌 서비스 개발 방법 (김준형 솔루션즈 아키텍트)
Amazon Web Services Korea
 

Similar to (GAM406) Glu Mobile: Real-time Analytics Processing og 10 MM+ Devices (20)

Clustrix Database Percona Ruby on Rails benchmark
Clustrix Database Percona Ruby on Rails benchmarkClustrix Database Percona Ruby on Rails benchmark
Clustrix Database Percona Ruby on Rails benchmark
 
Riga dev day: Lambda architecture at AWS
Riga dev day: Lambda architecture at AWSRiga dev day: Lambda architecture at AWS
Riga dev day: Lambda architecture at AWS
 
(BDT403) Best Practices for Building Real-time Streaming Applications with Am...
(BDT403) Best Practices for Building Real-time Streaming Applications with Am...(BDT403) Best Practices for Building Real-time Streaming Applications with Am...
(BDT403) Best Practices for Building Real-time Streaming Applications with Am...
 
OpenStack Cinder, Implementation Today and New Trends for Tomorrow
OpenStack Cinder, Implementation Today and New Trends for TomorrowOpenStack Cinder, Implementation Today and New Trends for Tomorrow
OpenStack Cinder, Implementation Today and New Trends for Tomorrow
 
Kafka streams decoupling with stores
Kafka streams decoupling with storesKafka streams decoupling with stores
Kafka streams decoupling with stores
 
Re invent announcements_2016_hcls_use_cases_mchampion
Re invent announcements_2016_hcls_use_cases_mchampionRe invent announcements_2016_hcls_use_cases_mchampion
Re invent announcements_2016_hcls_use_cases_mchampion
 
Agility and Scalability with MongoDB
Agility and Scalability with MongoDBAgility and Scalability with MongoDB
Agility and Scalability with MongoDB
 
Kotlin @ Coupang Backed - JetBrains Day seoul 2018
Kotlin @ Coupang Backed - JetBrains Day seoul 2018Kotlin @ Coupang Backed - JetBrains Day seoul 2018
Kotlin @ Coupang Backed - JetBrains Day seoul 2018
 
Real time data pipline with kafka streams
Real time data pipline with kafka streamsReal time data pipline with kafka streams
Real time data pipline with kafka streams
 
M6d cassandrapresentation
M6d cassandrapresentationM6d cassandrapresentation
M6d cassandrapresentation
 
Latest (storage IO) patterns for cloud-native applications
Latest (storage IO) patterns for cloud-native applications Latest (storage IO) patterns for cloud-native applications
Latest (storage IO) patterns for cloud-native applications
 
The Future of Apache Storm
The Future of Apache StormThe Future of Apache Storm
The Future of Apache Storm
 
DjangoCon 2010 Scaling Disqus
DjangoCon 2010 Scaling DisqusDjangoCon 2010 Scaling Disqus
DjangoCon 2010 Scaling Disqus
 
Real-time Data Processing Using AWS Lambda
Real-time Data Processing Using AWS LambdaReal-time Data Processing Using AWS Lambda
Real-time Data Processing Using AWS Lambda
 
AWS Database Services-Philadelphia AWS User Group-4-17-2018
AWS Database Services-Philadelphia AWS User Group-4-17-2018AWS Database Services-Philadelphia AWS User Group-4-17-2018
AWS Database Services-Philadelphia AWS User Group-4-17-2018
 
Chicago Kafka Meetup
Chicago Kafka MeetupChicago Kafka Meetup
Chicago Kafka Meetup
 
Amazon Kinesis
Amazon KinesisAmazon Kinesis
Amazon Kinesis
 
Architecting for Microservices Part 2
Architecting for Microservices Part 2Architecting for Microservices Part 2
Architecting for Microservices Part 2
 
AWS CLOUD 2018- Amazon DynamoDB기반 글로벌 서비스 개발 방법 (김준형 솔루션즈 아키텍트)
AWS CLOUD 2018- Amazon DynamoDB기반 글로벌 서비스 개발 방법 (김준형 솔루션즈 아키텍트)AWS CLOUD 2018- Amazon DynamoDB기반 글로벌 서비스 개발 방법 (김준형 솔루션즈 아키텍트)
AWS CLOUD 2018- Amazon DynamoDB기반 글로벌 서비스 개발 방법 (김준형 솔루션즈 아키텍트)
 
Realtime Analytics on AWS
Realtime Analytics on AWSRealtime Analytics on AWS
Realtime Analytics on AWS
 

More from Amazon Web Services

Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
Amazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
Amazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
Amazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
Amazon Web Services
 

More from Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

Recently uploaded

CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
 

Recently uploaded (20)

04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 

(GAM406) Glu Mobile: Real-time Analytics Processing og 10 MM+ Devices

  • 1. © 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Jordan Young – Manager, Analytics Engineering, Glu Mobile October 2015 GAM406 Glu Mobile An Amazon Kinesis-centric data platform to process real-time gaming events for 10+ million user devices
  • 2. What to expect from the session • Glu Mobile: Data requirements and challenges • Architecture overview and decisions • Amazon Kinesis: Producers, Streams, and the Amazon Kinesis Connector Library • Real Time: Storm and Amazon Kinesis Storm Spout • Other challenges and insights
  • 3. Glu mobile basics • A mobile gaming leader across genres • 4 titles in top 100 grossing (US) (9/24/15) • 4–6 million daily active users (DAU) (typical, 2015) • 1 billion+ global installs (2010-)
  • 4. What we collect High, variable volume • 700 million to 2+ billion events per day • 600 bytes per event • Up to 1.2 TB per day • Could scale up further with successful game launch Multiple sources • Client side SDKs • Game servers, central services servers • Attribution partners • Ad networks • Third parties
  • 5. Basic requirements • Near zero data loss • High levels of uptime • Flexible data format — JSON with arbitrary fields • Real-time aggregations • Reasonably low latency for ad-hoc queries (hourly batching OK)
  • 6. Other requirements • Not expensive • Can be implemented with minimal engineering effort • Requires minimal changes to existing games
  • 10. Next Step: Bring data collection in house • Build our own analytics SDK • Need a framework for collecting data from SDK • Options: • Build our own streaming and collection (Apache Kafka) • Use a hosted service (Amazon Kinesis)
  • 11. Amazon Kinesis: Producers, Streams, and the Amazon Kinesis Connector Library
  • 12. What is Amazon Kinesis?
  • 13. Why Amazon Kinesis? • Minimal setup time • Prebuilt applications (Amazon Kinesis Connector Library [KCL], Amazon Kinesis Storm Spout) • Extremely minimal maintenance • Minimal hardware • No significant price advantages either way (vs Kafka)
  • 14. Producers • Custom built client SDKs • Native Android (Java), Native iOS (Obj-C) plug-ins • Unity wrapper for unity titles • Built on top of AWS SDKs (for each platform) • Implements our internal analytical schema / standards • Additional server-side implementations
  • 15. Producers (continued) • Vanilla KinesisRecord.submitAllRecords() • No record batching • No compression • Records flushed every 30 seconds, or on certain events • Client authentication using Amazon Cognito • Server authentication using AWS Identity and Access Management (IAM) profiles
  • 16. How many shards? • Shard limits: • 1,000 records per second • 1 MB per sec writes • 2 MB per sec read • Our situation: 20,000 RPS, 600 bytes per message • Need at least 20 shards to handle message count • Only need 12 MB per sec write capacity • 20 shards = 40 MB per sec read capacity • Up to 3 apps OK (36 MB < 40 MB) • Other considerations (peak load, excess capacity)
  • 17. Consumers: Amazon Kinesis Connector Library
  • 18. Consumers: KCL pipeline public class S3Pipeline implements IKinesisConnectorPipeline<String, byte[]> { @Override public ITransformer<String, byte[]> getTransformer(KinesisConnectorConfiguration configuration) { return new GluJsonTransformer(); } @Override public IFilter<String> getFilter(KinesisConnectorConfiguration configuration) { return new GluMessageFilter<String>(); } @Override public IBuffer<String> getBuffer(KinesisConnectorConfiguration configuration) { return new BasicMemoryBuffer<String>(configuration); } @Override public IEmitter<byte[]> getEmitter(KinesisConnectorConfiguration configuration) { return new GluS3Emitter(configuration); } }
  • 19. Consumers: Transformer implementation ex. @Override public String toClass(Record record) { String json_str = ""; try { json_str = new String(record.getData().array()); } catch (Exception e) { return null; } if (json_str != null && !json_str.isEmpty()) { if (json_str.startsWith("{") && json_str.endsWith("}")) { json_str = json_str.substring(0, json_str.length()-1); if (json_str.length() > 3) { json_str += ","; json_str = json_str + ""kin_seq_num":"" + record.getSequenceNumber() + "","; json_str = json_str + ""server_ts":" + System.currentTimeMillis() + "}"; } } } return json_str; } @Override public byte[] fromClass(String record) { return record.getBytes(); }
  • 20. Real Time: Storm and the Amazon Kinesis Storm Spout
  • 21. Storm and real-time data • Distributed, fault-tolerant • Processes real-time data • Views records as “tuples” which are passed through an arbitrary DAG of nodes (a topology) • Spouts: Emit tuples into the topology • Bolts: Process tuples in the topology • Read from anywhere, write to anywhere
  • 24. Implementing the Amazon Kinesis Storm Spout //Define configuration parameters final KinesisSpoutConfig config = new KinesisSpoutConfig(streamName, zookeeperEndpoint).withZookeeperPrefix(zookeeperPrefix) .withKinesisRecordScheme(new DefaultKinesisRecordScheme()) .withRecordRetryLimit(recordRetryLimit) .withInitialPositionInStream(initialPositionInStream) //LATEST or TRIM_HORIZON .withEmptyRecordListBackoffMillis(emptyRecordListBackoffMillis); //Create Spout final KinesisSpout spout = new KinesisSpout(config, new CustomCredentialsProviderChain(awsAccessKey, awsSecretKey), new ClientConfiguration()); //Set Spout in Topology and define parallelism builder.setSpout("kinesis_spout", spout, num_spout_executors);
  • 25. Storm: Lessons • Only extract necessary fields when deserializing • Big instances with few workers (JVMs) • Too many workers can reduce speeds • Balance flexibility vs. speed • Final state (throughput and hardware) • Can handle up to ~42K RPS on (4) c4.2xlarge instances • Using (2) m3.large for ZooKeeper, m3.xlarge for Nimbus
  • 28. Challenge: Shards, buffers, and file size • Each shard is handled by its own buffer • A KCL instance needs a buffer for each shard • More shards per machine = more memory • Avoid memory bottleneck by reducing buffer size  Creates more, smaller files • Hadoop does not like this!
  • 31. Challenge: No IP address on record • Record sent to stream without IP • Device doesn’t know its own IP • Amazon Kinesis does not provide client IP • But rely on IP address for GEO lookup • No geographic splits • Big problem
  • 32. Solution: Geo lookup service (v1)
  • 33. Solution: Geo lookup service (v2)
  • 34. Challenge: Scaling • Can our system scale with minimal effort and impact? • Stream •  Can scale up / down with Amazon Kinesis Scaling Utils • Consumers •  Can add more machines, pegged to shard count rather than records if memory bottlenecked • Hadoop •  Add more nodes, but enough extra room • Storm?
  • 35. Kinesis Scaling Utils $ java -cp KinesisScalingUtils.jar-complete.jar -Dstream-name=<stream name> -Dscaling-action=resize –Dcount=<new shard count> ScalingClient Github: Kinesis Scaling Utils
  • 36. Scaling and Amazon Kinesis Storm Spout • Assigning tasks to shards • Required topology restart so that zookeeper could refresh the shard list • Solved in 1.1.1; now only requires “storm rebalance” • Need to be sure that withEmptyRecordListBackoffMillis setting is adequately low (defaults to 5 post 1.1.0) • Loss of state • Restarting / rebalancing causes tasks to lose their state. • Breaks topology operations, which require state such as unique counts and joins.
  • 37. Redis to the rescue • Redis is a scalable, in-memory key-value store • Solution: Store long running state to Redis • Count unique values using “sets” • Perform joins using key-value hashes • Easy deployment / management using Amazon ElastiCache
  • 38. Redis unique counter: Local aggregator @Override public void execute(Tuple tuple) { if (!(TupleHelpers.isTickTuple(tuple))) { addItemToSet(listOfIds, tuple.getID()); } else { emit(listOfIds); } collector.ack(tuple); } private void addItemToSet(HashSet<String> listOfIds, String id) { listOfIds.add(id); } private void emit(listOfIds) { Jedis jedis = redisPool.getResource(); Pipeline redisPipeline = jedis.pipelined(); String redisKeyName = "dau:day:" + FormatDateTime.getCurrentFormattedTimestamp("yyyy-MM-dd"); ttlSec = 60 * 60 * 24; for(String id : listOfIds){ redisPipeline.sadd(redisKeyName, id); } redisPipeline.sync(); if(jedis.ttl(redisKeyName) == -1 ) { jedis.expire(redisKeyName, ttlSec); } jedis.close(); listOfIds.clear() }
  • 39. Redis unique counter: Global aggregator @Override public void execute(Tuple tuple) { Jedis jedis = redisPool.getResource(); String redisKeyName = "dau:day:" + FormatDateTime.getCurrentFormattedTimestamp("yyyy-MM-dd"); Double unique_count = jedis.scard(redisKeyName).doubleValue(); jedis.close(); emitToWherever(redisKeyName, unique_count); collector.ack(tuple); } @Override public Map<String, Object> getComponentConfiguration() { Map<String, Object> conf = new HashMap<String, Object>(); conf.put(Config.TOPOLOGY_TICK_TUPLE_FREQ_SECS, emitFrequencyInSeconds); return conf; }
  • 40. In closing • Amazon Kinesis Connector Library makes basic consumer applications simple • Amazon Kinesis Storm Spout enables real-time processing • Optimize Hadoop file size with CombineFileInputFormat • Geo Lookup service in lieu of Amazon Kinesis API • Scale with Amazon Kinesis scaling utils and Storm Spout 1.1.1