SlideShare une entreprise Scribd logo
1  sur  24
Edge in the cloud Salim Hemdani   VP, Experiences and Platforms @shemdani
500,000,000,000
1,000
100
25
13
What are these numbers?
Numbers 500,000,000,000 records 1,000 clients 100 markets 25 data sources 13 terabytes per day
Agenda
Time for a change
Transition Service Agreement Move from Atlas ,[object Object]
Heavy on CAPEX
Managed by Atlas/MSFT networking teams
To be completed by October 2010; no interruption in SLAMove away from PVM
Ad Serving Event Log Request hash(key) mod R FS01 FS03 FS02 98101 98104 98115 98201 98203 98004 98007 98065
MapReduce (divide and concur) HDFS ,[object Object]
Distributed processing
Language agnostic Any Language Job tracker Task tracker
AWS
Aggregate Ad Serving data  Log Files File Export APIs Internet Client Provided Data Data Sources Presentation Layer Talend Data Flow Manager Direct Analytics Processing via EMR Web Application Layer ODBC Edge Provisioning DB OLAP Cache Cloud Storage S3  HBase/SDB 15 Elastic MapReduce
Name Brand Retailer Case Study Business challenge ,[object Object]
Decreasing web marketing effectiveness
Monetization of their web assets,[object Object]
Drive a personalized message User recently purchased a home theater system and is now looking for sports games Target Ad ( 1.7 million per day )

Contenu connexe

Tendances

Big Data and Analytics Innovation Summit
Big Data and Analytics Innovation SummitBig Data and Analytics Innovation Summit
Big Data and Analytics Innovation Summit
Martin Yan
 

Tendances (20)

Building a real-time, scalable and intelligent programmatic ad buying platform
Building a real-time, scalable and intelligent programmatic ad buying platformBuilding a real-time, scalable and intelligent programmatic ad buying platform
Building a real-time, scalable and intelligent programmatic ad buying platform
 
CTX case study
CTX case studyCTX case study
CTX case study
 
High availability, real-time and scalable architectures
High availability, real-time and scalable architecturesHigh availability, real-time and scalable architectures
High availability, real-time and scalable architectures
 
Processing 19 billion messages in real time and NOT dying in the process
Processing 19 billion messages in real time and NOT dying in the processProcessing 19 billion messages in real time and NOT dying in the process
Processing 19 billion messages in real time and NOT dying in the process
 
Build vs Migrate to PaaS
Build vs Migrate to PaaSBuild vs Migrate to PaaS
Build vs Migrate to PaaS
 
Making the move from iaa s to iaas+
Making the move from iaa s to iaas+Making the move from iaa s to iaas+
Making the move from iaa s to iaas+
 
Innovation with AWS on : Big Data Analytics
Innovation with AWS on : Big Data AnalyticsInnovation with AWS on : Big Data Analytics
Innovation with AWS on : Big Data Analytics
 
Analyze Amazon CloudFront, S3 & ELB Logs with Cloudlytics - Part 1
Analyze Amazon CloudFront, S3 & ELB Logs with Cloudlytics - Part 1Analyze Amazon CloudFront, S3 & ELB Logs with Cloudlytics - Part 1
Analyze Amazon CloudFront, S3 & ELB Logs with Cloudlytics - Part 1
 
PolarSeven - AWS Meetup Presentation Dec 2014
PolarSeven - AWS Meetup Presentation Dec 2014PolarSeven - AWS Meetup Presentation Dec 2014
PolarSeven - AWS Meetup Presentation Dec 2014
 
World's best AWS Cloud Log Analytics & Management Tool
World's best AWS Cloud Log Analytics & Management ToolWorld's best AWS Cloud Log Analytics & Management Tool
World's best AWS Cloud Log Analytics & Management Tool
 
Distributed Tracing: New DevOps Foundation
Distributed Tracing: New DevOps FoundationDistributed Tracing: New DevOps Foundation
Distributed Tracing: New DevOps Foundation
 
Cloud- IaaS in Perspective AWS
Cloud- IaaS in Perspective AWSCloud- IaaS in Perspective AWS
Cloud- IaaS in Perspective AWS
 
Big Data and Analytics Innovation Summit
Big Data and Analytics Innovation SummitBig Data and Analytics Innovation Summit
Big Data and Analytics Innovation Summit
 
Load data from AWS S3 to Snowflake in minutes
Load data from AWS S3 to Snowflake in minutesLoad data from AWS S3 to Snowflake in minutes
Load data from AWS S3 to Snowflake in minutes
 
Euronext_AWS_talend_connect_paris_2018.pdf
Euronext_AWS_talend_connect_paris_2018.pdfEuronext_AWS_talend_connect_paris_2018.pdf
Euronext_AWS_talend_connect_paris_2018.pdf
 
Integrating Web and Business Data
Integrating Web and Business DataIntegrating Web and Business Data
Integrating Web and Business Data
 
Introduction to Data Analysis, Storage & Processing Solutions
Introduction to Data Analysis, Storage & Processing SolutionsIntroduction to Data Analysis, Storage & Processing Solutions
Introduction to Data Analysis, Storage & Processing Solutions
 
Large Scale Data Analysis with AWS
Large Scale Data Analysis with AWSLarge Scale Data Analysis with AWS
Large Scale Data Analysis with AWS
 
Cloud computing
Cloud computingCloud computing
Cloud computing
 
1Spatial Australia: Introduction and getting started with fme 2017
1Spatial Australia: Introduction and getting started with fme 20171Spatial Australia: Introduction and getting started with fme 2017
1Spatial Australia: Introduction and getting started with fme 2017
 

Similaire à Razorfish - Amazon EMR usecase

State of the Union: Database & Analytics
State of the Union: Database & AnalyticsState of the Union: Database & Analytics
State of the Union: Database & Analytics
Amazon Web Services
 

Similaire à Razorfish - Amazon EMR usecase (20)

Slides: Proven Strategies for Hybrid Cloud Computing with Mainframes — From A...
Slides: Proven Strategies for Hybrid Cloud Computing with Mainframes — From A...Slides: Proven Strategies for Hybrid Cloud Computing with Mainframes — From A...
Slides: Proven Strategies for Hybrid Cloud Computing with Mainframes — From A...
 
Keynote sp summit 2014 final
Keynote sp summit 2014  finalKeynote sp summit 2014  final
Keynote sp summit 2014 final
 
The AWS Big Data Platform – Overview
The AWS Big Data Platform – OverviewThe AWS Big Data Platform – Overview
The AWS Big Data Platform – Overview
 
Getting Started with Amazon Redshift - AWS July 2016 Webinar Series
Getting Started with Amazon Redshift - AWS July 2016 Webinar SeriesGetting Started with Amazon Redshift - AWS July 2016 Webinar Series
Getting Started with Amazon Redshift - AWS July 2016 Webinar Series
 
State of the Union: Database & Analytics
State of the Union: Database & AnalyticsState of the Union: Database & Analytics
State of the Union: Database & Analytics
 
Introduction to Amazon Redshift
Introduction to Amazon RedshiftIntroduction to Amazon Redshift
Introduction to Amazon Redshift
 
Data Estate Modernization
Data Estate ModernizationData Estate Modernization
Data Estate Modernization
 
Building with Purpose - Built Databases: Match Your Workloads to the Right Da...
Building with Purpose - Built Databases: Match Your Workloads to the Right Da...Building with Purpose - Built Databases: Match Your Workloads to the Right Da...
Building with Purpose - Built Databases: Match Your Workloads to the Right Da...
 
AWS Big Data Platform
AWS Big Data PlatformAWS Big Data Platform
AWS Big Data Platform
 
NetApp Cloud Data Services & AWS Empower Your Cloud Champions
NetApp Cloud Data Services & AWS Empower Your Cloud ChampionsNetApp Cloud Data Services & AWS Empower Your Cloud Champions
NetApp Cloud Data Services & AWS Empower Your Cloud Champions
 
When Open Source Meets the Enterprise
When Open Source Meets the EnterpriseWhen Open Source Meets the Enterprise
When Open Source Meets the Enterprise
 
AWS March 2016 Webinar Series - Building Big Data Solutions with Amazon EMR a...
AWS March 2016 Webinar Series - Building Big Data Solutions with Amazon EMR a...AWS March 2016 Webinar Series - Building Big Data Solutions with Amazon EMR a...
AWS March 2016 Webinar Series - Building Big Data Solutions with Amazon EMR a...
 
The Most Trusted In-Memory database in the world- Altibase
The Most Trusted In-Memory database in the world- AltibaseThe Most Trusted In-Memory database in the world- Altibase
The Most Trusted In-Memory database in the world- Altibase
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon Redshift
 
AWS Summit Atlanta Keynote
AWS Summit Atlanta KeynoteAWS Summit Atlanta Keynote
AWS Summit Atlanta Keynote
 
When Open Source Meets the Enterprise
When Open Source Meets the EnterpriseWhen Open Source Meets the Enterprise
When Open Source Meets the Enterprise
 
Using AWS Purpose-Built Databases to Modernize your Applications
Using AWS Purpose-Built Databases to Modernize your ApplicationsUsing AWS Purpose-Built Databases to Modernize your Applications
Using AWS Purpose-Built Databases to Modernize your Applications
 
Astroinformatics 2014: Scientific Computing on the Cloud with Amazon Web Serv...
Astroinformatics 2014: Scientific Computing on the Cloud with Amazon Web Serv...Astroinformatics 2014: Scientific Computing on the Cloud with Amazon Web Serv...
Astroinformatics 2014: Scientific Computing on the Cloud with Amazon Web Serv...
 
Architecting Data Lakes on AWS
Architecting Data Lakes on AWSArchitecting Data Lakes on AWS
Architecting Data Lakes on AWS
 
Financial Services Analytics on AWS
Financial Services Analytics on AWSFinancial Services Analytics on AWS
Financial Services Analytics on AWS
 

Razorfish - Amazon EMR usecase

  • 1. Edge in the cloud Salim Hemdani VP, Experiences and Platforms @shemdani
  • 4. 100
  • 5. 25
  • 6. 13
  • 7. What are these numbers?
  • 8. Numbers 500,000,000,000 records 1,000 clients 100 markets 25 data sources 13 terabytes per day
  • 10. Time for a change
  • 11.
  • 13. Managed by Atlas/MSFT networking teams
  • 14. To be completed by October 2010; no interruption in SLAMove away from PVM
  • 15. Ad Serving Event Log Request hash(key) mod R FS01 FS03 FS02 98101 98104 98115 98201 98203 98004 98007 98065
  • 16.
  • 18. Language agnostic Any Language Job tracker Task tracker
  • 19. AWS
  • 20. Aggregate Ad Serving data Log Files File Export APIs Internet Client Provided Data Data Sources Presentation Layer Talend Data Flow Manager Direct Analytics Processing via EMR Web Application Layer ODBC Edge Provisioning DB OLAP Cache Cloud Storage S3 HBase/SDB 15 Elastic MapReduce
  • 21.
  • 22. Decreasing web marketing effectiveness
  • 23.
  • 24. Drive a personalized message User recently purchased a home theater system and is now looking for sports games Target Ad ( 1.7 million per day )
  • 25. We import Atlas transaction level data 24 servers S3 file storage Compress and upload 200 + GB of data per day ( 180 days = ½ Trillion ICA records )
  • 26. We use EMR to process and segment EMR S3 100 Machinecluster created on demand ( 3.5 Billion records, 71 million unique cookies a day)
  • 27. Process and Cost This all happens in about 8 hours every day and is fully automated (previously 2+ days) And increased ROAS by 500% (to $74)
  • 28. Why AWS Efficient Elastic infrastructure from AWS allows capacity to be provisioned as needed based on load, reducing cost and the risk of processing delays Ease of integration Amazon Elastic MapReduce with Cascading allows data processing in the cloud without any changes to the underlying algorithms Flexible Hadoop with Cascading is flexible enough to allow “agile” implementation and unit testing of sophisticated algorithms. Adaptable Cascading simplifies the integration of Hadoop with external ad system Scalable AWS infrastructure helps reliably store and process huge (Petabytes) data setss

Notes de l'éditeur

  1. Return on advertising spend (ROAS)