SlideShare une entreprise Scribd logo
1  sur  58
Big Data & Analytics 
Use Cases in 
Mobile, E-commerce, Media and more 
Russell Nash 
AWS Solutions Architect
Product? 
Do we have a product? 
Can we ship? 
How to develop faster? 
Better? Cheaper? 
Market? 
Can we scale? 
What do people do & why? 
How do we optimize?
• 10 million guests 
• 550,000 properties listed 
• Massive growth on AWS 
• $776.4M from top investors 
• $10B valuation – more than Hyatt
“At Airbnb, we look into all possible ways to 
improve our product and user experience. Often 
times this involves lots of analytics behind the 
http://nerds.airbnb.com/redshift-performance-cost/ 
scene.” 
Henry Cai 蔡明航 
Software Engineer, Growth at Airbnb
The best startups use AWS for analytics…
Agenda 
• Big Data Overview 
• MapReduce / Hadoop 
• Case Study: Yelp 
• Data Warehousing 
• Case Study: Foursquare 
• NoSQL 
• Case Study: AdRoll 
• Streaming 
• Case Study: Supercell
STREAMING 
Hadoop MPP NoSQL
KINESIS 
EMR Redshift DynamoDB
Structure 
High Low 
Large 
Size 
Small 
Traditional 
Database 
Hadoop 
NoSQL 
MPP DW
Hadoop MPP NoSQL 
Structure 
Latency 
Interfaces
Background 
• 2004 – Map Reduce 
• 2006 – Hadoop
Input 
File 
Functions Output 
Hadoop cluster 
1. Very Flexible 
2. Very Scalable 
3. Often Transient
Big Data Verticals and Use cases 
Media/Advertising 
Targeted 
Advertising 
Image and 
Video 
Processing 
Oil & Gas 
Seismic 
Analysis 
Retail 
Recommendation 
s 
Transactions 
Analysis 
Life Sciences 
Genome 
Analysis 
Financial Services 
Monte Carlo 
Simulations 
Risk 
Analysis 
Security 
Anti-virus 
Fraud 
Detection 
Image 
Recognition 
Social 
Network/Gaming 
User 
Demographics 
Usage 
analysis 
In-game 
metrics
Deployment Options 
On-premise 
Cloud 
Managed on Cloud
Amazon 
Elastic MapReduce 
Manageability 
Scalability 
Cost
Case Study
400 GB of logs per day 
~12 Terabytes per month
1) Load log file data for six months 
of user search history into Amazon 
S3 
Amazon S3 
Search ID Search Text Final Selection 
12423451 westen Westin 
14235235 wisten Westin 
54332232 westenn Westin 
12423451 
14235235 
54332232 
12423451 
14235235 
54332232 
12423451 
14235235 
54332232 
12423451 
14235235 
54332232 
12423451
Amazon S3 Amazon EMR 
Log Files 
2) Spin up a 200 node cluster 
Hadoop Cluster
3) 200 nodes simultaneously analyze this 
data looking for common misspellings 
… this takes a few hours 
Hadoop Cluster 
Amazon S3 Amazon EMR
Amazon S3 Amazon EMR 
4) New common misspellings and 
suggestions loaded back into S3 
Hadoop Cluster 
Log Files
Amazon S3 Amazon EMR 
5) When the job is done, the cluster is 
shut down. 
Log Files
E-Commerce Case Study 
• Online Marketplace 
• EMR 
–Weblog analysis 
– Recommendations 
• Link logs with production database in EMR 
“Enables us to focus on developing our…analysis stack 
without worrying about the underlying infrastructure”
The Hadoop Ecosystem
Trends 
SQL on Hadoop 
Spark
Hadoop MPP NoSQL 
Structure 
Latency 
Interfaces 
Any 
Mins-Hours 
Programming 
SQL-Like 
Tools
Background 
MPP = Massively Parallel Processing 
SQL Databases for analytical workloads 
Performance 
Scalability 
Ease of Use 
Cost
1. SQL 
2. High Performance 
3. Broad Toolset
Deployment Options 
On-premise 
Cloud 
Managed on Cloud
Amazon Redshift 
Manageability 
Scalability 
Cost
Mobile Case Study 
• Location based social app 
• 40 Million users 
• 4.5 Billion check-ins 
• Multi-terabytes of log data
Who is checking in? 
0.6 
0.5 
0.4 
0.3 
0.2 
0.1 
0 
Gender 
Female Male 
Age 
0 20 40 60 80
When do people go to a place? 
Gorilla Coffee 
Gray's Papaya 
Amorino 
Thursday Friday Saturday Sunday
“Using Amazon Redshift has enabled the 
company to perform more agile analytics 
while saving costs.”
Media Case Study 
• Placeshifting and media streaming 
• Collect terabytes of event logs 
• Viewership, devices etc 
• Hadoop for transformation 
• Redshift for analysis 
“Redshift allows us to turn on a dime”
Performance Evaluation on 2B Rows 
Traditional 
SQL Database 
Amazon 
Redshift 
Aggregate by month 02:08:35 00:35:46 00:00:12
Hadoop MPP NoSQL 
Structure 
Latency 
Interfaces 
Any Full 
Mins-Hours Seconds-Minutes 
Programming 
SQL-Like 
Tools 
SQL 
BI Tools
Background 
Databases for webscale transactions 
Performance 
Flexibility
ID Age State 
123 20 CA 
345 25 WA 
678 40 FL 
Relational Table 
ID Attributes 
123 Age:20, State:CA 
345 Age:25, Country: Australia, Gender: F, Smoker: No 
678 Age:40 
Non-Relational Table
Deployment Options 
On-premise 
Cloud 
Managed on Cloud
Amazon 
DynamoDB 
Manageability 
Scalability 
Cost
Low Latency
Ad-Tech Case Study
Pixel “fires”
Pixel “fires” 
Serve ad?
Pixel “fires” 
Serve ad? 
Ad served
If you can’t reply in 100ms… It doesn’t matter anymore! 
Network 
40 
Buffer 
20 
Processing 
40
Snacks DynamoDB
Hadoop MPP NoSQL 
Structure 
Latency 
Interfaces 
Any Full Semi 
Mins-Hours Seconds-Minutes Sub-second 
Programming 
SQL-Like 
Tools 
SQL Programming 
Tools
Streaming 
Analytics
Use Cases 
• Gaming analytics 
• Sensor networks analytics 
• Ad network analytics 
• Log centralization 
• Click stream analysis 
• Hardware and software appliance metrics 
• …more…
Data 
Sources 
App.4 
[Machine 
Learning] 
AWS Endpoint 
App.1 
[Aggregate & 
De-Duplicate] 
Data 
Sources 
Data 
Sources 
Data 
Sources 
App.2 
[Metric 
Extraction] 
S3 
DynamoDB 
Redshift 
App.3 
[Sliding 
Window 
Analysis] 
Data 
Sources 
Availability 
Zone 
Availability 
Zone 
Shard 1 
Shard 2 
Shard N 
Availability 
Zone 
Amazon Kinesis 
EMR
“Amazon Kinesis enables our business-critical analytics and dashboard 
applications to reliably get the data streams they need, without delays. Amazon 
Kinesis also offloads a lot of developer burden in building a real-time, streaming 
data ingestion platform, and enables Supercell to focus on delivering games that 
delight players worldwide.” 
Sami Yliharju, Supercell Services Lead
Big Data Tutorials 
aws.amazon.com/big-data 
Redshift Free Trial 
aws.amazon.com/redshift/free-trial
Big Data & Analytics 
Use Cases in 
Mobile, e-commerce, media and more 
Russell Nash 
AWS Solutions Architect

Contenu connexe

Tendances

Big Data Analytics Strategy and Roadmap
Big Data Analytics Strategy and RoadmapBig Data Analytics Strategy and Roadmap
Big Data Analytics Strategy and Roadmap
Srinath Perera
 

Tendances (20)

How to Build Interactive Data Apps by ThoughtSpot Product Leaders
How to Build Interactive Data Apps by ThoughtSpot Product LeadersHow to Build Interactive Data Apps by ThoughtSpot Product Leaders
How to Build Interactive Data Apps by ThoughtSpot Product Leaders
 
Introduction to Data Engineering
Introduction to Data EngineeringIntroduction to Data Engineering
Introduction to Data Engineering
 
Lean Business Intelligence - How and Why Organizations Are Moving to Self-Ser...
Lean Business Intelligence - How and Why Organizations Are Moving to Self-Ser...Lean Business Intelligence - How and Why Organizations Are Moving to Self-Ser...
Lean Business Intelligence - How and Why Organizations Are Moving to Self-Ser...
 
Big Data Trends
Big Data TrendsBig Data Trends
Big Data Trends
 
Big_data_ppt
Big_data_ppt Big_data_ppt
Big_data_ppt
 
What Is Unstructured Data And Why Is It So Important To Businesses?
What Is Unstructured Data And Why Is It So Important To Businesses?What Is Unstructured Data And Why Is It So Important To Businesses?
What Is Unstructured Data And Why Is It So Important To Businesses?
 
DAS Slides: Building a Future-State Data Architecture Plan - Where to Begin?
DAS Slides: Building a Future-State Data Architecture Plan - Where to Begin?DAS Slides: Building a Future-State Data Architecture Plan - Where to Begin?
DAS Slides: Building a Future-State Data Architecture Plan - Where to Begin?
 
AWS Big Data Platform
AWS Big Data PlatformAWS Big Data Platform
AWS Big Data Platform
 
Data-Driven @ Netflix
Data-Driven @ NetflixData-Driven @ Netflix
Data-Driven @ Netflix
 
Introducing Databricks Delta
Introducing Databricks DeltaIntroducing Databricks Delta
Introducing Databricks Delta
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
 
Big Data Analytics Strategy and Roadmap
Big Data Analytics Strategy and RoadmapBig Data Analytics Strategy and Roadmap
Big Data Analytics Strategy and Roadmap
 
DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
 
Big Data Ppt PowerPoint Presentation Slides
Big Data Ppt PowerPoint Presentation Slides Big Data Ppt PowerPoint Presentation Slides
Big Data Ppt PowerPoint Presentation Slides
 
Learn to Use Databricks for the Full ML Lifecycle
Learn to Use Databricks for the Full ML LifecycleLearn to Use Databricks for the Full ML Lifecycle
Learn to Use Databricks for the Full ML Lifecycle
 
Driving Business Insights with a Modern Data Architecture AWS Summit SG 2017
Driving Business Insights with a Modern Data Architecture  AWS Summit SG 2017Driving Business Insights with a Modern Data Architecture  AWS Summit SG 2017
Driving Business Insights with a Modern Data Architecture AWS Summit SG 2017
 
Future of Data Engineering
Future of Data EngineeringFuture of Data Engineering
Future of Data Engineering
 
(BDT303) Construct Your ETL Pipeline with AWS Data Pipeline, Amazon EMR, and ...
(BDT303) Construct Your ETL Pipeline with AWS Data Pipeline, Amazon EMR, and ...(BDT303) Construct Your ETL Pipeline with AWS Data Pipeline, Amazon EMR, and ...
(BDT303) Construct Your ETL Pipeline with AWS Data Pipeline, Amazon EMR, and ...
 
Big Data Tutorial For Beginners | What Is Big Data | Big Data Tutorial | Hado...
Big Data Tutorial For Beginners | What Is Big Data | Big Data Tutorial | Hado...Big Data Tutorial For Beginners | What Is Big Data | Big Data Tutorial | Hado...
Big Data Tutorial For Beginners | What Is Big Data | Big Data Tutorial | Hado...
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
 

En vedette

AWS_Architecture_e-commerce
AWS_Architecture_e-commerceAWS_Architecture_e-commerce
AWS_Architecture_e-commerce
SEONGTAEK OH
 
Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...
Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...
Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...
Geoffrey Fox
 
APAC Big Data Strategy RadhaKrishna Hiremane
APAC Big Data  Strategy RadhaKrishna  HiremaneAPAC Big Data  Strategy RadhaKrishna  Hiremane
APAC Big Data Strategy RadhaKrishna Hiremane
IntelAPAC
 

En vedette (20)

Surprising failure factors when implementing eCommerce and Omnichannel eBusiness
Surprising failure factors when implementing eCommerce and Omnichannel eBusinessSurprising failure factors when implementing eCommerce and Omnichannel eBusiness
Surprising failure factors when implementing eCommerce and Omnichannel eBusiness
 
Big Data in e-Commerce
Big Data in e-CommerceBig Data in e-Commerce
Big Data in e-Commerce
 
Omnichannel Customer Experience
Omnichannel Customer ExperienceOmnichannel Customer Experience
Omnichannel Customer Experience
 
Magento scalability from the trenches (Meet Magento Sweden 2016)
Magento scalability from the trenches (Meet Magento Sweden 2016)Magento scalability from the trenches (Meet Magento Sweden 2016)
Magento scalability from the trenches (Meet Magento Sweden 2016)
 
BIG Data & Hadoop Applications in E-Commerce
BIG Data & Hadoop Applications in E-CommerceBIG Data & Hadoop Applications in E-Commerce
BIG Data & Hadoop Applications in E-Commerce
 
AWS Partner Presentation -Sage
AWS Partner Presentation -SageAWS Partner Presentation -Sage
AWS Partner Presentation -Sage
 
Sensors & Internet of Things: Backend Infrastructure at Dublin Websummit
Sensors & Internet of Things: Backend Infrastructure at Dublin WebsummitSensors & Internet of Things: Backend Infrastructure at Dublin Websummit
Sensors & Internet of Things: Backend Infrastructure at Dublin Websummit
 
Analysing data analytics use cases to understand big data platform
Analysing data analytics use cases  to understand big data platformAnalysing data analytics use cases  to understand big data platform
Analysing data analytics use cases to understand big data platform
 
Lamoda i os app
Lamoda i os appLamoda i os app
Lamoda i os app
 
Blueprints bei E-Commerce Workloads mit AWS
Blueprints bei E-Commerce Workloads mit AWS Blueprints bei E-Commerce Workloads mit AWS
Blueprints bei E-Commerce Workloads mit AWS
 
Big data analytics and innovation
Big data analytics and innovationBig data analytics and innovation
Big data analytics and innovation
 
빠르고 편리한 렌더링 :: 정우근 :: AWS Summit Seoul 2016
빠르고 편리한 렌더링 :: 정우근 :: AWS Summit Seoul 2016빠르고 편리한 렌더링 :: 정우근 :: AWS Summit Seoul 2016
빠르고 편리한 렌더링 :: 정우근 :: AWS Summit Seoul 2016
 
Sparkflows - Build E2E Data Analytics Use Cases in less than 30 mins
Sparkflows - Build E2E Data Analytics Use Cases in less than 30 minsSparkflows - Build E2E Data Analytics Use Cases in less than 30 mins
Sparkflows - Build E2E Data Analytics Use Cases in less than 30 mins
 
Lamoda Mobile App Promotion
Lamoda Mobile App PromotionLamoda Mobile App Promotion
Lamoda Mobile App Promotion
 
Big data analytics use case and software
Big data analytics use case and softwareBig data analytics use case and software
Big data analytics use case and software
 
AWS_Architecture_e-commerce
AWS_Architecture_e-commerceAWS_Architecture_e-commerce
AWS_Architecture_e-commerce
 
Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...
Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...
Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...
 
Improve Monitoring & Monetization of Your Mobile Apps
Improve Monitoring & Monetization of Your Mobile AppsImprove Monitoring & Monetization of Your Mobile Apps
Improve Monitoring & Monetization of Your Mobile Apps
 
APAC Big Data Strategy RadhaKrishna Hiremane
APAC Big Data  Strategy RadhaKrishna  HiremaneAPAC Big Data  Strategy RadhaKrishna  Hiremane
APAC Big Data Strategy RadhaKrishna Hiremane
 
성공적인 AWS클라우드로의 여정 그리고 5가지 궁금한 점 :: 김재성 :: AWS Summit Seoul 2016
성공적인 AWS클라우드로의 여정 그리고 5가지 궁금한 점 :: 김재성 :: AWS Summit Seoul 2016성공적인 AWS클라우드로의 여정 그리고 5가지 궁금한 점 :: 김재성 :: AWS Summit Seoul 2016
성공적인 AWS클라우드로의 여정 그리고 5가지 궁금한 점 :: 김재성 :: AWS Summit Seoul 2016
 

Similaire à Big Data & Analytics - Use Cases in Mobile, E-commerce, Media and more

Big data on_aws in korea by abhishek sinha (lunch and learn)
Big data on_aws in korea by abhishek sinha (lunch and learn)Big data on_aws in korea by abhishek sinha (lunch and learn)
Big data on_aws in korea by abhishek sinha (lunch and learn)
Amazon Web Services Korea
 

Similaire à Big Data & Analytics - Use Cases in Mobile, E-commerce, Media and more (20)

Big Data in the Cloud
Big Data in the CloudBig Data in the Cloud
Big Data in the Cloud
 
Tapping the cloud for real time data analytics
 Tapping the cloud for real time data analytics Tapping the cloud for real time data analytics
Tapping the cloud for real time data analytics
 
AWS Summit 2013 | Singapore - Big Data Analytics, Presented by AWS, Intel and...
AWS Summit 2013 | Singapore - Big Data Analytics, Presented by AWS, Intel and...AWS Summit 2013 | Singapore - Big Data Analytics, Presented by AWS, Intel and...
AWS Summit 2013 | Singapore - Big Data Analytics, Presented by AWS, Intel and...
 
AWS Summit Stockholm 2014 – B4 – Business intelligence on AWS
AWS Summit Stockholm 2014 – B4 – Business intelligence on AWSAWS Summit Stockholm 2014 – B4 – Business intelligence on AWS
AWS Summit Stockholm 2014 – B4 – Business intelligence on AWS
 
Using real time big data analytics for competitive advantage
 Using real time big data analytics for competitive advantage Using real time big data analytics for competitive advantage
Using real time big data analytics for competitive advantage
 
Building a data warehouse with Amazon Redshift … and a quick look at Amazon ...
Building a data warehouse  with Amazon Redshift … and a quick look at Amazon ...Building a data warehouse  with Amazon Redshift … and a quick look at Amazon ...
Building a data warehouse with Amazon Redshift … and a quick look at Amazon ...
 
Vancouver keynote - AWS Innovate - Sam Elmalak
Vancouver keynote - AWS Innovate - Sam ElmalakVancouver keynote - AWS Innovate - Sam Elmalak
Vancouver keynote - AWS Innovate - Sam Elmalak
 
클라우드 기반 데이터 분석 및 인공 지능을 위한 비지니스 혁신 - 윤석찬 (AWS 테크에반젤리스트)
클라우드 기반 데이터 분석 및 인공 지능을 위한 비지니스 혁신 - 윤석찬 (AWS 테크에반젤리스트)클라우드 기반 데이터 분석 및 인공 지능을 위한 비지니스 혁신 - 윤석찬 (AWS 테크에반젤리스트)
클라우드 기반 데이터 분석 및 인공 지능을 위한 비지니스 혁신 - 윤석찬 (AWS 테크에반젤리스트)
 
Getting Started with Real-time Analytics
Getting Started with Real-time AnalyticsGetting Started with Real-time Analytics
Getting Started with Real-time Analytics
 
(HLS402) Getting into Your Genes: The Definitive Guide to Using Amazon EMR, A...
(HLS402) Getting into Your Genes: The Definitive Guide to Using Amazon EMR, A...(HLS402) Getting into Your Genes: The Definitive Guide to Using Amazon EMR, A...
(HLS402) Getting into Your Genes: The Definitive Guide to Using Amazon EMR, A...
 
Finding Meaning in the Noise: Understanding Big Data with AWS Analytics
Finding Meaning in the Noise: Understanding Big Data with AWS AnalyticsFinding Meaning in the Noise: Understanding Big Data with AWS Analytics
Finding Meaning in the Noise: Understanding Big Data with AWS Analytics
 
BDA305 Building Data Lakes and Analytics on AWS
BDA305 Building Data Lakes and Analytics on AWSBDA305 Building Data Lakes and Analytics on AWS
BDA305 Building Data Lakes and Analytics on AWS
 
Understanding AWS Managed Databases and Analytic Services - AWS Innovate Otta...
Understanding AWS Managed Databases and Analytic Services - AWS Innovate Otta...Understanding AWS Managed Databases and Analytic Services - AWS Innovate Otta...
Understanding AWS Managed Databases and Analytic Services - AWS Innovate Otta...
 
Understanding AWS Managed Database and Analytics Services | AWS Public Sector...
Understanding AWS Managed Database and Analytics Services | AWS Public Sector...Understanding AWS Managed Database and Analytics Services | AWS Public Sector...
Understanding AWS Managed Database and Analytics Services | AWS Public Sector...
 
Understanding AWS Managed Database and Analytics Services | AWS Public Sector...
Understanding AWS Managed Database and Analytics Services | AWS Public Sector...Understanding AWS Managed Database and Analytics Services | AWS Public Sector...
Understanding AWS Managed Database and Analytics Services | AWS Public Sector...
 
Big data on_aws in korea by abhishek sinha (lunch and learn)
Big data on_aws in korea by abhishek sinha (lunch and learn)Big data on_aws in korea by abhishek sinha (lunch and learn)
Big data on_aws in korea by abhishek sinha (lunch and learn)
 
Big Data Day LA 2015 - The AWS Big Data Platform by Michael Limcaco of Amazon
Big Data Day LA 2015 - The AWS Big Data Platform by Michael Limcaco of AmazonBig Data Day LA 2015 - The AWS Big Data Platform by Michael Limcaco of Amazon
Big Data Day LA 2015 - The AWS Big Data Platform by Michael Limcaco of Amazon
 
Big Data in the Cloud
Big Data in the Cloud Big Data in the Cloud
Big Data in the Cloud
 
AWS Webcast - Attunity Couchsurfing
AWS Webcast - Attunity CouchsurfingAWS Webcast - Attunity Couchsurfing
AWS Webcast - Attunity Couchsurfing
 
AWS Big Data combo
AWS Big Data comboAWS Big Data combo
AWS Big Data combo
 

Plus de Amazon Web Services

Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
Amazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
Amazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
Amazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
Amazon Web Services
 

Plus de Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

Dernier

IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Dernier (20)

Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 

Big Data & Analytics - Use Cases in Mobile, E-commerce, Media and more

  • 1. Big Data & Analytics Use Cases in Mobile, E-commerce, Media and more Russell Nash AWS Solutions Architect
  • 2.
  • 3. Product? Do we have a product? Can we ship? How to develop faster? Better? Cheaper? Market? Can we scale? What do people do & why? How do we optimize?
  • 4.
  • 5. • 10 million guests • 550,000 properties listed • Massive growth on AWS • $776.4M from top investors • $10B valuation – more than Hyatt
  • 6. “At Airbnb, we look into all possible ways to improve our product and user experience. Often times this involves lots of analytics behind the http://nerds.airbnb.com/redshift-performance-cost/ scene.” Henry Cai 蔡明航 Software Engineer, Growth at Airbnb
  • 7. The best startups use AWS for analytics…
  • 8. Agenda • Big Data Overview • MapReduce / Hadoop • Case Study: Yelp • Data Warehousing • Case Study: Foursquare • NoSQL • Case Study: AdRoll • Streaming • Case Study: Supercell
  • 11. Structure High Low Large Size Small Traditional Database Hadoop NoSQL MPP DW
  • 12. Hadoop MPP NoSQL Structure Latency Interfaces
  • 13. Background • 2004 – Map Reduce • 2006 – Hadoop
  • 14. Input File Functions Output Hadoop cluster 1. Very Flexible 2. Very Scalable 3. Often Transient
  • 15. Big Data Verticals and Use cases Media/Advertising Targeted Advertising Image and Video Processing Oil & Gas Seismic Analysis Retail Recommendation s Transactions Analysis Life Sciences Genome Analysis Financial Services Monte Carlo Simulations Risk Analysis Security Anti-virus Fraud Detection Image Recognition Social Network/Gaming User Demographics Usage analysis In-game metrics
  • 16. Deployment Options On-premise Cloud Managed on Cloud
  • 17. Amazon Elastic MapReduce Manageability Scalability Cost
  • 19. 400 GB of logs per day ~12 Terabytes per month
  • 20.
  • 21. 1) Load log file data for six months of user search history into Amazon S3 Amazon S3 Search ID Search Text Final Selection 12423451 westen Westin 14235235 wisten Westin 54332232 westenn Westin 12423451 14235235 54332232 12423451 14235235 54332232 12423451 14235235 54332232 12423451 14235235 54332232 12423451
  • 22. Amazon S3 Amazon EMR Log Files 2) Spin up a 200 node cluster Hadoop Cluster
  • 23. 3) 200 nodes simultaneously analyze this data looking for common misspellings … this takes a few hours Hadoop Cluster Amazon S3 Amazon EMR
  • 24. Amazon S3 Amazon EMR 4) New common misspellings and suggestions loaded back into S3 Hadoop Cluster Log Files
  • 25. Amazon S3 Amazon EMR 5) When the job is done, the cluster is shut down. Log Files
  • 26. E-Commerce Case Study • Online Marketplace • EMR –Weblog analysis – Recommendations • Link logs with production database in EMR “Enables us to focus on developing our…analysis stack without worrying about the underlying infrastructure”
  • 28. Trends SQL on Hadoop Spark
  • 29. Hadoop MPP NoSQL Structure Latency Interfaces Any Mins-Hours Programming SQL-Like Tools
  • 30. Background MPP = Massively Parallel Processing SQL Databases for analytical workloads Performance Scalability Ease of Use Cost
  • 31. 1. SQL 2. High Performance 3. Broad Toolset
  • 32. Deployment Options On-premise Cloud Managed on Cloud
  • 33. Amazon Redshift Manageability Scalability Cost
  • 34. Mobile Case Study • Location based social app • 40 Million users • 4.5 Billion check-ins • Multi-terabytes of log data
  • 35. Who is checking in? 0.6 0.5 0.4 0.3 0.2 0.1 0 Gender Female Male Age 0 20 40 60 80
  • 36. When do people go to a place? Gorilla Coffee Gray's Papaya Amorino Thursday Friday Saturday Sunday
  • 37. “Using Amazon Redshift has enabled the company to perform more agile analytics while saving costs.”
  • 38. Media Case Study • Placeshifting and media streaming • Collect terabytes of event logs • Viewership, devices etc • Hadoop for transformation • Redshift for analysis “Redshift allows us to turn on a dime”
  • 39. Performance Evaluation on 2B Rows Traditional SQL Database Amazon Redshift Aggregate by month 02:08:35 00:35:46 00:00:12
  • 40. Hadoop MPP NoSQL Structure Latency Interfaces Any Full Mins-Hours Seconds-Minutes Programming SQL-Like Tools SQL BI Tools
  • 41. Background Databases for webscale transactions Performance Flexibility
  • 42. ID Age State 123 20 CA 345 25 WA 678 40 FL Relational Table ID Attributes 123 Age:20, State:CA 345 Age:25, Country: Australia, Gender: F, Smoker: No 678 Age:40 Non-Relational Table
  • 43. Deployment Options On-premise Cloud Managed on Cloud
  • 44. Amazon DynamoDB Manageability Scalability Cost
  • 49. Pixel “fires” Serve ad? Ad served
  • 50. If you can’t reply in 100ms… It doesn’t matter anymore! Network 40 Buffer 20 Processing 40
  • 52. Hadoop MPP NoSQL Structure Latency Interfaces Any Full Semi Mins-Hours Seconds-Minutes Sub-second Programming SQL-Like Tools SQL Programming Tools
  • 54. Use Cases • Gaming analytics • Sensor networks analytics • Ad network analytics • Log centralization • Click stream analysis • Hardware and software appliance metrics • …more…
  • 55. Data Sources App.4 [Machine Learning] AWS Endpoint App.1 [Aggregate & De-Duplicate] Data Sources Data Sources Data Sources App.2 [Metric Extraction] S3 DynamoDB Redshift App.3 [Sliding Window Analysis] Data Sources Availability Zone Availability Zone Shard 1 Shard 2 Shard N Availability Zone Amazon Kinesis EMR
  • 56. “Amazon Kinesis enables our business-critical analytics and dashboard applications to reliably get the data streams they need, without delays. Amazon Kinesis also offloads a lot of developer burden in building a real-time, streaming data ingestion platform, and enables Supercell to focus on delivering games that delight players worldwide.” Sami Yliharju, Supercell Services Lead
  • 57. Big Data Tutorials aws.amazon.com/big-data Redshift Free Trial aws.amazon.com/redshift/free-trial
  • 58. Big Data & Analytics Use Cases in Mobile, e-commerce, media and more Russell Nash AWS Solutions Architect

Notes de l'éditeur

  1. Put something in users hands (doesn’t need to be code), and get feedback asap
  2. depending on your data structure, its size and access patterns you will need to pick the right solution * S3 is ideal for large unstructured objects such as files, pictures, binary data, etc. * Dynamo dB or other no SQL alternatives such as Cassandra is ideal for small object that you have to read or write at a high speed. It is great for data powering web or mobile applications * Amazon RDS (or other relational databases) are great for structured schema and standard SQL access but the size of data is typically limited to a single server. Of course it is possible to shard data across many RDS instances but this requires substantial development and ops work. * Hbase – ideal for analytics use case ▪ Optimized for append-heavy, light read workloads And so there is a variety of ways you can store your data on the cloud based on particular needs of your application.
  3. Hadoop and cloud marriage Shared nothing
  4. Yelp – Autocomplete, spelling suggestions S&P Capital IQ – Recommendations for investors based on behaviour Australian company – uses it to calculate which ad space it should buy.
  5. Let’s look at another company – Yelp.
  6. As you can see this company is growing rapidly and with more than 50 million of monthly visitors and 18 million or reviews the company generates about 400GB of data a day. That data needs to be processes and analyzed.
  7. The more searches you collect from your customers, the better recommendations you can provide. Using Hadoop on Amazon Elastic MapReduce Yelp analyses customer search results to deliver features such as hotel or restaurants recommendations. Yelp processes all customer reviews with natural language processing technologies to provide customers review highlights. From this example we can see that companies such as Yelp can use data generated by their customers on their web site to develop more innovative data products.
  8. By looking at typical queries, yelp can list common suggestions for a query even before you finish typing. Both of these products are possible because Yelp analyses all the web logs from their websites
  9. Map Reduce – Programming model for Hadoop Flume – Open source Log collection tool Mahout – Machine learning project Nutch – web search engine Cascading – Software abstraction layer Hbase – Columnar NoSQL database Cassandra – NoSQL database Sqoop – Data transfer between Hadoop and relational db’s Hive – SQL like language for Hadoop Chukwa – Log collection
  10. Approaching 50/50 male female
  11. You can see that some places are best for lunch during work hours others are dinner joints.
  12. Use Case – IMDB uses it for new applications. i.e, movie rating system
  13. Let’s look at another company – Yelp.
  14. Use Case – IMDB uses it for new applications. i.e, movie rating system
  15. Use Case – IMDB uses it for new applications. i.e, movie rating system
  16. [2 minutes] KINESIS is a new service that scales elastically for near realtime processing of streaming big data. The service will store large streams of data in durable, consistent storage, reliably, for near realtime processing of data by an elastically scalable fleet of data processing servers. Large streams means millions of records per second, GBs of data per second and near real-time means order of a few seconds Streaming data processing has two layers: a storage layer and a processing layer. The storage layer needs to support specialized ordering and consistency semantics that enable fast, inexpensive, and replayable reads and writes of large streams of data. Kinesis is the storage layer in Kinesis / Kinesis. The processing layer is responsible for reading data from the storage layer, processing that data, and notifying the storage layer to delete data that is no longer needed. Kinesis supports the processing layer. Customers compile the Kinesis library into their data processing application. Kinesis notifies the application (the Kinesis Worker) when there is new data to process. The Kinesis / Kinesis control plane works with Kinesis Workers to solve scalability and fault tolerance problems in the processing layer.
  17. Supercell is using Amazon Kinesis for real-time delivery of game insight data sent by hundreds of game engine servers.
  18. TALKING POINTS AWS Training and Certification is an organization dedicated to expanding and deepening knowledge of AWS, as well as driving proliferation in the usage of AWS services. Our programs are designed for customers, partners and AWS employees. Over the past several months, we have rolled out several new courses, training labs, and certifications to our customers and partners Go and visit the training team at the training booth to receive your 30% discount voucher for a certification exam.