SlideShare une entreprise Scribd logo
1  sur  47
AWS Big Data Platform
LA Big Data Day 2015
6/27/2015
GB TB
PB
IDC Digital Universe Study
projects 44 zettabytes (ZB)
by 2020
Source: IDC
ZB
EB
Hello Cloud.
(Amazon Web Services)
On demand Pay as you go
Uniform Available
Security
Scaling
Analytics
Database
Monitoring
Messaging
Workflow
DNS
Load Balancing
BackupCDN
On demand Pay as you go
Uniform Available
Compute
Storage
Enterprise Applications
Platform Services
Administration & Security
Core Services
Infrastructure
Analytics
App
Services
Deployment &
Management
Mobile
Services
Infrastructure Regions Points of PresenceAvailability Zones
Core Services
Storage
(Object, Block
and Archival)
Compute
(VMs, Auto-scaling
and Load Balancing)
Databases
(Relational, NoSQL, Caching)
Networking
(VPC, DX, DNS)
CDN
Access Control
Usage
Auditing
Monitoring and
Logs
Administration &
Security
Key Storage
Identity
Management
Platform
Services
Deployment & Management
One-click web app
deployment
Dev/ops resource
management
Resource Templates Push Notifications
Mobile Services
Mobile Analytics
Identity
Sync
App Services
Workflow
Transcoding
Email
Search
Queuing &
Notifications
App streaming
Analytics
Hadoop
Data Pipelines
Data
Warehouse
Real-time
Streaming Data
Enterprise
Applications
Virtual Desktops Collaboration and Sharing
AWS global footprint
US-WEST (N. California) EU-WEST (Ireland)
ASIA PAC
(Tokyo)
ASIA PAC
(Singapore)
US-WEST (Oregon)
SOUTH AMERICA
(Sao Paulo)
US-EAST (Virginia)
GOV CLOUD
ASIA-PAC
(Sydney)
EU-CENT (Frankfurt)
ASIA PAC
(Beijing)
Big Data Pipeline
Data Answers
Collect Process Analyze
Store
Collect Process Analyze
Store
Data Collection
and Storage
Data
Processing
Event
Processing
Data
Analysis
Object Storage
Streaming Ingest
NoSQL
RDBMS
Collect Process Analyze
Store
Data Collection
and Storage
Data
Processing
Event
Processing
Data
Analysis
Event
Driven
Framework
S3
Kinesis
DynamoDB
RDS (RDBMS)
AWS Lambda
KCL Apps
Collect Process Analyze
Store
Data Collection
and Storage
Data
Processing
Event
Processing
Data
Analysis
Store anything
Object storage
Scalable
99.999999999% durability
Amazon S3
Why is Amazon S3 good for Big Data?
• Unlimited number of objects
• Files from 1B to 5TB
• Encryption
• Versioning
• Lifecycle management
• Deep storage integration
Obj
ect
Cop
y
S3 Object
Copy
Region
DataCenters
Real-time processing
High throughput; elastic
Easy to use
S3, Redshift, DynamoDB Integrations
Amazon
Kinesis
Amazon Kinesis
Managed service for streaming ingest & processing
Sending Consuming
HTTP Post
AWS SDK
LOG4J
Flume
Fluentd
Get* APIs
Kinesis Client Library
+
Connector Library
Apache Storm
Amazon Elastic
MapReduce
AWS Mobile
SDK
• Streams are made of Shards
• Each Shard ingests data up to 1MB/sec,
and up to 1000 TPS
• Each Shard emits up to 2 MB/sec
• All data is stored for 24 hours
• Scale Kinesis streams by splitting or
merging Shards
• Replay data inside of 24Hr. Window
Kinesis Stream & Shards
Worker
Worker
Worker
Worker
Worker
Worker
Worker
Real-time event processing frameworks
Kinesis
Client
Library
AWS Lambda
Example: Amazon Kinesis + Apache Storm
Kinesis
Storm
Spout
Producer
Amazon
Kinesis
Apache Storm
ElastiCache
(Redis) Node.js Client
(D3)
http://blogs.aws.amazon.com/bigdata/post/Tx36LYSCY2R0A9B/Implement-a-
Real-time-Sliding-Window-Application-Using-Amazon-Kinesis-and-Apache
Collect Process Analyze
Store
Data Collection
and Storage
Data
Processing
Event
Processing
Data
Analysis
Hadoop MPP Warehouse
Machine Learning
Collect Process Analyze
Store
Data Collection
and Storage
Data
Processing
Event
Processing
Data
Analysis
EMR Redshift
Machine Learning
Hadoop/HDFS clusters
Hive, Spark, MapReduce
Easy to use; fully managed
On-demand and spot pricing
Amazon EMR
Easy to add and remove compute capacity on your cluster
Match compute
demands with
cluster sizing.
Resizable clusters
Spot Instances
for task nodes
Up to 90%
off Amazon EC2
on-demand
pricing
On-demand for
core nodes
Standard
Amazon EC2
pricing for
on-demand
capacity
Easy to use Spot Instances
Meet SLA at predictable cost Exceed SLA at lower cost
Amazon S3 as your persistent data store
• Separate compute and storage
• Resize and shut down Amazon EMR
clusters with no data loss
• Point multiple Amazon EMR clusters at
same data in Amazon S3
EMR
EMR
Amazon
S3
create external table if not exists omniturelogs
( … )
row format delimited
fields terminated by 't'
lines terminated by 'n'
location 's3:// my-bucket/input/omniturelogs';
create external table if not exists omniturelogs
( … )
row format delimited
fields terminated by 't'
lines terminated by 'n'
location 's3:// my-bucket/input/omniturelogs';
or “hdfs://<namenode>/…” or “/<hdfs directory>”
Petabyte scale
Massively parallel
Relational data warehouse
Fully managed
Amazon
Redshift
Amazon Redshift
• Leader Node
– SQL endpoint
– Stores metadata
– Coordinates query execution
• Compute Nodes
– Local, columnar storage
– Execute queries in parallel
– Load, backup, restore via Amazon S3
10 GigE
(HPC)
Ingestion
Backup
Restore
JDBC/ODBC
Amazon Redshift works with your
existing analysis tools
JDBC/ODBC
Amazon Redshift
Easy to use, managed ML
Based on tech used internally at
Amazon
Use data stored in the AWS cloud
Deploy models in seconds
Amazon
Machine
Learning
Amazon Machine Learning
• Binary classification
– Will customer buy this product or not buy
– Is this email Spam or not Spam
• Multiclass classification
– Is this movie a comedy, documentary, or thriller
• Regression/ forecasting
– How many units will sell?
– What price will this house sell at?
Build & Train
model
Evaluate and
optimize
Retrieve
predictions
1 2 3
Building smart applications with Amazon ML
1. Model Building: Ingest, Explore and understand your data
1. Model Building: Configuration & Training
2. Evaluate & Optimize
Real-time predictions
Synchronous, low-latency, high-throughput prediction generation
Request through service API or server or mobile SDKs
Best for interaction applications that deal with individual data records
>>> import boto
>>> ml = boto.connect_machinelearning()
>>> ml.predict(
ml_model_id=’my_model',
predict_endpoint=’example_endpoint’,
record={’key1':’value1’, ’key2':’value2’})
{
'Prediction': {
'predictedValue': 13.284348,
'details': {
'Algorithm': 'SGD',
'PredictiveModelType': 'REGRESSION’
}
}
}
Batch predictions
Asynchronous, large-volume prediction generation
Request through service console or API
Best for applications that deal with batches of data records
>>> import boto
>>> ml = boto.connect_machinelearning()
>>> model = ml.create_batch_prediction(
batch_prediction_id = 'my_batch_prediction’
batch_prediction_data_source_id = ’my_datasource’
ml_model_id = ’my_model',
output_uri = 's3://examplebucket/output/’)
Query for predictions
with batch APIProcess data
Raw data Aggregated data Predictions
Your application
Sample Prediction Flow
EMR Amazon ML
S3 S3 S3
Putting it all together …
S3
Kinesis
EMR
Redshift
DynamoDB
Collect Process & Analyze Consume
Amazon Machine Learning
Cloud enables big data
processing
aws.amazon.comaws.amazon.com

Contenu connexe

Tendances

Real-time Streaming and Querying with Amazon Kinesis and Amazon Elastic MapRe...
Real-time Streaming and Querying with Amazon Kinesis and Amazon Elastic MapRe...Real-time Streaming and Querying with Amazon Kinesis and Amazon Elastic MapRe...
Real-time Streaming and Querying with Amazon Kinesis and Amazon Elastic MapRe...Amazon Web Services
 
AWS Webcast - Introduction to Amazon Kinesis
AWS Webcast - Introduction to Amazon KinesisAWS Webcast - Introduction to Amazon Kinesis
AWS Webcast - Introduction to Amazon KinesisAmazon Web Services
 
Co 4, session 2, aws analytics services
Co 4, session 2, aws analytics servicesCo 4, session 2, aws analytics services
Co 4, session 2, aws analytics servicesm vaishnavi
 
Introduction to Amazon Kinesis Analytics
Introduction to Amazon Kinesis AnalyticsIntroduction to Amazon Kinesis Analytics
Introduction to Amazon Kinesis AnalyticsAmazon Web Services
 
Building a Modern Data Warehouse: Deep Dive on Amazon Redshift - SRV337 - Chi...
Building a Modern Data Warehouse: Deep Dive on Amazon Redshift - SRV337 - Chi...Building a Modern Data Warehouse: Deep Dive on Amazon Redshift - SRV337 - Chi...
Building a Modern Data Warehouse: Deep Dive on Amazon Redshift - SRV337 - Chi...Amazon Web Services
 
BigData: AWS RedShift with S3, EC2
BigData: AWS RedShift with S3, EC2BigData: AWS RedShift with S3, EC2
BigData: AWS RedShift with S3, EC2Paulraj Pappaiah
 
(ISM303) Migrating Your Enterprise Data Warehouse To Amazon Redshift
(ISM303) Migrating Your Enterprise Data Warehouse To Amazon Redshift(ISM303) Migrating Your Enterprise Data Warehouse To Amazon Redshift
(ISM303) Migrating Your Enterprise Data Warehouse To Amazon RedshiftAmazon Web Services
 
Big data architecture on cloud computing infrastructure
Big data architecture on cloud computing infrastructureBig data architecture on cloud computing infrastructure
Big data architecture on cloud computing infrastructuredatastack
 
Design for Scale - Building Real Time, High Performing Marketing Technology p...
Design for Scale - Building Real Time, High Performing Marketing Technology p...Design for Scale - Building Real Time, High Performing Marketing Technology p...
Design for Scale - Building Real Time, High Performing Marketing Technology p...Amazon Web Services
 
Cloud Big Data Architectures
Cloud Big Data ArchitecturesCloud Big Data Architectures
Cloud Big Data ArchitecturesLynn Langit
 
AWS Big Data Demystified #1.2 | Big Data architecture lessons learned
AWS Big Data Demystified #1.2 | Big Data architecture lessons learned AWS Big Data Demystified #1.2 | Big Data architecture lessons learned
AWS Big Data Demystified #1.2 | Big Data architecture lessons learned Omid Vahdaty
 
Building a Bigdata Architecture on AWS
Building a Bigdata Architecture on AWSBuilding a Bigdata Architecture on AWS
Building a Bigdata Architecture on AWSArun Sirimalla
 
Building Your Data Warehouse with Amazon Redshift
Building Your Data Warehouse with Amazon RedshiftBuilding Your Data Warehouse with Amazon Redshift
Building Your Data Warehouse with Amazon RedshiftAmazon Web Services
 
Building a Server-less Data Lake on AWS - Technical 301
Building a Server-less Data Lake on AWS - Technical 301Building a Server-less Data Lake on AWS - Technical 301
Building a Server-less Data Lake on AWS - Technical 301Amazon Web Services
 
Which Database is Right for My Workload: Database Week SF
Which Database is Right for My Workload: Database Week SFWhich Database is Right for My Workload: Database Week SF
Which Database is Right for My Workload: Database Week SFAmazon Web Services
 
Which Database is Right for My Workload?
Which Database is Right for My Workload?Which Database is Right for My Workload?
Which Database is Right for My Workload?Amazon Web Services
 

Tendances (20)

Real-time Streaming and Querying with Amazon Kinesis and Amazon Elastic MapRe...
Real-time Streaming and Querying with Amazon Kinesis and Amazon Elastic MapRe...Real-time Streaming and Querying with Amazon Kinesis and Amazon Elastic MapRe...
Real-time Streaming and Querying with Amazon Kinesis and Amazon Elastic MapRe...
 
AWS Webcast - Introduction to Amazon Kinesis
AWS Webcast - Introduction to Amazon KinesisAWS Webcast - Introduction to Amazon Kinesis
AWS Webcast - Introduction to Amazon Kinesis
 
Big data on aws
Big data on awsBig data on aws
Big data on aws
 
Processing and Analytics
Processing and AnalyticsProcessing and Analytics
Processing and Analytics
 
Co 4, session 2, aws analytics services
Co 4, session 2, aws analytics servicesCo 4, session 2, aws analytics services
Co 4, session 2, aws analytics services
 
Introduction to Amazon Kinesis Analytics
Introduction to Amazon Kinesis AnalyticsIntroduction to Amazon Kinesis Analytics
Introduction to Amazon Kinesis Analytics
 
Building a Modern Data Warehouse: Deep Dive on Amazon Redshift - SRV337 - Chi...
Building a Modern Data Warehouse: Deep Dive on Amazon Redshift - SRV337 - Chi...Building a Modern Data Warehouse: Deep Dive on Amazon Redshift - SRV337 - Chi...
Building a Modern Data Warehouse: Deep Dive on Amazon Redshift - SRV337 - Chi...
 
BigData: AWS RedShift with S3, EC2
BigData: AWS RedShift with S3, EC2BigData: AWS RedShift with S3, EC2
BigData: AWS RedShift with S3, EC2
 
(ISM303) Migrating Your Enterprise Data Warehouse To Amazon Redshift
(ISM303) Migrating Your Enterprise Data Warehouse To Amazon Redshift(ISM303) Migrating Your Enterprise Data Warehouse To Amazon Redshift
(ISM303) Migrating Your Enterprise Data Warehouse To Amazon Redshift
 
Big data architecture on cloud computing infrastructure
Big data architecture on cloud computing infrastructureBig data architecture on cloud computing infrastructure
Big data architecture on cloud computing infrastructure
 
Design for Scale - Building Real Time, High Performing Marketing Technology p...
Design for Scale - Building Real Time, High Performing Marketing Technology p...Design for Scale - Building Real Time, High Performing Marketing Technology p...
Design for Scale - Building Real Time, High Performing Marketing Technology p...
 
Cloud Big Data Architectures
Cloud Big Data ArchitecturesCloud Big Data Architectures
Cloud Big Data Architectures
 
Data Collection and Storage
Data Collection and StorageData Collection and Storage
Data Collection and Storage
 
AWS Big Data Demystified #1.2 | Big Data architecture lessons learned
AWS Big Data Demystified #1.2 | Big Data architecture lessons learned AWS Big Data Demystified #1.2 | Big Data architecture lessons learned
AWS Big Data Demystified #1.2 | Big Data architecture lessons learned
 
Real-Time Event Processing
Real-Time Event ProcessingReal-Time Event Processing
Real-Time Event Processing
 
Building a Bigdata Architecture on AWS
Building a Bigdata Architecture on AWSBuilding a Bigdata Architecture on AWS
Building a Bigdata Architecture on AWS
 
Building Your Data Warehouse with Amazon Redshift
Building Your Data Warehouse with Amazon RedshiftBuilding Your Data Warehouse with Amazon Redshift
Building Your Data Warehouse with Amazon Redshift
 
Building a Server-less Data Lake on AWS - Technical 301
Building a Server-less Data Lake on AWS - Technical 301Building a Server-less Data Lake on AWS - Technical 301
Building a Server-less Data Lake on AWS - Technical 301
 
Which Database is Right for My Workload: Database Week SF
Which Database is Right for My Workload: Database Week SFWhich Database is Right for My Workload: Database Week SF
Which Database is Right for My Workload: Database Week SF
 
Which Database is Right for My Workload?
Which Database is Right for My Workload?Which Database is Right for My Workload?
Which Database is Right for My Workload?
 

En vedette

Tajolabigdatacamp2014 140618135810-phpapp01 hyunsik-choi
Tajolabigdatacamp2014 140618135810-phpapp01 hyunsik-choiTajolabigdatacamp2014 140618135810-phpapp01 hyunsik-choi
Tajolabigdatacamp2014 140618135810-phpapp01 hyunsik-choiData Con LA
 
Big Data Day LA 2015 - Tips for Building Self Service Data Science Platform b...
Big Data Day LA 2015 - Tips for Building Self Service Data Science Platform b...Big Data Day LA 2015 - Tips for Building Self Service Data Science Platform b...
Big Data Day LA 2015 - Tips for Building Self Service Data Science Platform b...Data Con LA
 
Big Data Day LA 2015 - Data mining, forecasting, and BI at the RRCC by Benjam...
Big Data Day LA 2015 - Data mining, forecasting, and BI at the RRCC by Benjam...Big Data Day LA 2015 - Data mining, forecasting, and BI at the RRCC by Benjam...
Big Data Day LA 2015 - Data mining, forecasting, and BI at the RRCC by Benjam...Data Con LA
 
Big Data Day LA 2015 - What's new and next in Apache Tez by Bikas Saha of Hor...
Big Data Day LA 2015 - What's new and next in Apache Tez by Bikas Saha of Hor...Big Data Day LA 2015 - What's new and next in Apache Tez by Bikas Saha of Hor...
Big Data Day LA 2015 - What's new and next in Apache Tez by Bikas Saha of Hor...Data Con LA
 
Big Data Day LA 2015 - Machine Learning on Largish Data by Szilard Pafka of E...
Big Data Day LA 2015 - Machine Learning on Largish Data by Szilard Pafka of E...Big Data Day LA 2015 - Machine Learning on Largish Data by Szilard Pafka of E...
Big Data Day LA 2015 - Machine Learning on Largish Data by Szilard Pafka of E...Data Con LA
 
Getting started with Spark & Cassandra by Jon Haddad of Datastax
Getting started with Spark & Cassandra by Jon Haddad of DatastaxGetting started with Spark & Cassandra by Jon Haddad of Datastax
Getting started with Spark & Cassandra by Jon Haddad of DatastaxData Con LA
 
A noETL Parallel Streaming Transformation Loader using Spark, Kafka­ & Ver­tica
A noETL Parallel Streaming Transformation Loader using Spark, Kafka­ & Ver­ticaA noETL Parallel Streaming Transformation Loader using Spark, Kafka­ & Ver­tica
A noETL Parallel Streaming Transformation Loader using Spark, Kafka­ & Ver­ticaData Con LA
 
Inconsistencies in big data
Inconsistencies in big dataInconsistencies in big data
Inconsistencies in big dataminujoseph
 
Data science and good questions eric kostello
Data science and good questions eric kostelloData science and good questions eric kostello
Data science and good questions eric kostelloData Con LA
 
The Future of Hadoop by Arun Murthy, PMC Apache Hadoop & Cofounder Hortonworks
The Future of Hadoop by Arun Murthy, PMC Apache Hadoop & Cofounder HortonworksThe Future of Hadoop by Arun Murthy, PMC Apache Hadoop & Cofounder Hortonworks
The Future of Hadoop by Arun Murthy, PMC Apache Hadoop & Cofounder HortonworksData Con LA
 
Big Data Day LA 2015 - Data Science ≠ Big Data by Jim McGuire of ZestFinance
Big Data Day LA 2015 - Data Science ≠ Big Data by Jim McGuire of ZestFinanceBig Data Day LA 2015 - Data Science ≠ Big Data by Jim McGuire of ZestFinance
Big Data Day LA 2015 - Data Science ≠ Big Data by Jim McGuire of ZestFinanceData Con LA
 
Spark as part of a Hybrid RDBMS Architecture-John Leach Cofounder Splice Machine
Spark as part of a Hybrid RDBMS Architecture-John Leach Cofounder Splice MachineSpark as part of a Hybrid RDBMS Architecture-John Leach Cofounder Splice Machine
Spark as part of a Hybrid RDBMS Architecture-John Leach Cofounder Splice MachineData Con LA
 
Big Data Day LA 2015 - Scalable and High-Performance Analytics with Distribut...
Big Data Day LA 2015 - Scalable and High-Performance Analytics with Distribut...Big Data Day LA 2015 - Scalable and High-Performance Analytics with Distribut...
Big Data Day LA 2015 - Scalable and High-Performance Analytics with Distribut...Data Con LA
 
Big Data Day LA 2016/ Data Science Track - Enabling Cross-Screen Advertising ...
Big Data Day LA 2016/ Data Science Track - Enabling Cross-Screen Advertising ...Big Data Day LA 2016/ Data Science Track - Enabling Cross-Screen Advertising ...
Big Data Day LA 2016/ Data Science Track - Enabling Cross-Screen Advertising ...Data Con LA
 
Big Data Day LA 2016/ NoSQL track - Architecting Real Life IoT Architecture, ...
Big Data Day LA 2016/ NoSQL track - Architecting Real Life IoT Architecture, ...Big Data Day LA 2016/ NoSQL track - Architecting Real Life IoT Architecture, ...
Big Data Day LA 2016/ NoSQL track - Architecting Real Life IoT Architecture, ...Data Con LA
 
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Deep Learning at Scale - A...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Deep Learning at Scale - A...Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Deep Learning at Scale - A...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Deep Learning at Scale - A...Data Con LA
 
Big Data Day LA 2016/ Use Case Driven track - BI is broken, Dave Fryer, Produ...
Big Data Day LA 2016/ Use Case Driven track - BI is broken, Dave Fryer, Produ...Big Data Day LA 2016/ Use Case Driven track - BI is broken, Dave Fryer, Produ...
Big Data Day LA 2016/ Use Case Driven track - BI is broken, Dave Fryer, Produ...Data Con LA
 
Big Data Day LA 2016/ Big Data Track - Warner Bros. Digital Consumer Intellig...
Big Data Day LA 2016/ Big Data Track - Warner Bros. Digital Consumer Intellig...Big Data Day LA 2016/ Big Data Track - Warner Bros. Digital Consumer Intellig...
Big Data Day LA 2016/ Big Data Track - Warner Bros. Digital Consumer Intellig...Data Con LA
 

En vedette (19)

Datawarehousing Terminology
Datawarehousing TerminologyDatawarehousing Terminology
Datawarehousing Terminology
 
Tajolabigdatacamp2014 140618135810-phpapp01 hyunsik-choi
Tajolabigdatacamp2014 140618135810-phpapp01 hyunsik-choiTajolabigdatacamp2014 140618135810-phpapp01 hyunsik-choi
Tajolabigdatacamp2014 140618135810-phpapp01 hyunsik-choi
 
Big Data Day LA 2015 - Tips for Building Self Service Data Science Platform b...
Big Data Day LA 2015 - Tips for Building Self Service Data Science Platform b...Big Data Day LA 2015 - Tips for Building Self Service Data Science Platform b...
Big Data Day LA 2015 - Tips for Building Self Service Data Science Platform b...
 
Big Data Day LA 2015 - Data mining, forecasting, and BI at the RRCC by Benjam...
Big Data Day LA 2015 - Data mining, forecasting, and BI at the RRCC by Benjam...Big Data Day LA 2015 - Data mining, forecasting, and BI at the RRCC by Benjam...
Big Data Day LA 2015 - Data mining, forecasting, and BI at the RRCC by Benjam...
 
Big Data Day LA 2015 - What's new and next in Apache Tez by Bikas Saha of Hor...
Big Data Day LA 2015 - What's new and next in Apache Tez by Bikas Saha of Hor...Big Data Day LA 2015 - What's new and next in Apache Tez by Bikas Saha of Hor...
Big Data Day LA 2015 - What's new and next in Apache Tez by Bikas Saha of Hor...
 
Big Data Day LA 2015 - Machine Learning on Largish Data by Szilard Pafka of E...
Big Data Day LA 2015 - Machine Learning on Largish Data by Szilard Pafka of E...Big Data Day LA 2015 - Machine Learning on Largish Data by Szilard Pafka of E...
Big Data Day LA 2015 - Machine Learning on Largish Data by Szilard Pafka of E...
 
Getting started with Spark & Cassandra by Jon Haddad of Datastax
Getting started with Spark & Cassandra by Jon Haddad of DatastaxGetting started with Spark & Cassandra by Jon Haddad of Datastax
Getting started with Spark & Cassandra by Jon Haddad of Datastax
 
A noETL Parallel Streaming Transformation Loader using Spark, Kafka­ & Ver­tica
A noETL Parallel Streaming Transformation Loader using Spark, Kafka­ & Ver­ticaA noETL Parallel Streaming Transformation Loader using Spark, Kafka­ & Ver­tica
A noETL Parallel Streaming Transformation Loader using Spark, Kafka­ & Ver­tica
 
Inconsistencies in big data
Inconsistencies in big dataInconsistencies in big data
Inconsistencies in big data
 
Data science and good questions eric kostello
Data science and good questions eric kostelloData science and good questions eric kostello
Data science and good questions eric kostello
 
The Future of Hadoop by Arun Murthy, PMC Apache Hadoop & Cofounder Hortonworks
The Future of Hadoop by Arun Murthy, PMC Apache Hadoop & Cofounder HortonworksThe Future of Hadoop by Arun Murthy, PMC Apache Hadoop & Cofounder Hortonworks
The Future of Hadoop by Arun Murthy, PMC Apache Hadoop & Cofounder Hortonworks
 
Big Data Day LA 2015 - Data Science ≠ Big Data by Jim McGuire of ZestFinance
Big Data Day LA 2015 - Data Science ≠ Big Data by Jim McGuire of ZestFinanceBig Data Day LA 2015 - Data Science ≠ Big Data by Jim McGuire of ZestFinance
Big Data Day LA 2015 - Data Science ≠ Big Data by Jim McGuire of ZestFinance
 
Spark as part of a Hybrid RDBMS Architecture-John Leach Cofounder Splice Machine
Spark as part of a Hybrid RDBMS Architecture-John Leach Cofounder Splice MachineSpark as part of a Hybrid RDBMS Architecture-John Leach Cofounder Splice Machine
Spark as part of a Hybrid RDBMS Architecture-John Leach Cofounder Splice Machine
 
Big Data Day LA 2015 - Scalable and High-Performance Analytics with Distribut...
Big Data Day LA 2015 - Scalable and High-Performance Analytics with Distribut...Big Data Day LA 2015 - Scalable and High-Performance Analytics with Distribut...
Big Data Day LA 2015 - Scalable and High-Performance Analytics with Distribut...
 
Big Data Day LA 2016/ Data Science Track - Enabling Cross-Screen Advertising ...
Big Data Day LA 2016/ Data Science Track - Enabling Cross-Screen Advertising ...Big Data Day LA 2016/ Data Science Track - Enabling Cross-Screen Advertising ...
Big Data Day LA 2016/ Data Science Track - Enabling Cross-Screen Advertising ...
 
Big Data Day LA 2016/ NoSQL track - Architecting Real Life IoT Architecture, ...
Big Data Day LA 2016/ NoSQL track - Architecting Real Life IoT Architecture, ...Big Data Day LA 2016/ NoSQL track - Architecting Real Life IoT Architecture, ...
Big Data Day LA 2016/ NoSQL track - Architecting Real Life IoT Architecture, ...
 
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Deep Learning at Scale - A...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Deep Learning at Scale - A...Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Deep Learning at Scale - A...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Deep Learning at Scale - A...
 
Big Data Day LA 2016/ Use Case Driven track - BI is broken, Dave Fryer, Produ...
Big Data Day LA 2016/ Use Case Driven track - BI is broken, Dave Fryer, Produ...Big Data Day LA 2016/ Use Case Driven track - BI is broken, Dave Fryer, Produ...
Big Data Day LA 2016/ Use Case Driven track - BI is broken, Dave Fryer, Produ...
 
Big Data Day LA 2016/ Big Data Track - Warner Bros. Digital Consumer Intellig...
Big Data Day LA 2016/ Big Data Track - Warner Bros. Digital Consumer Intellig...Big Data Day LA 2016/ Big Data Track - Warner Bros. Digital Consumer Intellig...
Big Data Day LA 2016/ Big Data Track - Warner Bros. Digital Consumer Intellig...
 

Similaire à Big Data Day LA 2015 - The AWS Big Data Platform by Michael Limcaco of Amazon

AWS re:Invent 2016 recap (part 1)
AWS re:Invent 2016 recap (part 1)AWS re:Invent 2016 recap (part 1)
AWS re:Invent 2016 recap (part 1)Julien SIMON
 
AWS re:Invent 2016 Day 1 Keynote re:Cap
AWS re:Invent 2016 Day 1 Keynote re:CapAWS re:Invent 2016 Day 1 Keynote re:Cap
AWS re:Invent 2016 Day 1 Keynote re:CapIan Massingham
 
AWS re:Invent 2016 Day 1 Keynote re:Cap
AWS re:Invent 2016 Day 1 Keynote re:CapAWS re:Invent 2016 Day 1 Keynote re:Cap
AWS re:Invent 2016 Day 1 Keynote re:CapAdrian Hornsby
 
AWS Summit 2013 | Singapore - Big Data Analytics, Presented by AWS, Intel and...
AWS Summit 2013 | Singapore - Big Data Analytics, Presented by AWS, Intel and...AWS Summit 2013 | Singapore - Big Data Analytics, Presented by AWS, Intel and...
AWS Summit 2013 | Singapore - Big Data Analytics, Presented by AWS, Intel and...Amazon Web Services
 
AWS Enterprise Summit - 엔터프라이즈에서의 AWS 클라우드 활용 - Markku Lepisto
AWS Enterprise Summit - 엔터프라이즈에서의 AWS 클라우드 활용 - Markku LepistoAWS Enterprise Summit - 엔터프라이즈에서의 AWS 클라우드 활용 - Markku Lepisto
AWS Enterprise Summit - 엔터프라이즈에서의 AWS 클라우드 활용 - Markku LepistoAmazon Web Services Korea
 
AWS Summit Stockholm 2014 – B4 – Business intelligence on AWS
AWS Summit Stockholm 2014 – B4 – Business intelligence on AWSAWS Summit Stockholm 2014 – B4 – Business intelligence on AWS
AWS Summit Stockholm 2014 – B4 – Business intelligence on AWSAmazon Web Services
 
Understanding AWS Managed Databases and Analytic Services - AWS Innovate Otta...
Understanding AWS Managed Databases and Analytic Services - AWS Innovate Otta...Understanding AWS Managed Databases and Analytic Services - AWS Innovate Otta...
Understanding AWS Managed Databases and Analytic Services - AWS Innovate Otta...Amazon Web Services
 
Vancouver keynote - AWS Innovate - Sam Elmalak
Vancouver keynote - AWS Innovate - Sam ElmalakVancouver keynote - AWS Innovate - Sam Elmalak
Vancouver keynote - AWS Innovate - Sam ElmalakAmazon Web Services
 
Building a Big Data & Analytics Platform using AWS
Building a Big Data & Analytics Platform using AWS Building a Big Data & Analytics Platform using AWS
Building a Big Data & Analytics Platform using AWS Amazon Web Services
 
AWS webinar what is cloud computing 13 09 11
AWS webinar what is cloud computing 13 09 11AWS webinar what is cloud computing 13 09 11
AWS webinar what is cloud computing 13 09 11Amazon Web Services
 
Introduction to the AWS Cloud from Digital Tuesday Meetup
Introduction to the AWS Cloud from Digital Tuesday MeetupIntroduction to the AWS Cloud from Digital Tuesday Meetup
Introduction to the AWS Cloud from Digital Tuesday MeetupIan Massingham
 
Opportunities that the Cloud Brings for Carriers @ Carriers World 2014
Opportunities that the Cloud Brings for Carriers @ Carriers World 2014Opportunities that the Cloud Brings for Carriers @ Carriers World 2014
Opportunities that the Cloud Brings for Carriers @ Carriers World 2014Ian Massingham
 
Aws what is cloud computing deck 08 14 13
Aws what is cloud computing deck 08 14 13Aws what is cloud computing deck 08 14 13
Aws what is cloud computing deck 08 14 13Amazon Web Services
 
클라우드 기반 데이터 분석 및 인공 지능을 위한 비지니스 혁신 - 윤석찬 (AWS 테크에반젤리스트)
클라우드 기반 데이터 분석 및 인공 지능을 위한 비지니스 혁신 - 윤석찬 (AWS 테크에반젤리스트)클라우드 기반 데이터 분석 및 인공 지능을 위한 비지니스 혁신 - 윤석찬 (AWS 테크에반젤리스트)
클라우드 기반 데이터 분석 및 인공 지능을 위한 비지니스 혁신 - 윤석찬 (AWS 테크에반젤리스트)Amazon Web Services Korea
 
The AWS Big Data Platform – Overview
The AWS Big Data Platform – OverviewThe AWS Big Data Platform – Overview
The AWS Big Data Platform – OverviewAmazon Web Services
 
What is Cloud Computing with Amazon Web Services?
What is Cloud Computing with Amazon Web Services?What is Cloud Computing with Amazon Web Services?
What is Cloud Computing with Amazon Web Services?Amazon Web Services
 
Aberdeen Oil & Gas Event - Enterprise Cloud Adoption Patterns
Aberdeen Oil & Gas Event - Enterprise Cloud Adoption PatternsAberdeen Oil & Gas Event - Enterprise Cloud Adoption Patterns
Aberdeen Oil & Gas Event - Enterprise Cloud Adoption PatternsAmazon Web Services
 

Similaire à Big Data Day LA 2015 - The AWS Big Data Platform by Michael Limcaco of Amazon (20)

Analytics in the Cloud
Analytics in the CloudAnalytics in the Cloud
Analytics in the Cloud
 
AWS re:Invent 2016 recap (part 1)
AWS re:Invent 2016 recap (part 1)AWS re:Invent 2016 recap (part 1)
AWS re:Invent 2016 recap (part 1)
 
AWS re:Invent 2016 Day 1 Keynote re:Cap
AWS re:Invent 2016 Day 1 Keynote re:CapAWS re:Invent 2016 Day 1 Keynote re:Cap
AWS re:Invent 2016 Day 1 Keynote re:Cap
 
AWS re:Invent 2016 Day 1 Keynote re:Cap
AWS re:Invent 2016 Day 1 Keynote re:CapAWS re:Invent 2016 Day 1 Keynote re:Cap
AWS re:Invent 2016 Day 1 Keynote re:Cap
 
AWS Summit 2013 | Singapore - Big Data Analytics, Presented by AWS, Intel and...
AWS Summit 2013 | Singapore - Big Data Analytics, Presented by AWS, Intel and...AWS Summit 2013 | Singapore - Big Data Analytics, Presented by AWS, Intel and...
AWS Summit 2013 | Singapore - Big Data Analytics, Presented by AWS, Intel and...
 
AWS Enterprise Summit - 엔터프라이즈에서의 AWS 클라우드 활용 - Markku Lepisto
AWS Enterprise Summit - 엔터프라이즈에서의 AWS 클라우드 활용 - Markku LepistoAWS Enterprise Summit - 엔터프라이즈에서의 AWS 클라우드 활용 - Markku Lepisto
AWS Enterprise Summit - 엔터프라이즈에서의 AWS 클라우드 활용 - Markku Lepisto
 
AWS Summit Stockholm 2014 – B4 – Business intelligence on AWS
AWS Summit Stockholm 2014 – B4 – Business intelligence on AWSAWS Summit Stockholm 2014 – B4 – Business intelligence on AWS
AWS Summit Stockholm 2014 – B4 – Business intelligence on AWS
 
Understanding AWS Managed Databases and Analytic Services - AWS Innovate Otta...
Understanding AWS Managed Databases and Analytic Services - AWS Innovate Otta...Understanding AWS Managed Databases and Analytic Services - AWS Innovate Otta...
Understanding AWS Managed Databases and Analytic Services - AWS Innovate Otta...
 
Vancouver keynote - AWS Innovate - Sam Elmalak
Vancouver keynote - AWS Innovate - Sam ElmalakVancouver keynote - AWS Innovate - Sam Elmalak
Vancouver keynote - AWS Innovate - Sam Elmalak
 
Building a Big Data & Analytics Platform using AWS
Building a Big Data & Analytics Platform using AWS Building a Big Data & Analytics Platform using AWS
Building a Big Data & Analytics Platform using AWS
 
AWS webinar what is cloud computing 13 09 11
AWS webinar what is cloud computing 13 09 11AWS webinar what is cloud computing 13 09 11
AWS webinar what is cloud computing 13 09 11
 
Introduction to the AWS Cloud from Digital Tuesday Meetup
Introduction to the AWS Cloud from Digital Tuesday MeetupIntroduction to the AWS Cloud from Digital Tuesday Meetup
Introduction to the AWS Cloud from Digital Tuesday Meetup
 
What is Cloud Computing?
What is Cloud Computing?What is Cloud Computing?
What is Cloud Computing?
 
Opportunities that the Cloud Brings for Carriers @ Carriers World 2014
Opportunities that the Cloud Brings for Carriers @ Carriers World 2014Opportunities that the Cloud Brings for Carriers @ Carriers World 2014
Opportunities that the Cloud Brings for Carriers @ Carriers World 2014
 
Aws what is cloud computing deck 08 14 13
Aws what is cloud computing deck 08 14 13Aws what is cloud computing deck 08 14 13
Aws what is cloud computing deck 08 14 13
 
클라우드 기반 데이터 분석 및 인공 지능을 위한 비지니스 혁신 - 윤석찬 (AWS 테크에반젤리스트)
클라우드 기반 데이터 분석 및 인공 지능을 위한 비지니스 혁신 - 윤석찬 (AWS 테크에반젤리스트)클라우드 기반 데이터 분석 및 인공 지능을 위한 비지니스 혁신 - 윤석찬 (AWS 테크에반젤리스트)
클라우드 기반 데이터 분석 및 인공 지능을 위한 비지니스 혁신 - 윤석찬 (AWS 테크에반젤리스트)
 
The AWS Big Data Platform – Overview
The AWS Big Data Platform – OverviewThe AWS Big Data Platform – Overview
The AWS Big Data Platform – Overview
 
What is Cloud Computing with Amazon Web Services?
What is Cloud Computing with Amazon Web Services?What is Cloud Computing with Amazon Web Services?
What is Cloud Computing with Amazon Web Services?
 
2016 AWS Big Data Solution Days
2016 AWS Big Data Solution Days2016 AWS Big Data Solution Days
2016 AWS Big Data Solution Days
 
Aberdeen Oil & Gas Event - Enterprise Cloud Adoption Patterns
Aberdeen Oil & Gas Event - Enterprise Cloud Adoption PatternsAberdeen Oil & Gas Event - Enterprise Cloud Adoption Patterns
Aberdeen Oil & Gas Event - Enterprise Cloud Adoption Patterns
 

Plus de Data Con LA

Data Con LA 2022 Keynotes
Data Con LA 2022 KeynotesData Con LA 2022 Keynotes
Data Con LA 2022 KeynotesData Con LA
 
Data Con LA 2022 Keynotes
Data Con LA 2022 KeynotesData Con LA 2022 Keynotes
Data Con LA 2022 KeynotesData Con LA
 
Data Con LA 2022 Keynote
Data Con LA 2022 KeynoteData Con LA 2022 Keynote
Data Con LA 2022 KeynoteData Con LA
 
Data Con LA 2022 - Startup Showcase
Data Con LA 2022 - Startup ShowcaseData Con LA 2022 - Startup Showcase
Data Con LA 2022 - Startup ShowcaseData Con LA
 
Data Con LA 2022 Keynote
Data Con LA 2022 KeynoteData Con LA 2022 Keynote
Data Con LA 2022 KeynoteData Con LA
 
Data Con LA 2022 - Using Google trends data to build product recommendations
Data Con LA 2022 - Using Google trends data to build product recommendationsData Con LA 2022 - Using Google trends data to build product recommendations
Data Con LA 2022 - Using Google trends data to build product recommendationsData Con LA
 
Data Con LA 2022 - AI Ethics
Data Con LA 2022 - AI EthicsData Con LA 2022 - AI Ethics
Data Con LA 2022 - AI EthicsData Con LA
 
Data Con LA 2022 - Improving disaster response with machine learning
Data Con LA 2022 - Improving disaster response with machine learningData Con LA 2022 - Improving disaster response with machine learning
Data Con LA 2022 - Improving disaster response with machine learningData Con LA
 
Data Con LA 2022 - What's new with MongoDB 6.0 and Atlas
Data Con LA 2022 - What's new with MongoDB 6.0 and AtlasData Con LA 2022 - What's new with MongoDB 6.0 and Atlas
Data Con LA 2022 - What's new with MongoDB 6.0 and AtlasData Con LA
 
Data Con LA 2022 - Real world consumer segmentation
Data Con LA 2022 - Real world consumer segmentationData Con LA 2022 - Real world consumer segmentation
Data Con LA 2022 - Real world consumer segmentationData Con LA
 
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...Data Con LA
 
Data Con LA 2022 - Moving Data at Scale to AWS
Data Con LA 2022 - Moving Data at Scale to AWSData Con LA 2022 - Moving Data at Scale to AWS
Data Con LA 2022 - Moving Data at Scale to AWSData Con LA
 
Data Con LA 2022 - Collaborative Data Exploration using Conversational AI
Data Con LA 2022 - Collaborative Data Exploration using Conversational AIData Con LA 2022 - Collaborative Data Exploration using Conversational AI
Data Con LA 2022 - Collaborative Data Exploration using Conversational AIData Con LA
 
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...Data Con LA
 
Data Con LA 2022 - Intro to Data Science
Data Con LA 2022 - Intro to Data ScienceData Con LA 2022 - Intro to Data Science
Data Con LA 2022 - Intro to Data ScienceData Con LA
 
Data Con LA 2022 - How are NFTs and DeFi Changing Entertainment
Data Con LA 2022 - How are NFTs and DeFi Changing EntertainmentData Con LA 2022 - How are NFTs and DeFi Changing Entertainment
Data Con LA 2022 - How are NFTs and DeFi Changing EntertainmentData Con LA
 
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...Data Con LA
 
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...Data Con LA
 
Data Con LA 2022- Embedding medical journeys with machine learning to improve...
Data Con LA 2022- Embedding medical journeys with machine learning to improve...Data Con LA 2022- Embedding medical journeys with machine learning to improve...
Data Con LA 2022- Embedding medical journeys with machine learning to improve...Data Con LA
 
Data Con LA 2022 - Data Streaming with Kafka
Data Con LA 2022 - Data Streaming with KafkaData Con LA 2022 - Data Streaming with Kafka
Data Con LA 2022 - Data Streaming with KafkaData Con LA
 

Plus de Data Con LA (20)

Data Con LA 2022 Keynotes
Data Con LA 2022 KeynotesData Con LA 2022 Keynotes
Data Con LA 2022 Keynotes
 
Data Con LA 2022 Keynotes
Data Con LA 2022 KeynotesData Con LA 2022 Keynotes
Data Con LA 2022 Keynotes
 
Data Con LA 2022 Keynote
Data Con LA 2022 KeynoteData Con LA 2022 Keynote
Data Con LA 2022 Keynote
 
Data Con LA 2022 - Startup Showcase
Data Con LA 2022 - Startup ShowcaseData Con LA 2022 - Startup Showcase
Data Con LA 2022 - Startup Showcase
 
Data Con LA 2022 Keynote
Data Con LA 2022 KeynoteData Con LA 2022 Keynote
Data Con LA 2022 Keynote
 
Data Con LA 2022 - Using Google trends data to build product recommendations
Data Con LA 2022 - Using Google trends data to build product recommendationsData Con LA 2022 - Using Google trends data to build product recommendations
Data Con LA 2022 - Using Google trends data to build product recommendations
 
Data Con LA 2022 - AI Ethics
Data Con LA 2022 - AI EthicsData Con LA 2022 - AI Ethics
Data Con LA 2022 - AI Ethics
 
Data Con LA 2022 - Improving disaster response with machine learning
Data Con LA 2022 - Improving disaster response with machine learningData Con LA 2022 - Improving disaster response with machine learning
Data Con LA 2022 - Improving disaster response with machine learning
 
Data Con LA 2022 - What's new with MongoDB 6.0 and Atlas
Data Con LA 2022 - What's new with MongoDB 6.0 and AtlasData Con LA 2022 - What's new with MongoDB 6.0 and Atlas
Data Con LA 2022 - What's new with MongoDB 6.0 and Atlas
 
Data Con LA 2022 - Real world consumer segmentation
Data Con LA 2022 - Real world consumer segmentationData Con LA 2022 - Real world consumer segmentation
Data Con LA 2022 - Real world consumer segmentation
 
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
 
Data Con LA 2022 - Moving Data at Scale to AWS
Data Con LA 2022 - Moving Data at Scale to AWSData Con LA 2022 - Moving Data at Scale to AWS
Data Con LA 2022 - Moving Data at Scale to AWS
 
Data Con LA 2022 - Collaborative Data Exploration using Conversational AI
Data Con LA 2022 - Collaborative Data Exploration using Conversational AIData Con LA 2022 - Collaborative Data Exploration using Conversational AI
Data Con LA 2022 - Collaborative Data Exploration using Conversational AI
 
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
 
Data Con LA 2022 - Intro to Data Science
Data Con LA 2022 - Intro to Data ScienceData Con LA 2022 - Intro to Data Science
Data Con LA 2022 - Intro to Data Science
 
Data Con LA 2022 - How are NFTs and DeFi Changing Entertainment
Data Con LA 2022 - How are NFTs and DeFi Changing EntertainmentData Con LA 2022 - How are NFTs and DeFi Changing Entertainment
Data Con LA 2022 - How are NFTs and DeFi Changing Entertainment
 
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
 
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...
 
Data Con LA 2022- Embedding medical journeys with machine learning to improve...
Data Con LA 2022- Embedding medical journeys with machine learning to improve...Data Con LA 2022- Embedding medical journeys with machine learning to improve...
Data Con LA 2022- Embedding medical journeys with machine learning to improve...
 
Data Con LA 2022 - Data Streaming with Kafka
Data Con LA 2022 - Data Streaming with KafkaData Con LA 2022 - Data Streaming with Kafka
Data Con LA 2022 - Data Streaming with Kafka
 

Dernier

Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 

Dernier (20)

Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 

Big Data Day LA 2015 - The AWS Big Data Platform by Michael Limcaco of Amazon

  • 1. AWS Big Data Platform LA Big Data Day 2015 6/27/2015
  • 2. GB TB PB IDC Digital Universe Study projects 44 zettabytes (ZB) by 2020 Source: IDC ZB EB
  • 4. On demand Pay as you go Uniform Available
  • 5.
  • 7. Enterprise Applications Platform Services Administration & Security Core Services Infrastructure Analytics App Services Deployment & Management Mobile Services
  • 8. Infrastructure Regions Points of PresenceAvailability Zones Core Services Storage (Object, Block and Archival) Compute (VMs, Auto-scaling and Load Balancing) Databases (Relational, NoSQL, Caching) Networking (VPC, DX, DNS) CDN Access Control Usage Auditing Monitoring and Logs Administration & Security Key Storage Identity Management Platform Services Deployment & Management One-click web app deployment Dev/ops resource management Resource Templates Push Notifications Mobile Services Mobile Analytics Identity Sync App Services Workflow Transcoding Email Search Queuing & Notifications App streaming Analytics Hadoop Data Pipelines Data Warehouse Real-time Streaming Data Enterprise Applications Virtual Desktops Collaboration and Sharing
  • 9. AWS global footprint US-WEST (N. California) EU-WEST (Ireland) ASIA PAC (Tokyo) ASIA PAC (Singapore) US-WEST (Oregon) SOUTH AMERICA (Sao Paulo) US-EAST (Virginia) GOV CLOUD ASIA-PAC (Sydney) EU-CENT (Frankfurt) ASIA PAC (Beijing)
  • 10. Big Data Pipeline Data Answers Collect Process Analyze Store
  • 11. Collect Process Analyze Store Data Collection and Storage Data Processing Event Processing Data Analysis
  • 12. Object Storage Streaming Ingest NoSQL RDBMS Collect Process Analyze Store Data Collection and Storage Data Processing Event Processing Data Analysis Event Driven Framework
  • 13. S3 Kinesis DynamoDB RDS (RDBMS) AWS Lambda KCL Apps Collect Process Analyze Store Data Collection and Storage Data Processing Event Processing Data Analysis
  • 15. Why is Amazon S3 good for Big Data? • Unlimited number of objects • Files from 1B to 5TB • Encryption • Versioning • Lifecycle management • Deep storage integration Obj ect Cop y S3 Object Copy Region DataCenters
  • 16. Real-time processing High throughput; elastic Easy to use S3, Redshift, DynamoDB Integrations Amazon Kinesis
  • 17. Amazon Kinesis Managed service for streaming ingest & processing Sending Consuming HTTP Post AWS SDK LOG4J Flume Fluentd Get* APIs Kinesis Client Library + Connector Library Apache Storm Amazon Elastic MapReduce AWS Mobile SDK
  • 18. • Streams are made of Shards • Each Shard ingests data up to 1MB/sec, and up to 1000 TPS • Each Shard emits up to 2 MB/sec • All data is stored for 24 hours • Scale Kinesis streams by splitting or merging Shards • Replay data inside of 24Hr. Window Kinesis Stream & Shards Worker Worker Worker Worker Worker Worker Worker
  • 19. Real-time event processing frameworks Kinesis Client Library AWS Lambda
  • 20. Example: Amazon Kinesis + Apache Storm Kinesis Storm Spout Producer Amazon Kinesis Apache Storm ElastiCache (Redis) Node.js Client (D3) http://blogs.aws.amazon.com/bigdata/post/Tx36LYSCY2R0A9B/Implement-a- Real-time-Sliding-Window-Application-Using-Amazon-Kinesis-and-Apache
  • 21. Collect Process Analyze Store Data Collection and Storage Data Processing Event Processing Data Analysis Hadoop MPP Warehouse Machine Learning
  • 22. Collect Process Analyze Store Data Collection and Storage Data Processing Event Processing Data Analysis EMR Redshift Machine Learning
  • 23. Hadoop/HDFS clusters Hive, Spark, MapReduce Easy to use; fully managed On-demand and spot pricing Amazon EMR
  • 24. Easy to add and remove compute capacity on your cluster Match compute demands with cluster sizing. Resizable clusters
  • 25. Spot Instances for task nodes Up to 90% off Amazon EC2 on-demand pricing On-demand for core nodes Standard Amazon EC2 pricing for on-demand capacity Easy to use Spot Instances Meet SLA at predictable cost Exceed SLA at lower cost
  • 26. Amazon S3 as your persistent data store • Separate compute and storage • Resize and shut down Amazon EMR clusters with no data loss • Point multiple Amazon EMR clusters at same data in Amazon S3 EMR EMR Amazon S3
  • 27. create external table if not exists omniturelogs ( … ) row format delimited fields terminated by 't' lines terminated by 'n' location 's3:// my-bucket/input/omniturelogs';
  • 28. create external table if not exists omniturelogs ( … ) row format delimited fields terminated by 't' lines terminated by 'n' location 's3:// my-bucket/input/omniturelogs'; or “hdfs://<namenode>/…” or “/<hdfs directory>”
  • 29.
  • 30. Petabyte scale Massively parallel Relational data warehouse Fully managed Amazon Redshift
  • 31. Amazon Redshift • Leader Node – SQL endpoint – Stores metadata – Coordinates query execution • Compute Nodes – Local, columnar storage – Execute queries in parallel – Load, backup, restore via Amazon S3 10 GigE (HPC) Ingestion Backup Restore JDBC/ODBC
  • 32.
  • 33. Amazon Redshift works with your existing analysis tools JDBC/ODBC Amazon Redshift
  • 34.
  • 35. Easy to use, managed ML Based on tech used internally at Amazon Use data stored in the AWS cloud Deploy models in seconds Amazon Machine Learning
  • 36. Amazon Machine Learning • Binary classification – Will customer buy this product or not buy – Is this email Spam or not Spam • Multiclass classification – Is this movie a comedy, documentary, or thriller • Regression/ forecasting – How many units will sell? – What price will this house sell at?
  • 37. Build & Train model Evaluate and optimize Retrieve predictions 1 2 3 Building smart applications with Amazon ML
  • 38. 1. Model Building: Ingest, Explore and understand your data
  • 39. 1. Model Building: Configuration & Training
  • 40. 2. Evaluate & Optimize
  • 41. Real-time predictions Synchronous, low-latency, high-throughput prediction generation Request through service API or server or mobile SDKs Best for interaction applications that deal with individual data records >>> import boto >>> ml = boto.connect_machinelearning() >>> ml.predict( ml_model_id=’my_model', predict_endpoint=’example_endpoint’, record={’key1':’value1’, ’key2':’value2’}) { 'Prediction': { 'predictedValue': 13.284348, 'details': { 'Algorithm': 'SGD', 'PredictiveModelType': 'REGRESSION’ } } }
  • 42. Batch predictions Asynchronous, large-volume prediction generation Request through service console or API Best for applications that deal with batches of data records >>> import boto >>> ml = boto.connect_machinelearning() >>> model = ml.create_batch_prediction( batch_prediction_id = 'my_batch_prediction’ batch_prediction_data_source_id = ’my_datasource’ ml_model_id = ’my_model', output_uri = 's3://examplebucket/output/’)
  • 43. Query for predictions with batch APIProcess data Raw data Aggregated data Predictions Your application Sample Prediction Flow EMR Amazon ML S3 S3 S3
  • 44. Putting it all together …
  • 45. S3 Kinesis EMR Redshift DynamoDB Collect Process & Analyze Consume Amazon Machine Learning
  • 46. Cloud enables big data processing