SlideShare une entreprise Scribd logo
1  sur  53
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
NORDICS
Clarion Hotel Helsinki
March 21, 2018
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Make your data fly -
Building data platform
in AWS
Kimmo Kantojärvi & Roope Parviainen
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Today’s topics
● We are...
● Architectural evolution
● Making Data DevOps to work
● How to cope with the data challenges
● Our experiences with couple of the components/services
and some tips & tricks
● EMR, Redshift, Airflow, visualization tools
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
We are...
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Kimmo (@kimmokantojarvi)
● Coding architect
● 15 years in data business
● AWS Certified Solutions Architect -
Professional
● Ilves fan
Roope
● Data Architect #HandsDirty
● Professional love for data of 5 years
● Software Development × DW × data
platforms × IoT
● AWS Certified Solutions Architect -
Professional
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
We are a data and
customer value driven
transformation company
▪ 96 % of our 186 clients recommend us
▪ Over 2 million daily users in maintained services
▪ Extensive partner network in tech and insight
1996
FOUNDED
650
EMPLOYEES
6
CITIES
4
COUNTRIES
76MTURNOVER 2017
20%AVG. PROFITABLE
GROWTH PER ANNUM
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
We help our
customers
to create new
services by
understanding their
customers and
managing the change.
We build capabilities
and intelligence that
help develop and
create new business
opportunities.
We build and deliver
new business and
services technologies
and infrastructure.
We chase results
and take care of our
customers and their
services.
Offering
Consulting
and service
design
Data,
analytics and
AI
Digital
services
DevOps and
cloud
services
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Architectural evolution
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
It used to be so simple ;)
Source → ETL → DW → BI
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Today the architecture is much more versatile
and enabled by cloud
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
What happened?
From
● On-premise
● Few key technologies
● Closed solutions from big players
● Investments
● Compute & storage combined
● Data pull/batch
● Schema-on-write
● GUI
● Long projects, big lead times
To
● Cloud
● Various specific technologies
● Open source
● Flexible cost structure
● Separation of compute & storage
● Data push/stream
● Schema-on-read
● Code
● Agile methods, need to deliver fast
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Various options to load & process data
● Traditional
○ SQL
○ ETL tools
○ Integration tools
● APIs
● AWS Services
○ Glue
○ EMR
○ Kinesis
○ IoT
○ EC2/Lambda
○ S3
● Processing/streaming engines
○ Spark
○ Flink
○ Storm
○ Presto/Hive
● Custom code
○ R, Python, etc.
○ Machine learning
Make sure your new systems are
built to share data!
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Offloading data processing with EMR (+ Spark)
● Suitable for processing large amount of data and complex calculations
● Java, Scala, Python
● Combine SQL, Python generators and Spark dataframes - Win-Win!
● Very cost-effective with spot instances
● Some learning curve (understanding configuration, behaviour and
metrics)
● Not all EC2 instance types available
● Ramp-up time ~10min - not ideal for short tasks unless run
continuously
● Testing code locally challenging (e.g. py-test + spark plugin)
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
code.zip
job &
environment
configurations
60 x c3.xlarge process 10B rows in 1 hour = 3,5€
1000 SQL queries replaced with 1000 lines of Python &
Spark
S3
DynamoDB
S3 Redshift
EMR
data
copy/unload
data
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
response =
ec2_client.describe_spot_price_history(
AvailabilityZone='eu-west-1a',
StartTime='2018-03-01',
EndTime='2018-03-21',
InstanceTypes=['c3.xlarge'],
ProductDescriptions=['Linux/UNIX'],
MaxResults=100
)
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
So many data storage options nowadays
● File/object storage
○ S3
● Data warehouses
○ Redshift, Snowflake
● Traditional databases
○ RDS (MySQL, Postgres, MariaDB,
MSSQL, Oracle)
● NoSQL databases
○ DynamoDB
○ MongoDB
○ Cassandra
● In-memory databases
○ Exasol
● GPU databases
○ MapD, BrytlytDB
● Time series databases
○ Kdb+, InfluxDB
● Caches
○ Redis, Memcached
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Redshift performance requires planning & design
● Redshift is cluster and each node has
own data → data distribution affects
query performance and data loading
● Optimal to query few wide tables rather
than join many narrow tables together
○ E.g. data vault modeling a bit challenging
from query performance point of view
● Each table requires minimum storage
→ more nodes → higher minimum
storage
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
In addition to data distribution managing the query queues
(WLM) setup important
● Max 500 concurrent connections per cluster, but only max 50 query
slots
● Each slot takes own share of the memory, 50 slots → memory split to
1/50 parts
● Can be used to control long-running (maybe not so smart) queries
made by users
○ E.g. failover after 5 min to queue with less resources
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
In Spectrum we trust
● Store part of the data in S3 (e.g.
parquet + snappy), access as external
table with SQL
● Separate Spectrum compute layer
● Read-only, still need to process the data
into S3 and Redshift does support only
CSV at the moment
● Athena and Spectrum seem to be faster
if you have no joins but just single table
● VPC support not available yet
https://aws.amazon.com/blogs/
big-data/10-best-practices-for-
amazon-redshift-spectrum/
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Spectrum related wish list
● VPC support
● Write/delete also to allow schema-on-write
● Redshift unload to parquet/avro
● Some control over compute or control over cost structure
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Redshift requires still some maintenance
● Tasks taken care by AWS
○ Backups
○ Resizing
○ Node/disk replacement
○ Query caching
● Built-in maintenance processes which user controls
○ Analyze → Query optimizer needs to know tables
○ Vacuum → Sort data in correct order and free up storage for deleted data
○ Compression → Optimize table compression
● https://github.com/awslabs/amazon-redshift-utils
○ Great toolset for maintenance and reviewing system status
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Some other tips with Redshift
● With 3-year full prepayment break-even after 1 year = commitment
actually only 1 year
○ 5,12TB = 32 x dc2.xlarge = 2 x dc2.8xlarge ≈ $90k/year
○ All upfront 3-years $31k/year
● Publish directly from staging and model later → faster visible results for
business users
● A lot of interesting development going on (especially Spectrum)
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Sharing your data
● APIs
● Integration tools
● BI tools
● AWS services
○ QuickSight
○ Athena
○ API Gateway
○ S3
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Visualizing the data
● First phase to generate value of data is to visualize it
● General purpose BI/Analytics tool does not (always) cope with e.g.
○ vast amount of data
○ special visualisation need
● Right tool for the right purpose, “Mix and match”
○ PowerBI/Birst/Quicksight and custom d3.js / trending tool / Grafana
/ Kibana
○ Multiple data sources
■ Virtualization of data sources
■ Data catalogs and understandability
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Visualizing the data
VS.
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Right tool for the right purpose
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Fast and slow data - same but different
● Platforms have to be able ingest both slow and fast data
○ Batches are simply not enough
○ Data streams & event-driven data loads
● Different endpoints / integrations (SFTP, HTTP REST, MQTT, data
dumps)
● Different data pipelines and databases
○ Even for to same data based on usage needs
○ Orchestration of the whole becomes difficult
○ Parallelism when loading
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Managing the data flow
● Open source
○ Airflow
○ Oozie
○ Luigi
○ Jenkins
● Traditional ETL & Integration tools
● AWS services
○ Batch
○ Step
● Custom code
○ Lambda
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Airflow
● Visualization and management of whole data load
○ SQL
○ Command line
○ Python/Java/etc.
● Suitable for batch loading
● Loads can be generated programmatically based on metadata
● Parallel/multiple loads, managing parallelism
● Load history
● Logs available directly
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Airflow
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Airflow
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Airflow
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Things to consider
● Batch vs. streaming need to be handled separately
● Airflow has some flaws
○ GUI is not always up-to-date
○ Scanning DAG statuses takes time
● If you have a lot of custom code Lambdas running at different times,
how do you manage parallelism and how to monitor
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Making Data DevOps to work
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Data DevOps
● Target to achieve deployment processes similar to software projects
● Was not even possible earlier, because of poor support in traditional
tools
● To make it effective and scalable should be metadata driven
○ Code generated based on metadata
● Need to focus in following good coding practices
● Version management for everything
○ Infrastructure as a code
○ Recursive schema changes
○ Data load changes
○ Report changes?
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Data DevOps - Agile Data Engine
● Based on our previous experience/projects,
now formalized and bundled as a product
● Enabled by AWS services, difficult to
implement on-premises
● Design once, deploy multiple runtime
environments
● Functionality
○ Data modelling, Load Mapping, Data Vault
Automation
○ Continuous Deployment Management
○ Metadata Driven ELT Execution and Concurrency
Control
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Data modeling and why data vault
● Data vault is modeling and development method
● Hub = business entity, Satellite = all details, Link = join between
entities
● Well defined principles for developing, naming conventions, etc.
H_ORDER
S_ORDER
L_CUSTOMER
_ORDER
H_CUSTOMER
S_CUSTOMER
1 *
1
*
1*
1
*
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Data vault is one of the key enabler for increasing speed
with schema-on-write approach
● Data model split into pieces allowing loads in multiple steps/parts
● Data loads can be auto-generated
● Many-to-many links allow representing any business situation
● Built-in storing history of changes with satellite structure
● Standard development model allows easier personnel changes
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
How to survive with the data
challenges
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
#saddata
● Data you forced to collect even though no-one wants it as a customer
& no-one needs it in your business & no-one can find or utilize - Jarno
Kartela, AWS Summit Stockholm, 2017
● So basically consider what data you are collecting, it all adds some
maintenance overhead and need to keep GDPR in mind
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Handling malicious data
● Typically not considered
● Source could be 3rd party service or system which has poor data
validation/handling
● Probably best to create separate landing account and run security
check to the data before pushing forward
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Simple tasks to secure data
● Encrypt
○ S3 buckets
○ RDS & Redshift
○ EBS volumes
● Just block accesses
○ Network ACL
○ Security groups
○ S3 bucket policies
● Setup notifications on changes
● Prevent opening access
{ "Version": "2008-10-17",
"Statement": [
"Effect": "Deny",
"Action": "*",
"Resource": "arn:aws:s3:::my-bucket/*",
"Condition": {
"StringNotEqualsIfExists": {
"aws:SourceVpc": "vpc-abcdefg"
},
"NotIpAddressIfExists": {
"aws:SourceIp": [
"1.1.1.1/32" ]
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
There is no single data platform to answer all your needs
● How do you remove customer data from parquet files in S3 (as
required in GDPR)
● How do you manage access to S3, Redshift, Tableau, etc. in
centralized manner
● No centralized metadata management (maybe Glue in the future)
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Credits
Harri Kallio
Tero Honko
Thank you!
Questions?

Contenu connexe

Tendances

Leadership Session: AWS Semiconductor (MFG201-L) - AWS re:Invent 2018
Leadership Session: AWS Semiconductor (MFG201-L) - AWS re:Invent 2018Leadership Session: AWS Semiconductor (MFG201-L) - AWS re:Invent 2018
Leadership Session: AWS Semiconductor (MFG201-L) - AWS re:Invent 2018Amazon Web Services
 
Enabling Edge Processing & Surgical Suite Integration with AWS Snowball Edge ...
Enabling Edge Processing & Surgical Suite Integration with AWS Snowball Edge ...Enabling Edge Processing & Surgical Suite Integration with AWS Snowball Edge ...
Enabling Edge Processing & Surgical Suite Integration with AWS Snowball Edge ...Amazon Web Services
 
Driving Machine Learning and Analytics Use Cases with AWS Storage (STG302) - ...
Driving Machine Learning and Analytics Use Cases with AWS Storage (STG302) - ...Driving Machine Learning and Analytics Use Cases with AWS Storage (STG302) - ...
Driving Machine Learning and Analytics Use Cases with AWS Storage (STG302) - ...Amazon Web Services
 
WuXi NextCODE Scales up Genomic Sequencing on AWS (ANT210-S) - AWS re:Invent ...
WuXi NextCODE Scales up Genomic Sequencing on AWS (ANT210-S) - AWS re:Invent ...WuXi NextCODE Scales up Genomic Sequencing on AWS (ANT210-S) - AWS re:Invent ...
WuXi NextCODE Scales up Genomic Sequencing on AWS (ANT210-S) - AWS re:Invent ...Amazon Web Services
 
Build AWS Skills Through Community-Led User Groups (DVC202) - AWS reInvent 20...
Build AWS Skills Through Community-Led User Groups (DVC202) - AWS reInvent 20...Build AWS Skills Through Community-Led User Groups (DVC202) - AWS reInvent 20...
Build AWS Skills Through Community-Led User Groups (DVC202) - AWS reInvent 20...Amazon Web Services
 
Edge Computing with AWS Greengrass
Edge Computing with AWS Greengrass Edge Computing with AWS Greengrass
Edge Computing with AWS Greengrass Amazon Web Services
 
The Intelligent Edge for IoT: Help Customers Harness the Power of Connected I...
The Intelligent Edge for IoT: Help Customers Harness the Power of Connected I...The Intelligent Edge for IoT: Help Customers Harness the Power of Connected I...
The Intelligent Edge for IoT: Help Customers Harness the Power of Connected I...Amazon Web Services
 
AWS Data-Driven Insights Learning Series ANZ Sep 2019 Part 1
AWS Data-Driven Insights Learning Series ANZ Sep 2019 Part 1AWS Data-Driven Insights Learning Series ANZ Sep 2019 Part 1
AWS Data-Driven Insights Learning Series ANZ Sep 2019 Part 1Amazon Web Services
 
Searching Your Data with Amazon Elasticsearch Service (ANT384) - AWS re:Inven...
Searching Your Data with Amazon Elasticsearch Service (ANT384) - AWS re:Inven...Searching Your Data with Amazon Elasticsearch Service (ANT384) - AWS re:Inven...
Searching Your Data with Amazon Elasticsearch Service (ANT384) - AWS re:Inven...Amazon Web Services
 
GPSWKS301_Comprehensive Big Data Architecture Made Easy
GPSWKS301_Comprehensive Big Data Architecture Made EasyGPSWKS301_Comprehensive Big Data Architecture Made Easy
GPSWKS301_Comprehensive Big Data Architecture Made EasyAmazon Web Services
 
Data Warehouses & Data Lakes: Data Analytics Week at the SF Loft
Data Warehouses & Data Lakes: Data Analytics Week at the SF LoftData Warehouses & Data Lakes: Data Analytics Week at the SF Loft
Data Warehouses & Data Lakes: Data Analytics Week at the SF LoftAmazon Web Services
 
Breaking the Ice: Transform Cold Archival Data into Fresh Insights (STG355) -...
Breaking the Ice: Transform Cold Archival Data into Fresh Insights (STG355) -...Breaking the Ice: Transform Cold Archival Data into Fresh Insights (STG355) -...
Breaking the Ice: Transform Cold Archival Data into Fresh Insights (STG355) -...Amazon Web Services
 
Using data lifecycle management
Using data lifecycle managementUsing data lifecycle management
Using data lifecycle managementInterfacing
 
Build a Real-Time Sales Analytics Dashboard with Amazon ElastiCache for Redis...
Build a Real-Time Sales Analytics Dashboard with Amazon ElastiCache for Redis...Build a Real-Time Sales Analytics Dashboard with Amazon ElastiCache for Redis...
Build a Real-Time Sales Analytics Dashboard with Amazon ElastiCache for Redis...Amazon Web Services
 
NetApp Cloud Data Services & AWS Empower Your Cloud Champions
NetApp Cloud Data Services & AWS Empower Your Cloud ChampionsNetApp Cloud Data Services & AWS Empower Your Cloud Champions
NetApp Cloud Data Services & AWS Empower Your Cloud ChampionsAmazon Web Services
 
End User Collaboration on AWS - AWS Online Tech Talks
End User Collaboration on AWS - AWS Online Tech TalksEnd User Collaboration on AWS - AWS Online Tech Talks
End User Collaboration on AWS - AWS Online Tech TalksAmazon Web Services
 
Building IoT Analytics (IOT327-R1) - AWS re:Invent 2018
Building IoT Analytics (IOT327-R1) - AWS re:Invent 2018Building IoT Analytics (IOT327-R1) - AWS re:Invent 2018
Building IoT Analytics (IOT327-R1) - AWS re:Invent 2018Amazon Web Services
 
Accelerating AWS Migrations Through Agile Transformation (DEV202-S) - AWS re:...
Accelerating AWS Migrations Through Agile Transformation (DEV202-S) - AWS re:...Accelerating AWS Migrations Through Agile Transformation (DEV202-S) - AWS re:...
Accelerating AWS Migrations Through Agile Transformation (DEV202-S) - AWS re:...Amazon Web Services
 

Tendances (20)

Leadership Session: AWS Semiconductor (MFG201-L) - AWS re:Invent 2018
Leadership Session: AWS Semiconductor (MFG201-L) - AWS re:Invent 2018Leadership Session: AWS Semiconductor (MFG201-L) - AWS re:Invent 2018
Leadership Session: AWS Semiconductor (MFG201-L) - AWS re:Invent 2018
 
Enabling Edge Processing & Surgical Suite Integration with AWS Snowball Edge ...
Enabling Edge Processing & Surgical Suite Integration with AWS Snowball Edge ...Enabling Edge Processing & Surgical Suite Integration with AWS Snowball Edge ...
Enabling Edge Processing & Surgical Suite Integration with AWS Snowball Edge ...
 
Driving Machine Learning and Analytics Use Cases with AWS Storage (STG302) - ...
Driving Machine Learning and Analytics Use Cases with AWS Storage (STG302) - ...Driving Machine Learning and Analytics Use Cases with AWS Storage (STG302) - ...
Driving Machine Learning and Analytics Use Cases with AWS Storage (STG302) - ...
 
WuXi NextCODE Scales up Genomic Sequencing on AWS (ANT210-S) - AWS re:Invent ...
WuXi NextCODE Scales up Genomic Sequencing on AWS (ANT210-S) - AWS re:Invent ...WuXi NextCODE Scales up Genomic Sequencing on AWS (ANT210-S) - AWS re:Invent ...
WuXi NextCODE Scales up Genomic Sequencing on AWS (ANT210-S) - AWS re:Invent ...
 
Build AWS Skills Through Community-Led User Groups (DVC202) - AWS reInvent 20...
Build AWS Skills Through Community-Led User Groups (DVC202) - AWS reInvent 20...Build AWS Skills Through Community-Led User Groups (DVC202) - AWS reInvent 20...
Build AWS Skills Through Community-Led User Groups (DVC202) - AWS reInvent 20...
 
Edge Computing with AWS Greengrass
Edge Computing with AWS Greengrass Edge Computing with AWS Greengrass
Edge Computing with AWS Greengrass
 
The Intelligent Edge for IoT: Help Customers Harness the Power of Connected I...
The Intelligent Edge for IoT: Help Customers Harness the Power of Connected I...The Intelligent Edge for IoT: Help Customers Harness the Power of Connected I...
The Intelligent Edge for IoT: Help Customers Harness the Power of Connected I...
 
AWS Data-Driven Insights Learning Series ANZ Sep 2019 Part 1
AWS Data-Driven Insights Learning Series ANZ Sep 2019 Part 1AWS Data-Driven Insights Learning Series ANZ Sep 2019 Part 1
AWS Data-Driven Insights Learning Series ANZ Sep 2019 Part 1
 
Searching Your Data with Amazon Elasticsearch Service (ANT384) - AWS re:Inven...
Searching Your Data with Amazon Elasticsearch Service (ANT384) - AWS re:Inven...Searching Your Data with Amazon Elasticsearch Service (ANT384) - AWS re:Inven...
Searching Your Data with Amazon Elasticsearch Service (ANT384) - AWS re:Inven...
 
Teodor's Accreditation
Teodor's AccreditationTeodor's Accreditation
Teodor's Accreditation
 
GPSWKS301_Comprehensive Big Data Architecture Made Easy
GPSWKS301_Comprehensive Big Data Architecture Made EasyGPSWKS301_Comprehensive Big Data Architecture Made Easy
GPSWKS301_Comprehensive Big Data Architecture Made Easy
 
Data Warehouses & Data Lakes: Data Analytics Week at the SF Loft
Data Warehouses & Data Lakes: Data Analytics Week at the SF LoftData Warehouses & Data Lakes: Data Analytics Week at the SF Loft
Data Warehouses & Data Lakes: Data Analytics Week at the SF Loft
 
Breaking the Ice: Transform Cold Archival Data into Fresh Insights (STG355) -...
Breaking the Ice: Transform Cold Archival Data into Fresh Insights (STG355) -...Breaking the Ice: Transform Cold Archival Data into Fresh Insights (STG355) -...
Breaking the Ice: Transform Cold Archival Data into Fresh Insights (STG355) -...
 
Using data lifecycle management
Using data lifecycle managementUsing data lifecycle management
Using data lifecycle management
 
Build a Real-Time Sales Analytics Dashboard with Amazon ElastiCache for Redis...
Build a Real-Time Sales Analytics Dashboard with Amazon ElastiCache for Redis...Build a Real-Time Sales Analytics Dashboard with Amazon ElastiCache for Redis...
Build a Real-Time Sales Analytics Dashboard with Amazon ElastiCache for Redis...
 
NetApp Cloud Data Services & AWS Empower Your Cloud Champions
NetApp Cloud Data Services & AWS Empower Your Cloud ChampionsNetApp Cloud Data Services & AWS Empower Your Cloud Champions
NetApp Cloud Data Services & AWS Empower Your Cloud Champions
 
CurrencyCloud and AWS
CurrencyCloud and AWSCurrencyCloud and AWS
CurrencyCloud and AWS
 
End User Collaboration on AWS - AWS Online Tech Talks
End User Collaboration on AWS - AWS Online Tech TalksEnd User Collaboration on AWS - AWS Online Tech Talks
End User Collaboration on AWS - AWS Online Tech Talks
 
Building IoT Analytics (IOT327-R1) - AWS re:Invent 2018
Building IoT Analytics (IOT327-R1) - AWS re:Invent 2018Building IoT Analytics (IOT327-R1) - AWS re:Invent 2018
Building IoT Analytics (IOT327-R1) - AWS re:Invent 2018
 
Accelerating AWS Migrations Through Agile Transformation (DEV202-S) - AWS re:...
Accelerating AWS Migrations Through Agile Transformation (DEV202-S) - AWS re:...Accelerating AWS Migrations Through Agile Transformation (DEV202-S) - AWS re:...
Accelerating AWS Migrations Through Agile Transformation (DEV202-S) - AWS re:...
 

Similaire à Make your data fly - Building data platform in AWS

Using data lakes to quench your analytics fire - AWS Summit Cape Town 2018
Using data lakes to quench your analytics fire - AWS Summit Cape Town 2018Using data lakes to quench your analytics fire - AWS Summit Cape Town 2018
Using data lakes to quench your analytics fire - AWS Summit Cape Town 2018Amazon Web Services
 
Design Choices for Cloud Data Platforms
Design Choices for Cloud Data PlatformsDesign Choices for Cloud Data Platforms
Design Choices for Cloud Data PlatformsAshish Mrig
 
Big Data in 200 km/h | AWS Big Data Demystified #1.3
Big Data in 200 km/h | AWS Big Data Demystified #1.3  Big Data in 200 km/h | AWS Big Data Demystified #1.3
Big Data in 200 km/h | AWS Big Data Demystified #1.3 Omid Vahdaty
 
Loading Data into Amazon Redshift
Loading Data into Amazon RedshiftLoading Data into Amazon Redshift
Loading Data into Amazon RedshiftAmazon Web Services
 
Building Serverless Analytics Pipelines with AWS Glue (ANT308) - AWS re:Inven...
Building Serverless Analytics Pipelines with AWS Glue (ANT308) - AWS re:Inven...Building Serverless Analytics Pipelines with AWS Glue (ANT308) - AWS re:Inven...
Building Serverless Analytics Pipelines with AWS Glue (ANT308) - AWS re:Inven...Amazon Web Services
 
Big Data Analytics Architectural Patterns and Best Practices (ANT201-R1) - AW...
Big Data Analytics Architectural Patterns and Best Practices (ANT201-R1) - AW...Big Data Analytics Architectural Patterns and Best Practices (ANT201-R1) - AW...
Big Data Analytics Architectural Patterns and Best Practices (ANT201-R1) - AW...Amazon Web Services
 
AWS Data Transfer Services: Deep Dive - SRV302 - Chicago AWS Summit
AWS Data Transfer Services: Deep Dive - SRV302 - Chicago AWS SummitAWS Data Transfer Services: Deep Dive - SRV302 - Chicago AWS Summit
AWS Data Transfer Services: Deep Dive - SRV302 - Chicago AWS SummitAmazon Web Services
 
Data Transformation Patterns in AWS - AWS Online Tech Talks
Data Transformation Patterns in AWS - AWS Online Tech TalksData Transformation Patterns in AWS - AWS Online Tech Talks
Data Transformation Patterns in AWS - AWS Online Tech TalksAmazon Web Services
 
Loading Data into Redshift: Data Analytics Week at the SF Loft
Loading Data into Redshift: Data Analytics Week at the SF LoftLoading Data into Redshift: Data Analytics Week at the SF Loft
Loading Data into Redshift: Data Analytics Week at the SF LoftAmazon Web Services
 
AWS Big Data Demystified #1: Big data architecture lessons learned
AWS Big Data Demystified #1: Big data architecture lessons learned AWS Big Data Demystified #1: Big data architecture lessons learned
AWS Big Data Demystified #1: Big data architecture lessons learned Omid Vahdaty
 
Big data journey to the cloud rohit pujari 5.30.18
Big data journey to the cloud   rohit pujari 5.30.18Big data journey to the cloud   rohit pujari 5.30.18
Big data journey to the cloud rohit pujari 5.30.18Cloudera, Inc.
 
Loading Data into Redshift: Data Analytics Week SF
Loading Data into Redshift: Data Analytics Week SFLoading Data into Redshift: Data Analytics Week SF
Loading Data into Redshift: Data Analytics Week SFAmazon Web Services
 
Accelerate Productivity by Computing at the Edge - AWS Online Tech Talks
Accelerate Productivity by Computing at the Edge - AWS Online Tech TalksAccelerate Productivity by Computing at the Edge - AWS Online Tech Talks
Accelerate Productivity by Computing at the Edge - AWS Online Tech TalksAmazon Web Services
 
Loading Data into Redshift with Lab
Loading Data into Redshift with LabLoading Data into Redshift with Lab
Loading Data into Redshift with LabAmazon Web Services
 

Similaire à Make your data fly - Building data platform in AWS (20)

Migrating database to cloud
Migrating database to cloudMigrating database to cloud
Migrating database to cloud
 
Using data lakes to quench your analytics fire - AWS Summit Cape Town 2018
Using data lakes to quench your analytics fire - AWS Summit Cape Town 2018Using data lakes to quench your analytics fire - AWS Summit Cape Town 2018
Using data lakes to quench your analytics fire - AWS Summit Cape Town 2018
 
Design Choices for Cloud Data Platforms
Design Choices for Cloud Data PlatformsDesign Choices for Cloud Data Platforms
Design Choices for Cloud Data Platforms
 
Big Data in 200 km/h | AWS Big Data Demystified #1.3
Big Data in 200 km/h | AWS Big Data Demystified #1.3  Big Data in 200 km/h | AWS Big Data Demystified #1.3
Big Data in 200 km/h | AWS Big Data Demystified #1.3
 
Loading Data into Amazon Redshift
Loading Data into Amazon RedshiftLoading Data into Amazon Redshift
Loading Data into Amazon Redshift
 
Building Serverless Analytics Pipelines with AWS Glue (ANT308) - AWS re:Inven...
Building Serverless Analytics Pipelines with AWS Glue (ANT308) - AWS re:Inven...Building Serverless Analytics Pipelines with AWS Glue (ANT308) - AWS re:Inven...
Building Serverless Analytics Pipelines with AWS Glue (ANT308) - AWS re:Inven...
 
Data Warehouses and Data Lakes
Data Warehouses and Data LakesData Warehouses and Data Lakes
Data Warehouses and Data Lakes
 
Big Data Analytics Architectural Patterns and Best Practices (ANT201-R1) - AW...
Big Data Analytics Architectural Patterns and Best Practices (ANT201-R1) - AW...Big Data Analytics Architectural Patterns and Best Practices (ANT201-R1) - AW...
Big Data Analytics Architectural Patterns and Best Practices (ANT201-R1) - AW...
 
Data Warehouses and Data Lakes
Data Warehouses and Data LakesData Warehouses and Data Lakes
Data Warehouses and Data Lakes
 
Data Warehouses and Data Lakes
Data Warehouses and Data LakesData Warehouses and Data Lakes
Data Warehouses and Data Lakes
 
AWS Data Transfer Services: Deep Dive - SRV302 - Chicago AWS Summit
AWS Data Transfer Services: Deep Dive - SRV302 - Chicago AWS SummitAWS Data Transfer Services: Deep Dive - SRV302 - Chicago AWS Summit
AWS Data Transfer Services: Deep Dive - SRV302 - Chicago AWS Summit
 
Data Transformation Patterns in AWS - AWS Online Tech Talks
Data Transformation Patterns in AWS - AWS Online Tech TalksData Transformation Patterns in AWS - AWS Online Tech Talks
Data Transformation Patterns in AWS - AWS Online Tech Talks
 
Loading Data into Redshift: Data Analytics Week at the SF Loft
Loading Data into Redshift: Data Analytics Week at the SF LoftLoading Data into Redshift: Data Analytics Week at the SF Loft
Loading Data into Redshift: Data Analytics Week at the SF Loft
 
Loading Data into Redshift
Loading Data into RedshiftLoading Data into Redshift
Loading Data into Redshift
 
AWS Big Data Demystified #1: Big data architecture lessons learned
AWS Big Data Demystified #1: Big data architecture lessons learned AWS Big Data Demystified #1: Big data architecture lessons learned
AWS Big Data Demystified #1: Big data architecture lessons learned
 
Data Warehouses and Data Lakes
Data Warehouses and Data LakesData Warehouses and Data Lakes
Data Warehouses and Data Lakes
 
Big data journey to the cloud rohit pujari 5.30.18
Big data journey to the cloud   rohit pujari 5.30.18Big data journey to the cloud   rohit pujari 5.30.18
Big data journey to the cloud rohit pujari 5.30.18
 
Loading Data into Redshift: Data Analytics Week SF
Loading Data into Redshift: Data Analytics Week SFLoading Data into Redshift: Data Analytics Week SF
Loading Data into Redshift: Data Analytics Week SF
 
Accelerate Productivity by Computing at the Edge - AWS Online Tech Talks
Accelerate Productivity by Computing at the Edge - AWS Online Tech TalksAccelerate Productivity by Computing at the Edge - AWS Online Tech Talks
Accelerate Productivity by Computing at the Edge - AWS Online Tech Talks
 
Loading Data into Redshift with Lab
Loading Data into Redshift with LabLoading Data into Redshift with Lab
Loading Data into Redshift with Lab
 

Dernier

No Advance 8868886958 Chandigarh Call Girls , Indian Call Girls For Full Nigh...
No Advance 8868886958 Chandigarh Call Girls , Indian Call Girls For Full Nigh...No Advance 8868886958 Chandigarh Call Girls , Indian Call Girls For Full Nigh...
No Advance 8868886958 Chandigarh Call Girls , Indian Call Girls For Full Nigh...Sheetaleventcompany
 
OSCamp Kubernetes 2024 | SRE Challenges in Monolith to Microservices Shift at...
OSCamp Kubernetes 2024 | SRE Challenges in Monolith to Microservices Shift at...OSCamp Kubernetes 2024 | SRE Challenges in Monolith to Microservices Shift at...
OSCamp Kubernetes 2024 | SRE Challenges in Monolith to Microservices Shift at...NETWAYS
 
Philippine History cavite Mutiny Report.ppt
Philippine History cavite Mutiny Report.pptPhilippine History cavite Mutiny Report.ppt
Philippine History cavite Mutiny Report.pptssuser319dad
 
Mohammad_Alnahdi_Oral_Presentation_Assignment.pptx
Mohammad_Alnahdi_Oral_Presentation_Assignment.pptxMohammad_Alnahdi_Oral_Presentation_Assignment.pptx
Mohammad_Alnahdi_Oral_Presentation_Assignment.pptxmohammadalnahdi22
 
Russian Call Girls in Kolkata Vaishnavi 🤌 8250192130 🚀 Vip Call Girls Kolkata
Russian Call Girls in Kolkata Vaishnavi 🤌  8250192130 🚀 Vip Call Girls KolkataRussian Call Girls in Kolkata Vaishnavi 🤌  8250192130 🚀 Vip Call Girls Kolkata
Russian Call Girls in Kolkata Vaishnavi 🤌 8250192130 🚀 Vip Call Girls Kolkataanamikaraghav4
 
Open Source Camp Kubernetes 2024 | Monitoring Kubernetes With Icinga by Eric ...
Open Source Camp Kubernetes 2024 | Monitoring Kubernetes With Icinga by Eric ...Open Source Camp Kubernetes 2024 | Monitoring Kubernetes With Icinga by Eric ...
Open Source Camp Kubernetes 2024 | Monitoring Kubernetes With Icinga by Eric ...NETWAYS
 
Andrés Ramírez Gossler, Facundo Schinnea - eCommerce Day Chile 2024
Andrés Ramírez Gossler, Facundo Schinnea - eCommerce Day Chile 2024Andrés Ramírez Gossler, Facundo Schinnea - eCommerce Day Chile 2024
Andrés Ramírez Gossler, Facundo Schinnea - eCommerce Day Chile 2024eCommerce Institute
 
SaaStr Workshop Wednesday w: Jason Lemkin, SaaStr
SaaStr Workshop Wednesday w: Jason Lemkin, SaaStrSaaStr Workshop Wednesday w: Jason Lemkin, SaaStr
SaaStr Workshop Wednesday w: Jason Lemkin, SaaStrsaastr
 
CTAC 2024 Valencia - Sven Zoelle - Most Crucial Invest to Digitalisation_slid...
CTAC 2024 Valencia - Sven Zoelle - Most Crucial Invest to Digitalisation_slid...CTAC 2024 Valencia - Sven Zoelle - Most Crucial Invest to Digitalisation_slid...
CTAC 2024 Valencia - Sven Zoelle - Most Crucial Invest to Digitalisation_slid...henrik385807
 
Governance and Nation-Building in Nigeria: Some Reflections on Options for Po...
Governance and Nation-Building in Nigeria: Some Reflections on Options for Po...Governance and Nation-Building in Nigeria: Some Reflections on Options for Po...
Governance and Nation-Building in Nigeria: Some Reflections on Options for Po...Kayode Fayemi
 
call girls in delhi malviya nagar @9811711561@
call girls in delhi malviya nagar @9811711561@call girls in delhi malviya nagar @9811711561@
call girls in delhi malviya nagar @9811711561@vikas rana
 
Motivation and Theory Maslow and Murray pdf
Motivation and Theory Maslow and Murray pdfMotivation and Theory Maslow and Murray pdf
Motivation and Theory Maslow and Murray pdfakankshagupta7348026
 
Call Girl Number in Khar Mumbai📲 9892124323 💞 Full Night Enjoy
Call Girl Number in Khar Mumbai📲 9892124323 💞 Full Night EnjoyCall Girl Number in Khar Mumbai📲 9892124323 💞 Full Night Enjoy
Call Girl Number in Khar Mumbai📲 9892124323 💞 Full Night EnjoyPooja Nehwal
 
VVIP Call Girls Nalasopara : 9892124323, Call Girls in Nalasopara Services
VVIP Call Girls Nalasopara : 9892124323, Call Girls in Nalasopara ServicesVVIP Call Girls Nalasopara : 9892124323, Call Girls in Nalasopara Services
VVIP Call Girls Nalasopara : 9892124323, Call Girls in Nalasopara ServicesPooja Nehwal
 
Microsoft Copilot AI for Everyone - created by AI
Microsoft Copilot AI for Everyone - created by AIMicrosoft Copilot AI for Everyone - created by AI
Microsoft Copilot AI for Everyone - created by AITatiana Gurgel
 
OSCamp Kubernetes 2024 | Zero-Touch OS-Infrastruktur für Container und Kubern...
OSCamp Kubernetes 2024 | Zero-Touch OS-Infrastruktur für Container und Kubern...OSCamp Kubernetes 2024 | Zero-Touch OS-Infrastruktur für Container und Kubern...
OSCamp Kubernetes 2024 | Zero-Touch OS-Infrastruktur für Container und Kubern...NETWAYS
 
Open Source Camp Kubernetes 2024 | Running WebAssembly on Kubernetes by Alex ...
Open Source Camp Kubernetes 2024 | Running WebAssembly on Kubernetes by Alex ...Open Source Camp Kubernetes 2024 | Running WebAssembly on Kubernetes by Alex ...
Open Source Camp Kubernetes 2024 | Running WebAssembly on Kubernetes by Alex ...NETWAYS
 
Open Source Strategy in Logistics 2015_Henrik Hankedvz-d-nl-log-conference.pdf
Open Source Strategy in Logistics 2015_Henrik Hankedvz-d-nl-log-conference.pdfOpen Source Strategy in Logistics 2015_Henrik Hankedvz-d-nl-log-conference.pdf
Open Source Strategy in Logistics 2015_Henrik Hankedvz-d-nl-log-conference.pdfhenrik385807
 
Re-membering the Bard: Revisiting The Compleat Wrks of Wllm Shkspr (Abridged)...
Re-membering the Bard: Revisiting The Compleat Wrks of Wllm Shkspr (Abridged)...Re-membering the Bard: Revisiting The Compleat Wrks of Wllm Shkspr (Abridged)...
Re-membering the Bard: Revisiting The Compleat Wrks of Wllm Shkspr (Abridged)...Hasting Chen
 
Genesis part 2 Isaiah Scudder 04-24-2024.pptx
Genesis part 2 Isaiah Scudder 04-24-2024.pptxGenesis part 2 Isaiah Scudder 04-24-2024.pptx
Genesis part 2 Isaiah Scudder 04-24-2024.pptxFamilyWorshipCenterD
 

Dernier (20)

No Advance 8868886958 Chandigarh Call Girls , Indian Call Girls For Full Nigh...
No Advance 8868886958 Chandigarh Call Girls , Indian Call Girls For Full Nigh...No Advance 8868886958 Chandigarh Call Girls , Indian Call Girls For Full Nigh...
No Advance 8868886958 Chandigarh Call Girls , Indian Call Girls For Full Nigh...
 
OSCamp Kubernetes 2024 | SRE Challenges in Monolith to Microservices Shift at...
OSCamp Kubernetes 2024 | SRE Challenges in Monolith to Microservices Shift at...OSCamp Kubernetes 2024 | SRE Challenges in Monolith to Microservices Shift at...
OSCamp Kubernetes 2024 | SRE Challenges in Monolith to Microservices Shift at...
 
Philippine History cavite Mutiny Report.ppt
Philippine History cavite Mutiny Report.pptPhilippine History cavite Mutiny Report.ppt
Philippine History cavite Mutiny Report.ppt
 
Mohammad_Alnahdi_Oral_Presentation_Assignment.pptx
Mohammad_Alnahdi_Oral_Presentation_Assignment.pptxMohammad_Alnahdi_Oral_Presentation_Assignment.pptx
Mohammad_Alnahdi_Oral_Presentation_Assignment.pptx
 
Russian Call Girls in Kolkata Vaishnavi 🤌 8250192130 🚀 Vip Call Girls Kolkata
Russian Call Girls in Kolkata Vaishnavi 🤌  8250192130 🚀 Vip Call Girls KolkataRussian Call Girls in Kolkata Vaishnavi 🤌  8250192130 🚀 Vip Call Girls Kolkata
Russian Call Girls in Kolkata Vaishnavi 🤌 8250192130 🚀 Vip Call Girls Kolkata
 
Open Source Camp Kubernetes 2024 | Monitoring Kubernetes With Icinga by Eric ...
Open Source Camp Kubernetes 2024 | Monitoring Kubernetes With Icinga by Eric ...Open Source Camp Kubernetes 2024 | Monitoring Kubernetes With Icinga by Eric ...
Open Source Camp Kubernetes 2024 | Monitoring Kubernetes With Icinga by Eric ...
 
Andrés Ramírez Gossler, Facundo Schinnea - eCommerce Day Chile 2024
Andrés Ramírez Gossler, Facundo Schinnea - eCommerce Day Chile 2024Andrés Ramírez Gossler, Facundo Schinnea - eCommerce Day Chile 2024
Andrés Ramírez Gossler, Facundo Schinnea - eCommerce Day Chile 2024
 
SaaStr Workshop Wednesday w: Jason Lemkin, SaaStr
SaaStr Workshop Wednesday w: Jason Lemkin, SaaStrSaaStr Workshop Wednesday w: Jason Lemkin, SaaStr
SaaStr Workshop Wednesday w: Jason Lemkin, SaaStr
 
CTAC 2024 Valencia - Sven Zoelle - Most Crucial Invest to Digitalisation_slid...
CTAC 2024 Valencia - Sven Zoelle - Most Crucial Invest to Digitalisation_slid...CTAC 2024 Valencia - Sven Zoelle - Most Crucial Invest to Digitalisation_slid...
CTAC 2024 Valencia - Sven Zoelle - Most Crucial Invest to Digitalisation_slid...
 
Governance and Nation-Building in Nigeria: Some Reflections on Options for Po...
Governance and Nation-Building in Nigeria: Some Reflections on Options for Po...Governance and Nation-Building in Nigeria: Some Reflections on Options for Po...
Governance and Nation-Building in Nigeria: Some Reflections on Options for Po...
 
call girls in delhi malviya nagar @9811711561@
call girls in delhi malviya nagar @9811711561@call girls in delhi malviya nagar @9811711561@
call girls in delhi malviya nagar @9811711561@
 
Motivation and Theory Maslow and Murray pdf
Motivation and Theory Maslow and Murray pdfMotivation and Theory Maslow and Murray pdf
Motivation and Theory Maslow and Murray pdf
 
Call Girl Number in Khar Mumbai📲 9892124323 💞 Full Night Enjoy
Call Girl Number in Khar Mumbai📲 9892124323 💞 Full Night EnjoyCall Girl Number in Khar Mumbai📲 9892124323 💞 Full Night Enjoy
Call Girl Number in Khar Mumbai📲 9892124323 💞 Full Night Enjoy
 
VVIP Call Girls Nalasopara : 9892124323, Call Girls in Nalasopara Services
VVIP Call Girls Nalasopara : 9892124323, Call Girls in Nalasopara ServicesVVIP Call Girls Nalasopara : 9892124323, Call Girls in Nalasopara Services
VVIP Call Girls Nalasopara : 9892124323, Call Girls in Nalasopara Services
 
Microsoft Copilot AI for Everyone - created by AI
Microsoft Copilot AI for Everyone - created by AIMicrosoft Copilot AI for Everyone - created by AI
Microsoft Copilot AI for Everyone - created by AI
 
OSCamp Kubernetes 2024 | Zero-Touch OS-Infrastruktur für Container und Kubern...
OSCamp Kubernetes 2024 | Zero-Touch OS-Infrastruktur für Container und Kubern...OSCamp Kubernetes 2024 | Zero-Touch OS-Infrastruktur für Container und Kubern...
OSCamp Kubernetes 2024 | Zero-Touch OS-Infrastruktur für Container und Kubern...
 
Open Source Camp Kubernetes 2024 | Running WebAssembly on Kubernetes by Alex ...
Open Source Camp Kubernetes 2024 | Running WebAssembly on Kubernetes by Alex ...Open Source Camp Kubernetes 2024 | Running WebAssembly on Kubernetes by Alex ...
Open Source Camp Kubernetes 2024 | Running WebAssembly on Kubernetes by Alex ...
 
Open Source Strategy in Logistics 2015_Henrik Hankedvz-d-nl-log-conference.pdf
Open Source Strategy in Logistics 2015_Henrik Hankedvz-d-nl-log-conference.pdfOpen Source Strategy in Logistics 2015_Henrik Hankedvz-d-nl-log-conference.pdf
Open Source Strategy in Logistics 2015_Henrik Hankedvz-d-nl-log-conference.pdf
 
Re-membering the Bard: Revisiting The Compleat Wrks of Wllm Shkspr (Abridged)...
Re-membering the Bard: Revisiting The Compleat Wrks of Wllm Shkspr (Abridged)...Re-membering the Bard: Revisiting The Compleat Wrks of Wllm Shkspr (Abridged)...
Re-membering the Bard: Revisiting The Compleat Wrks of Wllm Shkspr (Abridged)...
 
Genesis part 2 Isaiah Scudder 04-24-2024.pptx
Genesis part 2 Isaiah Scudder 04-24-2024.pptxGenesis part 2 Isaiah Scudder 04-24-2024.pptx
Genesis part 2 Isaiah Scudder 04-24-2024.pptx
 

Make your data fly - Building data platform in AWS

  • 1. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. NORDICS Clarion Hotel Helsinki March 21, 2018
  • 2. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Make your data fly - Building data platform in AWS Kimmo Kantojärvi & Roope Parviainen
  • 3. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Today’s topics ● We are... ● Architectural evolution ● Making Data DevOps to work ● How to cope with the data challenges ● Our experiences with couple of the components/services and some tips & tricks ● EMR, Redshift, Airflow, visualization tools
  • 4. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. We are...
  • 5. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Kimmo (@kimmokantojarvi) ● Coding architect ● 15 years in data business ● AWS Certified Solutions Architect - Professional ● Ilves fan Roope ● Data Architect #HandsDirty ● Professional love for data of 5 years ● Software Development × DW × data platforms × IoT ● AWS Certified Solutions Architect - Professional
  • 6. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. We are a data and customer value driven transformation company ▪ 96 % of our 186 clients recommend us ▪ Over 2 million daily users in maintained services ▪ Extensive partner network in tech and insight 1996 FOUNDED 650 EMPLOYEES 6 CITIES 4 COUNTRIES 76MTURNOVER 2017 20%AVG. PROFITABLE GROWTH PER ANNUM
  • 7. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. We help our customers to create new services by understanding their customers and managing the change. We build capabilities and intelligence that help develop and create new business opportunities. We build and deliver new business and services technologies and infrastructure. We chase results and take care of our customers and their services. Offering Consulting and service design Data, analytics and AI Digital services DevOps and cloud services
  • 8. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
  • 9. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Architectural evolution
  • 10. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. It used to be so simple ;) Source → ETL → DW → BI
  • 11. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Today the architecture is much more versatile and enabled by cloud
  • 12. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. What happened? From ● On-premise ● Few key technologies ● Closed solutions from big players ● Investments ● Compute & storage combined ● Data pull/batch ● Schema-on-write ● GUI ● Long projects, big lead times To ● Cloud ● Various specific technologies ● Open source ● Flexible cost structure ● Separation of compute & storage ● Data push/stream ● Schema-on-read ● Code ● Agile methods, need to deliver fast
  • 13. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
  • 14. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Various options to load & process data ● Traditional ○ SQL ○ ETL tools ○ Integration tools ● APIs ● AWS Services ○ Glue ○ EMR ○ Kinesis ○ IoT ○ EC2/Lambda ○ S3 ● Processing/streaming engines ○ Spark ○ Flink ○ Storm ○ Presto/Hive ● Custom code ○ R, Python, etc. ○ Machine learning Make sure your new systems are built to share data!
  • 15. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Offloading data processing with EMR (+ Spark) ● Suitable for processing large amount of data and complex calculations ● Java, Scala, Python ● Combine SQL, Python generators and Spark dataframes - Win-Win! ● Very cost-effective with spot instances ● Some learning curve (understanding configuration, behaviour and metrics) ● Not all EC2 instance types available ● Ramp-up time ~10min - not ideal for short tasks unless run continuously ● Testing code locally challenging (e.g. py-test + spark plugin)
  • 16. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. code.zip job & environment configurations 60 x c3.xlarge process 10B rows in 1 hour = 3,5€ 1000 SQL queries replaced with 1000 lines of Python & Spark S3 DynamoDB S3 Redshift EMR data copy/unload data
  • 17. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
  • 18. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. response = ec2_client.describe_spot_price_history( AvailabilityZone='eu-west-1a', StartTime='2018-03-01', EndTime='2018-03-21', InstanceTypes=['c3.xlarge'], ProductDescriptions=['Linux/UNIX'], MaxResults=100 )
  • 19. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
  • 20. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. So many data storage options nowadays ● File/object storage ○ S3 ● Data warehouses ○ Redshift, Snowflake ● Traditional databases ○ RDS (MySQL, Postgres, MariaDB, MSSQL, Oracle) ● NoSQL databases ○ DynamoDB ○ MongoDB ○ Cassandra ● In-memory databases ○ Exasol ● GPU databases ○ MapD, BrytlytDB ● Time series databases ○ Kdb+, InfluxDB ● Caches ○ Redis, Memcached
  • 21. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Redshift performance requires planning & design ● Redshift is cluster and each node has own data → data distribution affects query performance and data loading ● Optimal to query few wide tables rather than join many narrow tables together ○ E.g. data vault modeling a bit challenging from query performance point of view ● Each table requires minimum storage → more nodes → higher minimum storage
  • 22. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. In addition to data distribution managing the query queues (WLM) setup important ● Max 500 concurrent connections per cluster, but only max 50 query slots ● Each slot takes own share of the memory, 50 slots → memory split to 1/50 parts ● Can be used to control long-running (maybe not so smart) queries made by users ○ E.g. failover after 5 min to queue with less resources
  • 23. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. In Spectrum we trust ● Store part of the data in S3 (e.g. parquet + snappy), access as external table with SQL ● Separate Spectrum compute layer ● Read-only, still need to process the data into S3 and Redshift does support only CSV at the moment ● Athena and Spectrum seem to be faster if you have no joins but just single table ● VPC support not available yet https://aws.amazon.com/blogs/ big-data/10-best-practices-for- amazon-redshift-spectrum/
  • 24. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Spectrum related wish list ● VPC support ● Write/delete also to allow schema-on-write ● Redshift unload to parquet/avro ● Some control over compute or control over cost structure
  • 25. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Redshift requires still some maintenance ● Tasks taken care by AWS ○ Backups ○ Resizing ○ Node/disk replacement ○ Query caching ● Built-in maintenance processes which user controls ○ Analyze → Query optimizer needs to know tables ○ Vacuum → Sort data in correct order and free up storage for deleted data ○ Compression → Optimize table compression ● https://github.com/awslabs/amazon-redshift-utils ○ Great toolset for maintenance and reviewing system status
  • 26. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Some other tips with Redshift ● With 3-year full prepayment break-even after 1 year = commitment actually only 1 year ○ 5,12TB = 32 x dc2.xlarge = 2 x dc2.8xlarge ≈ $90k/year ○ All upfront 3-years $31k/year ● Publish directly from staging and model later → faster visible results for business users ● A lot of interesting development going on (especially Spectrum)
  • 27. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
  • 28. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Sharing your data ● APIs ● Integration tools ● BI tools ● AWS services ○ QuickSight ○ Athena ○ API Gateway ○ S3
  • 29. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Visualizing the data ● First phase to generate value of data is to visualize it ● General purpose BI/Analytics tool does not (always) cope with e.g. ○ vast amount of data ○ special visualisation need ● Right tool for the right purpose, “Mix and match” ○ PowerBI/Birst/Quicksight and custom d3.js / trending tool / Grafana / Kibana ○ Multiple data sources ■ Virtualization of data sources ■ Data catalogs and understandability
  • 30. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Visualizing the data VS.
  • 31. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Right tool for the right purpose
  • 32. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
  • 33. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Fast and slow data - same but different ● Platforms have to be able ingest both slow and fast data ○ Batches are simply not enough ○ Data streams & event-driven data loads ● Different endpoints / integrations (SFTP, HTTP REST, MQTT, data dumps) ● Different data pipelines and databases ○ Even for to same data based on usage needs ○ Orchestration of the whole becomes difficult ○ Parallelism when loading
  • 34. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
  • 35. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Managing the data flow ● Open source ○ Airflow ○ Oozie ○ Luigi ○ Jenkins ● Traditional ETL & Integration tools ● AWS services ○ Batch ○ Step ● Custom code ○ Lambda
  • 36. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Airflow ● Visualization and management of whole data load ○ SQL ○ Command line ○ Python/Java/etc. ● Suitable for batch loading ● Loads can be generated programmatically based on metadata ● Parallel/multiple loads, managing parallelism ● Load history ● Logs available directly
  • 37. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Airflow
  • 38. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Airflow
  • 39. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Airflow
  • 40. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Things to consider ● Batch vs. streaming need to be handled separately ● Airflow has some flaws ○ GUI is not always up-to-date ○ Scanning DAG statuses takes time ● If you have a lot of custom code Lambdas running at different times, how do you manage parallelism and how to monitor
  • 41. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Making Data DevOps to work
  • 42. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Data DevOps ● Target to achieve deployment processes similar to software projects ● Was not even possible earlier, because of poor support in traditional tools ● To make it effective and scalable should be metadata driven ○ Code generated based on metadata ● Need to focus in following good coding practices ● Version management for everything ○ Infrastructure as a code ○ Recursive schema changes ○ Data load changes ○ Report changes?
  • 43. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Data DevOps - Agile Data Engine ● Based on our previous experience/projects, now formalized and bundled as a product ● Enabled by AWS services, difficult to implement on-premises ● Design once, deploy multiple runtime environments ● Functionality ○ Data modelling, Load Mapping, Data Vault Automation ○ Continuous Deployment Management ○ Metadata Driven ELT Execution and Concurrency Control
  • 44. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
  • 45. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Data modeling and why data vault ● Data vault is modeling and development method ● Hub = business entity, Satellite = all details, Link = join between entities ● Well defined principles for developing, naming conventions, etc. H_ORDER S_ORDER L_CUSTOMER _ORDER H_CUSTOMER S_CUSTOMER 1 * 1 * 1* 1 *
  • 46. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Data vault is one of the key enabler for increasing speed with schema-on-write approach ● Data model split into pieces allowing loads in multiple steps/parts ● Data loads can be auto-generated ● Many-to-many links allow representing any business situation ● Built-in storing history of changes with satellite structure ● Standard development model allows easier personnel changes
  • 47. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. How to survive with the data challenges
  • 48. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. #saddata ● Data you forced to collect even though no-one wants it as a customer & no-one needs it in your business & no-one can find or utilize - Jarno Kartela, AWS Summit Stockholm, 2017 ● So basically consider what data you are collecting, it all adds some maintenance overhead and need to keep GDPR in mind
  • 49. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Handling malicious data ● Typically not considered ● Source could be 3rd party service or system which has poor data validation/handling ● Probably best to create separate landing account and run security check to the data before pushing forward
  • 50. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Simple tasks to secure data ● Encrypt ○ S3 buckets ○ RDS & Redshift ○ EBS volumes ● Just block accesses ○ Network ACL ○ Security groups ○ S3 bucket policies ● Setup notifications on changes ● Prevent opening access { "Version": "2008-10-17", "Statement": [ "Effect": "Deny", "Action": "*", "Resource": "arn:aws:s3:::my-bucket/*", "Condition": { "StringNotEqualsIfExists": { "aws:SourceVpc": "vpc-abcdefg" }, "NotIpAddressIfExists": { "aws:SourceIp": [ "1.1.1.1/32" ]
  • 51. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. There is no single data platform to answer all your needs ● How do you remove customer data from parquet files in S3 (as required in GDPR) ● How do you manage access to S3, Redshift, Tableau, etc. in centralized manner ● No centralized metadata management (maybe Glue in the future)
  • 52. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Credits Harri Kallio Tero Honko