SlideShare une entreprise Scribd logo
1  sur  34
Télécharger pour lire hors ligne
© 2020 Snowflake Inc. All Rights Reserved
FROM DATA TO INSIGHTS:
DATA ENGINEERING
MIT SNOWFLAKE
ScaleUp 360° Smart Data
29. Sept. 2020
Harald Erb | harald.erb@snowflake.com
Sr. Solutions Engineer, Central Europe
© 2020 Snowflake Computing Inc. All Rights Reserved
ABOUT ME
Sr. Solutions Engineer
Central Europe
harald.erb@snowflake.com
Llinkedin.com/in/haralderb
Enthusiastic about Business Analytics &
Data Management for 20+ years
> Consulting: Delivered large-scale Data
Warehouse and BI projects as Developer,
Information Analyst, Solution Architect,
Project Lead at Oracle D/A/CH
> Presales #2 at Snowflake in Central
Europe with focus on Modern Data
Management & Analytics
> Worked with clients on Big Data & IoT
solutions as Architect and Solutions
Engineer at Oracle EMEA, Pentaho and
Hitachi Vantara
© 2020 Snowflake Computing Inc. All Rights Reserved
AGENDA
> Snowflake Cloud Data Platform – for Data Engineering
> Solution Study: Let‘s Build Something!
> Session Takeaway
© 2020 Snowflake Inc. All Rights Reserved.
SNOWFLAKE FOR
DATA ENGINEERING
© 2020 Snowflake Computing Inc. All Rights Reserved
© 2020 Snowflake Inc. All Rights Reserved
SNOWFLAKE CLOUD DATA PLATFORM
5
OLTP DATABASES
ENTERPRISE
APPLICATIONS
THIRD-PARTY
WEB/LOG DATA
IoT
DATA MONETIZATION
OPERATIONAL
REPORTING
AD HOC ANALYSIS
REAL-TIME ANALYTICS
DATA
SOURCES
DATA
CONSUMERS
Thema heute
© 2020 Snowflake Inc. All Rights Reserved
Rethink
transformation
with robust
and integrated
data pipelines
Simplify and
accelerate your
data lake with
one platform for
all your data
Develop apps
with fast and
scalable analytics
that delight
customers
Deliver
analytics at
scale with
a modern
data warehouse
Empower your
ecosystem
to securely
collaborate
across all data
Simplify and
accelerate
machine learning
and artificial
intelligence
ONE PLATFORM, ONE COPY OF DATA,
MANY WORKLOADS
6
© 2020 Snowflake Computing Inc. All Rights Reserved
OVERCOMING DATA SILOS WITH SNOWFLAKE
Data Sources Data Consumers
Structured Data
Semi-Structured Data
Web APIs
IoT Data
Data Visualization /
Reporting
Data Science
Ad hoc Queries
Data Zones
Enterprise data in one place (as much as possible), organized (e.g. in logical Data Zones) and accessible for all users
Work Area (Exploratory, AI / ML)
Persistent, user/team space, one or more Databases
Landing Zone
Transient, ELT processes, truncate/reload
Raw
Raw data, schema-
less (JSON…): no
transformations,
matches source data
Conformed
Raw +
de-duplicated, data
type standardization
(dates)
Reference
Master data, ,
manual mappings,
Business hierarchies
Modeled
Integrated, cleansed,
modeled data (3NF,
Data Vault,
Dimensional Model)
“Data Lake" “Data Warehouse”
© 2020 Snowflake Computing Inc. All Rights Reserved
ELASTIC SERVICE, SUPPORT FOR MULTIPLE WORKLOADS
8
Continuous
Loading (4TB/day)
S3
<5min SLA
Compute Cluster
“Medium”
Batch Data Loads
& Transformations
Compute Cluster
"Large”
Compute
Cluster
"2X-Large”
Customer
Analytics &
Segmentation
Interactive
Dashboard
50% < 1s
85% < 2s
95% < 5s
Compute Cluster
Auto Scale –
”X-Large” x 5
Prod DB
Snowflake Shared Data, Multi-Cluster Architecture: All data available in a central repository,
major workloads isolated, performance on demand, and easy data access for everybody via SQL
Benefit:
Deliver Reporting
SLA’s
Benefit:
Add teams as needed,
support agile development &
a data driven culture
Benefit:
Always fresh data
Benefit:
Complete more tasks
within same time frame
Structured & Semi-structured Data at Petabyte-Scale
(all encrypted, compressed)
© 2020 Snowflake Inc. All Rights Reserved
SUPPORTING CAPABILITIES FOR DATA ENGINEERING
Thema heute
© 2020 Snowflake Inc. All Rights Reserved.
SOLUTION STUDY:
LET‘S BUILD SOMETHING!
© 2020 Snowflake Computing Inc. All Rights Reserved
© 2020 Snowflake Inc. All Rights Reserved© 2020 Snowflake Inc. All Rights Reserved
SOLUTION SCENARIOSCENARIO: INGESTING FUEL PRICE DATA FOR ANALYIS
Source: tankerkoenig.de
© 2020 Snowflake Inc. All Rights Reserved
SOLUTION
ARCHITECTURE
© 2020 Snowflake Inc. All Rights Reserved
Thema heute
© 2020 Snowflake Inc. All Rights Reserved 13
Key Steps
>Integrate with AWS S3 and connect
Snowflake via External Stage
>Create a Pipe for Automatic Data Ingestion
> Test Snowpipe with new data
SCENARIO - Part #1
DATA INGESTION WITH
SNOWPIPE
© 2020 Snowflake Inc. All Rights Reserved© 2020 Snowflake Inc. All Rights Reserved
INTEGRATE AWS S3 WITH VIA EXTERNAL STAGE
What is… a Storage Integration and (External) Stage?
> Storage Integration: is a Snowflake object that stores a generated identity and access management (IAM)
entity for external cloud storage, along with an optional set of allowed or blocked storage locations (Amazon
S3, Google Cloud Storage, or Microsoft Azure)
> (External) Stage: a Snowflake object which encapsulates all of the required information for staging files: S3
bucket where the files are staged; the named storage integration object or S3 credentials for the bucket (if it
is protected); an encryption key (if the files in the bucket have been encrypted)
v
v
SF Admin Task, typically
not done by developers!
© 2020 Snowflake Inc. All Rights Reserved© 2020 Snowflake Inc. All Rights Reserved
IDENTIFY DATA TO BE LOADED FROM EXTERNAL STAGE
v
List content of a S3 bucket directly
from Snowflake, navigate subfolder
structure.
Identify, inspect and select files to be
loaded using “ * ” and RegExp etc.
Compute statistics
on files to be loaded
into Snowflake
© 2020 Snowflake Inc. All Rights Reserved
AUTOMATIC DATA INGESTION WITH SNOWPIPE
v
v
Bulk load
command
v
Target table to be updated
What is… Snowpipe?
> Snowpipe enables loading data from files as soon as
they’re available in a stage. Data can be loaded from files
in micro-batches, making it available to users within
minutes, rather than manually executing COPY statements
on a schedule to load larger batches.
> Alternative: Clients can call public Snowpipe REST
endpoints to load data and retrieve load history reports
Source location,
external stage
(e.g. S3 Bucket)
© 2020 Snowflake Inc. All Rights Reserved© 2020 Snowflake Inc. All Rights Reserved
UPLOAD NEW DATA TO S3 & CHECK STATUS OF SNOWPIPE
v
© 2020 Snowflake Inc. All Rights Reserved© 2020 Snowflake Inc. All Rights Reserved
VALIDATE RESULT IN SNOWSIGHT DASHBOARD
© 2020 Snowflake Inc. All Rights Reserved 19
Key Steps
>Integrate AWS Lambda Function
>Automate API Calls + store Payloads (JSON)
> Implement Change Data Capture
> Automate JSON flattening + Data Loading
SCENARIO - Part #2
AUTOMATED RETRIEVAL +
PROCESSING OF API DATA
© 2020 Snowflake Inc. All Rights Reserved© 2020 Snowflake Inc. All Rights Reserved
INTEGRATE AWS LAMBDA WITH VIA EXTERNAL FUNCTION
What is… an API Integration and External Function?
> API Integration (Preview Feature): object stores information about an HTTPS proxy service, including information
about: The cloud platform provider (e.g. Amazon AWS); type of proxy service (in case the cloud platform provider
offers more than one type of proxy service); identifier and access credentials
> External Function (Preview Feature): Snowflake does not call a remote service directly. Instead, Snowflake calls
the remote service through a cloud provider’s native HTTPS proxy service, for example API Gateway on AWS
SF Admin Task, typically
not done by developers
v
© 2020 Snowflake Inc. All Rights Reserved© 2020 Snowflake Inc. All Rights Reserved
AUTOMATE API REQUESTS WITH TASK #1
V Automation Task with dedicated
compute (TASK_WAREHOUSE),
schedule and no dependencies
V
What is… a Task?
> User-defined tasks allow scheduled execution of SQL statements. Tasks run according
to a specified execution configuration, using any combination of a set interval and/or a
flexible schedule using a subset of familiar cron utility syntax.
> There is no event source that can trigger a task; instead, a task runs on a schedule,
which can be defined when creating a task (using CREATE TASK) or later
(using ALTER TASK)
© 2020 Snowflake Inc. All Rights Reserved© 2020 Snowflake Inc. All Rights Reserved
API PAYLOAD RETRIEVED
V
Fuel price data of multiple
gas stations
© 2020 Snowflake Inc. All Rights Reserved© 2020 Snowflake Inc. All Rights Reserved
API PAYLOAD RETRIEVED
V
Fuel price data of multiple
gas stations
© 2020 Snowflake Inc. All Rights Reserved© 2020 Snowflake Inc. All Rights Reserved
ACTIVATE CHANGE DATA CAPTURE WITH STREAMS
V
V
Source table where
data record changes
should be tracked
V
SQL query on a table stream
to view which records have
been added, changed, deleted
V
What is… a Stream?
> An individual table stream
tracks the changes made
to rows in a source table. A
table stream makes a
“change table” available of
what changed, at the row
level, between two
transactional points of time
in a table.
> a stream itself does not
contain any table data, it
only stores the offset for
the source table and
returns CDC records by
leveraging the versioning
history.
© 2020 Snowflake Inc. All Rights Reserved© 2020 Snowflake Inc. All Rights Reserved
AUTOMATE DELTA LOAD WITH STREAMS AND TASK #2
V
Task will only start if table stream
has new data records to process
à saves compute resources!
Only CDC data
records of interest will
be processed and then
cleared from stream
when committed
V
Lateral view and flatten table function
used to split price data by Gas Station
and store as separate records in the
target table REMOTE_FUEL_PRICES
V
© 2020 Snowflake Inc. All Rights Reserved© 2020 Snowflake Inc. All Rights Reserved
STREAM CLEARED & PRICE DATA READY FOR ANALYSIS!
New fuel prices prepared
and stored in target table
REMOTE_FUEL_PRICES
(still in JSON format)
V
V
Query of table stream returns no
rows because the stream was
cleared after successful INSERT
into target table (Auto committed)
© 2020 Snowflake Inc. All Rights Reserved 27
Key Steps
>Consolidate Data for Analysis
>Query + visualize data for a given Gas Station
in Germany
>Analyze Snowflake Consumption
SCENARIO - Part #3
DATA CONSOLIDATION +
VISUALIZATION
© 2020 Snowflake Inc. All Rights Reserved
COMBINING
HISTORIC DATA
WITH API DATA
© 2020 Snowflake Inc. All Rights Reserved
V
Reading, formatting and
joining JSON price data
directly with master data
V
Putting all together:
Historic data from
dimensional model
combined with
current price data
© 2020 Snowflake Inc. All Rights Reserved© 2020 Snowflake Inc. All Rights Reserved
ANALYSIS & VISUALIZATION FOR A GIVEN GAS STATION
© 2020 Snowflake Inc. All Rights Reserved© 2020 Snowflake Inc. All Rights Reserved
PAY AS YOU USE + BUILT-IN COST TRANSPARENCY
Snowflake Default Billing & Usage Dashboard Snowpipe Usage History queried via SQL
© 2020 Snowflake Inc. All Rights Reserved.
SESSION TAKEAWAY
© 2020 Snowflake Computing Inc. All Rights Reserved
© 2020 Snowflake Inc. All Rights Reserved
A COMPLETE AND EASY-TO-USE DATA PLATFORM
Structured Data
Semi-Structured Data
Web APIs
IoT Data
Visualization /
Reporting
Data Science
Ad hoc Queries
Data Sources Stage
Presentation /
Consumers
JSON, AVRO
(VARIANT)
Hive Metastore
Integration
External Tables
Parquet
Load/Unload
ANSI SQL
Data Lake Warehouse Aggregation
Semantic /
Federated
Elastic Multi-
Cluster Compute
Data Vault,
3NF Modeling
ACID
Transactional
Consistency
Secure Views /
Data Masking
Materialized
Views
Zero Copy
Cloning
SSO
LDAP
OAUTH
SCIM
ODBC/JDBC
Python/R/Spark
Connector
End-to-End Security (RBAC, Encryption at Rest/in Motion)
Web UI
External
Functions
Data Sharing /
Marketplace
Streams (CDC) &
Tasks (Scheduler)
Time Travel
Kafka-Connector /
Snowpipe
Stored Procs /
UDFs
Geospatial
Snowflake supports Data Lake, Data Warehouse, and Data Engineering workloads
Dimensional
Modeling
32
Information
Schema
© 2020 Snowflake Inc. All Rights Reserved
SNOWFLAKE FOR DATA ENGINEERING
ALL DATA,
ANY SPEED
BETTER PRICE &
PERFORMANCE
NO SUPER POWERS
REQUIRED
Structured & Semi-Structured Data
Batch & Continuous Data Ingestion
Partner Ecosystems
Dedicated Resources
Auto Scaling
SQL-based
Single Platform with Near-Zero
Maintenance
Streams & Tasks
© 2020 Snowflake Inc. All Rights Reserved
THANK YOU

Contenu connexe

Tendances

Data Sharing with Snowflake
Data Sharing with SnowflakeData Sharing with Snowflake
Data Sharing with Snowflake
Snowflake Computing
 

Tendances (20)

Snowflake Datawarehouse Architecturing
Snowflake Datawarehouse ArchitecturingSnowflake Datawarehouse Architecturing
Snowflake Datawarehouse Architecturing
 
Demystifying Data Warehouse as a Service
Demystifying Data Warehouse as a ServiceDemystifying Data Warehouse as a Service
Demystifying Data Warehouse as a Service
 
Master the Multi-Clustered Data Warehouse - Snowflake
Master the Multi-Clustered Data Warehouse - SnowflakeMaster the Multi-Clustered Data Warehouse - Snowflake
Master the Multi-Clustered Data Warehouse - Snowflake
 
How to Take Advantage of an Enterprise Data Warehouse in the Cloud
How to Take Advantage of an Enterprise Data Warehouse in the CloudHow to Take Advantage of an Enterprise Data Warehouse in the Cloud
How to Take Advantage of an Enterprise Data Warehouse in the Cloud
 
Snowflake Architecture.pptx
Snowflake Architecture.pptxSnowflake Architecture.pptx
Snowflake Architecture.pptx
 
Snowflake: The Good, the Bad, and the Ugly
Snowflake: The Good, the Bad, and the UglySnowflake: The Good, the Bad, and the Ugly
Snowflake: The Good, the Bad, and the Ugly
 
A 30 day plan to start ending your data struggle with Snowflake
A 30 day plan to start ending your data struggle with SnowflakeA 30 day plan to start ending your data struggle with Snowflake
A 30 day plan to start ending your data struggle with Snowflake
 
Snowflake: Your Data. No Limits (Session sponsored by Snowflake) - AWS Summit...
Snowflake: Your Data. No Limits (Session sponsored by Snowflake) - AWS Summit...Snowflake: Your Data. No Limits (Session sponsored by Snowflake) - AWS Summit...
Snowflake: Your Data. No Limits (Session sponsored by Snowflake) - AWS Summit...
 
Snowflake Data Science and AI/ML at Scale
Snowflake Data Science and AI/ML at ScaleSnowflake Data Science and AI/ML at Scale
Snowflake Data Science and AI/ML at Scale
 
Elastic Data Warehousing
Elastic Data WarehousingElastic Data Warehousing
Elastic Data Warehousing
 
Data Sharing with Snowflake
Data Sharing with SnowflakeData Sharing with Snowflake
Data Sharing with Snowflake
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)
 
Snowflake + Power BI: Cloud Analytics for Everyone
Snowflake + Power BI: Cloud Analytics for EveryoneSnowflake + Power BI: Cloud Analytics for Everyone
Snowflake + Power BI: Cloud Analytics for Everyone
 
Snowflake Best Practices for Elastic Data Warehousing
Snowflake Best Practices for Elastic Data WarehousingSnowflake Best Practices for Elastic Data Warehousing
Snowflake Best Practices for Elastic Data Warehousing
 
Snowflake free trial_lab_guide
Snowflake free trial_lab_guideSnowflake free trial_lab_guide
Snowflake free trial_lab_guide
 
Architecting a datalake
Architecting a datalakeArchitecting a datalake
Architecting a datalake
 
KSnow: Getting started with Snowflake
KSnow: Getting started with SnowflakeKSnow: Getting started with Snowflake
KSnow: Getting started with Snowflake
 
An overview of snowflake
An overview of snowflakeAn overview of snowflake
An overview of snowflake
 
DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
 
Introduction to snowflake
Introduction to snowflakeIntroduction to snowflake
Introduction to snowflake
 

Similaire à Snowflake for Data Engineering

Similaire à Snowflake for Data Engineering (20)

IBM THINK 2018 - IBM Cloud SQL Query Introduction
IBM THINK 2018 - IBM Cloud SQL Query IntroductionIBM THINK 2018 - IBM Cloud SQL Query Introduction
IBM THINK 2018 - IBM Cloud SQL Query Introduction
 
Snowflake’s Cloud Data Platform and Modern Analytics
Snowflake’s Cloud Data Platform and Modern AnalyticsSnowflake’s Cloud Data Platform and Modern Analytics
Snowflake’s Cloud Data Platform and Modern Analytics
 
IBM THINK 2020 - Cloud Data Lake with IBM Cloud Data Services
IBM THINK 2020 - Cloud Data Lake with IBM Cloud Data Services IBM THINK 2020 - Cloud Data Lake with IBM Cloud Data Services
IBM THINK 2020 - Cloud Data Lake with IBM Cloud Data Services
 
Delivering Data Democratization in the Cloud with Snowflake
Delivering Data Democratization in the Cloud with SnowflakeDelivering Data Democratization in the Cloud with Snowflake
Delivering Data Democratization in the Cloud with Snowflake
 
Continuous Data Replication into Cloud Storage with Oracle GoldenGate
Continuous Data Replication into Cloud Storage with Oracle GoldenGateContinuous Data Replication into Cloud Storage with Oracle GoldenGate
Continuous Data Replication into Cloud Storage with Oracle GoldenGate
 
Peteris Arajs - Where is my data
Peteris Arajs - Where is my dataPeteris Arajs - Where is my data
Peteris Arajs - Where is my data
 
Demystifying Data Warehousing as a Service (GLOC 2019)
Demystifying Data Warehousing as a Service (GLOC 2019)Demystifying Data Warehousing as a Service (GLOC 2019)
Demystifying Data Warehousing as a Service (GLOC 2019)
 
Loading Data into Redshift: Data Analytics Week at the SF Loft
Loading Data into Redshift: Data Analytics Week at the SF LoftLoading Data into Redshift: Data Analytics Week at the SF Loft
Loading Data into Redshift: Data Analytics Week at the SF Loft
 
ME_Snowflake_Introduction_for new students.pptx
ME_Snowflake_Introduction_for new students.pptxME_Snowflake_Introduction_for new students.pptx
ME_Snowflake_Introduction_for new students.pptx
 
From Data Warehouse to Lakehouse
From Data Warehouse to LakehouseFrom Data Warehouse to Lakehouse
From Data Warehouse to Lakehouse
 
Loading Data into Amazon Redshift
Loading Data into Amazon RedshiftLoading Data into Amazon Redshift
Loading Data into Amazon Redshift
 
Delivering rapid-fire Analytics with Snowflake and Tableau
Delivering rapid-fire Analytics with Snowflake and TableauDelivering rapid-fire Analytics with Snowflake and Tableau
Delivering rapid-fire Analytics with Snowflake and Tableau
 
Azure Data Factory for Azure Data Week
Azure Data Factory for Azure Data WeekAzure Data Factory for Azure Data Week
Azure Data Factory for Azure Data Week
 
Loading Data into Redshift: Data Analytics Week SF
Loading Data into Redshift: Data Analytics Week SFLoading Data into Redshift: Data Analytics Week SF
Loading Data into Redshift: Data Analytics Week SF
 
Cloud-based Data Lake for Analytics and AI
Cloud-based Data Lake for Analytics and AICloud-based Data Lake for Analytics and AI
Cloud-based Data Lake for Analytics and AI
 
Building Serverless ETL Pipelines
Building Serverless ETL PipelinesBuilding Serverless ETL Pipelines
Building Serverless ETL Pipelines
 
Loading Data into Redshift
Loading Data into RedshiftLoading Data into Redshift
Loading Data into Redshift
 
Loading Data into Redshift
Loading Data into RedshiftLoading Data into Redshift
Loading Data into Redshift
 
MongoDB .local Houston 2019: Wide Ranging Analytical Solutions on MongoDB
MongoDB .local Houston 2019: Wide Ranging Analytical Solutions on MongoDBMongoDB .local Houston 2019: Wide Ranging Analytical Solutions on MongoDB
MongoDB .local Houston 2019: Wide Ranging Analytical Solutions on MongoDB
 
Laboratorio práctico: Data warehouse en la nube
Laboratorio práctico: Data warehouse en la nubeLaboratorio práctico: Data warehouse en la nube
Laboratorio práctico: Data warehouse en la nube
 

Plus de Harald Erb

Plus de Harald Erb (11)

Dataiku & Snowflake Meetup Berlin 2020
Dataiku & Snowflake Meetup Berlin 2020Dataiku & Snowflake Meetup Berlin 2020
Dataiku & Snowflake Meetup Berlin 2020
 
Does it only have to be ML + AI?
Does it only have to be ML + AI?Does it only have to be ML + AI?
Does it only have to be ML + AI?
 
Machine Learning - Eine Challenge für Architekten
Machine Learning - Eine Challenge für ArchitektenMachine Learning - Eine Challenge für Architekten
Machine Learning - Eine Challenge für Architekten
 
DOAG Big Data Days 2017 - Cloud Journey
DOAG Big Data Days 2017 - Cloud JourneyDOAG Big Data Days 2017 - Cloud Journey
DOAG Big Data Days 2017 - Cloud Journey
 
Do you know what k-Means? Cluster-Analysen
Do you know what k-Means? Cluster-Analysen Do you know what k-Means? Cluster-Analysen
Do you know what k-Means? Cluster-Analysen
 
Exploratory Analysis in the Data Lab - Team-Sport or for Nerds only?
Exploratory Analysis in the Data Lab - Team-Sport or for Nerds only?Exploratory Analysis in the Data Lab - Team-Sport or for Nerds only?
Exploratory Analysis in the Data Lab - Team-Sport or for Nerds only?
 
Big Data Discovery + Analytics = Datengetriebene Innovation!
Big Data Discovery + Analytics = Datengetriebene Innovation!Big Data Discovery + Analytics = Datengetriebene Innovation!
Big Data Discovery + Analytics = Datengetriebene Innovation!
 
Big Data Discovery
Big Data DiscoveryBig Data Discovery
Big Data Discovery
 
DOAG News 2012 - Analytische Mehrwerte mit Big Data
DOAG News 2012 - Analytische Mehrwerte mit Big DataDOAG News 2012 - Analytische Mehrwerte mit Big Data
DOAG News 2012 - Analytische Mehrwerte mit Big Data
 
Oracle Unified Information Architeture + Analytics by Example
Oracle Unified Information Architeture + Analytics by ExampleOracle Unified Information Architeture + Analytics by Example
Oracle Unified Information Architeture + Analytics by Example
 
Endeca Web Acquisition Toolkit - Integration verteilter Web-Anwendungen und a...
Endeca Web Acquisition Toolkit - Integration verteilter Web-Anwendungen und a...Endeca Web Acquisition Toolkit - Integration verteilter Web-Anwendungen und a...
Endeca Web Acquisition Toolkit - Integration verteilter Web-Anwendungen und a...
 

Dernier

Lake Town / Independent Kolkata Call Girls Phone No 8005736733 Elite Escort S...
Lake Town / Independent Kolkata Call Girls Phone No 8005736733 Elite Escort S...Lake Town / Independent Kolkata Call Girls Phone No 8005736733 Elite Escort S...
Lake Town / Independent Kolkata Call Girls Phone No 8005736733 Elite Escort S...
HyderabadDolls
 
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
gajnagarg
 
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
gajnagarg
 
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
HyderabadDolls
 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
nirzagarg
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
nirzagarg
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
nirzagarg
 

Dernier (20)

Lake Town / Independent Kolkata Call Girls Phone No 8005736733 Elite Escort S...
Lake Town / Independent Kolkata Call Girls Phone No 8005736733 Elite Escort S...Lake Town / Independent Kolkata Call Girls Phone No 8005736733 Elite Escort S...
Lake Town / Independent Kolkata Call Girls Phone No 8005736733 Elite Escort S...
 
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
 
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
 
Digital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham WareDigital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham Ware
 
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
 
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
 
Identify Customer Segments to Create Customer Offers for Each Segment - Appli...
Identify Customer Segments to Create Customer Offers for Each Segment - Appli...Identify Customer Segments to Create Customer Offers for Each Segment - Appli...
Identify Customer Segments to Create Customer Offers for Each Segment - Appli...
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
 
Statistics notes ,it includes mean to index numbers
Statistics notes ,it includes mean to index numbersStatistics notes ,it includes mean to index numbers
Statistics notes ,it includes mean to index numbers
 
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowVadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
 
Ranking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRanking and Scoring Exercises for Research
Ranking and Scoring Exercises for Research
 
7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt
 
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - Almora
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
 
Vastral Call Girls Book Now 7737669865 Top Class Escort Service Available
Vastral Call Girls Book Now 7737669865 Top Class Escort Service AvailableVastral Call Girls Book Now 7737669865 Top Class Escort Service Available
Vastral Call Girls Book Now 7737669865 Top Class Escort Service Available
 
Giridih Escorts Service Girl ^ 9332606886, WhatsApp Anytime Giridih
Giridih Escorts Service Girl ^ 9332606886, WhatsApp Anytime GiridihGiridih Escorts Service Girl ^ 9332606886, WhatsApp Anytime Giridih
Giridih Escorts Service Girl ^ 9332606886, WhatsApp Anytime Giridih
 

Snowflake for Data Engineering

  • 1. © 2020 Snowflake Inc. All Rights Reserved FROM DATA TO INSIGHTS: DATA ENGINEERING MIT SNOWFLAKE ScaleUp 360° Smart Data 29. Sept. 2020 Harald Erb | harald.erb@snowflake.com Sr. Solutions Engineer, Central Europe
  • 2. © 2020 Snowflake Computing Inc. All Rights Reserved ABOUT ME Sr. Solutions Engineer Central Europe harald.erb@snowflake.com Llinkedin.com/in/haralderb Enthusiastic about Business Analytics & Data Management for 20+ years > Consulting: Delivered large-scale Data Warehouse and BI projects as Developer, Information Analyst, Solution Architect, Project Lead at Oracle D/A/CH > Presales #2 at Snowflake in Central Europe with focus on Modern Data Management & Analytics > Worked with clients on Big Data & IoT solutions as Architect and Solutions Engineer at Oracle EMEA, Pentaho and Hitachi Vantara
  • 3. © 2020 Snowflake Computing Inc. All Rights Reserved AGENDA > Snowflake Cloud Data Platform – for Data Engineering > Solution Study: Let‘s Build Something! > Session Takeaway
  • 4. © 2020 Snowflake Inc. All Rights Reserved. SNOWFLAKE FOR DATA ENGINEERING © 2020 Snowflake Computing Inc. All Rights Reserved
  • 5. © 2020 Snowflake Inc. All Rights Reserved SNOWFLAKE CLOUD DATA PLATFORM 5 OLTP DATABASES ENTERPRISE APPLICATIONS THIRD-PARTY WEB/LOG DATA IoT DATA MONETIZATION OPERATIONAL REPORTING AD HOC ANALYSIS REAL-TIME ANALYTICS DATA SOURCES DATA CONSUMERS Thema heute
  • 6. © 2020 Snowflake Inc. All Rights Reserved Rethink transformation with robust and integrated data pipelines Simplify and accelerate your data lake with one platform for all your data Develop apps with fast and scalable analytics that delight customers Deliver analytics at scale with a modern data warehouse Empower your ecosystem to securely collaborate across all data Simplify and accelerate machine learning and artificial intelligence ONE PLATFORM, ONE COPY OF DATA, MANY WORKLOADS 6
  • 7. © 2020 Snowflake Computing Inc. All Rights Reserved OVERCOMING DATA SILOS WITH SNOWFLAKE Data Sources Data Consumers Structured Data Semi-Structured Data Web APIs IoT Data Data Visualization / Reporting Data Science Ad hoc Queries Data Zones Enterprise data in one place (as much as possible), organized (e.g. in logical Data Zones) and accessible for all users Work Area (Exploratory, AI / ML) Persistent, user/team space, one or more Databases Landing Zone Transient, ELT processes, truncate/reload Raw Raw data, schema- less (JSON…): no transformations, matches source data Conformed Raw + de-duplicated, data type standardization (dates) Reference Master data, , manual mappings, Business hierarchies Modeled Integrated, cleansed, modeled data (3NF, Data Vault, Dimensional Model) “Data Lake" “Data Warehouse”
  • 8. © 2020 Snowflake Computing Inc. All Rights Reserved ELASTIC SERVICE, SUPPORT FOR MULTIPLE WORKLOADS 8 Continuous Loading (4TB/day) S3 <5min SLA Compute Cluster “Medium” Batch Data Loads & Transformations Compute Cluster "Large” Compute Cluster "2X-Large” Customer Analytics & Segmentation Interactive Dashboard 50% < 1s 85% < 2s 95% < 5s Compute Cluster Auto Scale – ”X-Large” x 5 Prod DB Snowflake Shared Data, Multi-Cluster Architecture: All data available in a central repository, major workloads isolated, performance on demand, and easy data access for everybody via SQL Benefit: Deliver Reporting SLA’s Benefit: Add teams as needed, support agile development & a data driven culture Benefit: Always fresh data Benefit: Complete more tasks within same time frame Structured & Semi-structured Data at Petabyte-Scale (all encrypted, compressed)
  • 9. © 2020 Snowflake Inc. All Rights Reserved SUPPORTING CAPABILITIES FOR DATA ENGINEERING Thema heute
  • 10. © 2020 Snowflake Inc. All Rights Reserved. SOLUTION STUDY: LET‘S BUILD SOMETHING! © 2020 Snowflake Computing Inc. All Rights Reserved
  • 11. © 2020 Snowflake Inc. All Rights Reserved© 2020 Snowflake Inc. All Rights Reserved SOLUTION SCENARIOSCENARIO: INGESTING FUEL PRICE DATA FOR ANALYIS Source: tankerkoenig.de
  • 12. © 2020 Snowflake Inc. All Rights Reserved SOLUTION ARCHITECTURE © 2020 Snowflake Inc. All Rights Reserved Thema heute
  • 13. © 2020 Snowflake Inc. All Rights Reserved 13 Key Steps >Integrate with AWS S3 and connect Snowflake via External Stage >Create a Pipe for Automatic Data Ingestion > Test Snowpipe with new data SCENARIO - Part #1 DATA INGESTION WITH SNOWPIPE
  • 14. © 2020 Snowflake Inc. All Rights Reserved© 2020 Snowflake Inc. All Rights Reserved INTEGRATE AWS S3 WITH VIA EXTERNAL STAGE What is… a Storage Integration and (External) Stage? > Storage Integration: is a Snowflake object that stores a generated identity and access management (IAM) entity for external cloud storage, along with an optional set of allowed or blocked storage locations (Amazon S3, Google Cloud Storage, or Microsoft Azure) > (External) Stage: a Snowflake object which encapsulates all of the required information for staging files: S3 bucket where the files are staged; the named storage integration object or S3 credentials for the bucket (if it is protected); an encryption key (if the files in the bucket have been encrypted) v v SF Admin Task, typically not done by developers!
  • 15. © 2020 Snowflake Inc. All Rights Reserved© 2020 Snowflake Inc. All Rights Reserved IDENTIFY DATA TO BE LOADED FROM EXTERNAL STAGE v List content of a S3 bucket directly from Snowflake, navigate subfolder structure. Identify, inspect and select files to be loaded using “ * ” and RegExp etc. Compute statistics on files to be loaded into Snowflake
  • 16. © 2020 Snowflake Inc. All Rights Reserved AUTOMATIC DATA INGESTION WITH SNOWPIPE v v Bulk load command v Target table to be updated What is… Snowpipe? > Snowpipe enables loading data from files as soon as they’re available in a stage. Data can be loaded from files in micro-batches, making it available to users within minutes, rather than manually executing COPY statements on a schedule to load larger batches. > Alternative: Clients can call public Snowpipe REST endpoints to load data and retrieve load history reports Source location, external stage (e.g. S3 Bucket)
  • 17. © 2020 Snowflake Inc. All Rights Reserved© 2020 Snowflake Inc. All Rights Reserved UPLOAD NEW DATA TO S3 & CHECK STATUS OF SNOWPIPE v
  • 18. © 2020 Snowflake Inc. All Rights Reserved© 2020 Snowflake Inc. All Rights Reserved VALIDATE RESULT IN SNOWSIGHT DASHBOARD
  • 19. © 2020 Snowflake Inc. All Rights Reserved 19 Key Steps >Integrate AWS Lambda Function >Automate API Calls + store Payloads (JSON) > Implement Change Data Capture > Automate JSON flattening + Data Loading SCENARIO - Part #2 AUTOMATED RETRIEVAL + PROCESSING OF API DATA
  • 20. © 2020 Snowflake Inc. All Rights Reserved© 2020 Snowflake Inc. All Rights Reserved INTEGRATE AWS LAMBDA WITH VIA EXTERNAL FUNCTION What is… an API Integration and External Function? > API Integration (Preview Feature): object stores information about an HTTPS proxy service, including information about: The cloud platform provider (e.g. Amazon AWS); type of proxy service (in case the cloud platform provider offers more than one type of proxy service); identifier and access credentials > External Function (Preview Feature): Snowflake does not call a remote service directly. Instead, Snowflake calls the remote service through a cloud provider’s native HTTPS proxy service, for example API Gateway on AWS SF Admin Task, typically not done by developers v
  • 21. © 2020 Snowflake Inc. All Rights Reserved© 2020 Snowflake Inc. All Rights Reserved AUTOMATE API REQUESTS WITH TASK #1 V Automation Task with dedicated compute (TASK_WAREHOUSE), schedule and no dependencies V What is… a Task? > User-defined tasks allow scheduled execution of SQL statements. Tasks run according to a specified execution configuration, using any combination of a set interval and/or a flexible schedule using a subset of familiar cron utility syntax. > There is no event source that can trigger a task; instead, a task runs on a schedule, which can be defined when creating a task (using CREATE TASK) or later (using ALTER TASK)
  • 22. © 2020 Snowflake Inc. All Rights Reserved© 2020 Snowflake Inc. All Rights Reserved API PAYLOAD RETRIEVED V Fuel price data of multiple gas stations
  • 23. © 2020 Snowflake Inc. All Rights Reserved© 2020 Snowflake Inc. All Rights Reserved API PAYLOAD RETRIEVED V Fuel price data of multiple gas stations
  • 24. © 2020 Snowflake Inc. All Rights Reserved© 2020 Snowflake Inc. All Rights Reserved ACTIVATE CHANGE DATA CAPTURE WITH STREAMS V V Source table where data record changes should be tracked V SQL query on a table stream to view which records have been added, changed, deleted V What is… a Stream? > An individual table stream tracks the changes made to rows in a source table. A table stream makes a “change table” available of what changed, at the row level, between two transactional points of time in a table. > a stream itself does not contain any table data, it only stores the offset for the source table and returns CDC records by leveraging the versioning history.
  • 25. © 2020 Snowflake Inc. All Rights Reserved© 2020 Snowflake Inc. All Rights Reserved AUTOMATE DELTA LOAD WITH STREAMS AND TASK #2 V Task will only start if table stream has new data records to process à saves compute resources! Only CDC data records of interest will be processed and then cleared from stream when committed V Lateral view and flatten table function used to split price data by Gas Station and store as separate records in the target table REMOTE_FUEL_PRICES V
  • 26. © 2020 Snowflake Inc. All Rights Reserved© 2020 Snowflake Inc. All Rights Reserved STREAM CLEARED & PRICE DATA READY FOR ANALYSIS! New fuel prices prepared and stored in target table REMOTE_FUEL_PRICES (still in JSON format) V V Query of table stream returns no rows because the stream was cleared after successful INSERT into target table (Auto committed)
  • 27. © 2020 Snowflake Inc. All Rights Reserved 27 Key Steps >Consolidate Data for Analysis >Query + visualize data for a given Gas Station in Germany >Analyze Snowflake Consumption SCENARIO - Part #3 DATA CONSOLIDATION + VISUALIZATION
  • 28. © 2020 Snowflake Inc. All Rights Reserved COMBINING HISTORIC DATA WITH API DATA © 2020 Snowflake Inc. All Rights Reserved V Reading, formatting and joining JSON price data directly with master data V Putting all together: Historic data from dimensional model combined with current price data
  • 29. © 2020 Snowflake Inc. All Rights Reserved© 2020 Snowflake Inc. All Rights Reserved ANALYSIS & VISUALIZATION FOR A GIVEN GAS STATION
  • 30. © 2020 Snowflake Inc. All Rights Reserved© 2020 Snowflake Inc. All Rights Reserved PAY AS YOU USE + BUILT-IN COST TRANSPARENCY Snowflake Default Billing & Usage Dashboard Snowpipe Usage History queried via SQL
  • 31. © 2020 Snowflake Inc. All Rights Reserved. SESSION TAKEAWAY © 2020 Snowflake Computing Inc. All Rights Reserved
  • 32. © 2020 Snowflake Inc. All Rights Reserved A COMPLETE AND EASY-TO-USE DATA PLATFORM Structured Data Semi-Structured Data Web APIs IoT Data Visualization / Reporting Data Science Ad hoc Queries Data Sources Stage Presentation / Consumers JSON, AVRO (VARIANT) Hive Metastore Integration External Tables Parquet Load/Unload ANSI SQL Data Lake Warehouse Aggregation Semantic / Federated Elastic Multi- Cluster Compute Data Vault, 3NF Modeling ACID Transactional Consistency Secure Views / Data Masking Materialized Views Zero Copy Cloning SSO LDAP OAUTH SCIM ODBC/JDBC Python/R/Spark Connector End-to-End Security (RBAC, Encryption at Rest/in Motion) Web UI External Functions Data Sharing / Marketplace Streams (CDC) & Tasks (Scheduler) Time Travel Kafka-Connector / Snowpipe Stored Procs / UDFs Geospatial Snowflake supports Data Lake, Data Warehouse, and Data Engineering workloads Dimensional Modeling 32 Information Schema
  • 33. © 2020 Snowflake Inc. All Rights Reserved SNOWFLAKE FOR DATA ENGINEERING ALL DATA, ANY SPEED BETTER PRICE & PERFORMANCE NO SUPER POWERS REQUIRED Structured & Semi-Structured Data Batch & Continuous Data Ingestion Partner Ecosystems Dedicated Resources Auto Scaling SQL-based Single Platform with Near-Zero Maintenance Streams & Tasks
  • 34. © 2020 Snowflake Inc. All Rights Reserved THANK YOU