SlideShare une entreprise Scribd logo
1  sur  23
 Snowflake is a Cloud Data warehouse provided as SaaS with full support of ANSI
SQL, also includes both Structural and Semi-Structure data.
 Enables Users to CreateTables, Start Querying data with less administration.
 Offers bothTraditional Share disk and Shared Nothing architecture to offer the
best of both.
Shared Nothing Architecture Shared disk Architecture
 Snowflake facilitates Unlimited Storage Scalabilty without refactoring and Multiple
Clusters can read or write share data, ResizeClusters Instantly – no downtime is
involved.
 FullTransaction consistencyACID Across entire System.
 Centrally Manage Logical assets such as Servers, buckets etc.
Snowflake Support by 3 different layers
 Storage layer, Compute and Cloud Service
 Snowflake process queries using MPP Concept such as each node has parts of the
data stored locally while using a central data repository to store the data that is
accessible by all compute nodes.
 Snowflake Architecture consist of 3 layers
1. Data Storage
2. Query Processing
3. Cloud Services.
Database Storage Layer:
 Snowflake Organize data into multiple micro partition that are internally
optimized and compressed. It uses columnar format to store. Data stored in
Cloud Storage and works as Shared disk model which provides a simplicity in
data management.
 Compute Node connect with Storage layer to fetch data for querying as the
Storage layer is independent.This allows as Snowflake is provisioned on Cloud,
there by Storage is elastic resulting the user only pay perTB every month.
Query Layer
 Snowflake usesVirtualWarehouse for running the query.The Phenomenal of
snowflake that separate the query processing layer from disk storage.
Cloud Service Layer:
AllActivities Such as Authentication, Security, Meta Management of loaded data and
Query optimizer that coordinates across this layer.
Benefits:
Cloud Services :
 Multi-tenant, transactional and Secure
 Runs in AWS Cloud
 Million of Queries per day over
petabytes of data.
 Replicated for Availability and Scalability
 Focus on easy of use and service
experience
 Collection of services such as Access-
Control,QueryOptimizer and
transactional Manager
1. Extract data from oracle to CSV using SQL Plus
2. Data type conversation and other transformations.
3. Staging files to S3
4. Finally Copy Staged files to Snowflake tables.
Step 1: Code: --Turn on the spool
spool spool file.txt
select * from dba_table;
spool off
Note : Spool file will not be available until it is turned off.
#!/usr/bin/bash
FILE="students.csv"
sqlplus -s user_name/password@oracle_db <<EOF
SET PAGESIZE 35000
SET COLSEP "|"
SET LINESIZE 230
SET FEEDBACKOFF
SPOOL $FILE
SELECT * FROM EMP;
SPOOLOFF
EXIT
EOF#!/usr/bin/bash
FILE="emp.csv"
sqlplus -s scott/tiger@XE <<EOF
SET PAGESIZE 50000
SET COLSEP ","
SET LINESIZE 200
SET FEEDBACKOFF
SPOOL $FILE
SELECT * FROM STUDENTS;
SPOOLOFF
EXIT
EOF
Step 1.2:
 For incremental load,we need
to generate sql with proper
condition to select only the
records which we are modified
after the last data pull.
Query : select * from students
where last_modified_time >
last_pull_time and
last_modified_time <=
sys_time.
Step 2 : Below are the recommendation for
transferring data type conversation from oracle
to snowflake.
Step 3:
 To load data to Snowflake, the
data needs to be upload to s3
loaction (step 2 explains about
extract of oracle to flat files)
 We need a Snowflake instance
which runs on AWS.This
instance needs to have the
ability to access the S3 files in
AWS.
 This access can be either
internal or external and this
process is called Staging
Create Internal Staging :
create or replace stage my_oracle_stage
copy_options= (on_error='skip_file')
file_format= (type = 'CSV' field_delimiter = ','
skip_header = 1);
Use below PUT command to stage files to
internal Snowflake stage
PUT file://path_to_your_file/your_filename
internal_stage_name
Upload a file items_data.csv in the
/tmp/oracle_data/data/ directory to an
internal stage named oracle_stage
put
ile:////tmp/oracle_data/data/items_data.cs
v @oracle_stage;
Ref :
https://docs.snowflake.net/manuals/sql-reference/sql/put.html
Step3: (External Staging options)
 Snowflake supports any
accessibleAmazon S3 or
MicrosoftAzure as an external
staging location.You can create
a stage to pointing to the
location data can be loaded
directly to the Snowflake table
through that stage. No need to
move the data to an internal
stage
 create an external stage
pointing to an S3 location, IAM
credentials with proper access
permissions are required
If data needs to be decrypted before loading
to Snowflake, proper keys are to be
provided.
create or replace stage oracle_ext_stage
url='s3://snowflake_oracle/data/load/files/'
credentials=(aws_key_id='1d318jnsonmb5#d
gd4rrb3c'
aws_secret_key='aii998nnrcd4kx5y6z');
encryption=(master_key =
'eSxX0jzskjl22bNaaaDuOaO8=');
Once data is extracted from Oracle it can be
uploaded to S3 using the direct upload
option or usingAWS SDK in your favourite
programming language. Python’s boto3 is
a popular one used under such
circumstances. Once data is in S3, an external
stage can be created to point that location
Step 4: Copy staged files to
Snowflake table
 Extracted data from Oracle,
uploaded it to an S3 location
and created an external
Snowflake stage pointing to
that location.The next step is
to copy data to the table.The
command used to do this
is COPY INTO. Note:To execute
the COPY INTO command,
compute resources in
Snowflake virtual warehouses
are required and your
Snowflake credits will be
utilized.
• To load from a named internal
copy into oracle_table
from @oracle_stage;
• Loading from the external stage. Only one file is
specified.
copy into my_ext_stage_table from
@oracle_ext_stage/tutorials/dataloading/items_ext
.csv;
• A copy directly from an external location without
creating a stage
copy into oracle_table from
s3://mybucket/oracle_snow/data/files
credentials=(aws_key_id='$AWS_ACCESS_KEY_
ID'
aws_secret_key='$AWS_SECRET_ACCESS_KE
Y') encryption=(master_key =
'eSxX009jhh76jkIuLPH5r4BD09wOaO8=')
file_format = (format_name = csv_format);
Files can be specified using patterns
copy into oracle_pattern_table
from @oracle_stage file_format =
(type = 'TSV')
pattern='.*/.*/.*[.]csv[.]gz';
Step 4: Update SnowflakeTable
The basic idea is to load incrementally
extracted data into an intermediate
or temporary table and modify
records in the final table with data in
the intermediate table.The three
methods mentioned below are
generally used for this.
1. Update the rows in the target table
with new data (with same keys).
Then insert new rows from the intermediate or
landing table which are not in the final table
UPDATE oracle_target_table t SET t.value =
s.value FROM landing_delta_table in
WHERE t.id = in.id;
INSERT INTO oracle_target_table (id, value)
SELECT id, value
FROM landing_delta_table WHERE NOT
id IN (SELECT id FROM
oracle_target_table);
2. Delete rows from the target table which are
also in the landing table. Then insert all
rows from the landing table to the final
table. Now, the final table will have the
latest data without duplicates
DELETE .oracle_target_table f WHERE f.id IN
(SELECT id from landing_table); INSERT
oracle_target_table (id, value) SELECT id,
value FROM landing_table;
Files can be specified using patterns
copy into oracle_pattern_table
from @oracle_stage file_format =
(type = 'TSV')
pattern='.*/.*/.*[.]csv[.]gz';
Step 4: Update SnowflakeTable
The basic idea is to load incrementally
extracted data into an intermediate
or temporary table and modify
records in the final table with data in
the intermediate table.The three
methods mentioned below are
generally used for this.
1. Update the rows in the target table
with new data (with same keys).
Then insert new rows from the intermediate or
landing table which are not in the final table
UPDATE oracle_target_table t SET t.value =
s.value FROM landing_delta_table in
WHERE t.id = in.id;
INSERT INTO oracle_target_table (id, value)
SELECT id, value
FROM landing_delta_table WHERE NOT
id IN (SELECT id FROM
oracle_target_table);
2. Delete rows from the target table which are
also in the landing table. Then insert all
rows from the landing table to the final
table. Now, the final table will have the
latest data without duplicates
DELETE .oracle_target_table f WHERE f.id IN
(SELECT id from landing_table); INSERT
oracle_target_table (id, value) SELECT id,
value FROM landing_table;
3. MERGE Statement – Standard SQL
merge statement which combines
Inserts and updates. It is used to
apply changes in the landing table
to the target table with one SQL
statement
MERGE into oracle_target_table
t1 using landing_delta_table t2 on
t1.id = t2.idWHEN matched then
update set value = t2.value WHEN
not matched then INSERT (id, value)
values (t2.id, t2.value);
This method works when you have a
comfortable project timeline and a
pool of experienced engineering
resources that can build and
maintain the pipeline. However, the
method mentioned above comes
with a lot of coding and
maintenance overhead
Ref :
https://hevodata.com/blog/oracle-to-snowflake-etl/
Q&A
https://www.analytics.today/blog/top-10-reasons-snowflake-rocks
https://www.g2.com/reports/grid-report-for-data-warehouse-fall-
2019?featured=snowflake&secure%5Bgated_consumer%5D=0043e810-90c1-4257-a24a-
f7a3b7e6b1c3&secure%5Btoken%5D=04647245837d1e63f5d46e942153e0beed97b18b25f466
db19d0c54901467747&utm_campaign=gate-768549
Q&A

Contenu connexe

Tendances

Snowflake Company Presentation
Snowflake Company PresentationSnowflake Company Presentation
Snowflake Company PresentationAndrewJiang18
 
Snowflake: The Good, the Bad, and the Ugly
Snowflake: The Good, the Bad, and the UglySnowflake: The Good, the Bad, and the Ugly
Snowflake: The Good, the Bad, and the UglyTyler Wishnoff
 
Demystifying Data Warehouse as a Service
Demystifying Data Warehouse as a ServiceDemystifying Data Warehouse as a Service
Demystifying Data Warehouse as a ServiceSnowflake Computing
 
Snowflake free trial_lab_guide
Snowflake free trial_lab_guideSnowflake free trial_lab_guide
Snowflake free trial_lab_guideslidedown1
 
Building a modern data warehouse
Building a modern data warehouseBuilding a modern data warehouse
Building a modern data warehouseJames Serra
 
Snowflake SnowPro Certification Exam Cheat Sheet
Snowflake SnowPro Certification Exam Cheat SheetSnowflake SnowPro Certification Exam Cheat Sheet
Snowflake SnowPro Certification Exam Cheat SheetJeno Yamma
 
Snowflake SnowPro Core Cert CheatSheet.pdf
Snowflake SnowPro Core Cert CheatSheet.pdfSnowflake SnowPro Core Cert CheatSheet.pdf
Snowflake SnowPro Core Cert CheatSheet.pdfDustin Liu
 
Changing the game with cloud dw
Changing the game with cloud dwChanging the game with cloud dw
Changing the game with cloud dwelephantscale
 
Get Savvy with Snowflake
Get Savvy with SnowflakeGet Savvy with Snowflake
Get Savvy with SnowflakeMatillion
 
Data Warehouse - Incremental Migration to the Cloud
Data Warehouse - Incremental Migration to the CloudData Warehouse - Incremental Migration to the Cloud
Data Warehouse - Incremental Migration to the CloudMichael Rainey
 
Snowflake Data Governance
Snowflake Data GovernanceSnowflake Data Governance
Snowflake Data Governancessuser538b022
 
Snowflake Best Practices for Elastic Data Warehousing
Snowflake Best Practices for Elastic Data WarehousingSnowflake Best Practices for Elastic Data Warehousing
Snowflake Best Practices for Elastic Data WarehousingAmazon Web Services
 
A 30 day plan to start ending your data struggle with Snowflake
A 30 day plan to start ending your data struggle with SnowflakeA 30 day plan to start ending your data struggle with Snowflake
A 30 day plan to start ending your data struggle with SnowflakeSnowflake Computing
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)James Serra
 
AWS Summit Singapore 2019 | Snowflake: Your Data. No Limits
AWS Summit Singapore 2019 | Snowflake: Your Data. No LimitsAWS Summit Singapore 2019 | Snowflake: Your Data. No Limits
AWS Summit Singapore 2019 | Snowflake: Your Data. No LimitsAWS Summits
 
Modernizing to a Cloud Data Architecture
Modernizing to a Cloud Data ArchitectureModernizing to a Cloud Data Architecture
Modernizing to a Cloud Data ArchitectureDatabricks
 

Tendances (20)

Snowflake Company Presentation
Snowflake Company PresentationSnowflake Company Presentation
Snowflake Company Presentation
 
Snowflake: The Good, the Bad, and the Ugly
Snowflake: The Good, the Bad, and the UglySnowflake: The Good, the Bad, and the Ugly
Snowflake: The Good, the Bad, and the Ugly
 
Snowflake Overview
Snowflake OverviewSnowflake Overview
Snowflake Overview
 
Demystifying Data Warehouse as a Service
Demystifying Data Warehouse as a ServiceDemystifying Data Warehouse as a Service
Demystifying Data Warehouse as a Service
 
Snowflake free trial_lab_guide
Snowflake free trial_lab_guideSnowflake free trial_lab_guide
Snowflake free trial_lab_guide
 
Snowflake Architecture
Snowflake ArchitectureSnowflake Architecture
Snowflake Architecture
 
Building a modern data warehouse
Building a modern data warehouseBuilding a modern data warehouse
Building a modern data warehouse
 
Snowflake SnowPro Certification Exam Cheat Sheet
Snowflake SnowPro Certification Exam Cheat SheetSnowflake SnowPro Certification Exam Cheat Sheet
Snowflake SnowPro Certification Exam Cheat Sheet
 
Snowflake SnowPro Core Cert CheatSheet.pdf
Snowflake SnowPro Core Cert CheatSheet.pdfSnowflake SnowPro Core Cert CheatSheet.pdf
Snowflake SnowPro Core Cert CheatSheet.pdf
 
Changing the game with cloud dw
Changing the game with cloud dwChanging the game with cloud dw
Changing the game with cloud dw
 
Get Savvy with Snowflake
Get Savvy with SnowflakeGet Savvy with Snowflake
Get Savvy with Snowflake
 
Data Warehouse - Incremental Migration to the Cloud
Data Warehouse - Incremental Migration to the CloudData Warehouse - Incremental Migration to the Cloud
Data Warehouse - Incremental Migration to the Cloud
 
Snowflake Data Governance
Snowflake Data GovernanceSnowflake Data Governance
Snowflake Data Governance
 
Snowflake Best Practices for Elastic Data Warehousing
Snowflake Best Practices for Elastic Data WarehousingSnowflake Best Practices for Elastic Data Warehousing
Snowflake Best Practices for Elastic Data Warehousing
 
Elastic Data Warehousing
Elastic Data WarehousingElastic Data Warehousing
Elastic Data Warehousing
 
A 30 day plan to start ending your data struggle with Snowflake
A 30 day plan to start ending your data struggle with SnowflakeA 30 day plan to start ending your data struggle with Snowflake
A 30 day plan to start ending your data struggle with Snowflake
 
Snowflake Datawarehouse Architecturing
Snowflake Datawarehouse ArchitecturingSnowflake Datawarehouse Architecturing
Snowflake Datawarehouse Architecturing
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)
 
AWS Summit Singapore 2019 | Snowflake: Your Data. No Limits
AWS Summit Singapore 2019 | Snowflake: Your Data. No LimitsAWS Summit Singapore 2019 | Snowflake: Your Data. No Limits
AWS Summit Singapore 2019 | Snowflake: Your Data. No Limits
 
Modernizing to a Cloud Data Architecture
Modernizing to a Cloud Data ArchitectureModernizing to a Cloud Data Architecture
Modernizing to a Cloud Data Architecture
 

Similaire à An overview of snowflake

ASP.Net Presentation Part2
ASP.Net Presentation Part2ASP.Net Presentation Part2
ASP.Net Presentation Part2Neeraj Mathur
 
Ms sql server architecture
Ms sql server architectureMs sql server architecture
Ms sql server architectureAjeet Singh
 
Whitepaper To Study Filestream Option In Sql Server
Whitepaper To Study Filestream Option In Sql ServerWhitepaper To Study Filestream Option In Sql Server
Whitepaper To Study Filestream Option In Sql ServerShahzad
 
PostgreSQL Database Slides
PostgreSQL Database SlidesPostgreSQL Database Slides
PostgreSQL Database Slidesmetsarin
 
ME_Snowflake_Introduction_for new students.pptx
ME_Snowflake_Introduction_for new students.pptxME_Snowflake_Introduction_for new students.pptx
ME_Snowflake_Introduction_for new students.pptxSamuel168738
 
OOW16 - Oracle Database 12c - The Best Oracle Database 12c New Features for D...
OOW16 - Oracle Database 12c - The Best Oracle Database 12c New Features for D...OOW16 - Oracle Database 12c - The Best Oracle Database 12c New Features for D...
OOW16 - Oracle Database 12c - The Best Oracle Database 12c New Features for D...Alex Zaballa
 
OOW16 - Oracle Database 12c - The Best Oracle Database 12c New Features for D...
OOW16 - Oracle Database 12c - The Best Oracle Database 12c New Features for D...OOW16 - Oracle Database 12c - The Best Oracle Database 12c New Features for D...
OOW16 - Oracle Database 12c - The Best Oracle Database 12c New Features for D...Alex Zaballa
 
Oracle Database Backup
Oracle Database BackupOracle Database Backup
Oracle Database BackupHandy_Backup
 
CocoaHeads PDX 2014 01 23 : CoreData and iCloud Improvements iOS7 / OSX Maver...
CocoaHeads PDX 2014 01 23 : CoreData and iCloud Improvements iOS7 / OSX Maver...CocoaHeads PDX 2014 01 23 : CoreData and iCloud Improvements iOS7 / OSX Maver...
CocoaHeads PDX 2014 01 23 : CoreData and iCloud Improvements iOS7 / OSX Maver...smn-automate
 
Load & Unload Data TO and FROM Snowflake (By Faysal Shaarani)
Load & Unload Data TO and FROM Snowflake (By Faysal Shaarani)Load & Unload Data TO and FROM Snowflake (By Faysal Shaarani)
Load & Unload Data TO and FROM Snowflake (By Faysal Shaarani)Faysal Shaarani (MBA)
 
Top 20 FAQs on the Autonomous Database
Top 20 FAQs on the Autonomous DatabaseTop 20 FAQs on the Autonomous Database
Top 20 FAQs on the Autonomous DatabaseSandesh Rao
 
Introduction to SQLite in Adobe AIR
Introduction to SQLite in Adobe AIRIntroduction to SQLite in Adobe AIR
Introduction to SQLite in Adobe AIRPeter Elst
 
Exam 1z0 062 Oracle Database 12c: Installation and Administration
Exam 1z0 062 Oracle Database 12c: Installation and AdministrationExam 1z0 062 Oracle Database 12c: Installation and Administration
Exam 1z0 062 Oracle Database 12c: Installation and AdministrationKylieJonathan
 
Esm migrate to_corre_6.0c
Esm migrate to_corre_6.0cEsm migrate to_corre_6.0c
Esm migrate to_corre_6.0cProtect724v3
 

Similaire à An overview of snowflake (20)

ASP.Net Presentation Part2
ASP.Net Presentation Part2ASP.Net Presentation Part2
ASP.Net Presentation Part2
 
Sqllite
SqlliteSqllite
Sqllite
 
oracle dba
oracle dbaoracle dba
oracle dba
 
Ms sql server architecture
Ms sql server architectureMs sql server architecture
Ms sql server architecture
 
Big datademo
Big datademoBig datademo
Big datademo
 
Whitepaper To Study Filestream Option In Sql Server
Whitepaper To Study Filestream Option In Sql ServerWhitepaper To Study Filestream Option In Sql Server
Whitepaper To Study Filestream Option In Sql Server
 
PostgreSQL Database Slides
PostgreSQL Database SlidesPostgreSQL Database Slides
PostgreSQL Database Slides
 
ME_Snowflake_Introduction_for new students.pptx
ME_Snowflake_Introduction_for new students.pptxME_Snowflake_Introduction_for new students.pptx
ME_Snowflake_Introduction_for new students.pptx
 
OOW16 - Oracle Database 12c - The Best Oracle Database 12c New Features for D...
OOW16 - Oracle Database 12c - The Best Oracle Database 12c New Features for D...OOW16 - Oracle Database 12c - The Best Oracle Database 12c New Features for D...
OOW16 - Oracle Database 12c - The Best Oracle Database 12c New Features for D...
 
OOW16 - Oracle Database 12c - The Best Oracle Database 12c New Features for D...
OOW16 - Oracle Database 12c - The Best Oracle Database 12c New Features for D...OOW16 - Oracle Database 12c - The Best Oracle Database 12c New Features for D...
OOW16 - Oracle Database 12c - The Best Oracle Database 12c New Features for D...
 
Migration from 8.1 to 11.3
Migration from 8.1 to 11.3Migration from 8.1 to 11.3
Migration from 8.1 to 11.3
 
Oracle Database Backup
Oracle Database BackupOracle Database Backup
Oracle Database Backup
 
CocoaHeads PDX 2014 01 23 : CoreData and iCloud Improvements iOS7 / OSX Maver...
CocoaHeads PDX 2014 01 23 : CoreData and iCloud Improvements iOS7 / OSX Maver...CocoaHeads PDX 2014 01 23 : CoreData and iCloud Improvements iOS7 / OSX Maver...
CocoaHeads PDX 2014 01 23 : CoreData and iCloud Improvements iOS7 / OSX Maver...
 
Load & Unload Data TO and FROM Snowflake (By Faysal Shaarani)
Load & Unload Data TO and FROM Snowflake (By Faysal Shaarani)Load & Unload Data TO and FROM Snowflake (By Faysal Shaarani)
Load & Unload Data TO and FROM Snowflake (By Faysal Shaarani)
 
Top 20 FAQs on the Autonomous Database
Top 20 FAQs on the Autonomous DatabaseTop 20 FAQs on the Autonomous Database
Top 20 FAQs on the Autonomous Database
 
Sql lite android
Sql lite androidSql lite android
Sql lite android
 
Introduction to SQLite in Adobe AIR
Introduction to SQLite in Adobe AIRIntroduction to SQLite in Adobe AIR
Introduction to SQLite in Adobe AIR
 
Exam 1z0 062 Oracle Database 12c: Installation and Administration
Exam 1z0 062 Oracle Database 12c: Installation and AdministrationExam 1z0 062 Oracle Database 12c: Installation and Administration
Exam 1z0 062 Oracle Database 12c: Installation and Administration
 
ora_sothea
ora_sotheaora_sothea
ora_sothea
 
Esm migrate to_corre_6.0c
Esm migrate to_corre_6.0cEsm migrate to_corre_6.0c
Esm migrate to_corre_6.0c
 

Plus de Sivakumar Ramar

Nps speedo meter gauge chart in tabelau
Nps speedo meter   gauge chart in tabelauNps speedo meter   gauge chart in tabelau
Nps speedo meter gauge chart in tabelauSivakumar Ramar
 
Monitor tableau server for reference
Monitor tableau server for referenceMonitor tableau server for reference
Monitor tableau server for referenceSivakumar Ramar
 
Today's Synopsis about the 24 Databases
Today's Synopsis about the 24 DatabasesToday's Synopsis about the 24 Databases
Today's Synopsis about the 24 DatabasesSivakumar Ramar
 

Plus de Sivakumar Ramar (8)

Nps speedo meter gauge chart in tabelau
Nps speedo meter   gauge chart in tabelauNps speedo meter   gauge chart in tabelau
Nps speedo meter gauge chart in tabelau
 
01 BlockChain
01 BlockChain01 BlockChain
01 BlockChain
 
AWS Services - Part 1
AWS Services - Part 1AWS Services - Part 1
AWS Services - Part 1
 
Amazon quicksight
Amazon quicksightAmazon quicksight
Amazon quicksight
 
Monitor tableau server for reference
Monitor tableau server for referenceMonitor tableau server for reference
Monitor tableau server for reference
 
Today's Synopsis about the 24 Databases
Today's Synopsis about the 24 DatabasesToday's Synopsis about the 24 Databases
Today's Synopsis about the 24 Databases
 
AWS Devops
AWS DevopsAWS Devops
AWS Devops
 
TABLEAU for Beginners
TABLEAU for BeginnersTABLEAU for Beginners
TABLEAU for Beginners
 

Dernier

Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesBoston Institute of Analytics
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 

Dernier (20)

Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 

An overview of snowflake

  • 1.
  • 2.  Snowflake is a Cloud Data warehouse provided as SaaS with full support of ANSI SQL, also includes both Structural and Semi-Structure data.  Enables Users to CreateTables, Start Querying data with less administration.  Offers bothTraditional Share disk and Shared Nothing architecture to offer the best of both. Shared Nothing Architecture Shared disk Architecture
  • 3.  Snowflake facilitates Unlimited Storage Scalabilty without refactoring and Multiple Clusters can read or write share data, ResizeClusters Instantly – no downtime is involved.  FullTransaction consistencyACID Across entire System.  Centrally Manage Logical assets such as Servers, buckets etc.
  • 4. Snowflake Support by 3 different layers  Storage layer, Compute and Cloud Service
  • 5.  Snowflake process queries using MPP Concept such as each node has parts of the data stored locally while using a central data repository to store the data that is accessible by all compute nodes.  Snowflake Architecture consist of 3 layers 1. Data Storage 2. Query Processing 3. Cloud Services. Database Storage Layer:  Snowflake Organize data into multiple micro partition that are internally optimized and compressed. It uses columnar format to store. Data stored in Cloud Storage and works as Shared disk model which provides a simplicity in data management.  Compute Node connect with Storage layer to fetch data for querying as the Storage layer is independent.This allows as Snowflake is provisioned on Cloud, there by Storage is elastic resulting the user only pay perTB every month.
  • 6. Query Layer  Snowflake usesVirtualWarehouse for running the query.The Phenomenal of snowflake that separate the query processing layer from disk storage. Cloud Service Layer: AllActivities Such as Authentication, Security, Meta Management of loaded data and Query optimizer that coordinates across this layer. Benefits:
  • 7. Cloud Services :  Multi-tenant, transactional and Secure  Runs in AWS Cloud  Million of Queries per day over petabytes of data.  Replicated for Availability and Scalability  Focus on easy of use and service experience  Collection of services such as Access- Control,QueryOptimizer and transactional Manager
  • 8. 1. Extract data from oracle to CSV using SQL Plus 2. Data type conversation and other transformations. 3. Staging files to S3 4. Finally Copy Staged files to Snowflake tables. Step 1: Code: --Turn on the spool spool spool file.txt select * from dba_table; spool off Note : Spool file will not be available until it is turned off. #!/usr/bin/bash FILE="students.csv" sqlplus -s user_name/password@oracle_db <<EOF SET PAGESIZE 35000 SET COLSEP "|" SET LINESIZE 230
  • 9. SET FEEDBACKOFF SPOOL $FILE SELECT * FROM EMP; SPOOLOFF EXIT EOF#!/usr/bin/bash FILE="emp.csv" sqlplus -s scott/tiger@XE <<EOF SET PAGESIZE 50000 SET COLSEP "," SET LINESIZE 200 SET FEEDBACKOFF SPOOL $FILE SELECT * FROM STUDENTS; SPOOLOFF EXIT EOF
  • 10. Step 1.2:  For incremental load,we need to generate sql with proper condition to select only the records which we are modified after the last data pull. Query : select * from students where last_modified_time > last_pull_time and last_modified_time <= sys_time. Step 2 : Below are the recommendation for transferring data type conversation from oracle to snowflake.
  • 11. Step 3:  To load data to Snowflake, the data needs to be upload to s3 loaction (step 2 explains about extract of oracle to flat files)  We need a Snowflake instance which runs on AWS.This instance needs to have the ability to access the S3 files in AWS.  This access can be either internal or external and this process is called Staging Create Internal Staging : create or replace stage my_oracle_stage copy_options= (on_error='skip_file') file_format= (type = 'CSV' field_delimiter = ',' skip_header = 1); Use below PUT command to stage files to internal Snowflake stage PUT file://path_to_your_file/your_filename internal_stage_name Upload a file items_data.csv in the /tmp/oracle_data/data/ directory to an internal stage named oracle_stage put ile:////tmp/oracle_data/data/items_data.cs v @oracle_stage; Ref : https://docs.snowflake.net/manuals/sql-reference/sql/put.html
  • 12. Step3: (External Staging options)  Snowflake supports any accessibleAmazon S3 or MicrosoftAzure as an external staging location.You can create a stage to pointing to the location data can be loaded directly to the Snowflake table through that stage. No need to move the data to an internal stage  create an external stage pointing to an S3 location, IAM credentials with proper access permissions are required If data needs to be decrypted before loading to Snowflake, proper keys are to be provided. create or replace stage oracle_ext_stage url='s3://snowflake_oracle/data/load/files/' credentials=(aws_key_id='1d318jnsonmb5#d gd4rrb3c' aws_secret_key='aii998nnrcd4kx5y6z'); encryption=(master_key = 'eSxX0jzskjl22bNaaaDuOaO8='); Once data is extracted from Oracle it can be uploaded to S3 using the direct upload option or usingAWS SDK in your favourite programming language. Python’s boto3 is a popular one used under such circumstances. Once data is in S3, an external stage can be created to point that location
  • 13. Step 4: Copy staged files to Snowflake table  Extracted data from Oracle, uploaded it to an S3 location and created an external Snowflake stage pointing to that location.The next step is to copy data to the table.The command used to do this is COPY INTO. Note:To execute the COPY INTO command, compute resources in Snowflake virtual warehouses are required and your Snowflake credits will be utilized. • To load from a named internal copy into oracle_table from @oracle_stage; • Loading from the external stage. Only one file is specified. copy into my_ext_stage_table from @oracle_ext_stage/tutorials/dataloading/items_ext .csv; • A copy directly from an external location without creating a stage copy into oracle_table from s3://mybucket/oracle_snow/data/files credentials=(aws_key_id='$AWS_ACCESS_KEY_ ID' aws_secret_key='$AWS_SECRET_ACCESS_KE Y') encryption=(master_key = 'eSxX009jhh76jkIuLPH5r4BD09wOaO8=') file_format = (format_name = csv_format);
  • 14. Files can be specified using patterns copy into oracle_pattern_table from @oracle_stage file_format = (type = 'TSV') pattern='.*/.*/.*[.]csv[.]gz'; Step 4: Update SnowflakeTable The basic idea is to load incrementally extracted data into an intermediate or temporary table and modify records in the final table with data in the intermediate table.The three methods mentioned below are generally used for this. 1. Update the rows in the target table with new data (with same keys). Then insert new rows from the intermediate or landing table which are not in the final table UPDATE oracle_target_table t SET t.value = s.value FROM landing_delta_table in WHERE t.id = in.id; INSERT INTO oracle_target_table (id, value) SELECT id, value FROM landing_delta_table WHERE NOT id IN (SELECT id FROM oracle_target_table); 2. Delete rows from the target table which are also in the landing table. Then insert all rows from the landing table to the final table. Now, the final table will have the latest data without duplicates DELETE .oracle_target_table f WHERE f.id IN (SELECT id from landing_table); INSERT oracle_target_table (id, value) SELECT id, value FROM landing_table;
  • 15. Files can be specified using patterns copy into oracle_pattern_table from @oracle_stage file_format = (type = 'TSV') pattern='.*/.*/.*[.]csv[.]gz'; Step 4: Update SnowflakeTable The basic idea is to load incrementally extracted data into an intermediate or temporary table and modify records in the final table with data in the intermediate table.The three methods mentioned below are generally used for this. 1. Update the rows in the target table with new data (with same keys). Then insert new rows from the intermediate or landing table which are not in the final table UPDATE oracle_target_table t SET t.value = s.value FROM landing_delta_table in WHERE t.id = in.id; INSERT INTO oracle_target_table (id, value) SELECT id, value FROM landing_delta_table WHERE NOT id IN (SELECT id FROM oracle_target_table); 2. Delete rows from the target table which are also in the landing table. Then insert all rows from the landing table to the final table. Now, the final table will have the latest data without duplicates DELETE .oracle_target_table f WHERE f.id IN (SELECT id from landing_table); INSERT oracle_target_table (id, value) SELECT id, value FROM landing_table;
  • 16. 3. MERGE Statement – Standard SQL merge statement which combines Inserts and updates. It is used to apply changes in the landing table to the target table with one SQL statement MERGE into oracle_target_table t1 using landing_delta_table t2 on t1.id = t2.idWHEN matched then update set value = t2.value WHEN not matched then INSERT (id, value) values (t2.id, t2.value); This method works when you have a comfortable project timeline and a pool of experienced engineering resources that can build and maintain the pipeline. However, the method mentioned above comes with a lot of coding and maintenance overhead Ref : https://hevodata.com/blog/oracle-to-snowflake-etl/
  • 17.
  • 18.
  • 19.
  • 20.
  • 22.
  • 23. Q&A