Snowflake is a Cloud Data warehouse provided as SaaS with full support of ANSI
SQL, also includes both Structural and Semi-Structure data.
Enables Users to CreateTables, Start Querying data with less administration.
Offers bothTraditional Share disk and Shared Nothing architecture to offer the
best of both.
Shared Nothing Architecture Shared disk Architecture
Snowflake facilitates Unlimited Storage Scalabilty without refactoring and Multiple
Clusters can read or write share data, ResizeClusters Instantly – no downtime is
involved.
FullTransaction consistencyACID Across entire System.
Centrally Manage Logical assets such as Servers, buckets etc.
Snowflake process queries using MPP Concept such as each node has parts of the
data stored locally while using a central data repository to store the data that is
accessible by all compute nodes.
Snowflake Architecture consist of 3 layers
1. Data Storage
2. Query Processing
3. Cloud Services.
Database Storage Layer:
Snowflake Organize data into multiple micro partition that are internally
optimized and compressed. It uses columnar format to store. Data stored in
Cloud Storage and works as Shared disk model which provides a simplicity in
data management.
Compute Node connect with Storage layer to fetch data for querying as the
Storage layer is independent.This allows as Snowflake is provisioned on Cloud,
there by Storage is elastic resulting the user only pay perTB every month.
Query Layer
Snowflake usesVirtualWarehouse for running the query.The Phenomenal of
snowflake that separate the query processing layer from disk storage.
Cloud Service Layer:
AllActivities Such as Authentication, Security, Meta Management of loaded data and
Query optimizer that coordinates across this layer.
Benefits:
Cloud Services :
Multi-tenant, transactional and Secure
Runs in AWS Cloud
Million of Queries per day over
petabytes of data.
Replicated for Availability and Scalability
Focus on easy of use and service
experience
Collection of services such as Access-
Control,QueryOptimizer and
transactional Manager
1. Extract data from oracle to CSV using SQL Plus
2. Data type conversation and other transformations.
3. Staging files to S3
4. Finally Copy Staged files to Snowflake tables.
Step 1: Code: --Turn on the spool
spool spool file.txt
select * from dba_table;
spool off
Note : Spool file will not be available until it is turned off.
#!/usr/bin/bash
FILE="students.csv"
sqlplus -s user_name/password@oracle_db <<EOF
SET PAGESIZE 35000
SET COLSEP "|"
SET LINESIZE 230
SET FEEDBACKOFF
SPOOL $FILE
SELECT * FROM EMP;
SPOOLOFF
EXIT
EOF#!/usr/bin/bash
FILE="emp.csv"
sqlplus -s scott/tiger@XE <<EOF
SET PAGESIZE 50000
SET COLSEP ","
SET LINESIZE 200
SET FEEDBACKOFF
SPOOL $FILE
SELECT * FROM STUDENTS;
SPOOLOFF
EXIT
EOF
Step 1.2:
For incremental load,we need
to generate sql with proper
condition to select only the
records which we are modified
after the last data pull.
Query : select * from students
where last_modified_time >
last_pull_time and
last_modified_time <=
sys_time.
Step 2 : Below are the recommendation for
transferring data type conversation from oracle
to snowflake.
Step 3:
To load data to Snowflake, the
data needs to be upload to s3
loaction (step 2 explains about
extract of oracle to flat files)
We need a Snowflake instance
which runs on AWS.This
instance needs to have the
ability to access the S3 files in
AWS.
This access can be either
internal or external and this
process is called Staging
Create Internal Staging :
create or replace stage my_oracle_stage
copy_options= (on_error='skip_file')
file_format= (type = 'CSV' field_delimiter = ','
skip_header = 1);
Use below PUT command to stage files to
internal Snowflake stage
PUT file://path_to_your_file/your_filename
internal_stage_name
Upload a file items_data.csv in the
/tmp/oracle_data/data/ directory to an
internal stage named oracle_stage
put
ile:////tmp/oracle_data/data/items_data.cs
v @oracle_stage;
Ref :
https://docs.snowflake.net/manuals/sql-reference/sql/put.html
Step3: (External Staging options)
Snowflake supports any
accessibleAmazon S3 or
MicrosoftAzure as an external
staging location.You can create
a stage to pointing to the
location data can be loaded
directly to the Snowflake table
through that stage. No need to
move the data to an internal
stage
create an external stage
pointing to an S3 location, IAM
credentials with proper access
permissions are required
If data needs to be decrypted before loading
to Snowflake, proper keys are to be
provided.
create or replace stage oracle_ext_stage
url='s3://snowflake_oracle/data/load/files/'
credentials=(aws_key_id='1d318jnsonmb5#d
gd4rrb3c'
aws_secret_key='aii998nnrcd4kx5y6z');
encryption=(master_key =
'eSxX0jzskjl22bNaaaDuOaO8=');
Once data is extracted from Oracle it can be
uploaded to S3 using the direct upload
option or usingAWS SDK in your favourite
programming language. Python’s boto3 is
a popular one used under such
circumstances. Once data is in S3, an external
stage can be created to point that location
Step 4: Copy staged files to
Snowflake table
Extracted data from Oracle,
uploaded it to an S3 location
and created an external
Snowflake stage pointing to
that location.The next step is
to copy data to the table.The
command used to do this
is COPY INTO. Note:To execute
the COPY INTO command,
compute resources in
Snowflake virtual warehouses
are required and your
Snowflake credits will be
utilized.
• To load from a named internal
copy into oracle_table
from @oracle_stage;
• Loading from the external stage. Only one file is
specified.
copy into my_ext_stage_table from
@oracle_ext_stage/tutorials/dataloading/items_ext
.csv;
• A copy directly from an external location without
creating a stage
copy into oracle_table from
s3://mybucket/oracle_snow/data/files
credentials=(aws_key_id='$AWS_ACCESS_KEY_
ID'
aws_secret_key='$AWS_SECRET_ACCESS_KE
Y') encryption=(master_key =
'eSxX009jhh76jkIuLPH5r4BD09wOaO8=')
file_format = (format_name = csv_format);
Files can be specified using patterns
copy into oracle_pattern_table
from @oracle_stage file_format =
(type = 'TSV')
pattern='.*/.*/.*[.]csv[.]gz';
Step 4: Update SnowflakeTable
The basic idea is to load incrementally
extracted data into an intermediate
or temporary table and modify
records in the final table with data in
the intermediate table.The three
methods mentioned below are
generally used for this.
1. Update the rows in the target table
with new data (with same keys).
Then insert new rows from the intermediate or
landing table which are not in the final table
UPDATE oracle_target_table t SET t.value =
s.value FROM landing_delta_table in
WHERE t.id = in.id;
INSERT INTO oracle_target_table (id, value)
SELECT id, value
FROM landing_delta_table WHERE NOT
id IN (SELECT id FROM
oracle_target_table);
2. Delete rows from the target table which are
also in the landing table. Then insert all
rows from the landing table to the final
table. Now, the final table will have the
latest data without duplicates
DELETE .oracle_target_table f WHERE f.id IN
(SELECT id from landing_table); INSERT
oracle_target_table (id, value) SELECT id,
value FROM landing_table;
Files can be specified using patterns
copy into oracle_pattern_table
from @oracle_stage file_format =
(type = 'TSV')
pattern='.*/.*/.*[.]csv[.]gz';
Step 4: Update SnowflakeTable
The basic idea is to load incrementally
extracted data into an intermediate
or temporary table and modify
records in the final table with data in
the intermediate table.The three
methods mentioned below are
generally used for this.
1. Update the rows in the target table
with new data (with same keys).
Then insert new rows from the intermediate or
landing table which are not in the final table
UPDATE oracle_target_table t SET t.value =
s.value FROM landing_delta_table in
WHERE t.id = in.id;
INSERT INTO oracle_target_table (id, value)
SELECT id, value
FROM landing_delta_table WHERE NOT
id IN (SELECT id FROM
oracle_target_table);
2. Delete rows from the target table which are
also in the landing table. Then insert all
rows from the landing table to the final
table. Now, the final table will have the
latest data without duplicates
DELETE .oracle_target_table f WHERE f.id IN
(SELECT id from landing_table); INSERT
oracle_target_table (id, value) SELECT id,
value FROM landing_table;
3. MERGE Statement – Standard SQL
merge statement which combines
Inserts and updates. It is used to
apply changes in the landing table
to the target table with one SQL
statement
MERGE into oracle_target_table
t1 using landing_delta_table t2 on
t1.id = t2.idWHEN matched then
update set value = t2.value WHEN
not matched then INSERT (id, value)
values (t2.id, t2.value);
This method works when you have a
comfortable project timeline and a
pool of experienced engineering
resources that can build and
maintain the pipeline. However, the
method mentioned above comes
with a lot of coding and
maintenance overhead
Ref :
https://hevodata.com/blog/oracle-to-snowflake-etl/