Publicité
Publicité

Contenu connexe

Publicité

Dernier(20)

Publicité

An overview of snowflake

  1.  Snowflake is a Cloud Data warehouse provided as SaaS with full support of ANSI SQL, also includes both Structural and Semi-Structure data.  Enables Users to CreateTables, Start Querying data with less administration.  Offers bothTraditional Share disk and Shared Nothing architecture to offer the best of both. Shared Nothing Architecture Shared disk Architecture
  2.  Snowflake facilitates Unlimited Storage Scalabilty without refactoring and Multiple Clusters can read or write share data, ResizeClusters Instantly – no downtime is involved.  FullTransaction consistencyACID Across entire System.  Centrally Manage Logical assets such as Servers, buckets etc.
  3. Snowflake Support by 3 different layers  Storage layer, Compute and Cloud Service
  4.  Snowflake process queries using MPP Concept such as each node has parts of the data stored locally while using a central data repository to store the data that is accessible by all compute nodes.  Snowflake Architecture consist of 3 layers 1. Data Storage 2. Query Processing 3. Cloud Services. Database Storage Layer:  Snowflake Organize data into multiple micro partition that are internally optimized and compressed. It uses columnar format to store. Data stored in Cloud Storage and works as Shared disk model which provides a simplicity in data management.  Compute Node connect with Storage layer to fetch data for querying as the Storage layer is independent.This allows as Snowflake is provisioned on Cloud, there by Storage is elastic resulting the user only pay perTB every month.
  5. Query Layer  Snowflake usesVirtualWarehouse for running the query.The Phenomenal of snowflake that separate the query processing layer from disk storage. Cloud Service Layer: AllActivities Such as Authentication, Security, Meta Management of loaded data and Query optimizer that coordinates across this layer. Benefits:
  6. Cloud Services :  Multi-tenant, transactional and Secure  Runs in AWS Cloud  Million of Queries per day over petabytes of data.  Replicated for Availability and Scalability  Focus on easy of use and service experience  Collection of services such as Access- Control,QueryOptimizer and transactional Manager
  7. 1. Extract data from oracle to CSV using SQL Plus 2. Data type conversation and other transformations. 3. Staging files to S3 4. Finally Copy Staged files to Snowflake tables. Step 1: Code: --Turn on the spool spool spool file.txt select * from dba_table; spool off Note : Spool file will not be available until it is turned off. #!/usr/bin/bash FILE="students.csv" sqlplus -s user_name/password@oracle_db <<EOF SET PAGESIZE 35000 SET COLSEP "|" SET LINESIZE 230
  8. SET FEEDBACKOFF SPOOL $FILE SELECT * FROM EMP; SPOOLOFF EXIT EOF#!/usr/bin/bash FILE="emp.csv" sqlplus -s scott/tiger@XE <<EOF SET PAGESIZE 50000 SET COLSEP "," SET LINESIZE 200 SET FEEDBACKOFF SPOOL $FILE SELECT * FROM STUDENTS; SPOOLOFF EXIT EOF
  9. Step 1.2:  For incremental load,we need to generate sql with proper condition to select only the records which we are modified after the last data pull. Query : select * from students where last_modified_time > last_pull_time and last_modified_time <= sys_time. Step 2 : Below are the recommendation for transferring data type conversation from oracle to snowflake.
  10. Step 3:  To load data to Snowflake, the data needs to be upload to s3 loaction (step 2 explains about extract of oracle to flat files)  We need a Snowflake instance which runs on AWS.This instance needs to have the ability to access the S3 files in AWS.  This access can be either internal or external and this process is called Staging Create Internal Staging : create or replace stage my_oracle_stage copy_options= (on_error='skip_file') file_format= (type = 'CSV' field_delimiter = ',' skip_header = 1); Use below PUT command to stage files to internal Snowflake stage PUT file://path_to_your_file/your_filename internal_stage_name Upload a file items_data.csv in the /tmp/oracle_data/data/ directory to an internal stage named oracle_stage put ile:////tmp/oracle_data/data/items_data.cs v @oracle_stage; Ref : https://docs.snowflake.net/manuals/sql-reference/sql/put.html
  11. Step3: (External Staging options)  Snowflake supports any accessibleAmazon S3 or MicrosoftAzure as an external staging location.You can create a stage to pointing to the location data can be loaded directly to the Snowflake table through that stage. No need to move the data to an internal stage  create an external stage pointing to an S3 location, IAM credentials with proper access permissions are required If data needs to be decrypted before loading to Snowflake, proper keys are to be provided. create or replace stage oracle_ext_stage url='s3://snowflake_oracle/data/load/files/' credentials=(aws_key_id='1d318jnsonmb5#d gd4rrb3c' aws_secret_key='aii998nnrcd4kx5y6z'); encryption=(master_key = 'eSxX0jzskjl22bNaaaDuOaO8='); Once data is extracted from Oracle it can be uploaded to S3 using the direct upload option or usingAWS SDK in your favourite programming language. Python’s boto3 is a popular one used under such circumstances. Once data is in S3, an external stage can be created to point that location
  12. Step 4: Copy staged files to Snowflake table  Extracted data from Oracle, uploaded it to an S3 location and created an external Snowflake stage pointing to that location.The next step is to copy data to the table.The command used to do this is COPY INTO. Note:To execute the COPY INTO command, compute resources in Snowflake virtual warehouses are required and your Snowflake credits will be utilized. • To load from a named internal copy into oracle_table from @oracle_stage; • Loading from the external stage. Only one file is specified. copy into my_ext_stage_table from @oracle_ext_stage/tutorials/dataloading/items_ext .csv; • A copy directly from an external location without creating a stage copy into oracle_table from s3://mybucket/oracle_snow/data/files credentials=(aws_key_id='$AWS_ACCESS_KEY_ ID' aws_secret_key='$AWS_SECRET_ACCESS_KE Y') encryption=(master_key = 'eSxX009jhh76jkIuLPH5r4BD09wOaO8=') file_format = (format_name = csv_format);
  13. Files can be specified using patterns copy into oracle_pattern_table from @oracle_stage file_format = (type = 'TSV') pattern='.*/.*/.*[.]csv[.]gz'; Step 4: Update SnowflakeTable The basic idea is to load incrementally extracted data into an intermediate or temporary table and modify records in the final table with data in the intermediate table.The three methods mentioned below are generally used for this. 1. Update the rows in the target table with new data (with same keys). Then insert new rows from the intermediate or landing table which are not in the final table UPDATE oracle_target_table t SET t.value = s.value FROM landing_delta_table in WHERE t.id = in.id; INSERT INTO oracle_target_table (id, value) SELECT id, value FROM landing_delta_table WHERE NOT id IN (SELECT id FROM oracle_target_table); 2. Delete rows from the target table which are also in the landing table. Then insert all rows from the landing table to the final table. Now, the final table will have the latest data without duplicates DELETE .oracle_target_table f WHERE f.id IN (SELECT id from landing_table); INSERT oracle_target_table (id, value) SELECT id, value FROM landing_table;
  14. Files can be specified using patterns copy into oracle_pattern_table from @oracle_stage file_format = (type = 'TSV') pattern='.*/.*/.*[.]csv[.]gz'; Step 4: Update SnowflakeTable The basic idea is to load incrementally extracted data into an intermediate or temporary table and modify records in the final table with data in the intermediate table.The three methods mentioned below are generally used for this. 1. Update the rows in the target table with new data (with same keys). Then insert new rows from the intermediate or landing table which are not in the final table UPDATE oracle_target_table t SET t.value = s.value FROM landing_delta_table in WHERE t.id = in.id; INSERT INTO oracle_target_table (id, value) SELECT id, value FROM landing_delta_table WHERE NOT id IN (SELECT id FROM oracle_target_table); 2. Delete rows from the target table which are also in the landing table. Then insert all rows from the landing table to the final table. Now, the final table will have the latest data without duplicates DELETE .oracle_target_table f WHERE f.id IN (SELECT id from landing_table); INSERT oracle_target_table (id, value) SELECT id, value FROM landing_table;
  15. 3. MERGE Statement – Standard SQL merge statement which combines Inserts and updates. It is used to apply changes in the landing table to the target table with one SQL statement MERGE into oracle_target_table t1 using landing_delta_table t2 on t1.id = t2.idWHEN matched then update set value = t2.value WHEN not matched then INSERT (id, value) values (t2.id, t2.value); This method works when you have a comfortable project timeline and a pool of experienced engineering resources that can build and maintain the pipeline. However, the method mentioned above comes with a lot of coding and maintenance overhead Ref : https://hevodata.com/blog/oracle-to-snowflake-etl/
  16. Q&A https://www.analytics.today/blog/top-10-reasons-snowflake-rocks https://www.g2.com/reports/grid-report-for-data-warehouse-fall- 2019?featured=snowflake&secure%5Bgated_consumer%5D=0043e810-90c1-4257-a24a- f7a3b7e6b1c3&secure%5Btoken%5D=04647245837d1e63f5d46e942153e0beed97b18b25f466 db19d0c54901467747&utm_campaign=gate-768549
  17. Q&A
Publicité