Snowflake is a cloud-based data warehouse that runs entirely on cloud infrastructure like AWS or Azure. It uses a shared-disk and shared-nothing architecture. Data is stored in an optimized columnar format in cloud storage. Queries are executed using virtual warehouses that are independent compute clusters. Snowflake provides a SQL interface and connectors to load, query, and analyze data without having to manage any hardware or software.
Snowflake is an analytic data warehouse provided as software-as-a-service (SaaS). It uses a unique architecture designed for the cloud, with a shared-disk database and shared-nothing architecture. Snowflake's architecture consists of three layers - the database layer, query processing layer, and cloud services layer - which are deployed and managed entirely on cloud platforms like AWS and Azure. Snowflake offers different editions like Standard, Premier, Enterprise, and Enterprise for Sensitive Data that provide additional features, support, and security capabilities.
Snowflake's data platform is a self-managed cloud data warehouse that provides data storage, processing, and analytics capabilities. It uses a hybrid architecture of shared disk and shared nothing designs to offer scalability and performance. Snowflake handles all software installation, updates, and management so users only need to manage data and queries.
This whitepaper discusses best practices for using Tableau with Snowflake. Snowflake is a cloud-based data warehouse as a service that provides limitless scalability and no hardware or software to install and maintain. Tableau is a business intelligence software that allows users to visualize, explore and analyze data. The whitepaper provides an overview of Snowflake and Tableau and discusses topics such as connecting Tableau to Snowflake, working with different data types and structures in Snowflake, implementing security, and optimizing performance.
Snowflake is a cloud data platform that allows users to store, process and analyze data across cloud services like AWS, GCP and Azure. Some key features include:
- It uses a shared data architecture that decouples storage, compute and management services allowing them to scale independently. Data is stored in a centralized storage layer and processed using virtual warehouses.
- Snowflake supports both structured and semi-structured data and users can query data using standard SQL. It also supports stored procedures, user defined functions and external integrations.
- The platform offers different editions with features for security, governance, compute resource management and data integration. It also has tools and connectors for data warehousing, science and sharing workflows.
The document discusses Azure Data Factory and its capabilities for cloud-first data integration and transformation. ADF allows orchestrating data movement and transforming data at scale across hybrid and multi-cloud environments using a visual, code-free interface. It provides serverless scalability without infrastructure to manage along with capabilities for lifting and running SQL Server Integration Services packages in Azure.
The new Microsoft Azure SQL Data Warehouse (SQL DW) is an elastic data warehouse-as-a-service and is a Massively Parallel Processing (MPP) solution for "big data" with true enterprise class features. The SQL DW service is built for data warehouse workloads from a few hundred gigabytes to petabytes of data with truly unique features like disaggregated compute and storage allowing for customers to be able to utilize the service to match their needs. In this presentation, we take an in-depth look at implementing a SQL DW, elastic scale (grow, shrink, and pause), and hybrid data clouds with Hadoop integration via Polybase allowing for a true SQL experience across structured and unstructured data.
The document provides an overview of the Databricks platform, which offers a unified environment for data engineering, analytics, and AI. It describes how Databricks addresses the complexity of managing data across siloed systems by providing a single "data lakehouse" platform where all data and analytics workloads can be run. Key features highlighted include Delta Lake for ACID transactions on data lakes, auto loader for streaming data ingestion, notebooks for interactive coding, and governance tools to securely share and catalog data and models.
Snowflake is an analytic data warehouse provided as software-as-a-service (SaaS). It uses a unique architecture designed for the cloud, with a shared-disk database and shared-nothing architecture. Snowflake's architecture consists of three layers - the database layer, query processing layer, and cloud services layer - which are deployed and managed entirely on cloud platforms like AWS and Azure. Snowflake offers different editions like Standard, Premier, Enterprise, and Enterprise for Sensitive Data that provide additional features, support, and security capabilities.
Snowflake's data platform is a self-managed cloud data warehouse that provides data storage, processing, and analytics capabilities. It uses a hybrid architecture of shared disk and shared nothing designs to offer scalability and performance. Snowflake handles all software installation, updates, and management so users only need to manage data and queries.
This whitepaper discusses best practices for using Tableau with Snowflake. Snowflake is a cloud-based data warehouse as a service that provides limitless scalability and no hardware or software to install and maintain. Tableau is a business intelligence software that allows users to visualize, explore and analyze data. The whitepaper provides an overview of Snowflake and Tableau and discusses topics such as connecting Tableau to Snowflake, working with different data types and structures in Snowflake, implementing security, and optimizing performance.
Snowflake is a cloud data platform that allows users to store, process and analyze data across cloud services like AWS, GCP and Azure. Some key features include:
- It uses a shared data architecture that decouples storage, compute and management services allowing them to scale independently. Data is stored in a centralized storage layer and processed using virtual warehouses.
- Snowflake supports both structured and semi-structured data and users can query data using standard SQL. It also supports stored procedures, user defined functions and external integrations.
- The platform offers different editions with features for security, governance, compute resource management and data integration. It also has tools and connectors for data warehousing, science and sharing workflows.
The document discusses Azure Data Factory and its capabilities for cloud-first data integration and transformation. ADF allows orchestrating data movement and transforming data at scale across hybrid and multi-cloud environments using a visual, code-free interface. It provides serverless scalability without infrastructure to manage along with capabilities for lifting and running SQL Server Integration Services packages in Azure.
The new Microsoft Azure SQL Data Warehouse (SQL DW) is an elastic data warehouse-as-a-service and is a Massively Parallel Processing (MPP) solution for "big data" with true enterprise class features. The SQL DW service is built for data warehouse workloads from a few hundred gigabytes to petabytes of data with truly unique features like disaggregated compute and storage allowing for customers to be able to utilize the service to match their needs. In this presentation, we take an in-depth look at implementing a SQL DW, elastic scale (grow, shrink, and pause), and hybrid data clouds with Hadoop integration via Polybase allowing for a true SQL experience across structured and unstructured data.
The document provides an overview of the Databricks platform, which offers a unified environment for data engineering, analytics, and AI. It describes how Databricks addresses the complexity of managing data across siloed systems by providing a single "data lakehouse" platform where all data and analytics workloads can be run. Key features highlighted include Delta Lake for ACID transactions on data lakes, auto loader for streaming data ingestion, notebooks for interactive coding, and governance tools to securely share and catalog data and models.
Bridging to a hybrid cloud data services architectureIBM Analytics
Enterprises are increasingly evolving their data infrastructures into entire cloud-facing environments. Interfacing private and public cloud data assets is a hallmark of initiatives such as logical data warehouses, data lakes and online transactional data hubs. These projects may involve deploying two or more of the following cloud-based data platforms into a hybrid architecture: Apache Hadoop, data warehouses, graph databases, NoSQL databases, multiworkload SQL databases, open source databases, data refineries and predictive analytics.
Data application developers, data scientists and analytics professionals are driving their organizations’ efforts to bridge their data to the cloud. Several questions are of keen interest to those who are driving an organization’s evolution of its data and analytics initiatives into more holistic cloud-facing environments:
• What is a hybrid cloud data services architecture?
• What are the chief applications and benefits of a hybrid cloud data services architecture?
• What are the best practices for bridging a logical data warehouse to the cloud?
• What are the best practices for bridging advanced analytics and data lakes to the cloud?
• What are the best practices for bridging an enterprise database hub to the cloud?
• What are the first steps to take for bridging private data assets to the cloud?
• How can you measure ROI from bridging private data to public cloud data services?
• Which case studies illustrate the value of bridging private data to the cloud?
Sign up now for a free 3-month trial of IBM Analytics for Apache Spark and IBM Cloudant, IBM dashDB or IBM DB2 on Cloud.
http://ibm.co/ibm-cloudant-trial
http://ibm.co/ibm-dashdb-trial
http://ibm.co/ibm-db2-trial
http://ibm.co/ibm-spark-trial
Tobiasz Janusz Koprowski presented a beginner's guide to tips and tricks for using Windows Azure SQL Database. The presentation covered key Azure SQL Database concepts like database tiers, performance levels measured in Database Transaction Units (DTUs), data migration options, and compatibility with on-premises SQL Server versions. It provided an overview of supported and non-supported features between SQL Azure and different SQL Server versions. The presentation aimed to help attendees understand how to plan, configure and manage databases in the Azure SQL Database platform.
Serverless SQL provides a serverless analytics platform that allows users to analyze data stored in object storage without having to manage infrastructure. Key features include seamless elasticity, pay-per-query consumption, and the ability to analyze data directly in object storage without having to move it. The platform includes serverless storage, data ingest, data transformation, analytics, and automation capabilities. It aims to create a sharing economy for analytics by allowing various users like developers, data engineers, and analysts flexible access to data and analytics.
Delivering rapid-fire Analytics with Snowflake and TableauHarald Erb
Until recently, advancements in data warehousing and analytics were largely incremental. Small innovations in database design would herald a new data warehouse every
2-3 years, which would quickly become overwhelmed with rapidly increasing data volumes. Knowledge workers struggled to access those databases with development intensive BI tools designed for reporting, rather than exploration and sharing. Both databases and BI tools were strained in locally hosted environments that were inflexible to growth or change.
Snowflake and Tableau represent a fundamentally different approach. Snowflake’s multi-cluster shared data architecture was designed for the cloud and to handle logarithmically larger data volumes at blazing speed. Tableau was made to foster an interactive approach to analytics, freeing knowledge workers to use the speed of Snowflake to their greatest advantage.
Building cloud native data microserviceNilanjan Roy
Spring Cloud Data Flow is a unified, distributed, and extensible system for data ingestion, real time analytics, batch processing, and data export that simplifies development of big data applications. It applies a microservice pattern to data processing and provides typical benefits like scalability, isolation, agility, and continuous deployment. It uses Spring Cloud Stream for event-driven microservices and Spring Cloud Task for short-lived batch processes. Developers can also register and deploy their own applications with Spring Cloud Data Flow.
Designing and Implementing a cloud-hosted SaaS for data movement and Sharing ...Haripds Shrestha
SlapOS (Simple Language for Accounting and Provisioning Operating System) aim
to hide the complexity of IT infrastructures Software deployment from users.
Through a software as a service (SaaS) solution, users can request and install automatically
any data movement and sharing tools like Stork and Bitdew without any intervention of a system administrator.
Microsoft released SQL Azure more than two years ago - that's enough time for testing (I hope!). So, are you ready to move your data to the Cloud? If you’re considering a business (i.e. a production environment) in the Cloud, you need to think about methods for backing up your data, a backup plan for your data and, eventually, restoring with Red Gate Cloud Services (and not only). In this session, you’ll see the differences, functionality, restrictions, and opportunities in SQL Azure and On-Premise SQL Server 2008/2008 R2/2012. We’ll consider topics such as how to be prepared for backup and restore, and which parts of a cloud environment are most important: keys, triggers, indexes, prices, security, service level agreements, etc.
Azure provides several data related services for storing, processing, and analyzing data in the cloud at scale. Key services include Azure SQL Database for relational data, Azure DocumentDB for NoSQL data, Azure Data Warehouse for analytics, Azure Data Lake Store for big data storage, and Azure Storage for binary data. These services provide scalability, high availability, and manageability. Azure SQL Database provides fully managed SQL databases with options for single databases, elastic pools, and geo-replication. Azure Data Warehouse enables petabyte-scale analytics with massively parallel processing.
Demystifying Data Warehouse as a Service (DWaaS)Kent Graziano
This is from the talk I gave at the 30th Anniversary NoCOUG meeting in San Jose, CA.
We all know that data warehouses and best practices for them are changing dramatically today. As organizations build new data warehouses and modernize established ones, they are turning to Data Warehousing as a Service (DWaaS) in hopes of taking advantage of the performance, concurrency, simplicity, and lower cost of a SaaS solution or simply to reduce their data center footprint (and the maintenance that goes with that).
But what is a DWaaS really? How is it different from traditional on-premises data warehousing?
In this talk I will:
• Demystify DWaaS by defining it and its goals
• Discuss the real-world benefits of DWaaS
• Discuss some of the coolest features in a DWaaS solution as exemplified by the Snowflake Elastic Data Warehouse.
Building a Turbo-fast Data Warehousing Platform with DatabricksDatabricks
Traditionally, data warehouse platforms have been perceived as cost prohibitive, challenging to maintain and complex to scale. The combination of Apache Spark and Spark SQL – running on AWS – provides a fast, simple, and scalable way to build a new generation of data warehouses that revolutionizes how data scientists and engineers analyze their data sets.
In this webinar you will learn how Databricks - a fully managed Spark platform hosted on AWS - integrates with variety of different AWS services, Amazon S3, Kinesis, and VPC. We’ll also show you how to build your own data warehousing platform in very short amount of time and how to integrate it with other tools such as Spark’s machine learning library and Spark streaming for real-time processing of your data.
At last Oracle Cloud Services are commercially available.
For deploying, pricing and any consultant about Oracle Cloud Services - feel free to contact us. All our coordinates are in the last slide of the presentation.
Azure Synapse Analytics is Azure SQL Data Warehouse evolved: a limitless analytics service, that brings together enterprise data warehousing and Big Data analytics into a single service. It gives you the freedom to query data on your terms, using either serverless on-demand or provisioned resources, at scale. Azure Synapse brings these two worlds together with a unified experience to ingest, prepare, manage, and serve data for immediate business intelligence and machine learning needs. This is a huge deck with lots of screenshots so you can see exactly how it works.
Whats new in Autonomous Database in 2022Sandesh Rao
This session covers the new features and happenings in the autonomous database world and will help answer more questions DBAs and Developers will have on the Autonomous Database, from provisioning to backups, troubleshooting, tips and tricks, security and HA. This is a good introduction for on-prem DBAs who want to learn how this works quickly without spending too much time on it. Questions like what does the free tier cover, how do I do backup or if it's automated, how do I manage it, how to scale up and down, how to secure their environment, how to use mtls, how to use tools like SQLDeveloper and SQLModeler, performance tuning all in a quick 45-minute session which might take weeks to pick up reading documentation or spanning several presentations
The Basics of Getting Started With Microsoft AzureMicrosoft Azure
The document describes various capabilities provided by Microsoft Azure including hosting virtual machines and web applications, mobile backend services, cloud services, storage options, SQL databases, media services, integration services, identity and access management, virtual networking, and infrastructure as a service. It provides details on virtual machine sizes, disks, networking, security, backups, and cross-premise connectivity in Azure.
Microsoft Sentinel provides cloud-native SIEM and SOAR capabilities powered by AI and automation. It can integrate with various components like servers, cloud servers, network devices, firewalls, and security solutions to provide global visibility of IT security. The implementation includes event analysis, automation of incident response, and creation of dashboards and reports. It also provides log retention, data integrity, fault tolerance, and integration with third-party services and APIs.
The document proposes a data platform modernization project for ABC Corp to migrate its on-premise data warehouse to AWS. Key aspects include setting up a scalable data lake using AWS services like S3, Glue and Redshift. A lake house architecture is proposed with data ingestion, storage, processing and consumption layers. The solution will improve resiliency, support real-time analytics and enable AI/ML workloads. A two-year action plan is outlined along with the technology stack, solution components, quality assurance approach and resource planning.
Azure Identity (AD,ADFS 2.0,AAD,ADB2C,OAuth,OpenID,PingID,AD Custom Policies) ,
Azure PaaS (Azure Functions, Serverless computing, Azure Comsos DB, Webhooks, API Apps, Logic Apps, Kudu, Azure Websites), Azure Functions, Lamda Function, Event Functions, Serverless architecture, Implementing azure functions on GIT HUB comment feature, Why Azure Functions, Azure Virtual Machines, Azure Cloud Services, Azure Web Apps & WebJobs, Service Fabric, Consumption Plans, Billing Model, Benefits of Azure Functions, What is serverless, Implementing bigger solutions into smaller azure functions, Microservices, Use cases, Function App, Implementation storing unstructured data using Azure functions into Cosmos DB, Cosmos DB, Custom Azure functions, Azure Cosmos DB, IOTS, Document DB, Doc DB, How to setup a Jenkins build server and automatically trigger code from Visual studio online,Azure App Service, App service Environment, Azure Stack, Managing Azure App services, Azure Powershell, Azure CLI, REST APIS, Azure Portal, Templates, Kudu Console access, Run GIT Commands on Kudu Console, Locking Azure Resources, Configuring Custom Domains, Adding Extensions to Azure Web App/Websites, App service Deployment options, Data Services in Azure , Azure SQL, Azure SQL server, Azure SQL database vs SQL server in a Azure VM, SQL Tiers, DTU, Data Transactional Unit, Planning & provisioning azure SQL databases,Migrating SQL Databases, Azure SQL Server, SQL server transactional replication, Deploy database to Microsoft Azure Database Wizard, DAC package, DAC, SQL compatibility issues, Migrating SQL with downtime, DMA, Data Migration Assistant, Database Snapshot, Migrating SQL without downtime, DTU, Data Transactional Unit, Recommendations for best performance during SQL Import Process, Transactional Replication, T-SQL, Task to implement what ever you learnt till now,
Bridging to a hybrid cloud data services architectureIBM Analytics
Enterprises are increasingly evolving their data infrastructures into entire cloud-facing environments. Interfacing private and public cloud data assets is a hallmark of initiatives such as logical data warehouses, data lakes and online transactional data hubs. These projects may involve deploying two or more of the following cloud-based data platforms into a hybrid architecture: Apache Hadoop, data warehouses, graph databases, NoSQL databases, multiworkload SQL databases, open source databases, data refineries and predictive analytics.
Data application developers, data scientists and analytics professionals are driving their organizations’ efforts to bridge their data to the cloud. Several questions are of keen interest to those who are driving an organization’s evolution of its data and analytics initiatives into more holistic cloud-facing environments:
• What is a hybrid cloud data services architecture?
• What are the chief applications and benefits of a hybrid cloud data services architecture?
• What are the best practices for bridging a logical data warehouse to the cloud?
• What are the best practices for bridging advanced analytics and data lakes to the cloud?
• What are the best practices for bridging an enterprise database hub to the cloud?
• What are the first steps to take for bridging private data assets to the cloud?
• How can you measure ROI from bridging private data to public cloud data services?
• Which case studies illustrate the value of bridging private data to the cloud?
Sign up now for a free 3-month trial of IBM Analytics for Apache Spark and IBM Cloudant, IBM dashDB or IBM DB2 on Cloud.
http://ibm.co/ibm-cloudant-trial
http://ibm.co/ibm-dashdb-trial
http://ibm.co/ibm-db2-trial
http://ibm.co/ibm-spark-trial
Tobiasz Janusz Koprowski presented a beginner's guide to tips and tricks for using Windows Azure SQL Database. The presentation covered key Azure SQL Database concepts like database tiers, performance levels measured in Database Transaction Units (DTUs), data migration options, and compatibility with on-premises SQL Server versions. It provided an overview of supported and non-supported features between SQL Azure and different SQL Server versions. The presentation aimed to help attendees understand how to plan, configure and manage databases in the Azure SQL Database platform.
Serverless SQL provides a serverless analytics platform that allows users to analyze data stored in object storage without having to manage infrastructure. Key features include seamless elasticity, pay-per-query consumption, and the ability to analyze data directly in object storage without having to move it. The platform includes serverless storage, data ingest, data transformation, analytics, and automation capabilities. It aims to create a sharing economy for analytics by allowing various users like developers, data engineers, and analysts flexible access to data and analytics.
Delivering rapid-fire Analytics with Snowflake and TableauHarald Erb
Until recently, advancements in data warehousing and analytics were largely incremental. Small innovations in database design would herald a new data warehouse every
2-3 years, which would quickly become overwhelmed with rapidly increasing data volumes. Knowledge workers struggled to access those databases with development intensive BI tools designed for reporting, rather than exploration and sharing. Both databases and BI tools were strained in locally hosted environments that were inflexible to growth or change.
Snowflake and Tableau represent a fundamentally different approach. Snowflake’s multi-cluster shared data architecture was designed for the cloud and to handle logarithmically larger data volumes at blazing speed. Tableau was made to foster an interactive approach to analytics, freeing knowledge workers to use the speed of Snowflake to their greatest advantage.
Building cloud native data microserviceNilanjan Roy
Spring Cloud Data Flow is a unified, distributed, and extensible system for data ingestion, real time analytics, batch processing, and data export that simplifies development of big data applications. It applies a microservice pattern to data processing and provides typical benefits like scalability, isolation, agility, and continuous deployment. It uses Spring Cloud Stream for event-driven microservices and Spring Cloud Task for short-lived batch processes. Developers can also register and deploy their own applications with Spring Cloud Data Flow.
Designing and Implementing a cloud-hosted SaaS for data movement and Sharing ...Haripds Shrestha
SlapOS (Simple Language for Accounting and Provisioning Operating System) aim
to hide the complexity of IT infrastructures Software deployment from users.
Through a software as a service (SaaS) solution, users can request and install automatically
any data movement and sharing tools like Stork and Bitdew without any intervention of a system administrator.
Microsoft released SQL Azure more than two years ago - that's enough time for testing (I hope!). So, are you ready to move your data to the Cloud? If you’re considering a business (i.e. a production environment) in the Cloud, you need to think about methods for backing up your data, a backup plan for your data and, eventually, restoring with Red Gate Cloud Services (and not only). In this session, you’ll see the differences, functionality, restrictions, and opportunities in SQL Azure and On-Premise SQL Server 2008/2008 R2/2012. We’ll consider topics such as how to be prepared for backup and restore, and which parts of a cloud environment are most important: keys, triggers, indexes, prices, security, service level agreements, etc.
Azure provides several data related services for storing, processing, and analyzing data in the cloud at scale. Key services include Azure SQL Database for relational data, Azure DocumentDB for NoSQL data, Azure Data Warehouse for analytics, Azure Data Lake Store for big data storage, and Azure Storage for binary data. These services provide scalability, high availability, and manageability. Azure SQL Database provides fully managed SQL databases with options for single databases, elastic pools, and geo-replication. Azure Data Warehouse enables petabyte-scale analytics with massively parallel processing.
Demystifying Data Warehouse as a Service (DWaaS)Kent Graziano
This is from the talk I gave at the 30th Anniversary NoCOUG meeting in San Jose, CA.
We all know that data warehouses and best practices for them are changing dramatically today. As organizations build new data warehouses and modernize established ones, they are turning to Data Warehousing as a Service (DWaaS) in hopes of taking advantage of the performance, concurrency, simplicity, and lower cost of a SaaS solution or simply to reduce their data center footprint (and the maintenance that goes with that).
But what is a DWaaS really? How is it different from traditional on-premises data warehousing?
In this talk I will:
• Demystify DWaaS by defining it and its goals
• Discuss the real-world benefits of DWaaS
• Discuss some of the coolest features in a DWaaS solution as exemplified by the Snowflake Elastic Data Warehouse.
Building a Turbo-fast Data Warehousing Platform with DatabricksDatabricks
Traditionally, data warehouse platforms have been perceived as cost prohibitive, challenging to maintain and complex to scale. The combination of Apache Spark and Spark SQL – running on AWS – provides a fast, simple, and scalable way to build a new generation of data warehouses that revolutionizes how data scientists and engineers analyze their data sets.
In this webinar you will learn how Databricks - a fully managed Spark platform hosted on AWS - integrates with variety of different AWS services, Amazon S3, Kinesis, and VPC. We’ll also show you how to build your own data warehousing platform in very short amount of time and how to integrate it with other tools such as Spark’s machine learning library and Spark streaming for real-time processing of your data.
At last Oracle Cloud Services are commercially available.
For deploying, pricing and any consultant about Oracle Cloud Services - feel free to contact us. All our coordinates are in the last slide of the presentation.
Azure Synapse Analytics is Azure SQL Data Warehouse evolved: a limitless analytics service, that brings together enterprise data warehousing and Big Data analytics into a single service. It gives you the freedom to query data on your terms, using either serverless on-demand or provisioned resources, at scale. Azure Synapse brings these two worlds together with a unified experience to ingest, prepare, manage, and serve data for immediate business intelligence and machine learning needs. This is a huge deck with lots of screenshots so you can see exactly how it works.
Whats new in Autonomous Database in 2022Sandesh Rao
This session covers the new features and happenings in the autonomous database world and will help answer more questions DBAs and Developers will have on the Autonomous Database, from provisioning to backups, troubleshooting, tips and tricks, security and HA. This is a good introduction for on-prem DBAs who want to learn how this works quickly without spending too much time on it. Questions like what does the free tier cover, how do I do backup or if it's automated, how do I manage it, how to scale up and down, how to secure their environment, how to use mtls, how to use tools like SQLDeveloper and SQLModeler, performance tuning all in a quick 45-minute session which might take weeks to pick up reading documentation or spanning several presentations
The Basics of Getting Started With Microsoft AzureMicrosoft Azure
The document describes various capabilities provided by Microsoft Azure including hosting virtual machines and web applications, mobile backend services, cloud services, storage options, SQL databases, media services, integration services, identity and access management, virtual networking, and infrastructure as a service. It provides details on virtual machine sizes, disks, networking, security, backups, and cross-premise connectivity in Azure.
Microsoft Sentinel provides cloud-native SIEM and SOAR capabilities powered by AI and automation. It can integrate with various components like servers, cloud servers, network devices, firewalls, and security solutions to provide global visibility of IT security. The implementation includes event analysis, automation of incident response, and creation of dashboards and reports. It also provides log retention, data integrity, fault tolerance, and integration with third-party services and APIs.
The document proposes a data platform modernization project for ABC Corp to migrate its on-premise data warehouse to AWS. Key aspects include setting up a scalable data lake using AWS services like S3, Glue and Redshift. A lake house architecture is proposed with data ingestion, storage, processing and consumption layers. The solution will improve resiliency, support real-time analytics and enable AI/ML workloads. A two-year action plan is outlined along with the technology stack, solution components, quality assurance approach and resource planning.
Azure Identity (AD,ADFS 2.0,AAD,ADB2C,OAuth,OpenID,PingID,AD Custom Policies) ,
Azure PaaS (Azure Functions, Serverless computing, Azure Comsos DB, Webhooks, API Apps, Logic Apps, Kudu, Azure Websites), Azure Functions, Lamda Function, Event Functions, Serverless architecture, Implementing azure functions on GIT HUB comment feature, Why Azure Functions, Azure Virtual Machines, Azure Cloud Services, Azure Web Apps & WebJobs, Service Fabric, Consumption Plans, Billing Model, Benefits of Azure Functions, What is serverless, Implementing bigger solutions into smaller azure functions, Microservices, Use cases, Function App, Implementation storing unstructured data using Azure functions into Cosmos DB, Cosmos DB, Custom Azure functions, Azure Cosmos DB, IOTS, Document DB, Doc DB, How to setup a Jenkins build server and automatically trigger code from Visual studio online,Azure App Service, App service Environment, Azure Stack, Managing Azure App services, Azure Powershell, Azure CLI, REST APIS, Azure Portal, Templates, Kudu Console access, Run GIT Commands on Kudu Console, Locking Azure Resources, Configuring Custom Domains, Adding Extensions to Azure Web App/Websites, App service Deployment options, Data Services in Azure , Azure SQL, Azure SQL server, Azure SQL database vs SQL server in a Azure VM, SQL Tiers, DTU, Data Transactional Unit, Planning & provisioning azure SQL databases,Migrating SQL Databases, Azure SQL Server, SQL server transactional replication, Deploy database to Microsoft Azure Database Wizard, DAC package, DAC, SQL compatibility issues, Migrating SQL with downtime, DMA, Data Migration Assistant, Database Snapshot, Migrating SQL without downtime, DTU, Data Transactional Unit, Recommendations for best performance during SQL Import Process, Transactional Replication, T-SQL, Task to implement what ever you learnt till now,
Similaire à ME_Snowflake_Introduction_for new students.pptx (20)
Best Digital Marketing Strategy Build Your Online Presence 2024.pptxpavankumarpayexelsol
This presentation provides a comprehensive guide to the best digital marketing strategies for 2024, focusing on enhancing your online presence. Key topics include understanding and targeting your audience, building a user-friendly and mobile-responsive website, leveraging the power of social media platforms, optimizing content for search engines, and using email marketing to foster direct engagement. By adopting these strategies, you can increase brand visibility, drive traffic, generate leads, and ultimately boost sales, ensuring your business thrives in the competitive digital landscape.
2. What is Snowflake?
Snowflake is an analytic data warehouse provided as
Software-as-Services(SaaS). Snowflake provides data
warehouse that is faster, easier to use and more
flexible that other traditional data warehouses.
Snowflake data warehouse is not built on existing
databases or not on big data software platform as
Hadoop.
The snowflake data warehouse uses a new SQL
database engine with unique architecture designed
for the cloud.
3. Key Concept
and
Architecture
Data Warehouse as Cloud Service:
Snowflake data warehouse is true SaaS offering.
• There is no hardware (virtual or physical) for you to
select, install, configure and manage.
• There is no software for you install, configure and
manage.
• Ongoing maintenance, management and tuning is
handled by snowflake.
Snowflake completely runs on cloud infrastructure. All the
component of the snowflake service runs on public cloud
infrastructure.
Snowflake uses virtual compute instance for its compute
need and storage service for storage of data. Snowflake
can not be run on private cloud infrastructure(on-primises)
4. Snowflake Architecture
Snowflake architecture is the combination of shared-disk database
architecture and shared-nothing database architecture.
Shared-disk database architecture:
Snowflake uses central repository for data storage that is accessible from all the
compute nodes in the data warehouse.
Sharing-nothing architecture:
Snowflake processes queries using Massive Parallel Processing(MMP) compute
cluster where each nodes in compute cluster stores a portion of the entire data set
locally.
5. Shared-Disk Database Architecture
In this architecture each node can access the data stored in central
repository.
Data Warehouse
Central Repository for data storage
Node Node Node
6. Shared-nothing Database Architecture
• Snowflake processes the queries using MMP compute cluster where
each node in the cluster stores a portion of entire data set locally.
MPP Cluster
Data Data Data
Compute
node
Compute
node
Compute
node
8. Database Layer
When you load the data into snowflake, Snowflake reorganizes that data
into its internal optimized, compressed, columnar format.
Snowflake store this optimized data in cloud storage.
Snowflake manages all aspects of how this data is stored — the
organization, file size, structure, compression, metadata, statistics, and
other aspects of data storage are handled by Snowflake. The data
objects stored by Snowflake are not directly visible nor accessible by
customers; they are only accessible through SQL query operations run
using Snowflake.
9. Query Processing Layer
• Query execution is performed in the processing layer. Snowflake
processes queries using “virtual warehouses”. Each virtual warehouse
is an MPP compute cluster composed of multiple compute nodes
allocated by Snowflake from a cloud provider.
• Each virtual warehouse is an independent compute cluster that does
not share compute resources with other virtual warehouses. As a
result, each virtual warehouse has no impact on the performance of
other virtual warehouses.
10. Cloud Services
The cloud services layer is a collection of services that coordinate activities
across Snowflake. These services tie together all of the different components
of Snowflake in order to process user requests, from login to query dispatch.
The cloud services layer also runs on compute instances provisioned by
Snowflake from the cloud provider.
Among the services in this layer:
• Authentication
• Infrastructure management
• Metadata management
• Query parsing and optimization
• Access control
11. Connecting to Snowflake
Snowflake supports multiple ways of connecting to the service:
• A web-based user interface from which all aspects of managing and using
Snowflake can be accessed.
• Command line clients (e.g. SnowSQL) which can also access all aspects of
managing and using Snowflake.
• ODBC and JDBC drivers that can be used by other applications (e.g. Tableau)
to connect to Snowflake.
• Native connectors (e.g. Python) that can be used to develop applications for
connecting to Snowflake.
• Third-party connectors that can be used to connect applications such as ETL
tools (e.g. Informatica) and BI tools to Snowflake.
12. Cloud Platform
Snowflake provides data warehouse as Software-as-Services(SaaS) that
completely runs on cloud platform.
This means that all three layers of snowflake’s architecture(Database,
Query Processing, Cloud Services) are deployed and managed entirely
on selected cloud platform.
Snowflake account can be hosted on the following cloud platform:
• Amazon Web Services(AWS)
• Microsoft Azure
14. Data Loading
Snowflake supports data loading from files staged in any of the following
locations:
• Internal(snowflake) stage
• Amazon S3
• Microsoft Azure Blob Storage
Snowflake supports both bulk data loading and continuous data
loading(snowpipe). Likewise snowflake supports unloading data from
tables into any of above staging locations.
15. Region and Cloud Platform in Account Names
A hostname for a snowflake account starts with a unique account name and ends
with snowflakecomputing.com.
However, depending on the cloud platform (AWS or Azure) and Snowflake Region
where your account is hosted, the full account name may
require additional segments, as shown in the following diagram:
16. Enterprise Edition
Enterprise Edition provides all the features and services of Premier Edition, with the
following additional features designed specifically for the needs of large-scale
enterprises and organizations:
17. Overview of key Feature
This topic lists the notable/significant features supported in the current
release. Note that it does not list every feature provided by Snowflake.
• Security and Data Protection
• Standard and Extended SQL Support
• Tools and Interfaces
• Connectivity
• Data Import and Export
• Data Sharing
18. Security and Data Protection
• Choose the level of security you require for your Snowflake account, based on your Snowflake Edition.
• Choose the geographical location where your data is stored, based on your Snowflake Region.
• User authentication through standard user/password credentials.
• Enhanced authentication:
• Multi-factor authentication (MFA).
• Federated authentication and single sign-on (SSO) — requires Snowflake Enterprise Edition.
• All communication between clients and the server protected through TLS.
• Deployment inside a cloud platform VPC.
• Isolation of data via Amazon S3 policy controls.
• Support for PHI data (in compliance with HIPAA regulations) — requires Snowflake Enterprise for Sensitive Data (ESD).
• Automatic data encryption by Snowflake using Snowflake-managed keys.
• Object-level access control.
• Snowflake Time Travel (1 day standard for all accounts; additional days, up to 90, allowed with Snowflake Enterprise) for:
• Querying historical data in tables.
• Restoring and cloning historical data in databases, schemas, and tables.
• Snowflake Fail-safe (7 days standard for all accounts) for disaster recovery of historical data.
19. Standard and Extended SQL Support
• Most DDL and DML defined in SQL:1999, including:
• Database and schema DDL.
• Table and view DDL.
• Standard DML such as UPDATE, DELETE, and INSERT.
• DML for bulk data loading/unloading.
• Core data types.
• SET operations.
• CAST functions.
• Advanced DML such as multi-table INSERT, MERGE, and multi-merge.
• Transactions.
• Temporary and transient tables for transitory data.
• Lateral views.
• Statistical aggregate functions.
• Analytical aggregates (Group by cube, rollup, and grouping sets).
• Parts of the SQL:2003 analytic extensions:
• Windowing functions.
• Grouping sets.
• Scalar and tabular user-defined functions (UDFs), with support for both SQL and JavaScript.
• Information Schema for querying object and account metadata, as well as query and warehouse usage history data.
20. Tools and Interface
• Web-based GUI for account and general management, monitoring of
resources and system usage, and querying data.
• SnowSQL (Python-based command line client).
• Virtual warehouse management from the GUI or command line,
including creating, resizing (with zero downtime), suspending, and
dropping warehouses.
21. Connectivity
• Broad ecosystem of supported 3rd-party partners and technologies.
• Support for using free trials to connect to selected partners.
• Extensive set of client connectors and drivers provided by Snowflake:
• Python connector
• Spark connector
• Node.js driver
• Go Snowflake driver
• .NET driver
• JDBC client driver
• ODBC client driver
• dplyr-snowflakedb(open source dplyr package extension maintained on GitHub)
22. Data Import and Export
• Support for bulk loading and unloading data into/out of tables,
including:
• Load any data that uses a supported character encoding.
• Load data from compressed files.
• Load most flat, delimited data files (CSV, TSV, etc.).
• Load data files in JSON, Avro, ORC, Parquet, and XML format.
• Load from S3 data sources and local files using Snowflake web interface or
command line client.
• Support for continuous bulk loading data from files:
• Use Snowpipe to load data in micro-batches from internal stages (i.e. within
Snowflake) or external stages (i.e. in S3 or Azure).
23. Data Sharing (Not used in WILEY)
• Support for sharing data with other Snowflake accounts:
• Provide data to other accounts to consume.
• Consume data provided by other accounts.
24. Overview of the Data Lifecycle
Snowflake provides support for all standard SELECT, DDL, and DML operations across
the lifecycle of data in the system, from organizing and storing data to querying and
working with data, as well as removing data from the system.
• Lifecycle Diagram
• Organizing Data
• Storing Data
• Querying Data
• Working with Data
• Removing Data
25. Lifecycle Diagram
All user data in snowflake is logical represented as tables that can be queried and
modified through standard SQL interface. Each table belongs to schema which in
turn belongs to database.
26. Organizing Data
You can organize your data into databases, schemas, and tables. Snowflake does not
limit the number of databases you can create or the number of schemas you can
create within a database. Snowflake also does not limit the number of tables you can
create in a schema.
For more information, see:
27. Storing Data
You can insert data directly into tables. Snowflake provides DML for loading data into
snowflake tables from external, formatted files.
For more information, see:
28. Querying Data
Once the data stored in table, you can issue SELECT statement to query data.
For more Information, see:
29. Working With Data
Once the data is stored in table, All standard DML operations can be performed on
the data. Snowflake support DDL actions such as cloning entire databases, schemas
and tables.
For more information, see:
30. Removing Data
In addition to using DML command, DELETE, to remove data from table. You can
truncate or drop an entire table. You can also drop entire schemas and databases.
For more information, see:
31. Continuous Data Protection
Continuous Data Protection (CDP) encompasses a comprehensive set of features
that help protect data stored in Snowflake against human error, malicious acts, and
software or hardware failure. At every stage within the data lifecycle, Snowflake
enables your data to be accessible and recoverable in the event of accidental or
intentional modification, removal, or corruption.
32. Connecting to Snowflake
These topics provide an overview of the Snowflake-provided and 3rd-party tools and technologies that
form the ecosystem for connecting to Snowflake. They also provide detailed installation and usage
instructions for using the Snowflake-provided clients, connectors, and drivers.
1. Overview of the Ecosystem
2. Snowflake Partner connect
3. SnowSQL(CLI Client)
4. Snowflake connector for Python
5. Snowflake connector for Spark
6. Node.js Driver
7. Go Snowflake Driver
8. .NET Driver
9. JDBC Driver
10. ODBC Driver
11. Client Considerations
33. Overview of Ecosystem
Snowflake works with a wide array of industry-leading tools and technologies, enabling you to access
Snowflake through an extensive network of connectors, drivers, programming languages, and utilities,
including:
• Snowflake-provided client software: SnowSQL (CLI), Python, Node.js, JDBC, ODBC, etc.
• Certified partners who have developed cloud-based and on-premises solutions for connecting to
Snowflake through our drivers and connectors.
• Other 3rd-party tools and technologies that are known to work with Snowflake.
34. Data Integration
Commonly referred to as ETL, data integration encompasses the following primary
operations:
Extract: Exporting data from specified source.
Transform: Modifying the source data as needed using rules, merges, lookup tables
or other conversion methods to match the target.
Load: Importing the transformed data into target database.
More recent usage references the term ELT, emphasizing that the transformation
part of the process does not necessarily need to be performed before loading,
particularly in systems such as Snowflake that support transformation during or after
loading.
35. Business Intelligence(BI)
Business intelligence (BI) tools enable analyzing, discovering, and
reporting on data to help executives and managers make more informed
business decisions. A key component of any BI tool is the ability to
deliver data visualization through dashboards, charts, and other
graphical output.
Business intelligence also sometimes overlaps with technologies such
as data integration/transformation and advanced analytics; however,
we’ve chosen to list these technologies separately in their own
categories.
36. Advanced Analytics
Also referred to as data science, machine learning (ML), artificial
intelligence (AI), and “Big Data”, advanced analytics covers a broad
category of vendors, tools, and technologies that provide advanced
capabilities for statistical and predictive modeling.
These tools and technologies often share some overlapping features
and functionality with BI tools; however, they are less focused on
analyzing/reporting on past data. Instead, they focus on examining large
data sets to discover patterns and uncover useful business information
that can be used to predict future trends.
37. SnowSQL(CLI Client)
SnowSQL is the next-generation command line client for connecting to Snowflake to
execute SQL queries and perform all DDL and DML operations, including loading
data into and unloading data out of database tables.
Snowflake provides platform-specific versions of SnowSQL for download for the
following platforms:
38. Loading Data into Snowflake
This topic describes concepts related to loading data into Snowflake
tables.
• Bulk Loading Using COPY
• Data Loading Process
• Tasks for Loading Data
• Continuous Loading Using Snowpipe
39. Bulk Loading Using COPY
This section describes bulk data loading into Snowflake tables using the COPY INTO
<table> command. The information is similar regardless if you are loading from data files on your local
file system or in Amazon S3 buckets or Microsoft Azure containers.
Data Loading Process:
• Upload (i.e. stage) one or more data files into either an internal stage (i.e. within Snowflake) or an
external location:
Internal: Use the PUT command to stage the files.
External: Currently, Amazon S3 and Microsoft Azure are the only
services supported for staging external data. Snowflake
assumes the files have already been staged in one of these
locations. If they haven’t been staged already, use the
upload interfaces/utilities provided by the service that
hosts the location.
40. • Use the COPY INTO <table> command to load the contents of the
staged file(s) into a Snowflake database table.
This step requires a running virtual warehouse that is also the current
warehouse (i.e. in use) for the session. The warehouse provides the
compute resources to perform the actual insertion of rows into the
table.
41. Task for Loading Data
• Bulk Loading from a Local File System:
This set of topics describes how to use the COPY command to bulk load from a local
file system Into tables.
As illustrated in the diagram below, loading data from a local file system is performed
in two, separate steps:
Step 1: Upload (i.e. stage) one or more data files to a
Snowflake stage (named internal stage or table/user
stage) using the PUT command.
Step 2: Use the COPY INTO <table> command to load the
contents of the staged file(s) into a Snowflake
database table.