SlideShare a Scribd company logo
1 of 28
Powered by
SQL Server 2019
Big Data Clusters
Rozalina Zaharieva
&
Dimitar Zahariev
SQLServer Big Data Cluster Layout
IoT data
Controller
Cluster
Compute plane
Compute pool Compute pool
SQL Compute
Node
SQL Compute
Node
Compute pool
SQL Compute
Node
SQL Compute
Node
SQL Compute
Node
Control planeSQL Server
Master instance
Storage plane
Directly read
From HDFS
Data pool
SQL Data
Node
SQL Data
Node
Storage Storage
HDFS Data Node
Spark
SQL
Server
Storage pool
Spark
SQL
Server
HDFS Data Node HDFS Data Node
Spark
SQL
Server
Kubernetes pod
External data sources
Microsoft SQL Server
Node
Persistent storage
Node Node Node Node Node Node Node
Analytics
Custom
apps
BI
Architecturedissection
• Kubernetes (K8s) concepts
• SQL Server 2019 big data cluster (BDC) components
Kubernetes concepts
WhatisKubernetesandwhatitdoes?
 Kubernetes is a container orchestrator and is responsible for:
 Run a cluster of hosts
 Schedule containers to run on different hosts
 Facilitate the communication between the containers
 Provide and control access to/from outside world
 Track and optimize the resource usage
 Similar solutions
 Docker Swarm, Mesos Marathon, Amazon ECS, Hashicorp Nomad
K8sarchitectureoverview
kube-proxy
Kubelet
Node1
Pod1
PodN
...
kube-proxy
Kubelet
NodeK
Pod1
PodM
...
Master Node
Scheduler
Controller
api-server
Key-Value Store
Master Node
Scheduler
Controller
api-server
Key-Value Store
Master Node
Scheduler
Controller
api-server
Key-Value Store
MasterNodes
 Responsible for managing the cluster
 Typically more than one is installed
 In HA mode one Master node is the
Leader
 Can be reached via CLI (kubectl),
APIs, or Dashboard
Master Node
Scheduler
Controller
api-server
Key-Value Store
Master Node
Scheduler
Controller
api-server
Key-Value Store
Master Node
Scheduler
Controller
api-server
Key-Value Store
Schedules the work on
different nodes
Takes care of:
1) Control loops
2) Desired state
Performs:
1) Administrative tasks
2) Stores cluster state
etcd is used and it can
be:
1) part of the master
2) installed externally
(Worker)Nodes
 Initially called Minions
 Container runtime
 containerd, rkt, lxd
 Kubelet
 Communicates with master
 Uses CRI shims
 kube-proxy
 Network proxy
Node
kube-proxy Kubelet
Container Runtime
Pod 1
Pod 2
Pods(1)
 Smallest unit of scheduling
 Contains one or more
containers
 Containers share the pod
environment
 Scheduled on nodes
 Created via manifest files
Pod
Main container
Supporting containers
net mount ...
Environment
Pods(2)
 Each pod has unique IP address
 Inter-pod communication is via a pod network
 Intra-pod communication is via localhost and
port
Pod 2
10.10.20.21
Pod network
Pod 1
10.10.20.20
localhost
ReplicationControllers
 Higher level workload
 Looks after pod or set of pods
 Scale up/down pods
 Sets Desired State
Replication Controller
Pod
Deployment
Deployments
 Even higher level workload
 Simplifies updates
and rollbacks
 Declarative and imperative
approach
 Self documenting
 Suitable for versioning
Replication Set
Pod
Services(1)
 Provide reliable network endpoint
 IP address
 DNS name
 Port
 Expose Pods to the outside world
 NodePort (cluster-wide port)
 LoadBalancer (cloud-based)
 Use End Point object to track Pods
IP = 10.10.10.1
DNS = demo-svc
Port = 32000
Service
Pod A IP, Pod B IP, ...
End Point
Node 1
Pod A
10.10.20.21
Node 2
Pod B
10.10.20.22
Services(2)
 Services use label selectors to do their magic
Service
version=v01
app=myapp
Pod
version=v01
app=myapp
Pod
version=v01
app=myapp
Services(2)
Service
version=v01
app=myapp
Pod
version=v01
app=myapp
Pod
version=v02
app=myapp
Pod
version=v02
app=myapp
Pod
version=v01
app=myapp
 Services use label selectors to do their magic
Services(2)
Service
version=v02
app=myapp
Pod
version=v01
app=myapp
Pod
version=v02
app=myapp
Pod
version=v02
app=myapp
Pod
version=v01
app=myapp
 Services use label selectors to do their magic
Services(2)
Service
version=v02
app=myapp
Pod
version=v02
app=myapp
Pod
version=v02
app=myapp
 Services use label selectors to do their magic
SQL Server 2019 big data cluster (BDC)
components
SQLServer2019bigdatacluster
Basenodeconfiguration
Applies to nodes across all planes. Services:
 kubelet – K8s local agent
 kube-proxy – network config and forwarding
 supervisord – process monitor and control
 fluentd – node logging
 flanneld – Software defined network
 collectd – OS and application data collection
SQL Big Data watchdog– config sync, watchdog, data
collector (DMV, etc)
Kubernetes node
watchdog
kubelet
kube-proxy
supervisord
fluentd
flanned
collectd
ControlPlane
External Endpoints:
 Kubernetes (REST)
 Aris Control Service (REST)
 Knox Gateway (REST gateway for Hadoop APIs)
 SQL Server Master (TDS gateway for data marts and
SQL Master Service)
Services:
 etcd
 Kubernetes Master Services Controller
 SQL Master instance
 SQL Big Data Admin Portal
 Knox Gateway
 HDFS Name Service
 YARN Master
 Hive Metastore
 InfluxDB (metrics store)
 Livy (REST interface for Spark)
 Spark Driver
Kubernetes node
Base node services + etcd
K8s Master service
Spark driver
SQL Big Data Admin portal
InfluxDB
Grafana
Kubernetes node
Base node services + etcd
Controller
Proxy
SQL Master
HDFS Name Node
Kibana
Kubernetes node
Base node services + etcd
Livy
Knox
Elastic Search
HIVE Metastore
YARN Master
Controller
 External REST/HTTPS Endpoint
 Bootstrap and Build out
 Manage Capacity
 Configure High Availability and recover from failure (AGs)
Security (authN, authZ, certificate rotation)
 Lifecycle (upgrade/downgrade/rollback)
 Configuration management
 Monitoring - capacity, health, metrics, logs
 Troubleshooting – performance, failures
 Cluster Admin Portal
Controller service
Buildout
Upgrade/Rollback
Add/Remove capacity
Central AuthZ/AutnN
Cluster Admin Portal
Troubleshooting
Controller
Metadata
SQLMasterInstance
 TDS endpoint into the cluster
 High value data
 OLTP server
 Data connectors
 Machine learning & extensibility
 Scalable query engine
Master instance Availability Group
Primary
Readable
Secondary
Readable
Secondary
Computeplane
 Hosts one or more SQL
Compute Pools
 Compute pool is a group of
instances that forms a data,
security, and resource boundary.
 Compute pool processes
complex distributed queries
against the data plane.
 Local storage is used for
shuffling data if necessary.
Compute pool node
Base node services
SQL Engine
Compute pool node
Base node services
SQL Engine
Compute pool node
Base node services
SQL Engine
Compute pool node
Base node services
SQL Engine
Dataplane
Storage pool:
 Data ingestion through Spark (batch and streaming)
 Data storage in HDFS
 Data access through HDFS and SQL endpoints. SQL
engine reads files in HDFS directly
Data pool:
 Partitioned, in-memory cache for external data
 Scale-out data storage for append only data sets
 Data ingestion through Spark
 Provide persistent SQL Server storage for the cluster
Storage pool node
Base node services
SQL Engine
HDFS
Spark
Data pool node
Base node services
SQL Engine
Storage pool node
Base node services
SQL Engine
HDFS
Spark
Installation,configurationsandtools
Installation methods:
• Cloud - platform such as Azure Kubernetes Service (AKS)
• On-premis - VMs, Bare Metal
• Localhost - using minikube (to be used only for training and testing)
Configurations:
• All-in-One Single Node and Different Multi Node Options
Tools:
• mssqlctl, kubectl, Azure Data Studio, SQL Server 2019 extension,
• Azure CLI (for AKS), mssql-cli, sqlcmd, curl
Demonstrations
Powered by

More Related Content

What's hot

Upgrade your SQL Server like a Ninja
Upgrade your SQL Server like a NinjaUpgrade your SQL Server like a Ninja
Upgrade your SQL Server like a NinjaAmit Banerjee
 
SQL Server 2017 Deep Dive - @Ignite 2017
SQL Server 2017 Deep Dive - @Ignite 2017SQL Server 2017 Deep Dive - @Ignite 2017
SQL Server 2017 Deep Dive - @Ignite 2017Travis Wright
 
Modern ETL: Azure Data Factory, Data Lake, and SQL Database
Modern ETL: Azure Data Factory, Data Lake, and SQL DatabaseModern ETL: Azure Data Factory, Data Lake, and SQL Database
Modern ETL: Azure Data Factory, Data Lake, and SQL DatabaseEric Bragas
 
AliCloud Object Storage Service (OSS) Core Features
AliCloud Object Storage Service (OSS) Core FeaturesAliCloud Object Storage Service (OSS) Core Features
AliCloud Object Storage Service (OSS) Core FeaturesAlibaba Cloud
 
Data in Motion: Building Stream-Based Architectures with Qlik Replicate & Kaf...
Data in Motion: Building Stream-Based Architectures with Qlik Replicate & Kaf...Data in Motion: Building Stream-Based Architectures with Qlik Replicate & Kaf...
Data in Motion: Building Stream-Based Architectures with Qlik Replicate & Kaf...HostedbyConfluent
 
Introducing NoSQL and MongoDB to complement Relational Databases (AMIS SIG 14...
Introducing NoSQL and MongoDB to complement Relational Databases (AMIS SIG 14...Introducing NoSQL and MongoDB to complement Relational Databases (AMIS SIG 14...
Introducing NoSQL and MongoDB to complement Relational Databases (AMIS SIG 14...Lucas Jellema
 
Real-Time Machine Learning with Redis, Apache Spark, Tensor Flow, and more wi...
Real-Time Machine Learning with Redis, Apache Spark, Tensor Flow, and more wi...Real-Time Machine Learning with Redis, Apache Spark, Tensor Flow, and more wi...
Real-Time Machine Learning with Redis, Apache Spark, Tensor Flow, and more wi...Databricks
 
Beyond Relational
Beyond RelationalBeyond Relational
Beyond RelationalLynn Langit
 
Netflix's Big Leap from Oracle to Cassandra
Netflix's Big Leap from Oracle to CassandraNetflix's Big Leap from Oracle to Cassandra
Netflix's Big Leap from Oracle to CassandraRoopa Tangirala
 
Microsoft ignite 2018 SQL Server 2019 big data clusters - intro session
Microsoft ignite 2018  SQL Server 2019 big data clusters - intro sessionMicrosoft ignite 2018  SQL Server 2019 big data clusters - intro session
Microsoft ignite 2018 SQL Server 2019 big data clusters - intro sessionTravis Wright
 
Logging infrastructure for Microservices using StreamSets Data Collector
Logging infrastructure for Microservices using StreamSets Data CollectorLogging infrastructure for Microservices using StreamSets Data Collector
Logging infrastructure for Microservices using StreamSets Data CollectorCask Data
 
Organizational compliance and security SQL 2012-2019 by George Walters
Organizational compliance and security SQL 2012-2019 by George WaltersOrganizational compliance and security SQL 2012-2019 by George Walters
Organizational compliance and security SQL 2012-2019 by George WaltersGeorge Walters
 
Big Data Quickstart Series 3: Perform Data Integration
Big Data Quickstart Series 3: Perform Data IntegrationBig Data Quickstart Series 3: Perform Data Integration
Big Data Quickstart Series 3: Perform Data IntegrationAlibaba Cloud
 
Cloud-based Linked Data Management for Self-service Application Development
Cloud-based Linked Data Management for Self-service Application DevelopmentCloud-based Linked Data Management for Self-service Application Development
Cloud-based Linked Data Management for Self-service Application DevelopmentPeter Haase
 
Building a data warehouse with AWS Redshift, Matillion and Yellowfin
Building a data warehouse with AWS Redshift, Matillion and YellowfinBuilding a data warehouse with AWS Redshift, Matillion and Yellowfin
Building a data warehouse with AWS Redshift, Matillion and YellowfinLynn Langit
 
Azure SQL Database
Azure SQL DatabaseAzure SQL Database
Azure SQL Databaserockplace
 
10 Things Learned Releasing Databricks Enterprise Wide
10 Things Learned Releasing Databricks Enterprise Wide10 Things Learned Releasing Databricks Enterprise Wide
10 Things Learned Releasing Databricks Enterprise WideDatabricks
 
Benchmarking Aerospike on the Google Cloud - NoSQL Speed with Ease
Benchmarking Aerospike on the Google Cloud - NoSQL Speed with EaseBenchmarking Aerospike on the Google Cloud - NoSQL Speed with Ease
Benchmarking Aerospike on the Google Cloud - NoSQL Speed with EaseLynn Langit
 
Azure - Data Platform
Azure - Data PlatformAzure - Data Platform
Azure - Data Platformgiventocode
 

What's hot (20)

Upgrade your SQL Server like a Ninja
Upgrade your SQL Server like a NinjaUpgrade your SQL Server like a Ninja
Upgrade your SQL Server like a Ninja
 
SQL Server 2017 Deep Dive - @Ignite 2017
SQL Server 2017 Deep Dive - @Ignite 2017SQL Server 2017 Deep Dive - @Ignite 2017
SQL Server 2017 Deep Dive - @Ignite 2017
 
Modern ETL: Azure Data Factory, Data Lake, and SQL Database
Modern ETL: Azure Data Factory, Data Lake, and SQL DatabaseModern ETL: Azure Data Factory, Data Lake, and SQL Database
Modern ETL: Azure Data Factory, Data Lake, and SQL Database
 
AliCloud Object Storage Service (OSS) Core Features
AliCloud Object Storage Service (OSS) Core FeaturesAliCloud Object Storage Service (OSS) Core Features
AliCloud Object Storage Service (OSS) Core Features
 
Data in Motion: Building Stream-Based Architectures with Qlik Replicate & Kaf...
Data in Motion: Building Stream-Based Architectures with Qlik Replicate & Kaf...Data in Motion: Building Stream-Based Architectures with Qlik Replicate & Kaf...
Data in Motion: Building Stream-Based Architectures with Qlik Replicate & Kaf...
 
Introducing NoSQL and MongoDB to complement Relational Databases (AMIS SIG 14...
Introducing NoSQL and MongoDB to complement Relational Databases (AMIS SIG 14...Introducing NoSQL and MongoDB to complement Relational Databases (AMIS SIG 14...
Introducing NoSQL and MongoDB to complement Relational Databases (AMIS SIG 14...
 
Real-Time Machine Learning with Redis, Apache Spark, Tensor Flow, and more wi...
Real-Time Machine Learning with Redis, Apache Spark, Tensor Flow, and more wi...Real-Time Machine Learning with Redis, Apache Spark, Tensor Flow, and more wi...
Real-Time Machine Learning with Redis, Apache Spark, Tensor Flow, and more wi...
 
Beyond Relational
Beyond RelationalBeyond Relational
Beyond Relational
 
Netflix's Big Leap from Oracle to Cassandra
Netflix's Big Leap from Oracle to CassandraNetflix's Big Leap from Oracle to Cassandra
Netflix's Big Leap from Oracle to Cassandra
 
Microsoft ignite 2018 SQL Server 2019 big data clusters - intro session
Microsoft ignite 2018  SQL Server 2019 big data clusters - intro sessionMicrosoft ignite 2018  SQL Server 2019 big data clusters - intro session
Microsoft ignite 2018 SQL Server 2019 big data clusters - intro session
 
Logging infrastructure for Microservices using StreamSets Data Collector
Logging infrastructure for Microservices using StreamSets Data CollectorLogging infrastructure for Microservices using StreamSets Data Collector
Logging infrastructure for Microservices using StreamSets Data Collector
 
Organizational compliance and security SQL 2012-2019 by George Walters
Organizational compliance and security SQL 2012-2019 by George WaltersOrganizational compliance and security SQL 2012-2019 by George Walters
Organizational compliance and security SQL 2012-2019 by George Walters
 
Big Data Quickstart Series 3: Perform Data Integration
Big Data Quickstart Series 3: Perform Data IntegrationBig Data Quickstart Series 3: Perform Data Integration
Big Data Quickstart Series 3: Perform Data Integration
 
Cloud-based Linked Data Management for Self-service Application Development
Cloud-based Linked Data Management for Self-service Application DevelopmentCloud-based Linked Data Management for Self-service Application Development
Cloud-based Linked Data Management for Self-service Application Development
 
Building a data warehouse with AWS Redshift, Matillion and Yellowfin
Building a data warehouse with AWS Redshift, Matillion and YellowfinBuilding a data warehouse with AWS Redshift, Matillion and Yellowfin
Building a data warehouse with AWS Redshift, Matillion and Yellowfin
 
Azure SQL Database
Azure SQL DatabaseAzure SQL Database
Azure SQL Database
 
Spark
SparkSpark
Spark
 
10 Things Learned Releasing Databricks Enterprise Wide
10 Things Learned Releasing Databricks Enterprise Wide10 Things Learned Releasing Databricks Enterprise Wide
10 Things Learned Releasing Databricks Enterprise Wide
 
Benchmarking Aerospike on the Google Cloud - NoSQL Speed with Ease
Benchmarking Aerospike on the Google Cloud - NoSQL Speed with EaseBenchmarking Aerospike on the Google Cloud - NoSQL Speed with Ease
Benchmarking Aerospike on the Google Cloud - NoSQL Speed with Ease
 
Azure - Data Platform
Azure - Data PlatformAzure - Data Platform
Azure - Data Platform
 

Similar to SQL Server 2019 Big Data Clusters Architecture

The roadmap for sql server 2019
The roadmap for sql server 2019The roadmap for sql server 2019
The roadmap for sql server 2019Javier Villegas
 
Experience sql server on l inux and docker
Experience sql server on l inux and dockerExperience sql server on l inux and docker
Experience sql server on l inux and dockerBob Ward
 
Dragonflow Austin Summit Talk
Dragonflow Austin Summit Talk Dragonflow Austin Summit Talk
Dragonflow Austin Summit Talk Eran Gampel
 
Brk2051 sql server on linux and docker
Brk2051 sql server on linux and dockerBrk2051 sql server on linux and docker
Brk2051 sql server on linux and dockerBob Ward
 
Microsoft ignite 2018 SQL server 2019 big data clusters - deep dive session
Microsoft ignite 2018 SQL server 2019 big data clusters - deep dive sessionMicrosoft ignite 2018 SQL server 2019 big data clusters - deep dive session
Microsoft ignite 2018 SQL server 2019 big data clusters - deep dive sessionTravis Wright
 
Event Streaming Architectures with Confluent and ScyllaDB
Event Streaming Architectures with Confluent and ScyllaDBEvent Streaming Architectures with Confluent and ScyllaDB
Event Streaming Architectures with Confluent and ScyllaDBScyllaDB
 
Migrate or modernize your database applications using Azure SQL Database Mana...
Migrate or modernize your database applications using Azure SQL Database Mana...Migrate or modernize your database applications using Azure SQL Database Mana...
Migrate or modernize your database applications using Azure SQL Database Mana...ALI ANWAR, OCP®
 
Deploying windows containers with kubernetes
Deploying windows containers with kubernetesDeploying windows containers with kubernetes
Deploying windows containers with kubernetesBen Hall
 
Dockercon2015_paypal
Dockercon2015_paypalDockercon2015_paypal
Dockercon2015_paypalahunnargikar
 
Set your Data in Motion with Confluent & Apache Kafka Tech Talk Series LME
Set your Data in Motion with Confluent & Apache Kafka Tech Talk Series LMESet your Data in Motion with Confluent & Apache Kafka Tech Talk Series LME
Set your Data in Motion with Confluent & Apache Kafka Tech Talk Series LMEconfluent
 
Azure Virtual Machines Deployment Scenarios
Azure Virtual Machines Deployment ScenariosAzure Virtual Machines Deployment Scenarios
Azure Virtual Machines Deployment ScenariosBrian Benz
 
TopStack Product Architecture 2013-Q3
TopStack Product Architecture 2013-Q3TopStack Product Architecture 2013-Q3
TopStack Product Architecture 2013-Q3TranscendComputing
 
Best Practice SharePoint Architecture
Best Practice SharePoint ArchitectureBest Practice SharePoint Architecture
Best Practice SharePoint ArchitectureMichael Noel
 
Kubernetes for Docker Developers
Kubernetes for Docker DevelopersKubernetes for Docker Developers
Kubernetes for Docker DevelopersRed Hat Developers
 
CloudStack DC Meetup - Apache CloudStack Overview and 4.1/4.2 Preview
CloudStack DC Meetup - Apache CloudStack Overview and 4.1/4.2 PreviewCloudStack DC Meetup - Apache CloudStack Overview and 4.1/4.2 Preview
CloudStack DC Meetup - Apache CloudStack Overview and 4.1/4.2 PreviewChip Childers
 
StrongLoop Overview
StrongLoop OverviewStrongLoop Overview
StrongLoop OverviewShubhra Kar
 
Private Cloud with Open Stack, Docker
Private Cloud with Open Stack, DockerPrivate Cloud with Open Stack, Docker
Private Cloud with Open Stack, DockerDavinder Kohli
 
Global Azure Bootcamp 2017 - Why I love S2D for MSSQL on Azure
Global Azure Bootcamp 2017 - Why I love S2D for MSSQL on AzureGlobal Azure Bootcamp 2017 - Why I love S2D for MSSQL on Azure
Global Azure Bootcamp 2017 - Why I love S2D for MSSQL on AzureKarim Vaes
 
SQL Server 2019 Modern Data Platform.pptx
SQL Server 2019 Modern Data Platform.pptxSQL Server 2019 Modern Data Platform.pptx
SQL Server 2019 Modern Data Platform.pptxQuyVo27
 
Enabling Microservices Frameworks to Solve Business Problems
Enabling Microservices Frameworks to Solve  Business ProblemsEnabling Microservices Frameworks to Solve  Business Problems
Enabling Microservices Frameworks to Solve Business ProblemsKen Owens
 

Similar to SQL Server 2019 Big Data Clusters Architecture (20)

The roadmap for sql server 2019
The roadmap for sql server 2019The roadmap for sql server 2019
The roadmap for sql server 2019
 
Experience sql server on l inux and docker
Experience sql server on l inux and dockerExperience sql server on l inux and docker
Experience sql server on l inux and docker
 
Dragonflow Austin Summit Talk
Dragonflow Austin Summit Talk Dragonflow Austin Summit Talk
Dragonflow Austin Summit Talk
 
Brk2051 sql server on linux and docker
Brk2051 sql server on linux and dockerBrk2051 sql server on linux and docker
Brk2051 sql server on linux and docker
 
Microsoft ignite 2018 SQL server 2019 big data clusters - deep dive session
Microsoft ignite 2018 SQL server 2019 big data clusters - deep dive sessionMicrosoft ignite 2018 SQL server 2019 big data clusters - deep dive session
Microsoft ignite 2018 SQL server 2019 big data clusters - deep dive session
 
Event Streaming Architectures with Confluent and ScyllaDB
Event Streaming Architectures with Confluent and ScyllaDBEvent Streaming Architectures with Confluent and ScyllaDB
Event Streaming Architectures with Confluent and ScyllaDB
 
Migrate or modernize your database applications using Azure SQL Database Mana...
Migrate or modernize your database applications using Azure SQL Database Mana...Migrate or modernize your database applications using Azure SQL Database Mana...
Migrate or modernize your database applications using Azure SQL Database Mana...
 
Deploying windows containers with kubernetes
Deploying windows containers with kubernetesDeploying windows containers with kubernetes
Deploying windows containers with kubernetes
 
Dockercon2015_paypal
Dockercon2015_paypalDockercon2015_paypal
Dockercon2015_paypal
 
Set your Data in Motion with Confluent & Apache Kafka Tech Talk Series LME
Set your Data in Motion with Confluent & Apache Kafka Tech Talk Series LMESet your Data in Motion with Confluent & Apache Kafka Tech Talk Series LME
Set your Data in Motion with Confluent & Apache Kafka Tech Talk Series LME
 
Azure Virtual Machines Deployment Scenarios
Azure Virtual Machines Deployment ScenariosAzure Virtual Machines Deployment Scenarios
Azure Virtual Machines Deployment Scenarios
 
TopStack Product Architecture 2013-Q3
TopStack Product Architecture 2013-Q3TopStack Product Architecture 2013-Q3
TopStack Product Architecture 2013-Q3
 
Best Practice SharePoint Architecture
Best Practice SharePoint ArchitectureBest Practice SharePoint Architecture
Best Practice SharePoint Architecture
 
Kubernetes for Docker Developers
Kubernetes for Docker DevelopersKubernetes for Docker Developers
Kubernetes for Docker Developers
 
CloudStack DC Meetup - Apache CloudStack Overview and 4.1/4.2 Preview
CloudStack DC Meetup - Apache CloudStack Overview and 4.1/4.2 PreviewCloudStack DC Meetup - Apache CloudStack Overview and 4.1/4.2 Preview
CloudStack DC Meetup - Apache CloudStack Overview and 4.1/4.2 Preview
 
StrongLoop Overview
StrongLoop OverviewStrongLoop Overview
StrongLoop Overview
 
Private Cloud with Open Stack, Docker
Private Cloud with Open Stack, DockerPrivate Cloud with Open Stack, Docker
Private Cloud with Open Stack, Docker
 
Global Azure Bootcamp 2017 - Why I love S2D for MSSQL on Azure
Global Azure Bootcamp 2017 - Why I love S2D for MSSQL on AzureGlobal Azure Bootcamp 2017 - Why I love S2D for MSSQL on Azure
Global Azure Bootcamp 2017 - Why I love S2D for MSSQL on Azure
 
SQL Server 2019 Modern Data Platform.pptx
SQL Server 2019 Modern Data Platform.pptxSQL Server 2019 Modern Data Platform.pptx
SQL Server 2019 Modern Data Platform.pptx
 
Enabling Microservices Frameworks to Solve Business Problems
Enabling Microservices Frameworks to Solve  Business ProblemsEnabling Microservices Frameworks to Solve  Business Problems
Enabling Microservices Frameworks to Solve Business Problems
 

More from Ivan Donev

Power bi - enterprise cloud reporting platform Azure Bootcamp 19
Power bi - enterprise cloud reporting platform Azure Bootcamp 19Power bi - enterprise cloud reporting platform Azure Bootcamp 19
Power bi - enterprise cloud reporting platform Azure Bootcamp 19Ivan Donev
 
Tips and tricks to optimiza SQL Server Backup and Restore
Tips and tricks to optimiza SQL Server Backup and RestoreTips and tricks to optimiza SQL Server Backup and Restore
Tips and tricks to optimiza SQL Server Backup and RestoreIvan Donev
 
Get the most out of your Windows Azure VMs
Get the most out of your Windows Azure VMsGet the most out of your Windows Azure VMs
Get the most out of your Windows Azure VMsIvan Donev
 
Develop your database with Visual Studio
Develop your database with Visual StudioDevelop your database with Visual Studio
Develop your database with Visual StudioIvan Donev
 
Windows Azure Bootcamp - Microsoft BI in Azure VMs
Windows Azure Bootcamp - Microsoft BI in Azure VMsWindows Azure Bootcamp - Microsoft BI in Azure VMs
Windows Azure Bootcamp - Microsoft BI in Azure VMsIvan Donev
 
Building your first AS solution
Building your first AS solutionBuilding your first AS solution
Building your first AS solutionIvan Donev
 
Sql server consolidation and virtualization
Sql server consolidation and virtualizationSql server consolidation and virtualization
Sql server consolidation and virtualizationIvan Donev
 
Self-service BI with PowerPivot and PowerView
Self-service BI with PowerPivot and PowerViewSelf-service BI with PowerPivot and PowerView
Self-service BI with PowerPivot and PowerViewIvan Donev
 
Is "the bigger the beter" valid in the database world
Is "the bigger the beter" valid in the database worldIs "the bigger the beter" valid in the database world
Is "the bigger the beter" valid in the database worldIvan Donev
 

More from Ivan Donev (9)

Power bi - enterprise cloud reporting platform Azure Bootcamp 19
Power bi - enterprise cloud reporting platform Azure Bootcamp 19Power bi - enterprise cloud reporting platform Azure Bootcamp 19
Power bi - enterprise cloud reporting platform Azure Bootcamp 19
 
Tips and tricks to optimiza SQL Server Backup and Restore
Tips and tricks to optimiza SQL Server Backup and RestoreTips and tricks to optimiza SQL Server Backup and Restore
Tips and tricks to optimiza SQL Server Backup and Restore
 
Get the most out of your Windows Azure VMs
Get the most out of your Windows Azure VMsGet the most out of your Windows Azure VMs
Get the most out of your Windows Azure VMs
 
Develop your database with Visual Studio
Develop your database with Visual StudioDevelop your database with Visual Studio
Develop your database with Visual Studio
 
Windows Azure Bootcamp - Microsoft BI in Azure VMs
Windows Azure Bootcamp - Microsoft BI in Azure VMsWindows Azure Bootcamp - Microsoft BI in Azure VMs
Windows Azure Bootcamp - Microsoft BI in Azure VMs
 
Building your first AS solution
Building your first AS solutionBuilding your first AS solution
Building your first AS solution
 
Sql server consolidation and virtualization
Sql server consolidation and virtualizationSql server consolidation and virtualization
Sql server consolidation and virtualization
 
Self-service BI with PowerPivot and PowerView
Self-service BI with PowerPivot and PowerViewSelf-service BI with PowerPivot and PowerView
Self-service BI with PowerPivot and PowerView
 
Is "the bigger the beter" valid in the database world
Is "the bigger the beter" valid in the database worldIs "the bigger the beter" valid in the database world
Is "the bigger the beter" valid in the database world
 

Recently uploaded

Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfSeasiaInfotech2
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesZilliz
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 

Recently uploaded (20)

Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdf
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector Databases
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 

SQL Server 2019 Big Data Clusters Architecture

  • 1. Powered by SQL Server 2019 Big Data Clusters Rozalina Zaharieva & Dimitar Zahariev
  • 2. SQLServer Big Data Cluster Layout IoT data Controller Cluster Compute plane Compute pool Compute pool SQL Compute Node SQL Compute Node Compute pool SQL Compute Node SQL Compute Node SQL Compute Node Control planeSQL Server Master instance Storage plane Directly read From HDFS Data pool SQL Data Node SQL Data Node Storage Storage HDFS Data Node Spark SQL Server Storage pool Spark SQL Server HDFS Data Node HDFS Data Node Spark SQL Server Kubernetes pod External data sources Microsoft SQL Server Node Persistent storage Node Node Node Node Node Node Node Analytics Custom apps BI
  • 3. Architecturedissection • Kubernetes (K8s) concepts • SQL Server 2019 big data cluster (BDC) components
  • 5. WhatisKubernetesandwhatitdoes?  Kubernetes is a container orchestrator and is responsible for:  Run a cluster of hosts  Schedule containers to run on different hosts  Facilitate the communication between the containers  Provide and control access to/from outside world  Track and optimize the resource usage  Similar solutions  Docker Swarm, Mesos Marathon, Amazon ECS, Hashicorp Nomad
  • 6. K8sarchitectureoverview kube-proxy Kubelet Node1 Pod1 PodN ... kube-proxy Kubelet NodeK Pod1 PodM ... Master Node Scheduler Controller api-server Key-Value Store Master Node Scheduler Controller api-server Key-Value Store Master Node Scheduler Controller api-server Key-Value Store
  • 7. MasterNodes  Responsible for managing the cluster  Typically more than one is installed  In HA mode one Master node is the Leader  Can be reached via CLI (kubectl), APIs, or Dashboard Master Node Scheduler Controller api-server Key-Value Store Master Node Scheduler Controller api-server Key-Value Store Master Node Scheduler Controller api-server Key-Value Store Schedules the work on different nodes Takes care of: 1) Control loops 2) Desired state Performs: 1) Administrative tasks 2) Stores cluster state etcd is used and it can be: 1) part of the master 2) installed externally
  • 8. (Worker)Nodes  Initially called Minions  Container runtime  containerd, rkt, lxd  Kubelet  Communicates with master  Uses CRI shims  kube-proxy  Network proxy Node kube-proxy Kubelet Container Runtime Pod 1 Pod 2
  • 9. Pods(1)  Smallest unit of scheduling  Contains one or more containers  Containers share the pod environment  Scheduled on nodes  Created via manifest files Pod Main container Supporting containers net mount ... Environment
  • 10. Pods(2)  Each pod has unique IP address  Inter-pod communication is via a pod network  Intra-pod communication is via localhost and port Pod 2 10.10.20.21 Pod network Pod 1 10.10.20.20 localhost
  • 11. ReplicationControllers  Higher level workload  Looks after pod or set of pods  Scale up/down pods  Sets Desired State Replication Controller Pod
  • 12. Deployment Deployments  Even higher level workload  Simplifies updates and rollbacks  Declarative and imperative approach  Self documenting  Suitable for versioning Replication Set Pod
  • 13. Services(1)  Provide reliable network endpoint  IP address  DNS name  Port  Expose Pods to the outside world  NodePort (cluster-wide port)  LoadBalancer (cloud-based)  Use End Point object to track Pods IP = 10.10.10.1 DNS = demo-svc Port = 32000 Service Pod A IP, Pod B IP, ... End Point Node 1 Pod A 10.10.20.21 Node 2 Pod B 10.10.20.22
  • 14. Services(2)  Services use label selectors to do their magic Service version=v01 app=myapp Pod version=v01 app=myapp Pod version=v01 app=myapp
  • 18. SQL Server 2019 big data cluster (BDC) components
  • 20. Basenodeconfiguration Applies to nodes across all planes. Services:  kubelet – K8s local agent  kube-proxy – network config and forwarding  supervisord – process monitor and control  fluentd – node logging  flanneld – Software defined network  collectd – OS and application data collection SQL Big Data watchdog– config sync, watchdog, data collector (DMV, etc) Kubernetes node watchdog kubelet kube-proxy supervisord fluentd flanned collectd
  • 21. ControlPlane External Endpoints:  Kubernetes (REST)  Aris Control Service (REST)  Knox Gateway (REST gateway for Hadoop APIs)  SQL Server Master (TDS gateway for data marts and SQL Master Service) Services:  etcd  Kubernetes Master Services Controller  SQL Master instance  SQL Big Data Admin Portal  Knox Gateway  HDFS Name Service  YARN Master  Hive Metastore  InfluxDB (metrics store)  Livy (REST interface for Spark)  Spark Driver Kubernetes node Base node services + etcd K8s Master service Spark driver SQL Big Data Admin portal InfluxDB Grafana Kubernetes node Base node services + etcd Controller Proxy SQL Master HDFS Name Node Kibana Kubernetes node Base node services + etcd Livy Knox Elastic Search HIVE Metastore YARN Master
  • 22. Controller  External REST/HTTPS Endpoint  Bootstrap and Build out  Manage Capacity  Configure High Availability and recover from failure (AGs) Security (authN, authZ, certificate rotation)  Lifecycle (upgrade/downgrade/rollback)  Configuration management  Monitoring - capacity, health, metrics, logs  Troubleshooting – performance, failures  Cluster Admin Portal Controller service Buildout Upgrade/Rollback Add/Remove capacity Central AuthZ/AutnN Cluster Admin Portal Troubleshooting Controller Metadata
  • 23. SQLMasterInstance  TDS endpoint into the cluster  High value data  OLTP server  Data connectors  Machine learning & extensibility  Scalable query engine Master instance Availability Group Primary Readable Secondary Readable Secondary
  • 24. Computeplane  Hosts one or more SQL Compute Pools  Compute pool is a group of instances that forms a data, security, and resource boundary.  Compute pool processes complex distributed queries against the data plane.  Local storage is used for shuffling data if necessary. Compute pool node Base node services SQL Engine Compute pool node Base node services SQL Engine Compute pool node Base node services SQL Engine Compute pool node Base node services SQL Engine
  • 25. Dataplane Storage pool:  Data ingestion through Spark (batch and streaming)  Data storage in HDFS  Data access through HDFS and SQL endpoints. SQL engine reads files in HDFS directly Data pool:  Partitioned, in-memory cache for external data  Scale-out data storage for append only data sets  Data ingestion through Spark  Provide persistent SQL Server storage for the cluster Storage pool node Base node services SQL Engine HDFS Spark Data pool node Base node services SQL Engine Storage pool node Base node services SQL Engine HDFS Spark
  • 26. Installation,configurationsandtools Installation methods: • Cloud - platform such as Azure Kubernetes Service (AKS) • On-premis - VMs, Bare Metal • Localhost - using minikube (to be used only for training and testing) Configurations: • All-in-One Single Node and Different Multi Node Options Tools: • mssqlctl, kubectl, Azure Data Studio, SQL Server 2019 extension, • Azure CLI (for AKS), mssql-cli, sqlcmd, curl