SlideShare une entreprise Scribd logo
1  sur  19
What is AWS Glue?
AWS Glue is a serverless data integration service that makes it easier to discover,
prepare, move, and integrate data from multiple sources for analytics, machine
learning (ML), and application development.
AWS Glue is an event-driven, serverless computing platform provided
by Amazon as a part of Amazon Web Services. It is a computing service that runs
code in response to events and automatically manages the computing resources
required by that code. It was introduced in August 2017.
The primary purpose of Glue is to scan other services in the same Virtual Private
Cloud (or equivalent accessible network element even if not provided by AWS),
particularly S3. The jobs are billed according to compute time, with a minimum
count of 1 minute. Glue discovers the source data to store associated meta-data
(e.g. the table's schema of field names, types lengths) in the AWS Glue Data
Catalog (which is then accessible via AWS console or APIs).
What is ETL?
Extract — The script will read all the usage data from the S3 bucket to a single data
frame (you can think of a data frame in Pandas)
Transform — Let’s say that the original data contains 10 different logs per second on
average. The analytics team wants the data to be aggregated per each 1 minute with
a specific logic.
Load — Write the processed data back to another S3 bucket for the analytics team.
How AWS Glue work?
AWS Glue can run your extract, transform, and load (ETL) jobs as new data arrives.
For example, you can configure AWS Glue to initiate your ETL jobs to run as soon as
new data becomes available in Amazon Simple Storage Service (S3).
How AWS Glue work?
Choose your preferred data integration engine in AWS Glue to support your users
and workloads.
How AWS Glue work?
You can use the Data Catalog to quickly discover and search multiple AWS datasets
without moving the data. Once the data is cataloged, it is immediately available for
search and query using Amazon Athena, Amazon EMR, and Amazon Redshift
Spectrum.
How AWS Glue work?
AWS Glue Studio makes it easier to visually create, run, and monitor AWS Glue ETL
jobs. You can build ETL jobs that move and transform data using a drag-and-drop
editor, and AWS Glue automatically generates the code.
Use case where AWS Glue fits?
Simplify ETL pipeline development
Remove infrastructure management with automatic provisioning and worker
management, and consolidate all your data integration needs into a single service.
Discover data efficiently
Quickly identify data across multiple AWS datasets, and then make it instantly
available for querying and transforming.
Interactively explore, experiment on, and process data
Using AWS Glue interactive sessions, data engineers can interactively explore and
prepare data using the integrated development environment (IDE) or notebook of
their choice.
Support various processing frameworks and workloads
More easily support various data processing frameworks, such as ETL and ELT, and
various workloads, including batch, micro-batch, and streaming.
What is AWS Glue Studio?
AWS Glue Studio is a graphical interface that makes it easy to create, run, and
monitor data integration jobs in AWS Glue. You can visually compose data
transformation workflows and seamlessly run them on the Apache Spark–based
serverless ETL engine in AWS Glue. For more information, see What is AWS Glue
Studio.
With AWS Glue Studio, you can create and manage jobs that gather, transform, and
clean data. You can also use AWS Glue Studio to troubleshoot and edit job scripts.
AWS Glue features
AWS Glue features fall into three major categories:
•Discover and organize data
•Transform, prepare, and clean data for analysis
•Build and monitor data pipelines
How to Access AWS Glue ?
You can create, view, and manage your AWS Glue jobs using the following interfaces:
•AWS Glue console – Provides a web interface for you to create, view, and manage
your AWS Glue jobs.
•AWS Glue Studio – Provides a graphical interface for you to create and edit your
AWS Glue jobs visually.
•AWS Glue section of the AWS CLI Reference – Provides AWS CLI commands that
you can use with AWS Glue.
•AWS Glue API – Provides a complete API reference for developers.
AWS Glue console
You use the AWS Glue console to define and orchestrate your ETL workflow. The
console calls several API operations in the AWS Glue Data Catalog and AWS Glue
Jobs system to perform the following tasks:
•Define AWS Glue objects such as jobs, tables, crawlers, and connections.
•Schedule when crawlers run.
•Define events or schedules for job triggers.
•Search and filter lists of AWS Glue objects.
•Edit transformation scripts.
AWS Glue
Studio
AWS Glue Studio is a new
graphical interface that makes it
easy to create, run, and monitor
extract, transform, and load
(ETL) jobs in AWS Glue. You
can visually compose data
transformation workflows and
seamlessly run them on AWS
Glue’s Apache Spark-based
serverless ETL engine.
Streaming ETL in AWS Glue
AWS Glue enables you to perform ETL operations on streaming data using
continuously-running jobs. AWS Glue streaming ETL is built on the Apache Spark
Structured Streaming engine, and can ingest streams from Amazon Kinesis Data
Streams, Apache Kafka, and Amazon Managed Streaming for Apache Kafka
(Amazon MSK). Streaming ETL can clean and transform streaming data and load it
into Amazon S3 or JDBC data stores. Use Streaming ETL in AWS Glue to process
event data like IoT streams, clickstreams, and network logs.
The AWS Glue jobs system
The AWS Glue Jobs system provides managed infrastructure to orchestrate your ETL
workflow. You can create jobs in AWS Glue that automate the scripts you use to
extract, transform, and transfer data to different locations. Jobs can be scheduled and
chained, or they can be triggered by events such as the arrival of new data.
Serverless ETL jobs run in isolation
AWS Glue runs your ETL jobs in an Apache Spark serverless environment. AWS
Glue runs these jobs on virtual resources that it provisions and manages in its own
service account.
AWS Glue is designed to do the following:
•Segregate customer data.
•Protect customer data in transit and at rest.
•Access customer data only as needed in response to customer requests, using
temporary, scoped-down credentials, or with a customer's consent to IAM roles in
their account.
Data sources and destinations
AWS Glue allows you to read and write data from multiple systems and databases
including:
•Amazon S3
•Amazon DynamoDB
•Amazon Redshift
•Amazon Relational Database Service (Amazon RDS)
•Third-party JDBC-accessible databases
•MongoDB and Amazon DocumentDB (with MongoDB compatibility)
•Other marketplace connectors and Apache Spark plugins
Data streams
AWS Glue can stream data from the following systems:
•Amazon Kinesis Data Streams
•Apache Kafka
AWS Glue is available in several AWS Regions.
Components of AWS Glue
•Data catalog: The data catalog holds the metadata and the structure of the data.
•Database: It is used to create or access the database for the sources and targets.
•Table: Create one or more tables in the database that can be used by the source
and target.
•Crawler and Classifier: A crawler is used to retrieve data from the source using
built-in or custom classifiers. It creates/uses metadata tables that are pre-defined in
the data catalog.
•Job: A job is business logic that carries out an ETL task. Internally, Apache Spark
with python or scala language writes this business logic.
•Trigger: A trigger starts the ETL job execution on-demand or at a specific time.
•Development endpoint: It creates a development environment where the ETL job
script can be tested, developed, and debugged.
Summary
AWS Glue makes it easy to integrate data across your architecture. It integrates with
AWS analytics services and Amazon S3 data lakes. AWS Glue has integration
interfaces and job-authoring tools that are easy to use for all users, from developers
to business users, with tailored solutions for varied technical skill sets.
With the ability to scale on demand, AWS Glue helps you focus on high-value
activities that maximize the value of your data. It scales for any data size, and
supports all data types and schema variances. To increase agility and optimize costs,
AWS Glue provides built-in high availability and pay-as-you-go billing.
AWS Glue consolidates major data integration capabilities into a single service.
These include data discovery, modern ETL, cleansing, transforming, and centralized
cataloging. It's also serverless, which means there's no infrastructure to manage.
With flexible support for all workloads like ETL, ELT, and streaming in one service,
AWS Glue supports users across various workloads and types of users.
THANK YOU
Like the Video and Subscribe the Channel

Contenu connexe

Tendances

Building Serverless ETL Pipelines with AWS Glue
Building Serverless ETL Pipelines with AWS GlueBuilding Serverless ETL Pipelines with AWS Glue
Building Serverless ETL Pipelines with AWS GlueAmazon Web Services
 
AWS Lake Formation을 통한 손쉬운 데이터 레이크 구성 및 관리 - 윤석찬 :: AWS Unboxing 온라인 세미나
AWS Lake Formation을 통한 손쉬운 데이터 레이크 구성 및 관리 - 윤석찬 :: AWS Unboxing 온라인 세미나AWS Lake Formation을 통한 손쉬운 데이터 레이크 구성 및 관리 - 윤석찬 :: AWS Unboxing 온라인 세미나
AWS Lake Formation을 통한 손쉬운 데이터 레이크 구성 및 관리 - 윤석찬 :: AWS Unboxing 온라인 세미나Amazon Web Services Korea
 
Azure Synapse Analytics Overview (r2)
Azure Synapse Analytics Overview (r2)Azure Synapse Analytics Overview (r2)
Azure Synapse Analytics Overview (r2)James Serra
 
Building Data Pipelines on AWS
Building Data Pipelines on AWSBuilding Data Pipelines on AWS
Building Data Pipelines on AWSrudolf eremyan
 
Best Practices for Building Your Data Lake on AWS
Best Practices for Building Your Data Lake on AWSBest Practices for Building Your Data Lake on AWS
Best Practices for Building Your Data Lake on AWSAmazon Web Services
 
Serverless data and analytics on AWS for operations
Serverless data and analytics on AWS for operations Serverless data and analytics on AWS for operations
Serverless data and analytics on AWS for operations CloudHesive
 
ABD315_Serverless ETL with AWS Glue
ABD315_Serverless ETL with AWS GlueABD315_Serverless ETL with AWS Glue
ABD315_Serverless ETL with AWS GlueAmazon Web Services
 
Azure Synapse Analytics Overview (r1)
Azure Synapse Analytics Overview (r1)Azure Synapse Analytics Overview (r1)
Azure Synapse Analytics Overview (r1)James Serra
 
Building Advanced Analytics Pipelines with Azure Databricks
Building Advanced Analytics Pipelines with Azure DatabricksBuilding Advanced Analytics Pipelines with Azure Databricks
Building Advanced Analytics Pipelines with Azure DatabricksLace Lofranco
 
Amazon Athena Capabilities and Use Cases Overview
Amazon Athena Capabilities and Use Cases Overview Amazon Athena Capabilities and Use Cases Overview
Amazon Athena Capabilities and Use Cases Overview Amazon Web Services
 
Transparency and Control with AWS CloudTrail and AWS Config
Transparency and Control with AWS CloudTrail and AWS ConfigTransparency and Control with AWS CloudTrail and AWS Config
Transparency and Control with AWS CloudTrail and AWS ConfigAmazon Web Services
 
Building Serverless ETL Pipelines
Building Serverless ETL PipelinesBuilding Serverless ETL Pipelines
Building Serverless ETL PipelinesAmazon Web Services
 
Azure Data Factory
Azure Data FactoryAzure Data Factory
Azure Data FactoryHARIHARAN R
 
Amazon Redshift의 이해와 활용 (김용우) - AWS DB Day
Amazon Redshift의 이해와 활용 (김용우) - AWS DB DayAmazon Redshift의 이해와 활용 (김용우) - AWS DB Day
Amazon Redshift의 이해와 활용 (김용우) - AWS DB DayAmazon Web Services Korea
 
Azure data platform overview
Azure data platform overviewAzure data platform overview
Azure data platform overviewJames Serra
 

Tendances (20)

Building Serverless ETL Pipelines with AWS Glue
Building Serverless ETL Pipelines with AWS GlueBuilding Serverless ETL Pipelines with AWS Glue
Building Serverless ETL Pipelines with AWS Glue
 
AWS Lake Formation을 통한 손쉬운 데이터 레이크 구성 및 관리 - 윤석찬 :: AWS Unboxing 온라인 세미나
AWS Lake Formation을 통한 손쉬운 데이터 레이크 구성 및 관리 - 윤석찬 :: AWS Unboxing 온라인 세미나AWS Lake Formation을 통한 손쉬운 데이터 레이크 구성 및 관리 - 윤석찬 :: AWS Unboxing 온라인 세미나
AWS Lake Formation을 통한 손쉬운 데이터 레이크 구성 및 관리 - 윤석찬 :: AWS Unboxing 온라인 세미나
 
Azure Synapse Analytics Overview (r2)
Azure Synapse Analytics Overview (r2)Azure Synapse Analytics Overview (r2)
Azure Synapse Analytics Overview (r2)
 
Introduction to Amazon Athena
Introduction to Amazon AthenaIntroduction to Amazon Athena
Introduction to Amazon Athena
 
Building Data Pipelines on AWS
Building Data Pipelines on AWSBuilding Data Pipelines on AWS
Building Data Pipelines on AWS
 
Best Practices for Building Your Data Lake on AWS
Best Practices for Building Your Data Lake on AWSBest Practices for Building Your Data Lake on AWS
Best Practices for Building Your Data Lake on AWS
 
Building-a-Data-Lake-on-AWS
Building-a-Data-Lake-on-AWSBuilding-a-Data-Lake-on-AWS
Building-a-Data-Lake-on-AWS
 
Serverless data and analytics on AWS for operations
Serverless data and analytics on AWS for operations Serverless data and analytics on AWS for operations
Serverless data and analytics on AWS for operations
 
ABD315_Serverless ETL with AWS Glue
ABD315_Serverless ETL with AWS GlueABD315_Serverless ETL with AWS Glue
ABD315_Serverless ETL with AWS Glue
 
Azure Synapse Analytics Overview (r1)
Azure Synapse Analytics Overview (r1)Azure Synapse Analytics Overview (r1)
Azure Synapse Analytics Overview (r1)
 
Building Advanced Analytics Pipelines with Azure Databricks
Building Advanced Analytics Pipelines with Azure DatabricksBuilding Advanced Analytics Pipelines with Azure Databricks
Building Advanced Analytics Pipelines with Azure Databricks
 
Amazon Athena Capabilities and Use Cases Overview
Amazon Athena Capabilities and Use Cases Overview Amazon Athena Capabilities and Use Cases Overview
Amazon Athena Capabilities and Use Cases Overview
 
Masterclass Live: Amazon EMR
Masterclass Live: Amazon EMRMasterclass Live: Amazon EMR
Masterclass Live: Amazon EMR
 
Transparency and Control with AWS CloudTrail and AWS Config
Transparency and Control with AWS CloudTrail and AWS ConfigTransparency and Control with AWS CloudTrail and AWS Config
Transparency and Control with AWS CloudTrail and AWS Config
 
AWS Tagging Strategy
AWS Tagging StrategyAWS Tagging Strategy
AWS Tagging Strategy
 
Building Serverless ETL Pipelines
Building Serverless ETL PipelinesBuilding Serverless ETL Pipelines
Building Serverless ETL Pipelines
 
Azure Data Factory
Azure Data FactoryAzure Data Factory
Azure Data Factory
 
Amazon Redshift의 이해와 활용 (김용우) - AWS DB Day
Amazon Redshift의 이해와 활용 (김용우) - AWS DB DayAmazon Redshift의 이해와 활용 (김용우) - AWS DB Day
Amazon Redshift의 이해와 활용 (김용우) - AWS DB Day
 
Azure data platform overview
Azure data platform overviewAzure data platform overview
Azure data platform overview
 
Building a Data Lake on AWS
Building a Data Lake on AWSBuilding a Data Lake on AWS
Building a Data Lake on AWS
 

Similaire à What is AWS Glue

Tackle Your Dark Data Challenge with AWS Glue - AWS Online Tech Talks
Tackle Your Dark Data  Challenge with AWS Glue - AWS Online Tech TalksTackle Your Dark Data  Challenge with AWS Glue - AWS Online Tech Talks
Tackle Your Dark Data Challenge with AWS Glue - AWS Online Tech TalksAmazon Web Services
 
AWS Certified Solutions Architect Professional Course S15-S18
AWS Certified Solutions Architect Professional Course S15-S18AWS Certified Solutions Architect Professional Course S15-S18
AWS Certified Solutions Architect Professional Course S15-S18Neal Davis
 
Serverless Data Lake on AWS
Serverless Data Lake on AWSServerless Data Lake on AWS
Serverless Data Lake on AWSThanh Nguyen
 
Analytics in the Cloud
Analytics in the CloudAnalytics in the Cloud
Analytics in the CloudRoss McNeely
 
Serverless Big Data Architectures: Serverless Data Analytics
Serverless Big Data Architectures: Serverless Data AnalyticsServerless Big Data Architectures: Serverless Data Analytics
Serverless Big Data Architectures: Serverless Data AnalyticsKristana Kane
 
AWS Data Engineer Training | AWS Data Engineering Online Training
AWS Data Engineer Training | AWS Data Engineering Online TrainingAWS Data Engineer Training | AWS Data Engineering Online Training
AWS Data Engineer Training | AWS Data Engineering Online Trainingakhilariga
 
Deep dive into cloud security - Jaimin Gohel & Virendra Rathore
Deep dive into cloud security - Jaimin Gohel & Virendra RathoreDeep dive into cloud security - Jaimin Gohel & Virendra Rathore
Deep dive into cloud security - Jaimin Gohel & Virendra RathoreNSConclave
 
AWS re:Invent 2016: How to Build a Big Data Analytics Data Lake (LFS303)
AWS re:Invent 2016: How to Build a Big Data Analytics Data Lake (LFS303)AWS re:Invent 2016: How to Build a Big Data Analytics Data Lake (LFS303)
AWS re:Invent 2016: How to Build a Big Data Analytics Data Lake (LFS303)Amazon Web Services
 
Data & AI Platforms — Open Source Vs Managed Services (AWS vs Azure vs GCP)
Data & AI Platforms — Open Source Vs Managed Services (AWS vs Azure vs GCP)Data & AI Platforms — Open Source Vs Managed Services (AWS vs Azure vs GCP)
Data & AI Platforms — Open Source Vs Managed Services (AWS vs Azure vs GCP)Ankit Rathi
 
Building Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWSBuilding Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWSAmazon Web Services
 
AWS CLOUD 2017 - Amazon Athena 및 Glue를 통한 빠른 데이터 질의 및 처리 기능 소개 (김상필 솔루션즈 아키텍트)
AWS CLOUD 2017 - Amazon Athena 및 Glue를 통한 빠른 데이터 질의 및 처리 기능 소개 (김상필 솔루션즈 아키텍트)AWS CLOUD 2017 - Amazon Athena 및 Glue를 통한 빠른 데이터 질의 및 처리 기능 소개 (김상필 솔루션즈 아키텍트)
AWS CLOUD 2017 - Amazon Athena 및 Glue를 통한 빠른 데이터 질의 및 처리 기능 소개 (김상필 솔루션즈 아키텍트)Amazon Web Services Korea
 
AWS Cloud Practitioner.PDF
AWS Cloud Practitioner.PDFAWS Cloud Practitioner.PDF
AWS Cloud Practitioner.PDFssuser82123d
 
Azure fundamental -Introduction
Azure fundamental -IntroductionAzure fundamental -Introduction
Azure fundamental -IntroductionManishK55
 
Introduction to Azure Databricks
Introduction to Azure DatabricksIntroduction to Azure Databricks
Introduction to Azure DatabricksJames Serra
 
Big data and serverless - AWS UG The Netherlands
Big data and serverless - AWS UG The NetherlandsBig data and serverless - AWS UG The Netherlands
Big data and serverless - AWS UG The NetherlandsMarek Kuczynski
 

Similaire à What is AWS Glue (20)

Tackle Your Dark Data Challenge with AWS Glue - AWS Online Tech Talks
Tackle Your Dark Data  Challenge with AWS Glue - AWS Online Tech TalksTackle Your Dark Data  Challenge with AWS Glue - AWS Online Tech Talks
Tackle Your Dark Data Challenge with AWS Glue - AWS Online Tech Talks
 
AWS Certified Solutions Architect Professional Course S15-S18
AWS Certified Solutions Architect Professional Course S15-S18AWS Certified Solutions Architect Professional Course S15-S18
AWS Certified Solutions Architect Professional Course S15-S18
 
Reinvent recap
Reinvent recapReinvent recap
Reinvent recap
 
AWS Big Data Landscape
AWS Big Data LandscapeAWS Big Data Landscape
AWS Big Data Landscape
 
Serverless Data Lake on AWS
Serverless Data Lake on AWSServerless Data Lake on AWS
Serverless Data Lake on AWS
 
Analytics in the Cloud
Analytics in the CloudAnalytics in the Cloud
Analytics in the Cloud
 
Serverless Big Data Architectures: Serverless Data Analytics
Serverless Big Data Architectures: Serverless Data AnalyticsServerless Big Data Architectures: Serverless Data Analytics
Serverless Big Data Architectures: Serverless Data Analytics
 
AWS Data Engineer Training | AWS Data Engineering Online Training
AWS Data Engineer Training | AWS Data Engineering Online TrainingAWS Data Engineer Training | AWS Data Engineering Online Training
AWS Data Engineer Training | AWS Data Engineering Online Training
 
Deep dive into cloud security - Jaimin Gohel & Virendra Rathore
Deep dive into cloud security - Jaimin Gohel & Virendra RathoreDeep dive into cloud security - Jaimin Gohel & Virendra Rathore
Deep dive into cloud security - Jaimin Gohel & Virendra Rathore
 
AWS re:Invent 2016: How to Build a Big Data Analytics Data Lake (LFS303)
AWS re:Invent 2016: How to Build a Big Data Analytics Data Lake (LFS303)AWS re:Invent 2016: How to Build a Big Data Analytics Data Lake (LFS303)
AWS re:Invent 2016: How to Build a Big Data Analytics Data Lake (LFS303)
 
Data & AI Platforms — Open Source Vs Managed Services (AWS vs Azure vs GCP)
Data & AI Platforms — Open Source Vs Managed Services (AWS vs Azure vs GCP)Data & AI Platforms — Open Source Vs Managed Services (AWS vs Azure vs GCP)
Data & AI Platforms — Open Source Vs Managed Services (AWS vs Azure vs GCP)
 
Building Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWSBuilding Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWS
 
UCT AWS_IOT
UCT AWS_IOTUCT AWS_IOT
UCT AWS_IOT
 
AWS CLOUD 2017 - Amazon Athena 및 Glue를 통한 빠른 데이터 질의 및 처리 기능 소개 (김상필 솔루션즈 아키텍트)
AWS CLOUD 2017 - Amazon Athena 및 Glue를 통한 빠른 데이터 질의 및 처리 기능 소개 (김상필 솔루션즈 아키텍트)AWS CLOUD 2017 - Amazon Athena 및 Glue를 통한 빠른 데이터 질의 및 처리 기능 소개 (김상필 솔루션즈 아키텍트)
AWS CLOUD 2017 - Amazon Athena 및 Glue를 통한 빠른 데이터 질의 및 처리 기능 소개 (김상필 솔루션즈 아키텍트)
 
AWS Cloud Practitioner.PDF
AWS Cloud Practitioner.PDFAWS Cloud Practitioner.PDF
AWS Cloud Practitioner.PDF
 
Azure fundamental -Introduction
Azure fundamental -IntroductionAzure fundamental -Introduction
Azure fundamental -Introduction
 
Big Data on AWS
Big Data on AWSBig Data on AWS
Big Data on AWS
 
UNIT -IV.docx
UNIT -IV.docxUNIT -IV.docx
UNIT -IV.docx
 
Introduction to Azure Databricks
Introduction to Azure DatabricksIntroduction to Azure Databricks
Introduction to Azure Databricks
 
Big data and serverless - AWS UG The Netherlands
Big data and serverless - AWS UG The NetherlandsBig data and serverless - AWS UG The Netherlands
Big data and serverless - AWS UG The Netherlands
 

Plus de jeetendra mandal

Eventual consistency vs Strong consistency what is the difference
Eventual consistency vs Strong consistency what is the differenceEventual consistency vs Strong consistency what is the difference
Eventual consistency vs Strong consistency what is the differencejeetendra mandal
 
Batch Processing vs Stream Processing Difference
Batch Processing vs Stream Processing DifferenceBatch Processing vs Stream Processing Difference
Batch Processing vs Stream Processing Differencejeetendra mandal
 
Difference between Database vs Data Warehouse vs Data Lake
Difference between Database vs Data Warehouse vs Data LakeDifference between Database vs Data Warehouse vs Data Lake
Difference between Database vs Data Warehouse vs Data Lakejeetendra mandal
 
Difference between Client Polling vs Server Push vs Websocket vs Long Polling
Difference between Client Polling vs Server Push vs Websocket vs Long PollingDifference between Client Polling vs Server Push vs Websocket vs Long Polling
Difference between Client Polling vs Server Push vs Websocket vs Long Pollingjeetendra mandal
 
Difference between TLS 1.2 vs TLS 1.3 and tutorial of TLS2 and TLS2 version c...
Difference between TLS 1.2 vs TLS 1.3 and tutorial of TLS2 and TLS2 version c...Difference between TLS 1.2 vs TLS 1.3 and tutorial of TLS2 and TLS2 version c...
Difference between TLS 1.2 vs TLS 1.3 and tutorial of TLS2 and TLS2 version c...jeetendra mandal
 
Difference Program vs Process vs Thread
Difference Program vs Process vs ThreadDifference Program vs Process vs Thread
Difference Program vs Process vs Threadjeetendra mandal
 
Carrier Advice for a JAVA Developer How to Become a Java Programmer
Carrier Advice for a JAVA Developer How to Become a Java ProgrammerCarrier Advice for a JAVA Developer How to Become a Java Programmer
Carrier Advice for a JAVA Developer How to Become a Java Programmerjeetendra mandal
 
How to become a Software Tester Carrier Path for Software Quality Tester
How to become a Software Tester Carrier Path for Software Quality TesterHow to become a Software Tester Carrier Path for Software Quality Tester
How to become a Software Tester Carrier Path for Software Quality Testerjeetendra mandal
 
How to become a Software Engineer Carrier Path for Software Developer
How to become a Software Engineer Carrier Path for Software DeveloperHow to become a Software Engineer Carrier Path for Software Developer
How to become a Software Engineer Carrier Path for Software Developerjeetendra mandal
 
Microservice Architecture Software Architecture Microservice Design Pattern
Microservice Architecture Software Architecture Microservice Design PatternMicroservice Architecture Software Architecture Microservice Design Pattern
Microservice Architecture Software Architecture Microservice Design Patternjeetendra mandal
 
Event Driven Software Architecture Pattern
Event Driven Software Architecture PatternEvent Driven Software Architecture Pattern
Event Driven Software Architecture Patternjeetendra mandal
 
Top 5 Software Architecture Pattern Event Driven SOA Microservice Serverless ...
Top 5 Software Architecture Pattern Event Driven SOA Microservice Serverless ...Top 5 Software Architecture Pattern Event Driven SOA Microservice Serverless ...
Top 5 Software Architecture Pattern Event Driven SOA Microservice Serverless ...jeetendra mandal
 
Observability vs APM vs Monitoring Comparison
Observability vs APM vs  Monitoring ComparisonObservability vs APM vs  Monitoring Comparison
Observability vs APM vs Monitoring Comparisonjeetendra mandal
 
Disaster Recovery vs Data Backup what is the difference
Disaster Recovery vs Data Backup what is the differenceDisaster Recovery vs Data Backup what is the difference
Disaster Recovery vs Data Backup what is the differencejeetendra mandal
 
What is Spinnaker? Spinnaker tutorial
What is Spinnaker? Spinnaker tutorialWhat is Spinnaker? Spinnaker tutorial
What is Spinnaker? Spinnaker tutorialjeetendra mandal
 
Difference between Github vs Gitlab vs Bitbucket
Difference between Github vs Gitlab vs BitbucketDifference between Github vs Gitlab vs Bitbucket
Difference between Github vs Gitlab vs Bitbucketjeetendra mandal
 

Plus de jeetendra mandal (20)

what is OSI model
what is OSI modelwhat is OSI model
what is OSI model
 
What is AWS Cloud Watch
What is AWS Cloud WatchWhat is AWS Cloud Watch
What is AWS Cloud Watch
 
What is AWS Fargate
What is AWS FargateWhat is AWS Fargate
What is AWS Fargate
 
Eventual consistency vs Strong consistency what is the difference
Eventual consistency vs Strong consistency what is the differenceEventual consistency vs Strong consistency what is the difference
Eventual consistency vs Strong consistency what is the difference
 
Batch Processing vs Stream Processing Difference
Batch Processing vs Stream Processing DifferenceBatch Processing vs Stream Processing Difference
Batch Processing vs Stream Processing Difference
 
Difference between Database vs Data Warehouse vs Data Lake
Difference between Database vs Data Warehouse vs Data LakeDifference between Database vs Data Warehouse vs Data Lake
Difference between Database vs Data Warehouse vs Data Lake
 
Difference between Client Polling vs Server Push vs Websocket vs Long Polling
Difference between Client Polling vs Server Push vs Websocket vs Long PollingDifference between Client Polling vs Server Push vs Websocket vs Long Polling
Difference between Client Polling vs Server Push vs Websocket vs Long Polling
 
Difference between TLS 1.2 vs TLS 1.3 and tutorial of TLS2 and TLS2 version c...
Difference between TLS 1.2 vs TLS 1.3 and tutorial of TLS2 and TLS2 version c...Difference between TLS 1.2 vs TLS 1.3 and tutorial of TLS2 and TLS2 version c...
Difference between TLS 1.2 vs TLS 1.3 and tutorial of TLS2 and TLS2 version c...
 
Difference Program vs Process vs Thread
Difference Program vs Process vs ThreadDifference Program vs Process vs Thread
Difference Program vs Process vs Thread
 
Carrier Advice for a JAVA Developer How to Become a Java Programmer
Carrier Advice for a JAVA Developer How to Become a Java ProgrammerCarrier Advice for a JAVA Developer How to Become a Java Programmer
Carrier Advice for a JAVA Developer How to Become a Java Programmer
 
How to become a Software Tester Carrier Path for Software Quality Tester
How to become a Software Tester Carrier Path for Software Quality TesterHow to become a Software Tester Carrier Path for Software Quality Tester
How to become a Software Tester Carrier Path for Software Quality Tester
 
How to become a Software Engineer Carrier Path for Software Developer
How to become a Software Engineer Carrier Path for Software DeveloperHow to become a Software Engineer Carrier Path for Software Developer
How to become a Software Engineer Carrier Path for Software Developer
 
Events vs Notifications
Events vs NotificationsEvents vs Notifications
Events vs Notifications
 
Microservice Architecture Software Architecture Microservice Design Pattern
Microservice Architecture Software Architecture Microservice Design PatternMicroservice Architecture Software Architecture Microservice Design Pattern
Microservice Architecture Software Architecture Microservice Design Pattern
 
Event Driven Software Architecture Pattern
Event Driven Software Architecture PatternEvent Driven Software Architecture Pattern
Event Driven Software Architecture Pattern
 
Top 5 Software Architecture Pattern Event Driven SOA Microservice Serverless ...
Top 5 Software Architecture Pattern Event Driven SOA Microservice Serverless ...Top 5 Software Architecture Pattern Event Driven SOA Microservice Serverless ...
Top 5 Software Architecture Pattern Event Driven SOA Microservice Serverless ...
 
Observability vs APM vs Monitoring Comparison
Observability vs APM vs  Monitoring ComparisonObservability vs APM vs  Monitoring Comparison
Observability vs APM vs Monitoring Comparison
 
Disaster Recovery vs Data Backup what is the difference
Disaster Recovery vs Data Backup what is the differenceDisaster Recovery vs Data Backup what is the difference
Disaster Recovery vs Data Backup what is the difference
 
What is Spinnaker? Spinnaker tutorial
What is Spinnaker? Spinnaker tutorialWhat is Spinnaker? Spinnaker tutorial
What is Spinnaker? Spinnaker tutorial
 
Difference between Github vs Gitlab vs Bitbucket
Difference between Github vs Gitlab vs BitbucketDifference between Github vs Gitlab vs Bitbucket
Difference between Github vs Gitlab vs Bitbucket
 

Dernier

BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEBATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEOrtus Solutions, Corp
 
Buds n Tech IT Solutions: Top-Notch Web Services in Noida
Buds n Tech IT Solutions: Top-Notch Web Services in NoidaBuds n Tech IT Solutions: Top-Notch Web Services in Noida
Buds n Tech IT Solutions: Top-Notch Web Services in Noidabntitsolutionsrishis
 
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024StefanoLambiase
 
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Cizo Technology Services
 
Cyber security and its impact on E commerce
Cyber security and its impact on E commerceCyber security and its impact on E commerce
Cyber security and its impact on E commercemanigoyal112
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio, Inc.
 
Odoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 EnterpriseOdoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 Enterprisepreethippts
 
React Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaReact Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaHanief Utama
 
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...confluent
 
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanySuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanyChristoph Pohl
 
What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...Technogeeks
 
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company OdishaBalasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odishasmiwainfosol
 
SpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at RuntimeSpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at Runtimeandrehoraa
 
How to Track Employee Performance A Comprehensive Guide.pdf
How to Track Employee Performance A Comprehensive Guide.pdfHow to Track Employee Performance A Comprehensive Guide.pdf
How to Track Employee Performance A Comprehensive Guide.pdfLivetecs LLC
 
Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)Ahmed Mater
 
How to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationHow to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationBradBedford3
 
Folding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesFolding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesPhilip Schwarz
 
A healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdfA healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdfMarharyta Nedzelska
 
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEECloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEEVICTOR MAESTRE RAMIREZ
 

Dernier (20)

BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEBATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
 
Buds n Tech IT Solutions: Top-Notch Web Services in Noida
Buds n Tech IT Solutions: Top-Notch Web Services in NoidaBuds n Tech IT Solutions: Top-Notch Web Services in Noida
Buds n Tech IT Solutions: Top-Notch Web Services in Noida
 
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
 
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
 
Cyber security and its impact on E commerce
Cyber security and its impact on E commerceCyber security and its impact on E commerce
Cyber security and its impact on E commerce
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
 
Odoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 EnterpriseOdoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 Enterprise
 
React Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaReact Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief Utama
 
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
 
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanySuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
 
What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...
 
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company OdishaBalasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
 
SpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at RuntimeSpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at Runtime
 
How to Track Employee Performance A Comprehensive Guide.pdf
How to Track Employee Performance A Comprehensive Guide.pdfHow to Track Employee Performance A Comprehensive Guide.pdf
How to Track Employee Performance A Comprehensive Guide.pdf
 
Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)
 
How to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationHow to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion Application
 
Folding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesFolding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a series
 
A healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdfA healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdf
 
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEECloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEE
 
2.pdf Ejercicios de programación competitiva
2.pdf Ejercicios de programación competitiva2.pdf Ejercicios de programación competitiva
2.pdf Ejercicios de programación competitiva
 

What is AWS Glue

  • 1.
  • 2. What is AWS Glue? AWS Glue is a serverless data integration service that makes it easier to discover, prepare, move, and integrate data from multiple sources for analytics, machine learning (ML), and application development. AWS Glue is an event-driven, serverless computing platform provided by Amazon as a part of Amazon Web Services. It is a computing service that runs code in response to events and automatically manages the computing resources required by that code. It was introduced in August 2017. The primary purpose of Glue is to scan other services in the same Virtual Private Cloud (or equivalent accessible network element even if not provided by AWS), particularly S3. The jobs are billed according to compute time, with a minimum count of 1 minute. Glue discovers the source data to store associated meta-data (e.g. the table's schema of field names, types lengths) in the AWS Glue Data Catalog (which is then accessible via AWS console or APIs).
  • 3. What is ETL? Extract — The script will read all the usage data from the S3 bucket to a single data frame (you can think of a data frame in Pandas) Transform — Let’s say that the original data contains 10 different logs per second on average. The analytics team wants the data to be aggregated per each 1 minute with a specific logic. Load — Write the processed data back to another S3 bucket for the analytics team.
  • 4. How AWS Glue work? AWS Glue can run your extract, transform, and load (ETL) jobs as new data arrives. For example, you can configure AWS Glue to initiate your ETL jobs to run as soon as new data becomes available in Amazon Simple Storage Service (S3).
  • 5. How AWS Glue work? Choose your preferred data integration engine in AWS Glue to support your users and workloads.
  • 6. How AWS Glue work? You can use the Data Catalog to quickly discover and search multiple AWS datasets without moving the data. Once the data is cataloged, it is immediately available for search and query using Amazon Athena, Amazon EMR, and Amazon Redshift Spectrum.
  • 7. How AWS Glue work? AWS Glue Studio makes it easier to visually create, run, and monitor AWS Glue ETL jobs. You can build ETL jobs that move and transform data using a drag-and-drop editor, and AWS Glue automatically generates the code.
  • 8. Use case where AWS Glue fits? Simplify ETL pipeline development Remove infrastructure management with automatic provisioning and worker management, and consolidate all your data integration needs into a single service. Discover data efficiently Quickly identify data across multiple AWS datasets, and then make it instantly available for querying and transforming. Interactively explore, experiment on, and process data Using AWS Glue interactive sessions, data engineers can interactively explore and prepare data using the integrated development environment (IDE) or notebook of their choice. Support various processing frameworks and workloads More easily support various data processing frameworks, such as ETL and ELT, and various workloads, including batch, micro-batch, and streaming.
  • 9. What is AWS Glue Studio? AWS Glue Studio is a graphical interface that makes it easy to create, run, and monitor data integration jobs in AWS Glue. You can visually compose data transformation workflows and seamlessly run them on the Apache Spark–based serverless ETL engine in AWS Glue. For more information, see What is AWS Glue Studio. With AWS Glue Studio, you can create and manage jobs that gather, transform, and clean data. You can also use AWS Glue Studio to troubleshoot and edit job scripts. AWS Glue features AWS Glue features fall into three major categories: •Discover and organize data •Transform, prepare, and clean data for analysis •Build and monitor data pipelines
  • 10. How to Access AWS Glue ? You can create, view, and manage your AWS Glue jobs using the following interfaces: •AWS Glue console – Provides a web interface for you to create, view, and manage your AWS Glue jobs. •AWS Glue Studio – Provides a graphical interface for you to create and edit your AWS Glue jobs visually. •AWS Glue section of the AWS CLI Reference – Provides AWS CLI commands that you can use with AWS Glue. •AWS Glue API – Provides a complete API reference for developers.
  • 11. AWS Glue console You use the AWS Glue console to define and orchestrate your ETL workflow. The console calls several API operations in the AWS Glue Data Catalog and AWS Glue Jobs system to perform the following tasks: •Define AWS Glue objects such as jobs, tables, crawlers, and connections. •Schedule when crawlers run. •Define events or schedules for job triggers. •Search and filter lists of AWS Glue objects. •Edit transformation scripts.
  • 12. AWS Glue Studio AWS Glue Studio is a new graphical interface that makes it easy to create, run, and monitor extract, transform, and load (ETL) jobs in AWS Glue. You can visually compose data transformation workflows and seamlessly run them on AWS Glue’s Apache Spark-based serverless ETL engine.
  • 13. Streaming ETL in AWS Glue AWS Glue enables you to perform ETL operations on streaming data using continuously-running jobs. AWS Glue streaming ETL is built on the Apache Spark Structured Streaming engine, and can ingest streams from Amazon Kinesis Data Streams, Apache Kafka, and Amazon Managed Streaming for Apache Kafka (Amazon MSK). Streaming ETL can clean and transform streaming data and load it into Amazon S3 or JDBC data stores. Use Streaming ETL in AWS Glue to process event data like IoT streams, clickstreams, and network logs.
  • 14. The AWS Glue jobs system The AWS Glue Jobs system provides managed infrastructure to orchestrate your ETL workflow. You can create jobs in AWS Glue that automate the scripts you use to extract, transform, and transfer data to different locations. Jobs can be scheduled and chained, or they can be triggered by events such as the arrival of new data.
  • 15. Serverless ETL jobs run in isolation AWS Glue runs your ETL jobs in an Apache Spark serverless environment. AWS Glue runs these jobs on virtual resources that it provisions and manages in its own service account. AWS Glue is designed to do the following: •Segregate customer data. •Protect customer data in transit and at rest. •Access customer data only as needed in response to customer requests, using temporary, scoped-down credentials, or with a customer's consent to IAM roles in their account.
  • 16. Data sources and destinations AWS Glue allows you to read and write data from multiple systems and databases including: •Amazon S3 •Amazon DynamoDB •Amazon Redshift •Amazon Relational Database Service (Amazon RDS) •Third-party JDBC-accessible databases •MongoDB and Amazon DocumentDB (with MongoDB compatibility) •Other marketplace connectors and Apache Spark plugins Data streams AWS Glue can stream data from the following systems: •Amazon Kinesis Data Streams •Apache Kafka AWS Glue is available in several AWS Regions.
  • 17. Components of AWS Glue •Data catalog: The data catalog holds the metadata and the structure of the data. •Database: It is used to create or access the database for the sources and targets. •Table: Create one or more tables in the database that can be used by the source and target. •Crawler and Classifier: A crawler is used to retrieve data from the source using built-in or custom classifiers. It creates/uses metadata tables that are pre-defined in the data catalog. •Job: A job is business logic that carries out an ETL task. Internally, Apache Spark with python or scala language writes this business logic. •Trigger: A trigger starts the ETL job execution on-demand or at a specific time. •Development endpoint: It creates a development environment where the ETL job script can be tested, developed, and debugged.
  • 18. Summary AWS Glue makes it easy to integrate data across your architecture. It integrates with AWS analytics services and Amazon S3 data lakes. AWS Glue has integration interfaces and job-authoring tools that are easy to use for all users, from developers to business users, with tailored solutions for varied technical skill sets. With the ability to scale on demand, AWS Glue helps you focus on high-value activities that maximize the value of your data. It scales for any data size, and supports all data types and schema variances. To increase agility and optimize costs, AWS Glue provides built-in high availability and pay-as-you-go billing. AWS Glue consolidates major data integration capabilities into a single service. These include data discovery, modern ETL, cleansing, transforming, and centralized cataloging. It's also serverless, which means there's no infrastructure to manage. With flexible support for all workloads like ETL, ELT, and streaming in one service, AWS Glue supports users across various workloads and types of users.
  • 19. THANK YOU Like the Video and Subscribe the Channel