SlideShare une entreprise Scribd logo
1  sur  27
Télécharger pour lire hors ligne
Automating Federal Aviation
Administration’s (FAA) System Wide
Information Management (SWIM)
Data Ingestion and Analysis
Dr. Mehdi Hashemipour, Data Scientist, Bureau of Transportation Statistics
Marcelo Zambrana, Cloud Solutions Architect, Microsoft
Sheila Stewart, Solutions Architect, Databricks
Agenda
Mehdi Hashemipour, PhD
SWIM Overview
Marcelo Zambrana
Automating Infrastructure Architecture
Sheila Stewart
SWIM Data Processing
Objectives and Benefits
Objectives:
Using FAA flight data to build a Commercial Flight Database to
validate airline data and support the BTS mandate to measure and
report aviation system performance.
Potential Benefits:
▪ Enable timely estimates of enplanements and on-time
performance
▪ Provide a point of validation for airline-submitted data
▪ Expand BTS’s analytical capabilities and breadth of reporting
▪ Support special aviation studies
▪ Provide source of data to aviation dashboards and other
statistical products
▪ Serve as the aviation component of the Transportation Disruption
and Disaster System
The System Wide Information Management (SWIM)
SWIM service provides a single interface point to multiple data
services including airport, flight, aeronautical and weather data.
STDD Stream :
Access to data from over 200 airports.
Data from over 400 individual systems.
Potential BTS Use Cases for SWIM Data
• Airport Time Delays
• Ground Stop History, Status and Impact
• On-time Performance estimate by causes
• System Passenger Loading
• Airline Data Quality Assurance Check
• OAG Replacement
Airport/Airline
Performance
• Freight Aircraft Location On Ground
• Air Cargo Patterns and Seasonality
• Multi-modal Cargo Movement
AirCargoTraffic
• Planned vs. Actual Flight Path Analysis
• Actual flight path deviations from the “norm”
• Fuel Cost and Ticket Price Correlation
• Financial Impact of Delays
EconomicImpactof
Delays/Diversions/
Cancellations
• Gate availability
• Flight pattern interruption
• Late Arriving flight pattern
• Morning Flight Delay Impact
Operational Impact of
Delays/Diversions/
Cancellations
• Re-direct diverted passengers
• Passenger Impact of Cancellations
Passenger Impact of
Delays/Diversions/
Cancellations
Data Lake
BTS Conceptual SWIM Architecture
BTS
SWIM
MSG
Service
SWIM
Data
Msg
Service
XML MSG
Processing
Economic
Impact
Weather
Impact
Ground
Movement
ITWS
Kafka
…
TFM
Flight
TFM
Flow
TBFM
FDPS
ITWS
ITWS
FAA SWIM Data Service Bureau of Transportation Statistics (BTS)
Temp Raw
XML File
Storage
Performance
Air Cargo
Traffic
DOT Virtual
Machine
FAA NW
Gateway
Mapping &
Animation
…
Data AnalyticsXML Message Handling Data Transformation
and Storage
DOT Cloud Computing Environment
Data
Analyst
Data Lake
8
Infrastructure Architecture
Initial Goals
▪ Automate as much as possible
▪ Infrastructure.
▪ Server Configuration.
▪ Databricks resources.
▪ Multiple Ingestion Sources
▪ SWIM offers multiple types of sources.
▪ On-prem data sources.
▪ Security and Scalability
▪ Internal traffic only.
▪ Multiple environments.
Infrastructure Architecture
Automating the Environment
▪ Initial Networking
▪ Security
▪ VMs
▪ Storage
▪ Databricks Workspace
▪ Software Requirements
▪ Solace Connector
▪ SWIM Access configuration
▪ Kafka Configuration
Kafka ClusterInfrastructure
▪ Cluster Creation
▪ Libraries Configuration
▪ Notebooks
▪ Secrets
Databricks Cluster
Terraform
▪ Infrastructure as a Code
▪ Helps to automate infrastructure management.
▪ Understanding infrastructure changes before they
are applied.
▪ Allows to build, change and version infrastructure.
▪ Multi-cloud
▪ Common language for different providers.
▪ Feature rich
▪ Module Registry.
▪ Providers.
▪ Workspaces.
▪ Variables.
# Project Structure
├── LICENSE
├── README.md
├── main.tf
├── networking.tf
├── outputs.tf
├── security.tf
├── storage.tf
├── variables.tf
├── vm.tf
└── workspace.tf
# Common Commands
terraform fmt
terraform init
terraform validate
terraform plan
terraform apply
Configuration Management Ansible/Chef
▪ Consistency
▪ No more snowflake servers.
▪ Version Control of all configurations.
▪ Replicated environments.
▪ Scalability
▪ Add more SWIM source configurations.
▪ Easy to deploy new environments.
▪ Documentation
▪ Building-up knowledge.
▪ Change History.
Databricks CLI
▪ Easy Interface to Databricks
Platform
▪ Open source.
▪ Built on top of Databricks REST API.
▪ Allows you to interact with: workspace, clusters, fs,
groups, jobs, runs, libraries, and secrets.
▪ Supports multiple profiles.
▪ Experimental
▪ Still under active development.
# Create Databricks Cluster
databricks clusters create --json-file config/cluster.json
# Import Libraries
databricks libraries install --cluster-id CLUSTER_ID --maven-coordinates
com.databricks:spark-xml_2.11:0.9.0
# Import Notebooks
databricks workspace import -l PYTHON -f DBC Notebooks/TFMS.dbc
/Users/USER/tfms
#Secret Management
## Create secret scope
databricks secrets create-scope --scope swim --initial-manage-principal users
## Create new secret
databricks secrets put --scope bts-swim --key bts-swim-sp --string-value my-value
databricks secrets put --scope bts-swim --key bts-swim-sp --binary-file config/SP.txt
databricks secrets put --scope bts-swim --key bts-swim-sp
GitHub – GitHub Actions
▪ Automate from code to Cloud
▪ Workflow Automation
▪ Any OS, any language, and any
cloud.
CI/CD Architecture
Lessons Learned
▪ Infrastructure and Configuration as a Code
▪ Initial setup takes time.
▪ Test, fail and improve faster.
▪ Learning curve.
▪ Version Control
▪ Easy to review changes.
▪ Helps on-boarding new developers.
▪ Security
▪ Internal Network only.
▪ Limited access.
https://github.com/Chambras/SparkSummit2020
SWIM Data Processing
Future State Architecture
SWIM Data Lake Architecture with Streaming SWIM into Databricks
21
Predictive Analysis and
Advanced Analytics
Bronze
Oracle
Sybase
Adhoc & Graph AnalysisSpark ETL
Silver Gold Summary/Platinum -
optional
Enrichment
OperationsSWIM DataLake
Tableau
Dashboards and Apps
Data Stores
Streaming
SWIM-TFMS
Azure Data Lake Storage
Batch
Raw XML data:,
Staging Batch
data
Parsed XML
data, Schema
Validation with
spark-xml
Joined and
Aggregated data
Potential Further
Aggregations
SWIM-Other topics
Streaming
Ingress Data ETL, Stream, and Store Data Build JIT Data Warehouse Analytics and BI
Streaming SWIM Data to Databricks
RUNTIME
DEMONSTRATION
Lessons Learned
▪ spark-xml is improving
▪ Need to investigate new features for mitigating
complex nested XML schemas
▪ XML Schema Validation
▪ Copying schema to executors mitigates File I/O
latencies by making use of memory for fast validation
▪ XML Schema Inference
▪ Batch processing of XML data at hourly or daily
periodicity based on SLAs mitigates allows for more
accurate inference
Next Steps
▪ Validate SWIM data against
data provided by airlines
▪ Deeper dive into predictive
modeling to gather insights on
flight delays and passengers
affected
▪ Open up data pipeline to more
SWIM data feeds
Contact Us
marcelo.zambrana@microsoft.com
@ch4mbr4s
sheila.stewart@databricks.com
m.hashemipour@dot.gov
Feedback
Your feedback is important to us.
Don’t forget to rate and
review the sessions.
Automating Federal Aviation Administration’s (FAA) System Wide Information Management (SWIM) Data Ingestion and Analysis

Contenu connexe

Tendances

Azure Database Services for MySQL PostgreSQL and MariaDB
Azure Database Services for MySQL PostgreSQL and MariaDBAzure Database Services for MySQL PostgreSQL and MariaDB
Azure Database Services for MySQL PostgreSQL and MariaDBNicholas Vossburg
 
ACID ORC, Iceberg, and Delta Lake—An Overview of Table Formats for Large Scal...
ACID ORC, Iceberg, and Delta Lake—An Overview of Table Formats for Large Scal...ACID ORC, Iceberg, and Delta Lake—An Overview of Table Formats for Large Scal...
ACID ORC, Iceberg, and Delta Lake—An Overview of Table Formats for Large Scal...Databricks
 
Build Real-Time Applications with Databricks Streaming
Build Real-Time Applications with Databricks StreamingBuild Real-Time Applications with Databricks Streaming
Build Real-Time Applications with Databricks StreamingDatabricks
 
Apache Kudu: Technical Deep Dive


Apache Kudu: Technical Deep Dive

Apache Kudu: Technical Deep Dive


Apache Kudu: Technical Deep Dive

Cloudera, Inc.
 
Using Apache Arrow, Calcite, and Parquet to Build a Relational Cache
Using Apache Arrow, Calcite, and Parquet to Build a Relational CacheUsing Apache Arrow, Calcite, and Parquet to Build a Relational Cache
Using Apache Arrow, Calcite, and Parquet to Build a Relational CacheDremio Corporation
 
Secure, Build and Deduplicate Your Data Lake Data with Amazon Lake Formation
Secure, Build and Deduplicate Your Data Lake Data with Amazon Lake FormationSecure, Build and Deduplicate Your Data Lake Data with Amazon Lake Formation
Secure, Build and Deduplicate Your Data Lake Data with Amazon Lake FormationAmazon Web Services
 
Achieving Lakehouse Models with Spark 3.0
Achieving Lakehouse Models with Spark 3.0Achieving Lakehouse Models with Spark 3.0
Achieving Lakehouse Models with Spark 3.0Databricks
 
Amazon EMR Deep Dive & Best Practices
Amazon EMR Deep Dive & Best PracticesAmazon EMR Deep Dive & Best Practices
Amazon EMR Deep Dive & Best PracticesAmazon Web Services
 
Moving to Databricks & Delta
Moving to Databricks & DeltaMoving to Databricks & Delta
Moving to Databricks & DeltaDatabricks
 
Apache Spark and the Hadoop Ecosystem on AWS
Apache Spark and the Hadoop Ecosystem on AWSApache Spark and the Hadoop Ecosystem on AWS
Apache Spark and the Hadoop Ecosystem on AWSAmazon Web Services
 
Best Practices for Streaming IoT Data with MQTT and Apache Kafka®
Best Practices for Streaming IoT Data with MQTT and Apache Kafka®Best Practices for Streaming IoT Data with MQTT and Apache Kafka®
Best Practices for Streaming IoT Data with MQTT and Apache Kafka®confluent
 
Large Scale Lakehouse Implementation Using Structured Streaming
Large Scale Lakehouse Implementation Using Structured StreamingLarge Scale Lakehouse Implementation Using Structured Streaming
Large Scale Lakehouse Implementation Using Structured StreamingDatabricks
 
Vectorized Query Execution in Apache Spark at Facebook
Vectorized Query Execution in Apache Spark at FacebookVectorized Query Execution in Apache Spark at Facebook
Vectorized Query Execution in Apache Spark at FacebookDatabricks
 
Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5
Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5
Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5Cloudera, Inc.
 
MuleSoft Online Meetup - MuleSoft integration with snowflake and kafka
MuleSoft Online Meetup - MuleSoft integration with snowflake and kafkaMuleSoft Online Meetup - MuleSoft integration with snowflake and kafka
MuleSoft Online Meetup - MuleSoft integration with snowflake and kafkaRoyston Lobo
 
Azure DataBricks for Data Engineering by Eugene Polonichko
Azure DataBricks for Data Engineering by Eugene PolonichkoAzure DataBricks for Data Engineering by Eugene Polonichko
Azure DataBricks for Data Engineering by Eugene PolonichkoDimko Zhluktenko
 
Apache Spark vs Apache Spark: An On-Prem Comparison of Databricks and Open-So...
Apache Spark vs Apache Spark: An On-Prem Comparison of Databricks and Open-So...Apache Spark vs Apache Spark: An On-Prem Comparison of Databricks and Open-So...
Apache Spark vs Apache Spark: An On-Prem Comparison of Databricks and Open-So...Databricks
 
PySpark Best Practices
PySpark Best PracticesPySpark Best Practices
PySpark Best PracticesCloudera, Inc.
 
Databricks Delta Lake and Its Benefits
Databricks Delta Lake and Its BenefitsDatabricks Delta Lake and Its Benefits
Databricks Delta Lake and Its BenefitsDatabricks
 
Delta Lake OSS: Create reliable and performant Data Lake by Quentin Ambard
Delta Lake OSS: Create reliable and performant Data Lake by Quentin AmbardDelta Lake OSS: Create reliable and performant Data Lake by Quentin Ambard
Delta Lake OSS: Create reliable and performant Data Lake by Quentin AmbardParis Data Engineers !
 

Tendances (20)

Azure Database Services for MySQL PostgreSQL and MariaDB
Azure Database Services for MySQL PostgreSQL and MariaDBAzure Database Services for MySQL PostgreSQL and MariaDB
Azure Database Services for MySQL PostgreSQL and MariaDB
 
ACID ORC, Iceberg, and Delta Lake—An Overview of Table Formats for Large Scal...
ACID ORC, Iceberg, and Delta Lake—An Overview of Table Formats for Large Scal...ACID ORC, Iceberg, and Delta Lake—An Overview of Table Formats for Large Scal...
ACID ORC, Iceberg, and Delta Lake—An Overview of Table Formats for Large Scal...
 
Build Real-Time Applications with Databricks Streaming
Build Real-Time Applications with Databricks StreamingBuild Real-Time Applications with Databricks Streaming
Build Real-Time Applications with Databricks Streaming
 
Apache Kudu: Technical Deep Dive


Apache Kudu: Technical Deep Dive

Apache Kudu: Technical Deep Dive


Apache Kudu: Technical Deep Dive


 
Using Apache Arrow, Calcite, and Parquet to Build a Relational Cache
Using Apache Arrow, Calcite, and Parquet to Build a Relational CacheUsing Apache Arrow, Calcite, and Parquet to Build a Relational Cache
Using Apache Arrow, Calcite, and Parquet to Build a Relational Cache
 
Secure, Build and Deduplicate Your Data Lake Data with Amazon Lake Formation
Secure, Build and Deduplicate Your Data Lake Data with Amazon Lake FormationSecure, Build and Deduplicate Your Data Lake Data with Amazon Lake Formation
Secure, Build and Deduplicate Your Data Lake Data with Amazon Lake Formation
 
Achieving Lakehouse Models with Spark 3.0
Achieving Lakehouse Models with Spark 3.0Achieving Lakehouse Models with Spark 3.0
Achieving Lakehouse Models with Spark 3.0
 
Amazon EMR Deep Dive & Best Practices
Amazon EMR Deep Dive & Best PracticesAmazon EMR Deep Dive & Best Practices
Amazon EMR Deep Dive & Best Practices
 
Moving to Databricks & Delta
Moving to Databricks & DeltaMoving to Databricks & Delta
Moving to Databricks & Delta
 
Apache Spark and the Hadoop Ecosystem on AWS
Apache Spark and the Hadoop Ecosystem on AWSApache Spark and the Hadoop Ecosystem on AWS
Apache Spark and the Hadoop Ecosystem on AWS
 
Best Practices for Streaming IoT Data with MQTT and Apache Kafka®
Best Practices for Streaming IoT Data with MQTT and Apache Kafka®Best Practices for Streaming IoT Data with MQTT and Apache Kafka®
Best Practices for Streaming IoT Data with MQTT and Apache Kafka®
 
Large Scale Lakehouse Implementation Using Structured Streaming
Large Scale Lakehouse Implementation Using Structured StreamingLarge Scale Lakehouse Implementation Using Structured Streaming
Large Scale Lakehouse Implementation Using Structured Streaming
 
Vectorized Query Execution in Apache Spark at Facebook
Vectorized Query Execution in Apache Spark at FacebookVectorized Query Execution in Apache Spark at Facebook
Vectorized Query Execution in Apache Spark at Facebook
 
Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5
Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5
Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5
 
MuleSoft Online Meetup - MuleSoft integration with snowflake and kafka
MuleSoft Online Meetup - MuleSoft integration with snowflake and kafkaMuleSoft Online Meetup - MuleSoft integration with snowflake and kafka
MuleSoft Online Meetup - MuleSoft integration with snowflake and kafka
 
Azure DataBricks for Data Engineering by Eugene Polonichko
Azure DataBricks for Data Engineering by Eugene PolonichkoAzure DataBricks for Data Engineering by Eugene Polonichko
Azure DataBricks for Data Engineering by Eugene Polonichko
 
Apache Spark vs Apache Spark: An On-Prem Comparison of Databricks and Open-So...
Apache Spark vs Apache Spark: An On-Prem Comparison of Databricks and Open-So...Apache Spark vs Apache Spark: An On-Prem Comparison of Databricks and Open-So...
Apache Spark vs Apache Spark: An On-Prem Comparison of Databricks and Open-So...
 
PySpark Best Practices
PySpark Best PracticesPySpark Best Practices
PySpark Best Practices
 
Databricks Delta Lake and Its Benefits
Databricks Delta Lake and Its BenefitsDatabricks Delta Lake and Its Benefits
Databricks Delta Lake and Its Benefits
 
Delta Lake OSS: Create reliable and performant Data Lake by Quentin Ambard
Delta Lake OSS: Create reliable and performant Data Lake by Quentin AmbardDelta Lake OSS: Create reliable and performant Data Lake by Quentin Ambard
Delta Lake OSS: Create reliable and performant Data Lake by Quentin Ambard
 

Similaire à Automating Federal Aviation Administration’s (FAA) System Wide Information Management (SWIM) Data Ingestion and Analysis

Oracle E-Business Suite On Oracle Cloud
Oracle E-Business Suite On Oracle CloudOracle E-Business Suite On Oracle Cloud
Oracle E-Business Suite On Oracle Cloudpasalapudi
 
Meetup: Streaming Data Pipeline Development
Meetup:  Streaming Data Pipeline DevelopmentMeetup:  Streaming Data Pipeline Development
Meetup: Streaming Data Pipeline DevelopmentTimothy Spann
 
“Lights Out”Configuration using Tivoli Netcool AutoDiscovery Tools
“Lights Out”Configuration using Tivoli Netcool AutoDiscovery Tools“Lights Out”Configuration using Tivoli Netcool AutoDiscovery Tools
“Lights Out”Configuration using Tivoli Netcool AutoDiscovery ToolsAntonio Rolle
 
Unconference Round Table Notes
Unconference Round Table NotesUnconference Round Table Notes
Unconference Round Table NotesTimothy Spann
 
Analyzing and processing streaming data with Amazon EMR - ADB204 - New York A...
Analyzing and processing streaming data with Amazon EMR - ADB204 - New York A...Analyzing and processing streaming data with Amazon EMR - ADB204 - New York A...
Analyzing and processing streaming data with Amazon EMR - ADB204 - New York A...Amazon Web Services
 
Solving enterprise challenges through scale out storage & big compute final
Solving enterprise challenges through scale out storage & big compute finalSolving enterprise challenges through scale out storage & big compute final
Solving enterprise challenges through scale out storage & big compute finalAvere Systems
 
Working Experience_V5.0
Working Experience_V5.0Working Experience_V5.0
Working Experience_V5.0Danny Lai
 
Microsoft Azure Traffic Manager
Microsoft Azure Traffic ManagerMicrosoft Azure Traffic Manager
Microsoft Azure Traffic ManagerIdo Katz
 
6. DISZ - Webalkalmazások skálázhatósága a Google Cloud Platformon
6. DISZ - Webalkalmazások skálázhatósága  a Google Cloud Platformon6. DISZ - Webalkalmazások skálázhatósága  a Google Cloud Platformon
6. DISZ - Webalkalmazások skálázhatósága a Google Cloud PlatformonMárton Kodok
 
PartnerSkillUp_Enable a Streaming CDC Solution
PartnerSkillUp_Enable a Streaming CDC SolutionPartnerSkillUp_Enable a Streaming CDC Solution
PartnerSkillUp_Enable a Streaming CDC SolutionTimothy Spann
 
IBM THINK 2018 - IBM Cloud SQL Query Introduction
IBM THINK 2018 - IBM Cloud SQL Query IntroductionIBM THINK 2018 - IBM Cloud SQL Query Introduction
IBM THINK 2018 - IBM Cloud SQL Query IntroductionTorsten Steinbach
 
Super-NetOps Source of Truth
Super-NetOps Source of TruthSuper-NetOps Source of Truth
Super-NetOps Source of TruthJoel W. King
 
Building an Observability Platform in 389 Difficult Steps
Building an Observability Platform in 389 Difficult StepsBuilding an Observability Platform in 389 Difficult Steps
Building an Observability Platform in 389 Difficult StepsDigitalOcean
 
Mastering the move
Mastering the moveMastering the move
Mastering the moveTrivadis
 
(ARC346) Scaling To 25 Billion Daily Requests Within 3 Months On AWS
(ARC346) Scaling To 25 Billion Daily Requests Within 3 Months On AWS(ARC346) Scaling To 25 Billion Daily Requests Within 3 Months On AWS
(ARC346) Scaling To 25 Billion Daily Requests Within 3 Months On AWSAmazon Web Services
 
2024 February 28 - NYC - Meetup Unlocking Financial Data with Real-Time Pipel...
2024 February 28 - NYC - Meetup Unlocking Financial Data with Real-Time Pipel...2024 February 28 - NYC - Meetup Unlocking Financial Data with Real-Time Pipel...
2024 February 28 - NYC - Meetup Unlocking Financial Data with Real-Time Pipel...Timothy Spann
 
Best Practices for Managing Kubernetes and Stateful Services: Mesosphere & Sy...
Best Practices for Managing Kubernetes and Stateful Services: Mesosphere & Sy...Best Practices for Managing Kubernetes and Stateful Services: Mesosphere & Sy...
Best Practices for Managing Kubernetes and Stateful Services: Mesosphere & Sy...Mesosphere Inc.
 
DBA Fundamentals Group: Continuous SQL with Kafka and Flink
DBA Fundamentals Group: Continuous SQL with Kafka and FlinkDBA Fundamentals Group: Continuous SQL with Kafka and Flink
DBA Fundamentals Group: Continuous SQL with Kafka and FlinkTimothy Spann
 
Big Data LDN 2018: DEUTSCHE BANK: THE PATH TO AUTOMATION IN A HIGHLY REGULATE...
Big Data LDN 2018: DEUTSCHE BANK: THE PATH TO AUTOMATION IN A HIGHLY REGULATE...Big Data LDN 2018: DEUTSCHE BANK: THE PATH TO AUTOMATION IN A HIGHLY REGULATE...
Big Data LDN 2018: DEUTSCHE BANK: THE PATH TO AUTOMATION IN A HIGHLY REGULATE...Matt Stubbs
 

Similaire à Automating Federal Aviation Administration’s (FAA) System Wide Information Management (SWIM) Data Ingestion and Analysis (20)

Oracle E-Business Suite On Oracle Cloud
Oracle E-Business Suite On Oracle CloudOracle E-Business Suite On Oracle Cloud
Oracle E-Business Suite On Oracle Cloud
 
Meetup: Streaming Data Pipeline Development
Meetup:  Streaming Data Pipeline DevelopmentMeetup:  Streaming Data Pipeline Development
Meetup: Streaming Data Pipeline Development
 
“Lights Out”Configuration using Tivoli Netcool AutoDiscovery Tools
“Lights Out”Configuration using Tivoli Netcool AutoDiscovery Tools“Lights Out”Configuration using Tivoli Netcool AutoDiscovery Tools
“Lights Out”Configuration using Tivoli Netcool AutoDiscovery Tools
 
Unconference Round Table Notes
Unconference Round Table NotesUnconference Round Table Notes
Unconference Round Table Notes
 
Analyzing and processing streaming data with Amazon EMR - ADB204 - New York A...
Analyzing and processing streaming data with Amazon EMR - ADB204 - New York A...Analyzing and processing streaming data with Amazon EMR - ADB204 - New York A...
Analyzing and processing streaming data with Amazon EMR - ADB204 - New York A...
 
Solving enterprise challenges through scale out storage & big compute final
Solving enterprise challenges through scale out storage & big compute finalSolving enterprise challenges through scale out storage & big compute final
Solving enterprise challenges through scale out storage & big compute final
 
Working Experience_V5.0
Working Experience_V5.0Working Experience_V5.0
Working Experience_V5.0
 
Microsoft Azure Traffic Manager
Microsoft Azure Traffic ManagerMicrosoft Azure Traffic Manager
Microsoft Azure Traffic Manager
 
6. DISZ - Webalkalmazások skálázhatósága a Google Cloud Platformon
6. DISZ - Webalkalmazások skálázhatósága  a Google Cloud Platformon6. DISZ - Webalkalmazások skálázhatósága  a Google Cloud Platformon
6. DISZ - Webalkalmazások skálázhatósága a Google Cloud Platformon
 
PartnerSkillUp_Enable a Streaming CDC Solution
PartnerSkillUp_Enable a Streaming CDC SolutionPartnerSkillUp_Enable a Streaming CDC Solution
PartnerSkillUp_Enable a Streaming CDC Solution
 
Datapower Steven Cawn
Datapower Steven CawnDatapower Steven Cawn
Datapower Steven Cawn
 
IBM THINK 2018 - IBM Cloud SQL Query Introduction
IBM THINK 2018 - IBM Cloud SQL Query IntroductionIBM THINK 2018 - IBM Cloud SQL Query Introduction
IBM THINK 2018 - IBM Cloud SQL Query Introduction
 
Super-NetOps Source of Truth
Super-NetOps Source of TruthSuper-NetOps Source of Truth
Super-NetOps Source of Truth
 
Building an Observability Platform in 389 Difficult Steps
Building an Observability Platform in 389 Difficult StepsBuilding an Observability Platform in 389 Difficult Steps
Building an Observability Platform in 389 Difficult Steps
 
Mastering the move
Mastering the moveMastering the move
Mastering the move
 
(ARC346) Scaling To 25 Billion Daily Requests Within 3 Months On AWS
(ARC346) Scaling To 25 Billion Daily Requests Within 3 Months On AWS(ARC346) Scaling To 25 Billion Daily Requests Within 3 Months On AWS
(ARC346) Scaling To 25 Billion Daily Requests Within 3 Months On AWS
 
2024 February 28 - NYC - Meetup Unlocking Financial Data with Real-Time Pipel...
2024 February 28 - NYC - Meetup Unlocking Financial Data with Real-Time Pipel...2024 February 28 - NYC - Meetup Unlocking Financial Data with Real-Time Pipel...
2024 February 28 - NYC - Meetup Unlocking Financial Data with Real-Time Pipel...
 
Best Practices for Managing Kubernetes and Stateful Services: Mesosphere & Sy...
Best Practices for Managing Kubernetes and Stateful Services: Mesosphere & Sy...Best Practices for Managing Kubernetes and Stateful Services: Mesosphere & Sy...
Best Practices for Managing Kubernetes and Stateful Services: Mesosphere & Sy...
 
DBA Fundamentals Group: Continuous SQL with Kafka and Flink
DBA Fundamentals Group: Continuous SQL with Kafka and FlinkDBA Fundamentals Group: Continuous SQL with Kafka and Flink
DBA Fundamentals Group: Continuous SQL with Kafka and Flink
 
Big Data LDN 2018: DEUTSCHE BANK: THE PATH TO AUTOMATION IN A HIGHLY REGULATE...
Big Data LDN 2018: DEUTSCHE BANK: THE PATH TO AUTOMATION IN A HIGHLY REGULATE...Big Data LDN 2018: DEUTSCHE BANK: THE PATH TO AUTOMATION IN A HIGHLY REGULATE...
Big Data LDN 2018: DEUTSCHE BANK: THE PATH TO AUTOMATION IN A HIGHLY REGULATE...
 

Plus de Databricks

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDatabricks
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Databricks
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Databricks
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Databricks
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Databricks
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of HadoopDatabricks
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDatabricks
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceDatabricks
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringDatabricks
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixDatabricks
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationDatabricks
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchDatabricks
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesDatabricks
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsDatabricks
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkDatabricks
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkDatabricks
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesDatabricks
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkDatabricks
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeDatabricks
 
Machine Learning CI/CD for Email Attack Detection
Machine Learning CI/CD for Email Attack DetectionMachine Learning CI/CD for Email Attack Detection
Machine Learning CI/CD for Email Attack DetectionDatabricks
 

Plus de Databricks (20)

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML Monitoring
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature Aggregations
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and Spark
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
 
Machine Learning CI/CD for Email Attack Detection
Machine Learning CI/CD for Email Attack DetectionMachine Learning CI/CD for Email Attack Detection
Machine Learning CI/CD for Email Attack Detection
 

Dernier

BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...amitlee9823
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxMohammedJunaid861692
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxolyaivanovalion
 
CALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service Online
CALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service OnlineCALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service Online
CALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service Onlineanilsa9823
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxolyaivanovalion
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...shambhavirathore45
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Delhi Call girls
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionfulawalesam
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...amitlee9823
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxolyaivanovalion
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfadriantubila
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023ymrp368
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightDelhi Call girls
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 

Dernier (20)

BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
CALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service Online
CALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service OnlineCALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service Online
CALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service Online
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 

Automating Federal Aviation Administration’s (FAA) System Wide Information Management (SWIM) Data Ingestion and Analysis

  • 1.
  • 2. Automating Federal Aviation Administration’s (FAA) System Wide Information Management (SWIM) Data Ingestion and Analysis Dr. Mehdi Hashemipour, Data Scientist, Bureau of Transportation Statistics Marcelo Zambrana, Cloud Solutions Architect, Microsoft Sheila Stewart, Solutions Architect, Databricks
  • 3. Agenda Mehdi Hashemipour, PhD SWIM Overview Marcelo Zambrana Automating Infrastructure Architecture Sheila Stewart SWIM Data Processing
  • 4. Objectives and Benefits Objectives: Using FAA flight data to build a Commercial Flight Database to validate airline data and support the BTS mandate to measure and report aviation system performance. Potential Benefits: ▪ Enable timely estimates of enplanements and on-time performance ▪ Provide a point of validation for airline-submitted data ▪ Expand BTS’s analytical capabilities and breadth of reporting ▪ Support special aviation studies ▪ Provide source of data to aviation dashboards and other statistical products ▪ Serve as the aviation component of the Transportation Disruption and Disaster System
  • 5. The System Wide Information Management (SWIM) SWIM service provides a single interface point to multiple data services including airport, flight, aeronautical and weather data. STDD Stream : Access to data from over 200 airports. Data from over 400 individual systems.
  • 6.
  • 7. Potential BTS Use Cases for SWIM Data • Airport Time Delays • Ground Stop History, Status and Impact • On-time Performance estimate by causes • System Passenger Loading • Airline Data Quality Assurance Check • OAG Replacement Airport/Airline Performance • Freight Aircraft Location On Ground • Air Cargo Patterns and Seasonality • Multi-modal Cargo Movement AirCargoTraffic • Planned vs. Actual Flight Path Analysis • Actual flight path deviations from the “norm” • Fuel Cost and Ticket Price Correlation • Financial Impact of Delays EconomicImpactof Delays/Diversions/ Cancellations • Gate availability • Flight pattern interruption • Late Arriving flight pattern • Morning Flight Delay Impact Operational Impact of Delays/Diversions/ Cancellations • Re-direct diverted passengers • Passenger Impact of Cancellations Passenger Impact of Delays/Diversions/ Cancellations
  • 8. Data Lake BTS Conceptual SWIM Architecture BTS SWIM MSG Service SWIM Data Msg Service XML MSG Processing Economic Impact Weather Impact Ground Movement ITWS Kafka … TFM Flight TFM Flow TBFM FDPS ITWS ITWS FAA SWIM Data Service Bureau of Transportation Statistics (BTS) Temp Raw XML File Storage Performance Air Cargo Traffic DOT Virtual Machine FAA NW Gateway Mapping & Animation … Data AnalyticsXML Message Handling Data Transformation and Storage DOT Cloud Computing Environment Data Analyst Data Lake 8
  • 10. Initial Goals ▪ Automate as much as possible ▪ Infrastructure. ▪ Server Configuration. ▪ Databricks resources. ▪ Multiple Ingestion Sources ▪ SWIM offers multiple types of sources. ▪ On-prem data sources. ▪ Security and Scalability ▪ Internal traffic only. ▪ Multiple environments.
  • 12. Automating the Environment ▪ Initial Networking ▪ Security ▪ VMs ▪ Storage ▪ Databricks Workspace ▪ Software Requirements ▪ Solace Connector ▪ SWIM Access configuration ▪ Kafka Configuration Kafka ClusterInfrastructure ▪ Cluster Creation ▪ Libraries Configuration ▪ Notebooks ▪ Secrets Databricks Cluster
  • 13. Terraform ▪ Infrastructure as a Code ▪ Helps to automate infrastructure management. ▪ Understanding infrastructure changes before they are applied. ▪ Allows to build, change and version infrastructure. ▪ Multi-cloud ▪ Common language for different providers. ▪ Feature rich ▪ Module Registry. ▪ Providers. ▪ Workspaces. ▪ Variables. # Project Structure ├── LICENSE ├── README.md ├── main.tf ├── networking.tf ├── outputs.tf ├── security.tf ├── storage.tf ├── variables.tf ├── vm.tf └── workspace.tf # Common Commands terraform fmt terraform init terraform validate terraform plan terraform apply
  • 14. Configuration Management Ansible/Chef ▪ Consistency ▪ No more snowflake servers. ▪ Version Control of all configurations. ▪ Replicated environments. ▪ Scalability ▪ Add more SWIM source configurations. ▪ Easy to deploy new environments. ▪ Documentation ▪ Building-up knowledge. ▪ Change History.
  • 15. Databricks CLI ▪ Easy Interface to Databricks Platform ▪ Open source. ▪ Built on top of Databricks REST API. ▪ Allows you to interact with: workspace, clusters, fs, groups, jobs, runs, libraries, and secrets. ▪ Supports multiple profiles. ▪ Experimental ▪ Still under active development. # Create Databricks Cluster databricks clusters create --json-file config/cluster.json # Import Libraries databricks libraries install --cluster-id CLUSTER_ID --maven-coordinates com.databricks:spark-xml_2.11:0.9.0 # Import Notebooks databricks workspace import -l PYTHON -f DBC Notebooks/TFMS.dbc /Users/USER/tfms #Secret Management ## Create secret scope databricks secrets create-scope --scope swim --initial-manage-principal users ## Create new secret databricks secrets put --scope bts-swim --key bts-swim-sp --string-value my-value databricks secrets put --scope bts-swim --key bts-swim-sp --binary-file config/SP.txt databricks secrets put --scope bts-swim --key bts-swim-sp
  • 16. GitHub – GitHub Actions ▪ Automate from code to Cloud ▪ Workflow Automation ▪ Any OS, any language, and any cloud.
  • 18. Lessons Learned ▪ Infrastructure and Configuration as a Code ▪ Initial setup takes time. ▪ Test, fail and improve faster. ▪ Learning curve. ▪ Version Control ▪ Easy to review changes. ▪ Helps on-boarding new developers. ▪ Security ▪ Internal Network only. ▪ Limited access.
  • 21. Future State Architecture SWIM Data Lake Architecture with Streaming SWIM into Databricks 21 Predictive Analysis and Advanced Analytics Bronze Oracle Sybase Adhoc & Graph AnalysisSpark ETL Silver Gold Summary/Platinum - optional Enrichment OperationsSWIM DataLake Tableau Dashboards and Apps Data Stores Streaming SWIM-TFMS Azure Data Lake Storage Batch Raw XML data:, Staging Batch data Parsed XML data, Schema Validation with spark-xml Joined and Aggregated data Potential Further Aggregations SWIM-Other topics Streaming Ingress Data ETL, Stream, and Store Data Build JIT Data Warehouse Analytics and BI Streaming SWIM Data to Databricks RUNTIME
  • 23. Lessons Learned ▪ spark-xml is improving ▪ Need to investigate new features for mitigating complex nested XML schemas ▪ XML Schema Validation ▪ Copying schema to executors mitigates File I/O latencies by making use of memory for fast validation ▪ XML Schema Inference ▪ Batch processing of XML data at hourly or daily periodicity based on SLAs mitigates allows for more accurate inference
  • 24. Next Steps ▪ Validate SWIM data against data provided by airlines ▪ Deeper dive into predictive modeling to gather insights on flight delays and passengers affected ▪ Open up data pipeline to more SWIM data feeds
  • 26. Feedback Your feedback is important to us. Don’t forget to rate and review the sessions.