The System Wide Information Management (SWIM) Program is a National Airspace System (NAS)-wide information system that supports Next Generation Air Transportation System (NextGen) goals. SWIM facilitates the data-sharing requirements for NextGen, providing the digital data-sharing backbone of NextGen.
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Automating Federal Aviation Administration’s (FAA) System Wide Information Management (SWIM) Data Ingestion and Analysis
1.
2. Automating Federal Aviation
Administration’s (FAA) System Wide
Information Management (SWIM)
Data Ingestion and Analysis
Dr. Mehdi Hashemipour, Data Scientist, Bureau of Transportation Statistics
Marcelo Zambrana, Cloud Solutions Architect, Microsoft
Sheila Stewart, Solutions Architect, Databricks
4. Objectives and Benefits
Objectives:
Using FAA flight data to build a Commercial Flight Database to
validate airline data and support the BTS mandate to measure and
report aviation system performance.
Potential Benefits:
▪ Enable timely estimates of enplanements and on-time
performance
▪ Provide a point of validation for airline-submitted data
▪ Expand BTS’s analytical capabilities and breadth of reporting
▪ Support special aviation studies
▪ Provide source of data to aviation dashboards and other
statistical products
▪ Serve as the aviation component of the Transportation Disruption
and Disaster System
5. The System Wide Information Management (SWIM)
SWIM service provides a single interface point to multiple data
services including airport, flight, aeronautical and weather data.
STDD Stream :
Access to data from over 200 airports.
Data from over 400 individual systems.
6.
7. Potential BTS Use Cases for SWIM Data
• Airport Time Delays
• Ground Stop History, Status and Impact
• On-time Performance estimate by causes
• System Passenger Loading
• Airline Data Quality Assurance Check
• OAG Replacement
Airport/Airline
Performance
• Freight Aircraft Location On Ground
• Air Cargo Patterns and Seasonality
• Multi-modal Cargo Movement
AirCargoTraffic
• Planned vs. Actual Flight Path Analysis
• Actual flight path deviations from the “norm”
• Fuel Cost and Ticket Price Correlation
• Financial Impact of Delays
EconomicImpactof
Delays/Diversions/
Cancellations
• Gate availability
• Flight pattern interruption
• Late Arriving flight pattern
• Morning Flight Delay Impact
Operational Impact of
Delays/Diversions/
Cancellations
• Re-direct diverted passengers
• Passenger Impact of Cancellations
Passenger Impact of
Delays/Diversions/
Cancellations
8. Data Lake
BTS Conceptual SWIM Architecture
BTS
SWIM
MSG
Service
SWIM
Data
Msg
Service
XML MSG
Processing
Economic
Impact
Weather
Impact
Ground
Movement
ITWS
Kafka
…
TFM
Flight
TFM
Flow
TBFM
FDPS
ITWS
ITWS
FAA SWIM Data Service Bureau of Transportation Statistics (BTS)
Temp Raw
XML File
Storage
Performance
Air Cargo
Traffic
DOT Virtual
Machine
FAA NW
Gateway
Mapping &
Animation
…
Data AnalyticsXML Message Handling Data Transformation
and Storage
DOT Cloud Computing Environment
Data
Analyst
Data Lake
8
13. Terraform
▪ Infrastructure as a Code
▪ Helps to automate infrastructure management.
▪ Understanding infrastructure changes before they
are applied.
▪ Allows to build, change and version infrastructure.
▪ Multi-cloud
▪ Common language for different providers.
▪ Feature rich
▪ Module Registry.
▪ Providers.
▪ Workspaces.
▪ Variables.
# Project Structure
├── LICENSE
├── README.md
├── main.tf
├── networking.tf
├── outputs.tf
├── security.tf
├── storage.tf
├── variables.tf
├── vm.tf
└── workspace.tf
# Common Commands
terraform fmt
terraform init
terraform validate
terraform plan
terraform apply
14. Configuration Management Ansible/Chef
▪ Consistency
▪ No more snowflake servers.
▪ Version Control of all configurations.
▪ Replicated environments.
▪ Scalability
▪ Add more SWIM source configurations.
▪ Easy to deploy new environments.
▪ Documentation
▪ Building-up knowledge.
▪ Change History.
15. Databricks CLI
▪ Easy Interface to Databricks
Platform
▪ Open source.
▪ Built on top of Databricks REST API.
▪ Allows you to interact with: workspace, clusters, fs,
groups, jobs, runs, libraries, and secrets.
▪ Supports multiple profiles.
▪ Experimental
▪ Still under active development.
# Create Databricks Cluster
databricks clusters create --json-file config/cluster.json
# Import Libraries
databricks libraries install --cluster-id CLUSTER_ID --maven-coordinates
com.databricks:spark-xml_2.11:0.9.0
# Import Notebooks
databricks workspace import -l PYTHON -f DBC Notebooks/TFMS.dbc
/Users/USER/tfms
#Secret Management
## Create secret scope
databricks secrets create-scope --scope swim --initial-manage-principal users
## Create new secret
databricks secrets put --scope bts-swim --key bts-swim-sp --string-value my-value
databricks secrets put --scope bts-swim --key bts-swim-sp --binary-file config/SP.txt
databricks secrets put --scope bts-swim --key bts-swim-sp
16. GitHub – GitHub Actions
▪ Automate from code to Cloud
▪ Workflow Automation
▪ Any OS, any language, and any
cloud.
21. Future State Architecture
SWIM Data Lake Architecture with Streaming SWIM into Databricks
21
Predictive Analysis and
Advanced Analytics
Bronze
Oracle
Sybase
Adhoc & Graph AnalysisSpark ETL
Silver Gold Summary/Platinum -
optional
Enrichment
OperationsSWIM DataLake
Tableau
Dashboards and Apps
Data Stores
Streaming
SWIM-TFMS
Azure Data Lake Storage
Batch
Raw XML data:,
Staging Batch
data
Parsed XML
data, Schema
Validation with
spark-xml
Joined and
Aggregated data
Potential Further
Aggregations
SWIM-Other topics
Streaming
Ingress Data ETL, Stream, and Store Data Build JIT Data Warehouse Analytics and BI
Streaming SWIM Data to Databricks
RUNTIME
23. Lessons Learned
▪ spark-xml is improving
▪ Need to investigate new features for mitigating
complex nested XML schemas
▪ XML Schema Validation
▪ Copying schema to executors mitigates File I/O
latencies by making use of memory for fast validation
▪ XML Schema Inference
▪ Batch processing of XML data at hourly or daily
periodicity based on SLAs mitigates allows for more
accurate inference
24. Next Steps
▪ Validate SWIM data against
data provided by airlines
▪ Deeper dive into predictive
modeling to gather insights on
flight delays and passengers
affected
▪ Open up data pipeline to more
SWIM data feeds