SlideShare une entreprise Scribd logo
1  sur  32
Télécharger pour lire hors ligne
Shivani Kumar, Navin Lalwani, Rohan Nanda, Han Ni, Ying Zhu

1

1
Table of Contents
○ Introduction
○ Business Question
○ Description of the Data
○ Exploratory Plots and Tables
○ Unsupervised and Supervised Analytics Models
○ Recommendations and Conclusion
○ Possible next steps

2

2
Introduction
Air travel cancellation has always been a universal problem. As more and more economic connections happen
among different countries, this issue can cause huge problems to frequent travellers, especially long-distance
travellers, such as international students and business persons. Our group members come from different parts
of the world, so this question is of key interest to us. So we decided to base our projects on the statistic data
of Bureau of Transportation Statistics of the United States, and hoped to generate some interesting insights
regarding air travel cancellation, thus to provide some useful insights for the frequent travellers mentioned
above.
Air cancellation can bring about a series of problems to various shareholders in tourism industry: the agenda of
customers get delayed, the airports get crowded, and the needs for hotel rooms rockets if a large number of
flights got cancelled on the same day due to a severe weather. On acknowledging our insights, travellers can
plan ahead accordingly, airlines and airports can make efforts to reduce cancellation based on our findings, and
hotels can plan their marketing and sales according to certain flight cancellation pattern.

3

3
Business Question
Flight cancellation can happen due to a variety of reasons. The most common causes are as follows:
1. Weather
2. Natural Disasters
3. Mechanical Errors
4. Monopoly Routes
5. Aircraft Size
Our team is interested in figuring out the different factors that will lead to a flight cancellation. After deciding
our datasets for this project and initial analysis of the datasets, we decided to focus on the following domains:
1. Segments - by the Airport ID of original airport and Destination Airport ID pair
2. Airport - by every Origin Airport ID
3. Airlines - by Airline ID
We have learned to analyze data with Decision Tree Model and Regression Model in Business Intelligence and
Data Mining class. So we decided to try both models to analyze the above mentioned factors, and choose the
best model that has the smallest average squared error at the initial stage of our analysis.
*In order to work with 2 datasets, we used SQL to combine these two datasets first before we start to
conduct the analysis using SAS Enterprise Miner.

4

4
Description of the Data
After careful observation, we choose two datasets:
(1) T100 Domestic Airline Segment Data
(2) Airline On-Time Performance Data.
Those two datasets comes from Bureau of Transportation Statistics of Research and Innovative Technology
Administration (RITA). The first dataset has more than 70k rows and contains domestic market data reported
by U.S. air carriers, including carrier, origin, destination, and service class for enplaned passengers, freight and
mail when both origin and destination airports are located within the boundaries of the United States and its
territories.1 Each month, every certificated U.S. air carriers reports their traffic information to Office of Airline
Information, using an internal normalized form named T-100, and this dataset summarized T-100 data from
1993 to 2013.
The dataset named Airline On-Time Performance Data has more than a million rows. It is collected by the
Office of Airline Information, Bureau of Transportation Statistics (BTS), and contains on-time arrival data for
non-stop domestic flights by major air carriers, and provides such additional items as departure and arrival
delays, origin and destination airports, flight numbers, scheduled and actual departure and arrival times,
cancelled or diverted flights, taxi-out and taxi-in times, air time, and non-stop distance.2
Variables Available
These two datasets have sufficient data volume and variables for data analysis on the relationship between air
traffic patterns and externalities which hereby defined as airports and airlines.
(1) T100 Domestic Airline Segment Data
This dataset supplied key insights on the factors that result in flight cancellations. The key measures of
this dataset are listed below:
Variables
DepScheduled

Departures Performed

Payload

Available Payload (pounds)

Seats

Available Seats

Passengers

2

Departures Scheduled

DepPerformed

1

Definition

Non-Stop Segment Passengers Transported

  Source: http://www.transtats.bts.gov/Fields.asp?Table_ID=259
Source: http://www.transtats.bts.gov/Fields.asp?Table_ID=236

5

5
Freight

Non-Stop Segment Freight Transported (pounds)

Mail

Non-Stop Segment Mail Transported (pounds)

Distance

Distance between airports (miles)

LoadFactor

Load Factor: Ratio of Passenger Miles to Available Seat Miles

RampTime

Ramp to Ramp Time (minutes)

AirTime

Airborne Time (minutes)

(2) Airline On-Time Performance Data
This dataset supplied the factors that affect the Delay and causes for different types of delays. The key
measures of this dataset are listed below:
Variables

Definition

CarrierDelay

Carrier Delay, in Minutes

WeatherDelay

Weather Delay, in Minutes

NASDelay

National Air System Delay, in Minutes

SecurityDelay

Security Delay, in Minutes

LateAircraftDelay

Late Aircraft Delay, in Minutes

Analysis Methodology:
1. Consolidated the data for the months of May, June and July
The first dataset contains T-100 data from 1993 to 2013 and more than 10 million records. To get
valuable and effective information, we consolidated the data from May 2013 and July 2013, and get
70,000+ records.
2. Clean and construct new variables
a) Generated variables: Flights_Cancelled, Flights_Adhoc, Adhoc?, Cancellation?
The original first dataset doesn’t have clear indicator about cancellation number, but contain
Flights_Scheduled and Flights_Performed. We subtract Flights_Performed from Flight_Scheduled and
get the number of flights with unexpected changes, including both cancellation and Adhoc. If the

6

6
unexpected changes is negative, we convert the changes into a new variable named”Flight_Cancelled”,
and if it’s positive, we convert the changes into another new variable named “Flights_Adhoc”. We also
created binary variables to show the occurrence of cancellation and adhoc, which are named
“Cancellation?” and “Adhoc?”.
Variables

Definition

Flights_Cancelled

Number of flights cancelled (Scheduled - Performed )

Flights_Adhoc

Number of flights which took off adhoc (Scheduled Performed)

Adhoc?

Binary Variable to depict adhoc flights

Cancellation?

Binary Variable to depict cancellations

b) Converted sum to average for: Passengers, Seats, Payload, Freight, Mail, Ramp_to_Ramp, AirTime
Several vital indicators which could be potential externalities impacting cancellations rates is in the sum
of the amount of all flights that day. Therefore, the actual flights numbers influence those indicators. To
exclude this bias possibility, we calculated the average number of the indicators (Total amount/ number
of flights performed) generated new variables to store the records.

Variables

Definition

Avg_Passengers

Avg_Passengers=Passengers/Departures Performed

Avg_Seats

Avg_Seats=Seats/Departures Performed

Avg_Freight

Avg_Freight=Freight/Departures Performed

Avg_Mail

Avg_Mail=Mail/Departures Performed

Avg_Ramp_to_Ramp

Avg_Ramp_to_Ramp=Ramp_to_Ramp/Departures Performed

Avg_AirTime

Avg_AirTime=AirTime/Departures Performed

3) Analyzed data individually for each of the datasets
Two datasets that we are interested in are related to flight cancellations and delays. They have different

7

7
primary keys and the internal calculation logic are intuitively different for each of these datasets.
Therefore, we decided to not to merge them, and analyzed them individually.

Exploratory Plots and Tables
We explored both our data sets to find relations between variables. Also, we tried to find interesting patterns
related to flight cancellations by using tableau.
Interesting Relationships
Using a scatter plot in the data exploration menu in SAS we were able to arrive at some interesting
relationships between key variables in our data set.
a) Departures Performed:

We plotted the variable “departures_performed” against the variable “Airline_ID” with respect to
“Flight_Cancelled”. The color blue indicates that a flight was not cancelled and the color red indicates
that a flight was cancelled. The above graph shows us that the density of the red pixels is very high for
departures exceeding 150. More specifically, airlines that had higher number of departures also
had flight cancellations.

8

8
The departures_performed variable was noted for further investigation.
b) Number of Passengers:

We plotted the variable “Total Passengers” against the variable “Airport_ID” with respect to
“Flight_Cancelled”. The color blue indicates that a flight was not cancelled and the color red
indicates that a flight was cancelled. An increase in the number of red pixels above the 2500
passenger mark can be observed. More specifically, airports that handled higher passengers also
had flight cancellations.

The total_passengers variable was noted for further investigation.
c) Distance

9

9
We plotted the variable “Distance from Origin” against the variable “Dest_Airport_ID” with respect to
“Flight_Cancelled”. The color blue indicates that a flight was not cancelled and the color red indicates
that a flight was cancelled. Distances between the 500 and 750 miles mark see a larger density of red
pixels. It can be observed that shorter distance flights see more flights cancellations.
The distance variable was noted for further investigation.

Using tableau we tried to find interesting facts about key variables.
a) Monthly Distribution of cancellations:

The charts above show that June and July are the months with the highest flight delay and
cancellations. Also, the number of flights diverted increase in the month of June and July.

10

10
b) Geographic distribution of flight delays

The three graphs above show that:
1. Georgia had the maximum flights delayed due to weather.
2. Texas had the maximum flights delayed due to security checks.
3. Thursday sees the maximum amount of flight delays.

11

11
Unsupervised and Supervised Analytics Models
For this project, we used k-means clustering, as our unsupervised model, and tried decision trees and
regression models for each of the three domains: airports, airlines and segments.
Unsupervised Learning Model
In the segments domain, on running a K-means cluster analysis, we found the following:
We had 46 clusters of segments. We were primarily interested in grouping segments based on the
departures performed and the total flights cancelled in that segment.

We determined 5 major clusters. The range of departures performed in the clusters was from 6 to 864. The
range of flights cancelled for segments in the cluster was from 0 to 75. The five clusters were in decreasing
order of frequency are:

12

12
● The largest cluster comprised of segments that had approximately 9 departures as the average for
the cluster, and 0.05 as the average of flight cancellations for the cluster.
● The next cluster comprised of segments that had approximately 55 departures as the average for
the cluster, and 0.21 as the average of flight cancellations for the cluster.
● The next cluster comprised of segments that had approximately 37.4 departures as the average for
the cluster, and 3 as the average of flight cancellations for the cluster.
● The next cluster comprised of segments that had approximately 119 departures as the average for
the cluster, and 0.39 as the average of flight cancellations for the cluster.
● The next cluster comprised of segments that had approximately 88 departures as the average for the
cluster, and 2.2 as the average of flight cancellations for the cluster.
We weren’t able to analyze a significant trend through the use of this model, so we continued with predictive
modelling.
Supervised Learning Models
The two models that we looked at were :
1. Regression
2. Decision Tree
We will finally base our analysis on one of these two models depending on which has lesser average square
error.
Regression Analysis
We conducted Regression analysis to determine the significant factors that influence flight cancellations. We
performed backward, forward and stepwise regression. The diagram below represents the regression diagram :

The following actions were performed on the data:

13

13
1. Data Partition: The data was partitioned into training and validation for basic model fitting and to prevent
overfitting the training data.

2. Impute: The data was imputed to fill in the missing values.
3. Regression Snapshots:
Stepwise Regression(With Airline ID as Target):

The ASE for Validation (Stepwise) : 0.100689

14

14
We looked at the Regressions for the other selection models too, and decided to go ahead with Stepwise as
it had the least average square error.
Output of the stepwise Regression, depicting all significant variables:

Stepwise Regression(With Origin Airport ID as Target):

The ASE for this model was 0.112633
Similarly, for the segment-wise regression model analysis, we got an ASE of 0.090134.

15

15
These errors that we saw with the Regression model were much higher than what the decision tree gave us,
so we rejected the regression model and based our analysis on the Decision Tree .
Decision Tree Analysis
Decision trees are a simple, but powerful form of multiple variable analysis. They provide unique capabilities to
supplement, complement, and substitute for traditional statistical forms of analysis. To access the important
variables in this study we apply the decision tree model in terms of SAS to acquire the critical variables in our
dataset.By cross validation, we found the most important variables for our target and conducted further
analysis to provide business suggestion on factors that affect the flight cancellations.
A) Based on Airline ID domain

Experiment Methodology:
1. Import the following dataset :
T-100 Segment data for the months of May,June and July (84,232 rows).
2. Edit variables and set different roles to all of variables

Variable

Role

Level

Airline ID

ID

Nominal

Aircraft Config

Input

Interval

Aircraft Group

Input

Interval

Aircraft Categorization

Input

Nominal

Departure Performed

Input

Interval

Class

Input

Nominal

Average Freight

Input

Interval

16

16
Average Airtime

Input

Interval

Average Total Time at ground on bot

Input

Interval

Average Mail

Input

Interval

Average Passengers

Input

Interval

Average Payload

Input

Interval

Average Ramp to Ramp

Input

Interval

Distance

Input

Interval

Month

Input

Interval

Flight Cancelled

Target

Nominal

The other variables which are not important for this analysis, were rejected.
3.Data Partition
With 70% for training and 30% for validation, all the rest is following the default setting.
4. Transformation
Variable transformations can be used to stabilize variance, remove nonlinearity, improve
additivity, and counter non-normality.The following variables were transformed in order to
address these irregularities
Variable

Method

Average Ramp to Ramp

Log

Average Payload

Log

Average Passengers

Log

Average Airtime

Log

Aircraft Categorisation

Dummy Indicator

Class

Dummy Indicator

Post transformation, the variables skewness reduced considerably and in seen in the below figures:

17

17
5. Decision Tree Analysis
Applying with Cross validation, Rest are following the default settings.
6. Results
The ASE for Validation data is : 0.078363

18

18
Decision Tree:

We also looked at the various important variables for this dataset:

The subtree assessment plot depicted that the tree was pruned such that there are 45 leaves.

19

19
7. Outcomes

For a given airline, if :
● the number of departures performed is more than approximately 3,
● the average number of passengers travelling is less than approximately 3
then there is a 99.6% probability that a flight of that airline will not be cancelled.

20

20
For a given airline, if :
● the average payload is less than 10,
● the Class is F
● the departures performed less than 49
then there is 82.4% probability that the flight would get cancelled.

For a given airline, if:
● the departures performed are more than 70,
● the average payload is more than 9 pounds,
● the average total time on ground is more than 18 minutes
then there is 83.3% probability that the flight would get cancelled.

B) Based on Airport ID
Changing the ID variable to Origin Airport ID and keeping the other configurations similar, we see the following
results:

The ASE for Validation data is 0.0987131

21

21
The decision tree:

We see that the same set of variables were important for this analysis as well:

The subtree assessment plot with the average square errors:

22

22
Outcomes

For a given Airport, if
● the departures performed more than 42,
● the average payload of less than 10 pounds,
● the average mails sent is more than 1,
then it is very unlikely (100%) that the flight would get cancelled.

For a particular Airport ID,
● the departures performed more than 70,
● they belong to Class F,
● the average payload of less than 10 pounds and Aircraft Config lesser than 2
then it is 83.6% likely that the flight would get cancelled.

23

23
C) Based on Segments (Origin Airport ID and Destination Airport ID pairs)

Experiment Methodology:
1. Import the following dataset :
T-100 Segment data for the months of May,June and July (84,232 rows).
2. Edit variables and set different roles to all of variables
Variable

Role

Level

Origin_Airport_ID

ID

Nominal

Dest_Airport_ID

ID

Nominal

flightAdHoc?

Input

Binary

Aircraft Config

Input

Interval

Aircraft Group

Input

Interval

Aircraft Categorization

Input

Nominal

Departure Performed

Input

Interval

Class

Input

Nominal

Average Freight

Input

Interval

Average Airtime

Input

Interval

Average Total Time at ground on bot

Input

Interval

Average Mail

Input

Interval

Average Passengers

Input

Interval

Average Payload

Input

Interval

Distance

Input

Interval

24

24
Month

Input

Interval

Flight Cancelled?

Target

Nominal

The other variables which are not important for this analysis were rejected.
3.Data Partition
With 70% for training and 30% for validation, all the rest is following the default setting.
4. Transformation
Variable

Method

Average Payload

Log

Average Passengers

Log

Average Airtime

Log

Aircraft Categorisation

Dummy Indicator

Class

Dummy Indicator

Post transformation, the variables skewness reduced considerably as seen in the figures depicted above in the
airline-based analysis.
5. Decision Tree Analysis
Applying with cross validation, rest are following the default settings.
6.Results
The ASE for Validation data is : 0.081963

25

25
Decision Tree:

We also looked at the various important variables for this dataset:

The subtree assessment plot depicted that the tree was pruned such that there are 36 leaves.

26

26
7. Outcomes
For a given segment, if :
● The number of departures performed is more than approximately 70,
● The average allotted payload is less than approximately 9 pounds,
then there is an 88% probability that flights in that segment will get cancelled

27

27
For a given segment, if :
● The number of departures performed is more than approximately 70,
● The average allotted payload is more than approximately 9 pounds
● The average total time on ground for both source airport and destination airport is greater than
approximately 19 minutes
then there is an 83.3% probability that flights in that segment will get cancelled

For a given segment, if :
● The number of departures performed is less than approximately 10 and greater than 2
● The flights too off randomly without schedule,
then there is a 94.7% probability that flights in that segment will get cancelled

28

28
Recommendations and Conclusion
Important Variables Venn Analysis
We performed a venn analysis on the important variables in each of the three domains and plotted them,
considering those ones that were important at arriving at our recommendations.

● Departures Performed and Avg. Payload are the most important variable in our analysis for all the

29

29
three domains. They are the game-changing decider variables that decide cancellations for segments,
airlines and airports
● Airlines and Segments share avg total time on ground at both source and destination as an
important variable. This is interesting because it is counter-intuitive. One would think that this would
appear as a decider variable for airports
● Airlines and airports share the aircraft_class variable as common
● FlightAdHoc, Avg. Passengers, and Airport Config and Avg Mails are important for segments,
airlines and airports respectively

Findings and Recommendations
Segments
Findings:
● In segments that have flights with very less payload on an average (< 8 pounds) but fly
frequently are likely to get cancelled. Moreover, the segments that have flights with higher
payloads and fly frequently, but spend more than 18 minutes at both the source and destination
airports are also likely to get cancelled.
● In segments that have flights with few departures and are taking off without being scheduled
see less or no cancellations.
Recommendations:
● The airport should pilot a program to redirect a few congested segments’ traffic to runways
that handle the non-scheduled flights. Based on the results, it can determine whether priority
given to non-scheduled aircrafts was causing cancellations.
● A new runway should be opened to speed up ground handling and reduce the average time
spent for higher payload aircrafts on ground at both source and destination
● The airport is accommodating flights of non-congested segments, that too flights that are not
scheduled. However, congested, heavy-traffic segments but with less or no passengers are
being cancelled, and those with passengers and cargo, and those that take time on the ground
at both source and destination, are being cancelled.

Airlines:
Findings:
● For small flights (accommodating three or lesser people) that fly more often (more than 3
departures) have very little chance of getting cancelled.
● For flights that fly more often with little payload (lesser than 9 pounds) tend to get cancelled
more often. They also spend a considerable about of time at the airports (18 minutes).
Recommendations:
● The last recommendation for the segments ties into the same for the airlines domain. Ground
crew of airline companies should make sure that quick ground handling time is instilled at the

30

30
airport for
higher payload aircrafts on ground at both source and destination
● The payload analysis from segments complies with our finding for aircrafts with lesser number
of passengers. Just as it was found that less payload but high departure segment flights were
getting cancelled, the same for airlines hold true. Airlines ground staff at airports should be
alert when these flights are schedules to arrive and depart at airports, to make sure that
handling time is fast.
Airports:
Findings:
● For airports with frequent departures (more than 70) with relatively lesser payload ( 10 pounds
or lesser) and belonging to Class F, and with avg. mails being loaded into the aircrafts, it is very
likely that these flights would get cancelled.
Recommendations:
● As these delays affect a large population, the airports should work on Scheduled
Passenger/cargo service flights to understand why these flights result in frequent cancellations.
From our findings, it is apparent that the handling time, in terms of baggage and mail loading
into the aircrafts, is deciding the cancellations, apart from other important variables. In
conclusion, handling at the airports is taking time.

31

31
Possible next steps
According to Wall Street Journal, illness, family emergencies, and rescheduled business meetings are a big
business for airline companies. 3 At some airlines, the resulting change fee and penalties passengers ended up
paying added up to $2 billion a year, which is even higher than the total baggage fees. If airlines can delve more
into the seasonal client data to figure out a cancellation pattern from the passenger’s side, adjust change fees
and penalties according to the patterns discovered, the airlines can generate a higher revenue based on that
finding.

3

Source: http://online.wsj.com/news/articles/SB10001424052970204563304574318212311819146

32

32

Contenu connexe

Tendances

Flight delay detection data mining project
Flight delay detection data mining projectFlight delay detection data mining project
Flight delay detection data mining projectAkshay Kumar Bhushan
 
Flight Arrival Delay Prediction
Flight Arrival Delay PredictionFlight Arrival Delay Prediction
Flight Arrival Delay PredictionShabnam Abghari
 
SAS Ron Cody Solutions for even Number problems from Chapter 7 to 15
SAS Ron Cody Solutions for even Number problems from Chapter 7 to 15SAS Ron Cody Solutions for even Number problems from Chapter 7 to 15
SAS Ron Cody Solutions for even Number problems from Chapter 7 to 15Ayapparaj SKS
 
Airline reservation system db design
Airline reservation system db designAirline reservation system db design
Airline reservation system db designUC San Diego
 
Askari Aviation Marketing Service Report
Askari Aviation Marketing Service ReportAskari Aviation Marketing Service Report
Askari Aviation Marketing Service ReportMuhammad Zeeshan Baloch
 
Airlines Reservation System
Airlines Reservation SystemAirlines Reservation System
Airlines Reservation SystemAnit Thapaliya
 
Competing information systems used in emirates
Competing information systems used in emiratesCompeting information systems used in emirates
Competing information systems used in emiratesKevin Philip Joseph
 
construction of Reservation software solution for Airline Companies project ...
construction of  Reservation software solution for Airline Companies project ...construction of  Reservation software solution for Airline Companies project ...
construction of Reservation software solution for Airline Companies project ...Hagi Sahib
 
SRS on airline reservation system
SRS on airline reservation system SRS on airline reservation system
SRS on airline reservation system VikasSingh958
 
Southwest Airline 2009
Southwest Airline 2009Southwest Airline 2009
Southwest Airline 2009Shakhzod44
 
Operations management - Airline Scheduling
Operations management - Airline SchedulingOperations management - Airline Scheduling
Operations management - Airline SchedulingAshish Saxena
 
Revenue Management for Airline Industry โดย รศ.ดร.กาญจ์นภา อมรัชกุล
Revenue Management for Airline Industry โดย รศ.ดร.กาญจ์นภา  อมรัชกุลRevenue Management for Airline Industry โดย รศ.ดร.กาญจ์นภา  อมรัชกุล
Revenue Management for Airline Industry โดย รศ.ดร.กาญจ์นภา อมรัชกุลBAINIDA
 
A case study in offshore wind farm project management
A case study in offshore wind farm project managementA case study in offshore wind farm project management
A case study in offshore wind farm project managementmtingle
 
Airline Flight Tracking
Airline Flight TrackingAirline Flight Tracking
Airline Flight Trackingmariasinha81
 
Airline reservation system
Airline reservation systemAirline reservation system
Airline reservation systemUnsa Jawaid
 
Airport service presentation
Airport service presentationAirport service presentation
Airport service presentationjonchan726
 
The Short and Winding Road to 2030 - Measuring Distance to the SDG Targets, R...
The Short and Winding Road to 2030 - Measuring Distance to the SDG Targets, R...The Short and Winding Road to 2030 - Measuring Distance to the SDG Targets, R...
The Short and Winding Road to 2030 - Measuring Distance to the SDG Targets, R...StatsCommunications
 
Electronic Ticketing Machine (ETM)
Electronic Ticketing Machine (ETM)Electronic Ticketing Machine (ETM)
Electronic Ticketing Machine (ETM)Navas Kilikkottu
 

Tendances (20)

Flight delay detection data mining project
Flight delay detection data mining projectFlight delay detection data mining project
Flight delay detection data mining project
 
Flight Arrival Delay Prediction
Flight Arrival Delay PredictionFlight Arrival Delay Prediction
Flight Arrival Delay Prediction
 
SAS Ron Cody Solutions for even Number problems from Chapter 7 to 15
SAS Ron Cody Solutions for even Number problems from Chapter 7 to 15SAS Ron Cody Solutions for even Number problems from Chapter 7 to 15
SAS Ron Cody Solutions for even Number problems from Chapter 7 to 15
 
Big Data For Flight Delay Report
Big Data For Flight Delay ReportBig Data For Flight Delay Report
Big Data For Flight Delay Report
 
Airline reservation system db design
Airline reservation system db designAirline reservation system db design
Airline reservation system db design
 
Askari Aviation Marketing Service Report
Askari Aviation Marketing Service ReportAskari Aviation Marketing Service Report
Askari Aviation Marketing Service Report
 
Airlines Reservation System
Airlines Reservation SystemAirlines Reservation System
Airlines Reservation System
 
Competing information systems used in emirates
Competing information systems used in emiratesCompeting information systems used in emirates
Competing information systems used in emirates
 
construction of Reservation software solution for Airline Companies project ...
construction of  Reservation software solution for Airline Companies project ...construction of  Reservation software solution for Airline Companies project ...
construction of Reservation software solution for Airline Companies project ...
 
SRS on airline reservation system
SRS on airline reservation system SRS on airline reservation system
SRS on airline reservation system
 
Southwest Airline 2009
Southwest Airline 2009Southwest Airline 2009
Southwest Airline 2009
 
Operations management - Airline Scheduling
Operations management - Airline SchedulingOperations management - Airline Scheduling
Operations management - Airline Scheduling
 
Revenue Management for Airline Industry โดย รศ.ดร.กาญจ์นภา อมรัชกุล
Revenue Management for Airline Industry โดย รศ.ดร.กาญจ์นภา  อมรัชกุลRevenue Management for Airline Industry โดย รศ.ดร.กาญจ์นภา  อมรัชกุล
Revenue Management for Airline Industry โดย รศ.ดร.กาญจ์นภา อมรัชกุล
 
A case study in offshore wind farm project management
A case study in offshore wind farm project managementA case study in offshore wind farm project management
A case study in offshore wind farm project management
 
Etihad CSR report 2012
Etihad CSR report 2012Etihad CSR report 2012
Etihad CSR report 2012
 
Airline Flight Tracking
Airline Flight TrackingAirline Flight Tracking
Airline Flight Tracking
 
Airline reservation system
Airline reservation systemAirline reservation system
Airline reservation system
 
Airport service presentation
Airport service presentationAirport service presentation
Airport service presentation
 
The Short and Winding Road to 2030 - Measuring Distance to the SDG Targets, R...
The Short and Winding Road to 2030 - Measuring Distance to the SDG Targets, R...The Short and Winding Road to 2030 - Measuring Distance to the SDG Targets, R...
The Short and Winding Road to 2030 - Measuring Distance to the SDG Targets, R...
 
Electronic Ticketing Machine (ETM)
Electronic Ticketing Machine (ETM)Electronic Ticketing Machine (ETM)
Electronic Ticketing Machine (ETM)
 

En vedette

Big Data Analytics & Travel Industry – The Best Deal Around
Big Data Analytics & Travel Industry – The Best Deal AroundBig Data Analytics & Travel Industry – The Best Deal Around
Big Data Analytics & Travel Industry – The Best Deal AroundSPEC INDIA
 
Travel and hospitality industry - 2017 analytics landscape
Travel and hospitality industry - 2017 analytics landscapeTravel and hospitality industry - 2017 analytics landscape
Travel and hospitality industry - 2017 analytics landscapeMetriplica
 
Machine Learning Travel Industry
Machine Learning   Travel IndustryMachine Learning   Travel Industry
Machine Learning Travel IndustryVijay PG
 
Web Analytics 101 - Getaway Travel Blog Conference 2012
Web Analytics 101 - Getaway Travel Blog Conference 2012Web Analytics 101 - Getaway Travel Blog Conference 2012
Web Analytics 101 - Getaway Travel Blog Conference 2012Joseph C Lawrence
 
Analytics Best Practice for the Travel Industry
Analytics Best Practice for the Travel IndustryAnalytics Best Practice for the Travel Industry
Analytics Best Practice for the Travel IndustryAdam Lee
 
Visual Analytics for Large Scale Travel Models
Visual Analytics for Large Scale Travel Models Visual Analytics for Large Scale Travel Models
Visual Analytics for Large Scale Travel Models JumpingJaq
 
make mytrip--service-marketing
make mytrip--service-marketingmake mytrip--service-marketing
make mytrip--service-marketingshivam_01
 
Deep Learning through Examples
Deep Learning through ExamplesDeep Learning through Examples
Deep Learning through ExamplesSri Ambati
 

En vedette (10)

Big Data Analytics & Travel Industry – The Best Deal Around
Big Data Analytics & Travel Industry – The Best Deal AroundBig Data Analytics & Travel Industry – The Best Deal Around
Big Data Analytics & Travel Industry – The Best Deal Around
 
Travel and hospitality industry - 2017 analytics landscape
Travel and hospitality industry - 2017 analytics landscapeTravel and hospitality industry - 2017 analytics landscape
Travel and hospitality industry - 2017 analytics landscape
 
Machine Learning Travel Industry
Machine Learning   Travel IndustryMachine Learning   Travel Industry
Machine Learning Travel Industry
 
Web Analytics 101 - Getaway Travel Blog Conference 2012
Web Analytics 101 - Getaway Travel Blog Conference 2012Web Analytics 101 - Getaway Travel Blog Conference 2012
Web Analytics 101 - Getaway Travel Blog Conference 2012
 
Analytics Best Practice for the Travel Industry
Analytics Best Practice for the Travel IndustryAnalytics Best Practice for the Travel Industry
Analytics Best Practice for the Travel Industry
 
Travel Industry Overview
Travel Industry OverviewTravel Industry Overview
Travel Industry Overview
 
Visual Analytics for Large Scale Travel Models
Visual Analytics for Large Scale Travel Models Visual Analytics for Large Scale Travel Models
Visual Analytics for Large Scale Travel Models
 
make mytrip--service-marketing
make mytrip--service-marketingmake mytrip--service-marketing
make mytrip--service-marketing
 
Makemytrip
MakemytripMakemytrip
Makemytrip
 
Deep Learning through Examples
Deep Learning through ExamplesDeep Learning through Examples
Deep Learning through Examples
 

Similaire à Air Travel Analytics in SAS

big data slides.pptx
big data slides.pptxbig data slides.pptx
big data slides.pptxBSwethaBindu
 
Airline Economics - Planning And Key Performance Indicators Practical Guide F...
Airline Economics - Planning And Key Performance Indicators Practical Guide F...Airline Economics - Planning And Key Performance Indicators Practical Guide F...
Airline Economics - Planning And Key Performance Indicators Practical Guide F...Amy Cernava
 
Predicting flight cancellation likelihood
Predicting flight cancellation likelihoodPredicting flight cancellation likelihood
Predicting flight cancellation likelihoodAashish Jain
 
Airfare Analysis of Domestic Airlines in U.S.
Airfare Analysis of Domestic Airlines in U.S.Airfare Analysis of Domestic Airlines in U.S.
Airfare Analysis of Domestic Airlines in U.S.ABHISHEKDAHALE
 
PRESENTATION ON CHALLENGE lab_084627 (1).pptx
PRESENTATION ON CHALLENGE lab_084627 (1).pptxPRESENTATION ON CHALLENGE lab_084627 (1).pptx
PRESENTATION ON CHALLENGE lab_084627 (1).pptxMUSAIDRIS15
 
Random Forest Ensemble learning algorithm for Engineering Analytics Project
Random Forest Ensemble learning algorithm for Engineering Analytics ProjectRandom Forest Ensemble learning algorithm for Engineering Analytics Project
Random Forest Ensemble learning algorithm for Engineering Analytics ProjectSaurabh Kale
 
AVM 3201 – Aviation Planning Case Study Deer Valley Airpor.docx
AVM 3201 – Aviation Planning Case Study Deer Valley Airpor.docxAVM 3201 – Aviation Planning Case Study Deer Valley Airpor.docx
AVM 3201 – Aviation Planning Case Study Deer Valley Airpor.docxrock73
 
A Novel Approach To The Weight and Balance Calculation for The De Haviland Ca...
A Novel Approach To The Weight and Balance Calculation for The De Haviland Ca...A Novel Approach To The Weight and Balance Calculation for The De Haviland Ca...
A Novel Approach To The Weight and Balance Calculation for The De Haviland Ca...CSCJournals
 
Application of Data Science in the Airline industry
Application of Data Science in the Airline industryApplication of Data Science in the Airline industry
Application of Data Science in the Airline industryEshaNair4
 
Predicting 2016 Airlines Performance
Predicting 2016   Airlines Performance Predicting 2016   Airlines Performance
Predicting 2016 Airlines Performance Mohammed Awad
 
ITA-software-travel-complexity.pdf
ITA-software-travel-complexity.pdfITA-software-travel-complexity.pdf
ITA-software-travel-complexity.pdfmustafe39
 
srs for railway reservation system
 srs for railway reservation system srs for railway reservation system
srs for railway reservation systemkhushi kalaria
 
Aviation articles - Aircraft Evaluation and selection
Aviation articles - Aircraft Evaluation and selectionAviation articles - Aircraft Evaluation and selection
Aviation articles - Aircraft Evaluation and selectionMohammed Hadi
 
Is Low Cost Carrier Profitable - Norwegian article - Issue No. 1
Is Low Cost Carrier Profitable  - Norwegian article - Issue No. 1Is Low Cost Carrier Profitable  - Norwegian article - Issue No. 1
Is Low Cost Carrier Profitable - Norwegian article - Issue No. 1Mohammed Awad
 
SAS Programming and Data Analysis Portfolio - BTReilly
SAS Programming and Data Analysis Portfolio - BTReillySAS Programming and Data Analysis Portfolio - BTReilly
SAS Programming and Data Analysis Portfolio - BTReillyBrian Reilly
 
software testing micro projectnnnn(1)22.pptx
software testing micro projectnnnn(1)22.pptxsoftware testing micro projectnnnn(1)22.pptx
software testing micro projectnnnn(1)22.pptx40NehaPagariya
 
Sample reports_2015
Sample reports_2015Sample reports_2015
Sample reports_2015Alexis Cohen
 
Flight data analysis using apache pig--------------Final Year Project
Flight data analysis using apache pig--------------Final Year ProjectFlight data analysis using apache pig--------------Final Year Project
Flight data analysis using apache pig--------------Final Year ProjectSanjib Mitra
 
CDG - Paris Charles De Gaulle Airport
CDG - Paris Charles De Gaulle AirportCDG - Paris Charles De Gaulle Airport
CDG - Paris Charles De Gaulle AirportMohammed Awad
 

Similaire à Air Travel Analytics in SAS (20)

big data slides.pptx
big data slides.pptxbig data slides.pptx
big data slides.pptx
 
Airline Economics - Planning And Key Performance Indicators Practical Guide F...
Airline Economics - Planning And Key Performance Indicators Practical Guide F...Airline Economics - Planning And Key Performance Indicators Practical Guide F...
Airline Economics - Planning And Key Performance Indicators Practical Guide F...
 
Predicting flight cancellation likelihood
Predicting flight cancellation likelihoodPredicting flight cancellation likelihood
Predicting flight cancellation likelihood
 
Airfare Analysis of Domestic Airlines in U.S.
Airfare Analysis of Domestic Airlines in U.S.Airfare Analysis of Domestic Airlines in U.S.
Airfare Analysis of Domestic Airlines in U.S.
 
PRESENTATION ON CHALLENGE lab_084627 (1).pptx
PRESENTATION ON CHALLENGE lab_084627 (1).pptxPRESENTATION ON CHALLENGE lab_084627 (1).pptx
PRESENTATION ON CHALLENGE lab_084627 (1).pptx
 
Random Forest Ensemble learning algorithm for Engineering Analytics Project
Random Forest Ensemble learning algorithm for Engineering Analytics ProjectRandom Forest Ensemble learning algorithm for Engineering Analytics Project
Random Forest Ensemble learning algorithm for Engineering Analytics Project
 
AVM 3201 – Aviation Planning Case Study Deer Valley Airpor.docx
AVM 3201 – Aviation Planning Case Study Deer Valley Airpor.docxAVM 3201 – Aviation Planning Case Study Deer Valley Airpor.docx
AVM 3201 – Aviation Planning Case Study Deer Valley Airpor.docx
 
A Novel Approach To The Weight and Balance Calculation for The De Haviland Ca...
A Novel Approach To The Weight and Balance Calculation for The De Haviland Ca...A Novel Approach To The Weight and Balance Calculation for The De Haviland Ca...
A Novel Approach To The Weight and Balance Calculation for The De Haviland Ca...
 
Application of Data Science in the Airline industry
Application of Data Science in the Airline industryApplication of Data Science in the Airline industry
Application of Data Science in the Airline industry
 
Predicting 2016 Airlines Performance
Predicting 2016   Airlines Performance Predicting 2016   Airlines Performance
Predicting 2016 Airlines Performance
 
ITA-software-travel-complexity.pdf
ITA-software-travel-complexity.pdfITA-software-travel-complexity.pdf
ITA-software-travel-complexity.pdf
 
srs for railway reservation system
 srs for railway reservation system srs for railway reservation system
srs for railway reservation system
 
Aviation articles - Aircraft Evaluation and selection
Aviation articles - Aircraft Evaluation and selectionAviation articles - Aircraft Evaluation and selection
Aviation articles - Aircraft Evaluation and selection
 
Is Low Cost Carrier Profitable - Norwegian article - Issue No. 1
Is Low Cost Carrier Profitable  - Norwegian article - Issue No. 1Is Low Cost Carrier Profitable  - Norwegian article - Issue No. 1
Is Low Cost Carrier Profitable - Norwegian article - Issue No. 1
 
SAS Programming and Data Analysis Portfolio - BTReilly
SAS Programming and Data Analysis Portfolio - BTReillySAS Programming and Data Analysis Portfolio - BTReilly
SAS Programming and Data Analysis Portfolio - BTReilly
 
software testing micro projectnnnn(1)22.pptx
software testing micro projectnnnn(1)22.pptxsoftware testing micro projectnnnn(1)22.pptx
software testing micro projectnnnn(1)22.pptx
 
Taking the lead
Taking the leadTaking the lead
Taking the lead
 
Sample reports_2015
Sample reports_2015Sample reports_2015
Sample reports_2015
 
Flight data analysis using apache pig--------------Final Year Project
Flight data analysis using apache pig--------------Final Year ProjectFlight data analysis using apache pig--------------Final Year Project
Flight data analysis using apache pig--------------Final Year Project
 
CDG - Paris Charles De Gaulle Airport
CDG - Paris Charles De Gaulle AirportCDG - Paris Charles De Gaulle Airport
CDG - Paris Charles De Gaulle Airport
 

Dernier

2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 

Dernier (20)

2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 

Air Travel Analytics in SAS

  • 1. Shivani Kumar, Navin Lalwani, Rohan Nanda, Han Ni, Ying Zhu 1 1
  • 2. Table of Contents ○ Introduction ○ Business Question ○ Description of the Data ○ Exploratory Plots and Tables ○ Unsupervised and Supervised Analytics Models ○ Recommendations and Conclusion ○ Possible next steps 2 2
  • 3. Introduction Air travel cancellation has always been a universal problem. As more and more economic connections happen among different countries, this issue can cause huge problems to frequent travellers, especially long-distance travellers, such as international students and business persons. Our group members come from different parts of the world, so this question is of key interest to us. So we decided to base our projects on the statistic data of Bureau of Transportation Statistics of the United States, and hoped to generate some interesting insights regarding air travel cancellation, thus to provide some useful insights for the frequent travellers mentioned above. Air cancellation can bring about a series of problems to various shareholders in tourism industry: the agenda of customers get delayed, the airports get crowded, and the needs for hotel rooms rockets if a large number of flights got cancelled on the same day due to a severe weather. On acknowledging our insights, travellers can plan ahead accordingly, airlines and airports can make efforts to reduce cancellation based on our findings, and hotels can plan their marketing and sales according to certain flight cancellation pattern. 3 3
  • 4. Business Question Flight cancellation can happen due to a variety of reasons. The most common causes are as follows: 1. Weather 2. Natural Disasters 3. Mechanical Errors 4. Monopoly Routes 5. Aircraft Size Our team is interested in figuring out the different factors that will lead to a flight cancellation. After deciding our datasets for this project and initial analysis of the datasets, we decided to focus on the following domains: 1. Segments - by the Airport ID of original airport and Destination Airport ID pair 2. Airport - by every Origin Airport ID 3. Airlines - by Airline ID We have learned to analyze data with Decision Tree Model and Regression Model in Business Intelligence and Data Mining class. So we decided to try both models to analyze the above mentioned factors, and choose the best model that has the smallest average squared error at the initial stage of our analysis. *In order to work with 2 datasets, we used SQL to combine these two datasets first before we start to conduct the analysis using SAS Enterprise Miner. 4 4
  • 5. Description of the Data After careful observation, we choose two datasets: (1) T100 Domestic Airline Segment Data (2) Airline On-Time Performance Data. Those two datasets comes from Bureau of Transportation Statistics of Research and Innovative Technology Administration (RITA). The first dataset has more than 70k rows and contains domestic market data reported by U.S. air carriers, including carrier, origin, destination, and service class for enplaned passengers, freight and mail when both origin and destination airports are located within the boundaries of the United States and its territories.1 Each month, every certificated U.S. air carriers reports their traffic information to Office of Airline Information, using an internal normalized form named T-100, and this dataset summarized T-100 data from 1993 to 2013. The dataset named Airline On-Time Performance Data has more than a million rows. It is collected by the Office of Airline Information, Bureau of Transportation Statistics (BTS), and contains on-time arrival data for non-stop domestic flights by major air carriers, and provides such additional items as departure and arrival delays, origin and destination airports, flight numbers, scheduled and actual departure and arrival times, cancelled or diverted flights, taxi-out and taxi-in times, air time, and non-stop distance.2 Variables Available These two datasets have sufficient data volume and variables for data analysis on the relationship between air traffic patterns and externalities which hereby defined as airports and airlines. (1) T100 Domestic Airline Segment Data This dataset supplied key insights on the factors that result in flight cancellations. The key measures of this dataset are listed below: Variables DepScheduled Departures Performed Payload Available Payload (pounds) Seats Available Seats Passengers 2 Departures Scheduled DepPerformed 1 Definition Non-Stop Segment Passengers Transported   Source: http://www.transtats.bts.gov/Fields.asp?Table_ID=259 Source: http://www.transtats.bts.gov/Fields.asp?Table_ID=236 5 5
  • 6. Freight Non-Stop Segment Freight Transported (pounds) Mail Non-Stop Segment Mail Transported (pounds) Distance Distance between airports (miles) LoadFactor Load Factor: Ratio of Passenger Miles to Available Seat Miles RampTime Ramp to Ramp Time (minutes) AirTime Airborne Time (minutes) (2) Airline On-Time Performance Data This dataset supplied the factors that affect the Delay and causes for different types of delays. The key measures of this dataset are listed below: Variables Definition CarrierDelay Carrier Delay, in Minutes WeatherDelay Weather Delay, in Minutes NASDelay National Air System Delay, in Minutes SecurityDelay Security Delay, in Minutes LateAircraftDelay Late Aircraft Delay, in Minutes Analysis Methodology: 1. Consolidated the data for the months of May, June and July The first dataset contains T-100 data from 1993 to 2013 and more than 10 million records. To get valuable and effective information, we consolidated the data from May 2013 and July 2013, and get 70,000+ records. 2. Clean and construct new variables a) Generated variables: Flights_Cancelled, Flights_Adhoc, Adhoc?, Cancellation? The original first dataset doesn’t have clear indicator about cancellation number, but contain Flights_Scheduled and Flights_Performed. We subtract Flights_Performed from Flight_Scheduled and get the number of flights with unexpected changes, including both cancellation and Adhoc. If the 6 6
  • 7. unexpected changes is negative, we convert the changes into a new variable named”Flight_Cancelled”, and if it’s positive, we convert the changes into another new variable named “Flights_Adhoc”. We also created binary variables to show the occurrence of cancellation and adhoc, which are named “Cancellation?” and “Adhoc?”. Variables Definition Flights_Cancelled Number of flights cancelled (Scheduled - Performed ) Flights_Adhoc Number of flights which took off adhoc (Scheduled Performed) Adhoc? Binary Variable to depict adhoc flights Cancellation? Binary Variable to depict cancellations b) Converted sum to average for: Passengers, Seats, Payload, Freight, Mail, Ramp_to_Ramp, AirTime Several vital indicators which could be potential externalities impacting cancellations rates is in the sum of the amount of all flights that day. Therefore, the actual flights numbers influence those indicators. To exclude this bias possibility, we calculated the average number of the indicators (Total amount/ number of flights performed) generated new variables to store the records. Variables Definition Avg_Passengers Avg_Passengers=Passengers/Departures Performed Avg_Seats Avg_Seats=Seats/Departures Performed Avg_Freight Avg_Freight=Freight/Departures Performed Avg_Mail Avg_Mail=Mail/Departures Performed Avg_Ramp_to_Ramp Avg_Ramp_to_Ramp=Ramp_to_Ramp/Departures Performed Avg_AirTime Avg_AirTime=AirTime/Departures Performed 3) Analyzed data individually for each of the datasets Two datasets that we are interested in are related to flight cancellations and delays. They have different 7 7
  • 8. primary keys and the internal calculation logic are intuitively different for each of these datasets. Therefore, we decided to not to merge them, and analyzed them individually. Exploratory Plots and Tables We explored both our data sets to find relations between variables. Also, we tried to find interesting patterns related to flight cancellations by using tableau. Interesting Relationships Using a scatter plot in the data exploration menu in SAS we were able to arrive at some interesting relationships between key variables in our data set. a) Departures Performed: We plotted the variable “departures_performed” against the variable “Airline_ID” with respect to “Flight_Cancelled”. The color blue indicates that a flight was not cancelled and the color red indicates that a flight was cancelled. The above graph shows us that the density of the red pixels is very high for departures exceeding 150. More specifically, airlines that had higher number of departures also had flight cancellations. 8 8
  • 9. The departures_performed variable was noted for further investigation. b) Number of Passengers: We plotted the variable “Total Passengers” against the variable “Airport_ID” with respect to “Flight_Cancelled”. The color blue indicates that a flight was not cancelled and the color red indicates that a flight was cancelled. An increase in the number of red pixels above the 2500 passenger mark can be observed. More specifically, airports that handled higher passengers also had flight cancellations. The total_passengers variable was noted for further investigation. c) Distance 9 9
  • 10. We plotted the variable “Distance from Origin” against the variable “Dest_Airport_ID” with respect to “Flight_Cancelled”. The color blue indicates that a flight was not cancelled and the color red indicates that a flight was cancelled. Distances between the 500 and 750 miles mark see a larger density of red pixels. It can be observed that shorter distance flights see more flights cancellations. The distance variable was noted for further investigation. Using tableau we tried to find interesting facts about key variables. a) Monthly Distribution of cancellations: The charts above show that June and July are the months with the highest flight delay and cancellations. Also, the number of flights diverted increase in the month of June and July. 10 10
  • 11. b) Geographic distribution of flight delays The three graphs above show that: 1. Georgia had the maximum flights delayed due to weather. 2. Texas had the maximum flights delayed due to security checks. 3. Thursday sees the maximum amount of flight delays. 11 11
  • 12. Unsupervised and Supervised Analytics Models For this project, we used k-means clustering, as our unsupervised model, and tried decision trees and regression models for each of the three domains: airports, airlines and segments. Unsupervised Learning Model In the segments domain, on running a K-means cluster analysis, we found the following: We had 46 clusters of segments. We were primarily interested in grouping segments based on the departures performed and the total flights cancelled in that segment. We determined 5 major clusters. The range of departures performed in the clusters was from 6 to 864. The range of flights cancelled for segments in the cluster was from 0 to 75. The five clusters were in decreasing order of frequency are: 12 12
  • 13. ● The largest cluster comprised of segments that had approximately 9 departures as the average for the cluster, and 0.05 as the average of flight cancellations for the cluster. ● The next cluster comprised of segments that had approximately 55 departures as the average for the cluster, and 0.21 as the average of flight cancellations for the cluster. ● The next cluster comprised of segments that had approximately 37.4 departures as the average for the cluster, and 3 as the average of flight cancellations for the cluster. ● The next cluster comprised of segments that had approximately 119 departures as the average for the cluster, and 0.39 as the average of flight cancellations for the cluster. ● The next cluster comprised of segments that had approximately 88 departures as the average for the cluster, and 2.2 as the average of flight cancellations for the cluster. We weren’t able to analyze a significant trend through the use of this model, so we continued with predictive modelling. Supervised Learning Models The two models that we looked at were : 1. Regression 2. Decision Tree We will finally base our analysis on one of these two models depending on which has lesser average square error. Regression Analysis We conducted Regression analysis to determine the significant factors that influence flight cancellations. We performed backward, forward and stepwise regression. The diagram below represents the regression diagram : The following actions were performed on the data: 13 13
  • 14. 1. Data Partition: The data was partitioned into training and validation for basic model fitting and to prevent overfitting the training data. 2. Impute: The data was imputed to fill in the missing values. 3. Regression Snapshots: Stepwise Regression(With Airline ID as Target): The ASE for Validation (Stepwise) : 0.100689 14 14
  • 15. We looked at the Regressions for the other selection models too, and decided to go ahead with Stepwise as it had the least average square error. Output of the stepwise Regression, depicting all significant variables: Stepwise Regression(With Origin Airport ID as Target): The ASE for this model was 0.112633 Similarly, for the segment-wise regression model analysis, we got an ASE of 0.090134. 15 15
  • 16. These errors that we saw with the Regression model were much higher than what the decision tree gave us, so we rejected the regression model and based our analysis on the Decision Tree . Decision Tree Analysis Decision trees are a simple, but powerful form of multiple variable analysis. They provide unique capabilities to supplement, complement, and substitute for traditional statistical forms of analysis. To access the important variables in this study we apply the decision tree model in terms of SAS to acquire the critical variables in our dataset.By cross validation, we found the most important variables for our target and conducted further analysis to provide business suggestion on factors that affect the flight cancellations. A) Based on Airline ID domain Experiment Methodology: 1. Import the following dataset : T-100 Segment data for the months of May,June and July (84,232 rows). 2. Edit variables and set different roles to all of variables Variable Role Level Airline ID ID Nominal Aircraft Config Input Interval Aircraft Group Input Interval Aircraft Categorization Input Nominal Departure Performed Input Interval Class Input Nominal Average Freight Input Interval 16 16
  • 17. Average Airtime Input Interval Average Total Time at ground on bot Input Interval Average Mail Input Interval Average Passengers Input Interval Average Payload Input Interval Average Ramp to Ramp Input Interval Distance Input Interval Month Input Interval Flight Cancelled Target Nominal The other variables which are not important for this analysis, were rejected. 3.Data Partition With 70% for training and 30% for validation, all the rest is following the default setting. 4. Transformation Variable transformations can be used to stabilize variance, remove nonlinearity, improve additivity, and counter non-normality.The following variables were transformed in order to address these irregularities Variable Method Average Ramp to Ramp Log Average Payload Log Average Passengers Log Average Airtime Log Aircraft Categorisation Dummy Indicator Class Dummy Indicator Post transformation, the variables skewness reduced considerably and in seen in the below figures: 17 17
  • 18. 5. Decision Tree Analysis Applying with Cross validation, Rest are following the default settings. 6. Results The ASE for Validation data is : 0.078363 18 18
  • 19. Decision Tree: We also looked at the various important variables for this dataset: The subtree assessment plot depicted that the tree was pruned such that there are 45 leaves. 19 19
  • 20. 7. Outcomes For a given airline, if : ● the number of departures performed is more than approximately 3, ● the average number of passengers travelling is less than approximately 3 then there is a 99.6% probability that a flight of that airline will not be cancelled. 20 20
  • 21. For a given airline, if : ● the average payload is less than 10, ● the Class is F ● the departures performed less than 49 then there is 82.4% probability that the flight would get cancelled. For a given airline, if: ● the departures performed are more than 70, ● the average payload is more than 9 pounds, ● the average total time on ground is more than 18 minutes then there is 83.3% probability that the flight would get cancelled. B) Based on Airport ID Changing the ID variable to Origin Airport ID and keeping the other configurations similar, we see the following results: The ASE for Validation data is 0.0987131 21 21
  • 22. The decision tree: We see that the same set of variables were important for this analysis as well: The subtree assessment plot with the average square errors: 22 22
  • 23. Outcomes For a given Airport, if ● the departures performed more than 42, ● the average payload of less than 10 pounds, ● the average mails sent is more than 1, then it is very unlikely (100%) that the flight would get cancelled. For a particular Airport ID, ● the departures performed more than 70, ● they belong to Class F, ● the average payload of less than 10 pounds and Aircraft Config lesser than 2 then it is 83.6% likely that the flight would get cancelled. 23 23
  • 24. C) Based on Segments (Origin Airport ID and Destination Airport ID pairs) Experiment Methodology: 1. Import the following dataset : T-100 Segment data for the months of May,June and July (84,232 rows). 2. Edit variables and set different roles to all of variables Variable Role Level Origin_Airport_ID ID Nominal Dest_Airport_ID ID Nominal flightAdHoc? Input Binary Aircraft Config Input Interval Aircraft Group Input Interval Aircraft Categorization Input Nominal Departure Performed Input Interval Class Input Nominal Average Freight Input Interval Average Airtime Input Interval Average Total Time at ground on bot Input Interval Average Mail Input Interval Average Passengers Input Interval Average Payload Input Interval Distance Input Interval 24 24
  • 25. Month Input Interval Flight Cancelled? Target Nominal The other variables which are not important for this analysis were rejected. 3.Data Partition With 70% for training and 30% for validation, all the rest is following the default setting. 4. Transformation Variable Method Average Payload Log Average Passengers Log Average Airtime Log Aircraft Categorisation Dummy Indicator Class Dummy Indicator Post transformation, the variables skewness reduced considerably as seen in the figures depicted above in the airline-based analysis. 5. Decision Tree Analysis Applying with cross validation, rest are following the default settings. 6.Results The ASE for Validation data is : 0.081963 25 25
  • 26. Decision Tree: We also looked at the various important variables for this dataset: The subtree assessment plot depicted that the tree was pruned such that there are 36 leaves. 26 26
  • 27. 7. Outcomes For a given segment, if : ● The number of departures performed is more than approximately 70, ● The average allotted payload is less than approximately 9 pounds, then there is an 88% probability that flights in that segment will get cancelled 27 27
  • 28. For a given segment, if : ● The number of departures performed is more than approximately 70, ● The average allotted payload is more than approximately 9 pounds ● The average total time on ground for both source airport and destination airport is greater than approximately 19 minutes then there is an 83.3% probability that flights in that segment will get cancelled For a given segment, if : ● The number of departures performed is less than approximately 10 and greater than 2 ● The flights too off randomly without schedule, then there is a 94.7% probability that flights in that segment will get cancelled 28 28
  • 29. Recommendations and Conclusion Important Variables Venn Analysis We performed a venn analysis on the important variables in each of the three domains and plotted them, considering those ones that were important at arriving at our recommendations. ● Departures Performed and Avg. Payload are the most important variable in our analysis for all the 29 29
  • 30. three domains. They are the game-changing decider variables that decide cancellations for segments, airlines and airports ● Airlines and Segments share avg total time on ground at both source and destination as an important variable. This is interesting because it is counter-intuitive. One would think that this would appear as a decider variable for airports ● Airlines and airports share the aircraft_class variable as common ● FlightAdHoc, Avg. Passengers, and Airport Config and Avg Mails are important for segments, airlines and airports respectively Findings and Recommendations Segments Findings: ● In segments that have flights with very less payload on an average (< 8 pounds) but fly frequently are likely to get cancelled. Moreover, the segments that have flights with higher payloads and fly frequently, but spend more than 18 minutes at both the source and destination airports are also likely to get cancelled. ● In segments that have flights with few departures and are taking off without being scheduled see less or no cancellations. Recommendations: ● The airport should pilot a program to redirect a few congested segments’ traffic to runways that handle the non-scheduled flights. Based on the results, it can determine whether priority given to non-scheduled aircrafts was causing cancellations. ● A new runway should be opened to speed up ground handling and reduce the average time spent for higher payload aircrafts on ground at both source and destination ● The airport is accommodating flights of non-congested segments, that too flights that are not scheduled. However, congested, heavy-traffic segments but with less or no passengers are being cancelled, and those with passengers and cargo, and those that take time on the ground at both source and destination, are being cancelled. Airlines: Findings: ● For small flights (accommodating three or lesser people) that fly more often (more than 3 departures) have very little chance of getting cancelled. ● For flights that fly more often with little payload (lesser than 9 pounds) tend to get cancelled more often. They also spend a considerable about of time at the airports (18 minutes). Recommendations: ● The last recommendation for the segments ties into the same for the airlines domain. Ground crew of airline companies should make sure that quick ground handling time is instilled at the 30 30
  • 31. airport for higher payload aircrafts on ground at both source and destination ● The payload analysis from segments complies with our finding for aircrafts with lesser number of passengers. Just as it was found that less payload but high departure segment flights were getting cancelled, the same for airlines hold true. Airlines ground staff at airports should be alert when these flights are schedules to arrive and depart at airports, to make sure that handling time is fast. Airports: Findings: ● For airports with frequent departures (more than 70) with relatively lesser payload ( 10 pounds or lesser) and belonging to Class F, and with avg. mails being loaded into the aircrafts, it is very likely that these flights would get cancelled. Recommendations: ● As these delays affect a large population, the airports should work on Scheduled Passenger/cargo service flights to understand why these flights result in frequent cancellations. From our findings, it is apparent that the handling time, in terms of baggage and mail loading into the aircrafts, is deciding the cancellations, apart from other important variables. In conclusion, handling at the airports is taking time. 31 31
  • 32. Possible next steps According to Wall Street Journal, illness, family emergencies, and rescheduled business meetings are a big business for airline companies. 3 At some airlines, the resulting change fee and penalties passengers ended up paying added up to $2 billion a year, which is even higher than the total baggage fees. If airlines can delve more into the seasonal client data to figure out a cancellation pattern from the passenger’s side, adjust change fees and penalties according to the patterns discovered, the airlines can generate a higher revenue based on that finding. 3 Source: http://online.wsj.com/news/articles/SB10001424052970204563304574318212311819146 32 32