SlideShare une entreprise Scribd logo
1  sur  34
Big Data
Airlines Project
ZIYAD SALEH
What is Big Data ?
Big data is a broad term for very large or complex data sets that are
difficult to process using traditional data processing applications.
Big Data is Terra bytes (1024 GB) of data to be processed and
analyzed, terra bytes of new data is being generated daily, which
means the speed of analyzing this huge flow of data is a challenge.
Big data can be described by the 4 Vs which are: Volume, Velocity,
Variety and Veracity.
Small Data Vs. Big Data
Map Reduce
Map Reduce model
Project Scope
The Scope is limited to :
1. Installing and configuring Hadoop Map/Reduce
platform.
2. Analyzing a big data sample belonging to U.S
domestic flights performance and delay for 5 years
to try to figure out
1. Top carriers experiencing delays.
2. Top airports and states with departure delays.
3. Plotting state delay in a thematic map of USA
Source of Data for the project
Datasets will be collected
from :
U.S. Department of
Transportation's (DOT) –
Statistical Computing
Dataset size will be between 500 GB and 1 TB and
covering 5 years of flight statistics.
Size of Data
Field Name Description
Year Year of the scheduled flight
Month Month of the scheduled flight (1–12).
Day Day of the month (1–31).
DepTime Actual departure time of the flight
CRSDepTime Scheduled departure time
ArrTime Actual arrival time in HH/MM format
CRSArrTime Scheduled arrival time
FlightNum Flight number.
ArrDelay Arrival delay
DepDelay departure delay, in minutes
CarrierDelay Delay (in minutes) caused by factors within control of the carrier.
WeatherDelay Delay (in minutes) caused by extreme weather conditions
NASDelay Delay (in minutes) within the control of the National Airspace System (NAS)
SecurityDelay Security delay (in minutes) caused by security reasons
LateAircraftDelay Delay (in minutes) due to the same aircraft arriving late at a previous airport.
Table 1 : Airline Dataset Dictionary.
Data Pre-Processing , Processing and Analytics
Data pre-processing:
Data will be cleansed and some artifacts will be filtered out as
necessary. Many fields in the airline data set need to be discarded as
they are irrelevant to the subject of delay that we are concerned on.
Data Processing and Analytics :
Data will be processed using java programming on Map/Reduce to
reduce the size of the data and produce an organized smaller
datasets.
Next, the resulting datasets will be analyzed using additional tools
like R.
Data Storage
Data will be stored in the HDFS multiple
storage nodes with total size between
500 GB and 1 TB.
Airlines
Big Data
HDFS
Target Analysis:
During the 5 years of all US domestic airlines flight
information
1. Which carriers have the most aggregated
delay in their flights ?
2. What are the states with most delays. ) ?
Design
Airlines Project Workflow and Design
Master Node Node 1
Node 2
Node 3
Node 4
Name
Node
Job
Tracker
Airlines
Big Data
Task
Java Code
Reducer Node
HDFS
Mapper
Reducer
Top Airlines
Implementation
Software and Tools
1. CentOS Linux Operating System.
2. Apache Hadoop
3. Cloudera CDH 5.3 virtual machine
4. Oracle VM Virtual Box Manager
5. Eclipse IDE
6. Java (Oracle JDK )
7. Maven
8. Microsoft Excel and Access 2010.
9. The R statistical tool
Mapper :
Reducer:
R:
Findings
US Airlines Delay (Per Carrier)
0
0.2
0.4
0.6
0.8
1
1.2
WN AA OO MQ US DL UA XE NW CO EV 9E FL YV OH B6 AS F9 HA AQ PI HP EA PS TW
ArrivalOnTime
ArrivalDelays
DepartureOnTime
DepartureDelays
Cancellations
Diversions
Thematic Map of US Airlines Delay (Per State)
Conclusion
Conclusion:
 Big Data is the large amount of continuously generated data that cannot be processed and
analyzed using traditional data management tools .
 Big data is a new topic that is rising dramatically , reshaping the future , and a large demand
for big data scientist is taking place and will continue to happen during the coming period of
time.
 Hadoop is an open source framework for storing and processing large datasets using clusters
of commodity hardware.
 Big Data analytics is attracting both business and policy makers to leverage from this new
phenomenon towards more informed decisions and planning for the future.
 Big Data now , Normal Data tomorrow.
Big Data Tutorials
Online Big Data Tutorials:
1. Udemy : https://www.udemy.com/course/subscribe/?courseId=336982&dtcode=lGCe31035ujY
2. Udacity : https://www.udacity.com/courses#!/data-science
3. EMC : https://education.emc.com/guest/campaign/data_science.aspx
4. Coursera : https://www.coursera.org/course/datasci
5. CalTech’s : Learning from Data http://work.caltech.edu/telecourse.html
6. MIT : Open Courseware http://ocw.mit.edu/courses/sloan-school-of-management/15-062-
data-mining-spring-2003/index.htm
7. Stanford’s OpenClassroom
http://openclassroom.stanford.edu/MainFolder/CoursePage.php?course=MachineLearning
8. Big Data University : https://bigdatauniversity.com/curriculum-map/
Thank You
Ziyad Saleh
34
‫ينفعنا‬ ‫ما‬ ‫علمنا‬ ‫اللهم‬..‫علمتنا‬ ‫بما‬ ‫وانفعنا‬
‫علما‬ ‫وزدنا‬

Contenu connexe

Tendances

Big data analytics for transport
Big data analytics for transportBig data analytics for transport
Big data analytics for transport
UKinItaly
 
Oil & gas
Oil & gasOil & gas
Oil & gas
yoki78
 

Tendances (20)

Elastic in oil and gas
Elastic in oil and gasElastic in oil and gas
Elastic in oil and gas
 
Big data analytics for transport
Big data analytics for transportBig data analytics for transport
Big data analytics for transport
 
Oil & gas
Oil & gasOil & gas
Oil & gas
 
ICARUS @ 27th ACRIS Meeting (February 2020, London)
ICARUS @ 27th ACRIS Meeting (February 2020, London)ICARUS @ 27th ACRIS Meeting (February 2020, London)
ICARUS @ 27th ACRIS Meeting (February 2020, London)
 
The Critical Role of IoT Data Integration to develop Big Data Applications (f...
The Critical Role of IoT Data Integration to develop Big Data Applications (f...The Critical Role of IoT Data Integration to develop Big Data Applications (f...
The Critical Role of IoT Data Integration to develop Big Data Applications (f...
 
Opportunities in Sensor Networks and Big Data in 2014 (for NIKKEI Big Data Co...
Opportunities in Sensor Networks and Big Data in 2014 (for NIKKEI Big Data Co...Opportunities in Sensor Networks and Big Data in 2014 (for NIKKEI Big Data Co...
Opportunities in Sensor Networks and Big Data in 2014 (for NIKKEI Big Data Co...
 
MapR Edge : Act Locally Learn Globally
MapR Edge : Act Locally Learn GloballyMapR Edge : Act Locally Learn Globally
MapR Edge : Act Locally Learn Globally
 
How to Create the Google for Earth Data (XLDB 2015, Stanford)
How to Create the Google for Earth Data (XLDB 2015, Stanford)How to Create the Google for Earth Data (XLDB 2015, Stanford)
How to Create the Google for Earth Data (XLDB 2015, Stanford)
 
SXSW Proposal - Harnessing Data from Connected Vehicles
SXSW Proposal - Harnessing Data from Connected VehiclesSXSW Proposal - Harnessing Data from Connected Vehicles
SXSW Proposal - Harnessing Data from Connected Vehicles
 
Rainer Sternfeld - Planetary Big Data - PlanetOS - Stanford Engineering - Mar...
Rainer Sternfeld - Planetary Big Data - PlanetOS - Stanford Engineering - Mar...Rainer Sternfeld - Planetary Big Data - PlanetOS - Stanford Engineering - Mar...
Rainer Sternfeld - Planetary Big Data - PlanetOS - Stanford Engineering - Mar...
 
How to Gain a Competitive Edge with an Open Source, Purpose-built Time Series...
How to Gain a Competitive Edge with an Open Source, Purpose-built Time Series...How to Gain a Competitive Edge with an Open Source, Purpose-built Time Series...
How to Gain a Competitive Edge with an Open Source, Purpose-built Time Series...
 
Indexing the Real World Sensor Networks (at RE.WORK Internet of Things Summit...
Indexing the Real World Sensor Networks (at RE.WORK Internet of Things Summit...Indexing the Real World Sensor Networks (at RE.WORK Internet of Things Summit...
Indexing the Real World Sensor Networks (at RE.WORK Internet of Things Summit...
 
Esri News for Petroleum Winter 2013/2014 newsletter
Esri News for Petroleum Winter 2013/2014 newsletterEsri News for Petroleum Winter 2013/2014 newsletter
Esri News for Petroleum Winter 2013/2014 newsletter
 
Chen - New data and frontier tools
Chen - New data and frontier toolsChen - New data and frontier tools
Chen - New data and frontier tools
 
Data Skipping Technology
Data Skipping TechnologyData Skipping Technology
Data Skipping Technology
 
Experience Big Data Analytics use cases ranging from cancer research to IoT a...
Experience Big Data Analytics use cases ranging from cancer research to IoT a...Experience Big Data Analytics use cases ranging from cancer research to IoT a...
Experience Big Data Analytics use cases ranging from cancer research to IoT a...
 
BIG Data & Hadoop Applications in Logistics
BIG Data & Hadoop Applications in LogisticsBIG Data & Hadoop Applications in Logistics
BIG Data & Hadoop Applications in Logistics
 
HPC Market Update from IDC
HPC Market Update from IDCHPC Market Update from IDC
HPC Market Update from IDC
 
Bigdata 2016- projects list
Bigdata  2016- projects listBigdata  2016- projects list
Bigdata 2016- projects list
 
Visualizing Big Data with augmented and virtual reality
Visualizing Big Data with augmented and virtual realityVisualizing Big Data with augmented and virtual reality
Visualizing Big Data with augmented and virtual reality
 

Similaire à Big Data Airline Project at UAEU

AGU_Iguassu_Brazil_AUG
AGU_Iguassu_Brazil_AUGAGU_Iguassu_Brazil_AUG
AGU_Iguassu_Brazil_AUG
Jordan Alpert
 
Sql portfolio admin_practicals
Sql portfolio admin_practicalsSql portfolio admin_practicals
Sql portfolio admin_practicals
Shelli Ciaschini
 

Similaire à Big Data Airline Project at UAEU (20)

Big Data with Hadoop – For Data Management, Processing and Storing
Big Data with Hadoop – For Data Management, Processing and StoringBig Data with Hadoop – For Data Management, Processing and Storing
Big Data with Hadoop – For Data Management, Processing and Storing
 
A Comprehensive Study on Big Data Applications and Challenges
A Comprehensive Study on Big Data Applications and ChallengesA Comprehensive Study on Big Data Applications and Challenges
A Comprehensive Study on Big Data Applications and Challenges
 
Big Data and OSS at IBM
Big Data and OSS at IBMBig Data and OSS at IBM
Big Data and OSS at IBM
 
Big data analysis concepts and references
Big data analysis concepts and referencesBig data analysis concepts and references
Big data analysis concepts and references
 
Airline Data Analysis
Airline Data AnalysisAirline Data Analysis
Airline Data Analysis
 
Processing Geospatial at Scale at LocationTech
Processing Geospatial at Scale at LocationTechProcessing Geospatial at Scale at LocationTech
Processing Geospatial at Scale at LocationTech
 
hadoop seminar training report
hadoop seminar  training reporthadoop seminar  training report
hadoop seminar training report
 
Harnessing Hadoop and Big Data to Reduce Execution Times
Harnessing Hadoop and Big Data to Reduce Execution TimesHarnessing Hadoop and Big Data to Reduce Execution Times
Harnessing Hadoop and Big Data to Reduce Execution Times
 
AGU_Iguassu_Brazil_AUG
AGU_Iguassu_Brazil_AUGAGU_Iguassu_Brazil_AUG
AGU_Iguassu_Brazil_AUG
 
Big data with hadoop
Big data with hadoopBig data with hadoop
Big data with hadoop
 
AR and Big Data: Interoperable Data Repositories for Collaborative Work Envir...
AR and Big Data: Interoperable Data Repositories for Collaborative Work Envir...AR and Big Data: Interoperable Data Repositories for Collaborative Work Envir...
AR and Big Data: Interoperable Data Repositories for Collaborative Work Envir...
 
Vikram Andem Big Data Strategy @ IATA Technology Roadmap
Vikram Andem Big Data Strategy @ IATA Technology Roadmap Vikram Andem Big Data Strategy @ IATA Technology Roadmap
Vikram Andem Big Data Strategy @ IATA Technology Roadmap
 
Moving Toward Big Data: Challenges, Trends and Perspectives
Moving Toward Big Data: Challenges, Trends and PerspectivesMoving Toward Big Data: Challenges, Trends and Perspectives
Moving Toward Big Data: Challenges, Trends and Perspectives
 
A Big-Data Process Consigned Geographically by Employing Mapreduce Frame Work
A Big-Data Process Consigned Geographically by Employing Mapreduce Frame WorkA Big-Data Process Consigned Geographically by Employing Mapreduce Frame Work
A Big-Data Process Consigned Geographically by Employing Mapreduce Frame Work
 
Matlab, Big Data, and HDF Server
Matlab, Big Data, and HDF ServerMatlab, Big Data, and HDF Server
Matlab, Big Data, and HDF Server
 
Removal Based Improved Replication Control and Fault Tolerance Method for Roa...
Removal Based Improved Replication Control and Fault Tolerance Method for Roa...Removal Based Improved Replication Control and Fault Tolerance Method for Roa...
Removal Based Improved Replication Control and Fault Tolerance Method for Roa...
 
IRJET- Analysis of Boston’s Crime Data using Apache Pig
IRJET- Analysis of Boston’s Crime Data using Apache PigIRJET- Analysis of Boston’s Crime Data using Apache Pig
IRJET- Analysis of Boston’s Crime Data using Apache Pig
 
Hadoop Cluster Analysis and Assessment
Hadoop Cluster Analysis and AssessmentHadoop Cluster Analysis and Assessment
Hadoop Cluster Analysis and Assessment
 
Sql portfolio admin_practicals
Sql portfolio admin_practicalsSql portfolio admin_practicals
Sql portfolio admin_practicals
 
Hadoop,Big Data Analytics and More
Hadoop,Big Data Analytics and MoreHadoop,Big Data Analytics and More
Hadoop,Big Data Analytics and More
 

Dernier

PLE-statistics document for primary schs
PLE-statistics document for primary schsPLE-statistics document for primary schs
PLE-statistics document for primary schs
cnajjemba
 
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制
vexqp
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
nirzagarg
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
wsppdmt
 
Jual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
Jual Cytotec Asli Obat Aborsi No. 1 Paling ManjurJual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
Jual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
ptikerjasaptiker
 
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
q6pzkpark
 
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
vexqp
 
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
nirzagarg
 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
nirzagarg
 
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
vexqp
 
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
nirzagarg
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
nirzagarg
 
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
gajnagarg
 

Dernier (20)

PLE-statistics document for primary schs
PLE-statistics document for primary schsPLE-statistics document for primary schs
PLE-statistics document for primary schs
 
Digital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham WareDigital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham Ware
 
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
 
Dubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls DubaiDubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls Dubai
 
Jual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
Jual Cytotec Asli Obat Aborsi No. 1 Paling ManjurJual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
Jual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
 
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - Almora
 
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
 
SR-101-01012024-EN.docx Federal Constitution of the Swiss Confederation
SR-101-01012024-EN.docx  Federal Constitution  of the Swiss ConfederationSR-101-01012024-EN.docx  Federal Constitution  of the Swiss Confederation
SR-101-01012024-EN.docx Federal Constitution of the Swiss Confederation
 
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
 
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
 
Ranking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRanking and Scoring Exercises for Research
Ranking and Scoring Exercises for Research
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
 
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
 
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
 

Big Data Airline Project at UAEU

  • 2. What is Big Data ? Big data is a broad term for very large or complex data sets that are difficult to process using traditional data processing applications. Big Data is Terra bytes (1024 GB) of data to be processed and analyzed, terra bytes of new data is being generated daily, which means the speed of analyzing this huge flow of data is a challenge. Big data can be described by the 4 Vs which are: Volume, Velocity, Variety and Veracity.
  • 3.
  • 4.
  • 5.
  • 6. Small Data Vs. Big Data
  • 7.
  • 8.
  • 11.
  • 13. The Scope is limited to : 1. Installing and configuring Hadoop Map/Reduce platform. 2. Analyzing a big data sample belonging to U.S domestic flights performance and delay for 5 years to try to figure out 1. Top carriers experiencing delays. 2. Top airports and states with departure delays. 3. Plotting state delay in a thematic map of USA
  • 14. Source of Data for the project Datasets will be collected from : U.S. Department of Transportation's (DOT) – Statistical Computing
  • 15. Dataset size will be between 500 GB and 1 TB and covering 5 years of flight statistics. Size of Data
  • 16. Field Name Description Year Year of the scheduled flight Month Month of the scheduled flight (1–12). Day Day of the month (1–31). DepTime Actual departure time of the flight CRSDepTime Scheduled departure time ArrTime Actual arrival time in HH/MM format CRSArrTime Scheduled arrival time FlightNum Flight number. ArrDelay Arrival delay DepDelay departure delay, in minutes CarrierDelay Delay (in minutes) caused by factors within control of the carrier. WeatherDelay Delay (in minutes) caused by extreme weather conditions NASDelay Delay (in minutes) within the control of the National Airspace System (NAS) SecurityDelay Security delay (in minutes) caused by security reasons LateAircraftDelay Delay (in minutes) due to the same aircraft arriving late at a previous airport. Table 1 : Airline Dataset Dictionary.
  • 17. Data Pre-Processing , Processing and Analytics Data pre-processing: Data will be cleansed and some artifacts will be filtered out as necessary. Many fields in the airline data set need to be discarded as they are irrelevant to the subject of delay that we are concerned on. Data Processing and Analytics : Data will be processed using java programming on Map/Reduce to reduce the size of the data and produce an organized smaller datasets. Next, the resulting datasets will be analyzed using additional tools like R.
  • 18. Data Storage Data will be stored in the HDFS multiple storage nodes with total size between 500 GB and 1 TB. Airlines Big Data HDFS
  • 19. Target Analysis: During the 5 years of all US domestic airlines flight information 1. Which carriers have the most aggregated delay in their flights ? 2. What are the states with most delays. ) ?
  • 21. Airlines Project Workflow and Design Master Node Node 1 Node 2 Node 3 Node 4 Name Node Job Tracker Airlines Big Data Task Java Code Reducer Node HDFS Mapper Reducer Top Airlines
  • 23. Software and Tools 1. CentOS Linux Operating System. 2. Apache Hadoop 3. Cloudera CDH 5.3 virtual machine 4. Oracle VM Virtual Box Manager 5. Eclipse IDE 6. Java (Oracle JDK ) 7. Maven 8. Microsoft Excel and Access 2010. 9. The R statistical tool
  • 26. R:
  • 28. US Airlines Delay (Per Carrier) 0 0.2 0.4 0.6 0.8 1 1.2 WN AA OO MQ US DL UA XE NW CO EV 9E FL YV OH B6 AS F9 HA AQ PI HP EA PS TW ArrivalOnTime ArrivalDelays DepartureOnTime DepartureDelays Cancellations Diversions
  • 29. Thematic Map of US Airlines Delay (Per State)
  • 31. Conclusion:  Big Data is the large amount of continuously generated data that cannot be processed and analyzed using traditional data management tools .  Big data is a new topic that is rising dramatically , reshaping the future , and a large demand for big data scientist is taking place and will continue to happen during the coming period of time.  Hadoop is an open source framework for storing and processing large datasets using clusters of commodity hardware.  Big Data analytics is attracting both business and policy makers to leverage from this new phenomenon towards more informed decisions and planning for the future.  Big Data now , Normal Data tomorrow.
  • 33. Online Big Data Tutorials: 1. Udemy : https://www.udemy.com/course/subscribe/?courseId=336982&dtcode=lGCe31035ujY 2. Udacity : https://www.udacity.com/courses#!/data-science 3. EMC : https://education.emc.com/guest/campaign/data_science.aspx 4. Coursera : https://www.coursera.org/course/datasci 5. CalTech’s : Learning from Data http://work.caltech.edu/telecourse.html 6. MIT : Open Courseware http://ocw.mit.edu/courses/sloan-school-of-management/15-062- data-mining-spring-2003/index.htm 7. Stanford’s OpenClassroom http://openclassroom.stanford.edu/MainFolder/CoursePage.php?course=MachineLearning 8. Big Data University : https://bigdatauniversity.com/curriculum-map/
  • 34. Thank You Ziyad Saleh 34 ‫ينفعنا‬ ‫ما‬ ‫علمنا‬ ‫اللهم‬..‫علمتنا‬ ‫بما‬ ‫وانفعنا‬ ‫علما‬ ‫وزدنا‬