SlideShare une entreprise Scribd logo
1  sur  29
- BASED ON HISTORICAL TRIP OBSERVATION
CAB TRAVEL TIME PREDICTI N
PROBLEM DEFINITION
• New Booking - Time
of Arrival
• Shortest Route
(Distance/Time)
• Taxi-Passenger
Demand Distribution
Opportunity
• Accurate Travel Time
Prediction
• Increased Fuel
Efficiency
• Optimized Utilization
of Cabs
Value
Our project involves developing a model for predicting the travel time for a particular cab along
different routes in a rectangular map of Porto. The output is based on several factors, like –
• day of the week,
• time of the day and other features
derived from Floating Car Data.
FLOATING CAR DATA ESTIMATION
 Advantages
 Most Inexpensive Data
 Generally very accurate (GPS)
 Disadvantages
 FCD is usually sampled infrequently
 High Density of data is required to get
meaningful travel time predictions.
Floating car data are position fixes of vehicles traversing city streets throughout the day.
 Trajectories of Single Vehicles,
 Location and Time of Lane Changes,
 Traffic Density (vehicles per kilometer),
 Traffic Flow (vehicles per hour),
 Vehicle Speed
 Length and Position of Traffic Jams
 Use
DATA DESCRIPTION
 The dataset contained 9 features and a total of 17,10,760 observations of various parameters
of taxi movement on Porto, Portugal.
 Features :
1. TRIP_ID: (String) It contains an unique identifier for each trip.
2. CALL_TYPE: (Char) It identifies the way used to demand this service.
3. ORIGIN_CALL: (Integer) It contains an unique identifier for each phone number which was used to
demand, at least, one service. It identifies the trip’s customer if CALL_TYPE=’A’. Otherwise, it assumes
a NULL value.
4. ORIGIN_STAND: (Integer): It contains an unique identifier for the taxi stand.
5. TAXI_ID: (Integer): It contains an unique identifier for the taxi driver that performed each trip.
DATA DESCRIPTION
 The dataset contained 9 features and a total of 17,10,760 observations of various parameters
of taxi movement on Porto, Portugal.
 Features :
6. TIMESTAMP: (Integer) Unix Timestamp (in seconds). It identifies the trip’s start.
7. DAYTYPE: (Char) It identifies the type of the day in which the trip starts. (Holiday, Day before Holiday,
Weekday)
8. MISSING_DATA: (Boolean) It is FALSE when the GPS data stream is complete and TRUE whenever one
(or more) locations are missing.
9. POLYLINE: (String): It contains a list of GPS coordinates (i.e. WGS84 format) mapped as a string. Each
pair of coordinates is of the form [LONGITUDE, LATITUDE].
 For Prediction of Time we have considered Timestamp and Polyline.
 The rows having missing value are not considered in our calculations.
PREDICTION ARCHITECTURE
 Brief overview of the various processing steps required
Motion
Detection
Grid
Matching
Directed
Graph
Supervised
Learning
Prediction
PREDICTION ARCHITECTURE
 Motion Detection
 Since the incoming streams of traffic data
come from taxi cabs, delivery vehicles, or
other commercial fleet vehicles, there will
always be a certain amount of ambiguity
between a slowdown in traffic and a
commercial stop by the vehicle. (i.e. for a
taxi customer, or a delivery vehicle dropping
off packages). Therefore, any further
processing must clean out all unwanted
stops that are in the GPS logs. The most
common and intuitive technique, and that
which is used in this project is to track the
number of consecutive GPS points that are
less than a specified distance Dmax from
each other.
PREDICTION ARCHITECTURE
 Grid Matching
 Grid Matching is a widely studied problem
in transportation research is perhaps the
most computationally difficult and
important subcomponent.
 Input: {x1 , y1 , t1}
 Output: {i1 , t1}
PREDICTION ARCHITECTURE
 Several distance measure:
Manhattan Distance, Block Distance, Minkowski distance, Haversine Distance.
 For our calculation we have used Haversine distance which is defined as:
 For any two points on a sphere, the haversine of the central angle between them is given by
 Where hav is the haversine function:
d : the distance between the two points (along a great circle of the sphere; see spherical distance),
r : the radius of the sphere,
∅1, ∅2: latitude of point 1 and latitude of point 2
𝜆1, 𝜆2 : longitude of point 1 and longitude of point 2
PREDICTION ARCHITECTURE
 Directed Graph
 After Grid Matching of polyline covered by
each training observation, we get a series of
nodes and edges. Thus a directed graph is
constructed using the grid points as nodes
and velocity or time as edges.
 Input: {i1 , t1}
 Process:
 e1,2 = t2-t1
 e1,2 = (Haversine distance) / (t2-t1)
 Output: Sparse Matrix M = {ei,j} for all i , j
PREDICTION ARCHITECTURE
 Supervised Learning
 Decision Tree Classifier, Decision Tree
Regression, Random Forest Regression,
Historical Mean
 Process:
 The whole day is categorized in x units.
 The timestamp of each edge and the
time/velocity is fed into the system.
 The output gives us a predicted value for a
corresponding timestamp value.
 Output: Sparse Matrix M = {ei,j} for all i, j at
that timestamp.
PREDICTION ARCHITECTURE
 Decision Tree Classification:
 Classification trees work just like regression trees, only they try to predict a discrete
category (the class), rather than a numerical value. The response variable Y is categorical,
so we can use information theory to measure how much we learn about it from knowing
the value of another discrete variable A:
I[Y | A] = 𝑎 P A = a ∗ I[Y | A = a]
Where, I[Y | A = a] = H[Y ] − H[Y | A = a]
 The definitions of entropy H[Y ] and conditional entropy H[Y |A = a].
H[Y | A = a] = - 𝑘 𝑝 𝑎𝑘 log2 𝑝 𝑎𝑘
PREDICTION ARCHITECTURE
 Decision Tree Regression:
 If the target is a continuous value, then for node m, representing a region Rm with Nm
criterion, a common criterion is to minimize the Mean Square Error:
mc =
1
𝑁 𝑚
𝑖 ∈ 𝑁 𝑚
𝑦𝑖
H(x) =
1
𝑁 𝑚
𝑖 ∈ 𝑁 𝑚
(𝑦𝑖−𝑚 𝑐)2
 Complexity : O(nfeatures . nsamples . log(nsamples))
PREDICTION ARCHITECTURE
 Random Forest Regression:
 Here, each tree in the ensemble is built from a sample drawn with replacement (i.e., a
bootstrap sample) from the training set.
 Disadvantage: In addition, when splitting a node during the construction of the tree, the
split that is chosen is no longer the best split among all features. Instead, the split that is
picked is the best split among a random subset of the features. As a result of this
randomness, the bias of the forest usually slightly increases (with respect to the bias of a
single non-random tree)
 Advantage: But, due to averaging, its variance also decreases, usually more than
compensating for the increase in bias, hence yielding an overall better model.
PREDICTION ARCHITECTURE
 Gradient Tree Boosting:
 Gradient Tree Boosting or Gradient Boosted Regression Trees (GBRT) is a generalization of
boosting to arbitrary differentiable loss functions.
𝐹𝑚(𝑥) = 𝐹 𝑚−1(𝑥) + ℎ(𝑥) = y
 Find h, such that residual 𝑦 − 𝐹 𝑚−1 𝑥 is minimized.
𝐹𝑚 𝑥 = 𝐹 𝑚−1 𝑥 + 𝛾 𝑚 𝑖=1
𝑛
𝛻𝐹 𝐿 𝑦𝑖, 𝐹 𝑚−1 𝑥𝑖
 Loss Functions: -
 Regression: LS, LAS, HUBER, QUANTILE
 Classification: Binomial Deviance, Multinomial Deviance, Exponential Loss.
DATA INFERENCE
 The Demand distribution of taxi along
the daytime is shown here.
1. Highest peak is at the range of 9
to 10 A.M. (Office Hours)
2. A steady distribution thereafter
till 6 P.M.
3. Can be utilized to model the
value of charge/Km during the
course of the day.
4. Can be used to optimize the
model of resource utilization
and thereby profit generation.
DATA INFERENCE
Individual trip time during daytime is
shown here.
 Travelling time of maximum trips
are within half an hour.
DATA INFERENCE
 The trip time shows a cluster of
points in the time domain of 26000
to 82000 corresponding to 7 A.M.
and 11 P.M. respectively.
 Most of the trip lengths are within
10 km.
 This is in conformity with the
previous plot of travelling time
where most of the travelling time
are within half an hour.
DATA INFERENCE
DATA INFERENCE
 Top 5 important road segments of the city are
 [[41.162292, -8.60787], [41.162031, -8.60571]]
 [[41.16258, -8.61003], [41.162292, -8.60787]]
 [[41.163291, -8.6157], [41.163021, -8.613639]]
 [[41.163525, -8.617644], [41.163291, -8.6157]]
 [[41.163768, -8.619678], [41.163525, -8.617644]]
 0.0772321809805613, 0.07708869873451546, 0.07441287173742973, 0.07212227668575326,
0.06932519386480848
Rank Segment Centrality
1 [41.162292, -8.60787], [41.162031, -
8.60571]
0.0772321809805613
2 [41.16258, -8.61003], [41.162292, -8.60787] 0.0770886987345154
6
3 [41.163291, -8.6157], [41.163021, -
8.613639]
0.0744128717374297
3
4 [41.163525, -8.617644], [41.163291, -
8.6157]
0.0721222766857532
6
DATA INFERENCE
Rank Geo-Location Centrality
1 [41.162031, -8.60571] 0.08277015940786502
2 [41.162292, -8.60787] 0.07877967724696255
3 [41.16258, -8.61003] 0.07775467831692078
4 [41.163021, -
8.613639]
0.07690847152899795
5 [41.163291, -8.6157] 0.07608514260606453
PERFORMANCE EVALUATION
 Mean Absolute Percentage Error
 The MAPE (Mean Absolute Percent Error) measures
the size of the error in percentage terms. It is
calculated as the average of the unsigned
percentage error.
 Mean Absolute Error:
 The MAE measures the average magnitude of the
errors in a set of forecasts, without considering
their direction. It measures accuracy for continuous
variables.
 Mean Percentage Error:
 MPE is the computed average of percentage errors
by which forecasts of a model differ from actual
values of the quantity being forecast.
HYPOTHESIS TESTING (P-VALUE CALCULATION)
 The Percentage Error is estimated using the Null Hypothesis:
 H0 : PE <= 0
 H1 : PE > 0
 Here, the 𝑃𝐸 = mean Percentage Error (
∆𝑡
𝑡
) for the selected trips
 t =
𝑃𝐸
𝑆
𝑛
 t > ta for a = critical value (one-sided) implies we reject H0 in favor of H1
PERFORMANCE EVALUATION
MAPE MAE MPE P-Value
DT(R) 0.3713 252.00s 0.3556 0.999781858
DT(C) 0.3729 253.33s 0.3365 0.999182999
RF(C) 0.3692 253.33s 0.3933 0.999430984
Historical
Mean
0.5899 274.24s -0.3664 0.043007171
PERFORMANCE EVALUATION
Blue Original Path 345s
Red Historical Mean 467.51
s
Green Decision Tree / Random
Forest
240s
PERFORMANCE EVALUATION
Decision Tree
FUTURE VALUE ADDITION
 Apply an incremental ARIMA model to each segment for a specific time of the day (Data
needed)
 Use Map Matching Technique to position the standard points on the road accurately.
 Real Time Traffic Map (travelling route management, traffic signal optimization etc.)
 Optimization of our approach in order to reduce model error.
 To make this prediction real time and make better visualization of prediction.
 Implement it locally.
 Consolidate the Features in an Application
REFERENCES
 http://arxiv.org/pdf/1012.4249.pdf
 http://arxiv.org/pdf/1509.05257.pdf
 https://en.wikipedia.org/wiki/Haversine_formula
 http://toc.proceedings.com/04373webtoc.pdf
 http://www.ivt.ethz.ch/oev/ped2012/vpl/publications/reports/ab379.pdf
 https://www.kaggle.com/c/pkdd-15-taxi-trip-time-prediction-ii
 http://www.galitshmueli.com/data-mining-project/predicting-customer-cancellation-cab-bookings-
yourcabscom
 http://scikit-learn.org/stable/modules/ensemble.html
 https://www.scribblemaps.com/create/#id=srin&lat=41.1619705&lng=-
8.60570389999998&z=14&t=road
THANK YOU

Contenu connexe

Tendances

A multiplicative time series model
A multiplicative time series modelA multiplicative time series model
A multiplicative time series modelMohammed Awad
 
Coordinate systems (and transformations) and vector calculus
Coordinate systems (and transformations) and vector calculus Coordinate systems (and transformations) and vector calculus
Coordinate systems (and transformations) and vector calculus garghanish
 
Geometric transformation 2d chapter 5
Geometric transformation 2d   chapter 5Geometric transformation 2d   chapter 5
Geometric transformation 2d chapter 5geethawilliam
 
3D Graphics : Computer Graphics Fundamentals
3D Graphics : Computer Graphics Fundamentals3D Graphics : Computer Graphics Fundamentals
3D Graphics : Computer Graphics FundamentalsMuhammed Afsal Villan
 
TRBAM2020 Public Transit posters - University of Twente.
TRBAM2020 Public Transit posters - University of Twente.TRBAM2020 Public Transit posters - University of Twente.
TRBAM2020 Public Transit posters - University of Twente.Konstantinos Gkiotsalitis
 
Spatial Transformation
Spatial TransformationSpatial Transformation
Spatial TransformationEhsan Hamzei
 
Heuristic Function Influence to the Global Optimum Value in Shortest Path Pro...
Heuristic Function Influence to the Global Optimum Value in Shortest Path Pro...Heuristic Function Influence to the Global Optimum Value in Shortest Path Pro...
Heuristic Function Influence to the Global Optimum Value in Shortest Path Pro...Universitas Pembangunan Panca Budi
 
Fundamentals of electromagnetics
Fundamentals of electromagneticsFundamentals of electromagnetics
Fundamentals of electromagneticsRichu Jose Cyriac
 
A Unified PDE model for image multi-phase segmentation and grey-scale inpaint...
A Unified PDE model for image multi-phase segmentation and grey-scale inpaint...A Unified PDE model for image multi-phase segmentation and grey-scale inpaint...
A Unified PDE model for image multi-phase segmentation and grey-scale inpaint...vijayakrishna rowthu
 
Big Data Analysis with Signal Processing on Graphs
Big Data Analysis with Signal Processing on GraphsBig Data Analysis with Signal Processing on Graphs
Big Data Analysis with Signal Processing on GraphsMohamed Seif
 
Presentation of GreenYourMove's hybrid approach in the 3rd Conference on Sust...
Presentation of GreenYourMove's hybrid approach in the 3rd Conference on Sust...Presentation of GreenYourMove's hybrid approach in the 3rd Conference on Sust...
Presentation of GreenYourMove's hybrid approach in the 3rd Conference on Sust...GreenYourMove
 
Measuring Axle Weight of Moving Vehicle Based on Particle Swarm Optimization
Measuring Axle Weight of Moving Vehicle Based on Particle Swarm OptimizationMeasuring Axle Weight of Moving Vehicle Based on Particle Swarm Optimization
Measuring Axle Weight of Moving Vehicle Based on Particle Swarm OptimizationIJRES Journal
 
A New Method For Solving Kinematics Model Of An RA-02
A New Method For Solving Kinematics Model Of An RA-02A New Method For Solving Kinematics Model Of An RA-02
A New Method For Solving Kinematics Model Of An RA-02IJERA Editor
 
Top school in ghaziabad
Top school in ghaziabadTop school in ghaziabad
Top school in ghaziabadEdhole.com
 

Tendances (20)

A multiplicative time series model
A multiplicative time series modelA multiplicative time series model
A multiplicative time series model
 
Coordinate systems (and transformations) and vector calculus
Coordinate systems (and transformations) and vector calculus Coordinate systems (and transformations) and vector calculus
Coordinate systems (and transformations) and vector calculus
 
Geometric transformation 2d chapter 5
Geometric transformation 2d   chapter 5Geometric transformation 2d   chapter 5
Geometric transformation 2d chapter 5
 
3D Graphics : Computer Graphics Fundamentals
3D Graphics : Computer Graphics Fundamentals3D Graphics : Computer Graphics Fundamentals
3D Graphics : Computer Graphics Fundamentals
 
TRBAM2020 Public Transit posters - University of Twente.
TRBAM2020 Public Transit posters - University of Twente.TRBAM2020 Public Transit posters - University of Twente.
TRBAM2020 Public Transit posters - University of Twente.
 
Ih2616101615
Ih2616101615Ih2616101615
Ih2616101615
 
Spatial Transformation
Spatial TransformationSpatial Transformation
Spatial Transformation
 
report
reportreport
report
 
Heuristic Function Influence to the Global Optimum Value in Shortest Path Pro...
Heuristic Function Influence to the Global Optimum Value in Shortest Path Pro...Heuristic Function Influence to the Global Optimum Value in Shortest Path Pro...
Heuristic Function Influence to the Global Optimum Value in Shortest Path Pro...
 
Fundamentals of electromagnetics
Fundamentals of electromagneticsFundamentals of electromagnetics
Fundamentals of electromagnetics
 
A Unified PDE model for image multi-phase segmentation and grey-scale inpaint...
A Unified PDE model for image multi-phase segmentation and grey-scale inpaint...A Unified PDE model for image multi-phase segmentation and grey-scale inpaint...
A Unified PDE model for image multi-phase segmentation and grey-scale inpaint...
 
Big Data Analysis with Signal Processing on Graphs
Big Data Analysis with Signal Processing on GraphsBig Data Analysis with Signal Processing on Graphs
Big Data Analysis with Signal Processing on Graphs
 
Presentation of GreenYourMove's hybrid approach in the 3rd Conference on Sust...
Presentation of GreenYourMove's hybrid approach in the 3rd Conference on Sust...Presentation of GreenYourMove's hybrid approach in the 3rd Conference on Sust...
Presentation of GreenYourMove's hybrid approach in the 3rd Conference on Sust...
 
Measuring Axle Weight of Moving Vehicle Based on Particle Swarm Optimization
Measuring Axle Weight of Moving Vehicle Based on Particle Swarm OptimizationMeasuring Axle Weight of Moving Vehicle Based on Particle Swarm Optimization
Measuring Axle Weight of Moving Vehicle Based on Particle Swarm Optimization
 
Distortion Correction Scheme for Multiresolution Camera Images
Distortion Correction Scheme for Multiresolution Camera ImagesDistortion Correction Scheme for Multiresolution Camera Images
Distortion Correction Scheme for Multiresolution Camera Images
 
A New Method For Solving Kinematics Model Of An RA-02
A New Method For Solving Kinematics Model Of An RA-02A New Method For Solving Kinematics Model Of An RA-02
A New Method For Solving Kinematics Model Of An RA-02
 
30120130405026
3012013040502630120130405026
30120130405026
 
Curves and surfaces
Curves and surfacesCurves and surfaces
Curves and surfaces
 
Top school in ghaziabad
Top school in ghaziabadTop school in ghaziabad
Top school in ghaziabad
 
Shading in OpenGL
Shading in OpenGLShading in OpenGL
Shading in OpenGL
 

Similaire à Cab travel time prediction using ensemble models

A Strategic Model For Dynamic Traffic Assignment
A Strategic Model For Dynamic Traffic AssignmentA Strategic Model For Dynamic Traffic Assignment
A Strategic Model For Dynamic Traffic AssignmentKelly Taylor
 
Hybrid Ant Colony Optimization for Real-World Delivery Problems Based on Real...
Hybrid Ant Colony Optimization for Real-World Delivery Problems Based on Real...Hybrid Ant Colony Optimization for Real-World Delivery Problems Based on Real...
Hybrid Ant Colony Optimization for Real-World Delivery Problems Based on Real...csandit
 
Development of Methodology for Determining Earth Work Volume Using Combined S...
Development of Methodology for Determining Earth Work Volume Using Combined S...Development of Methodology for Determining Earth Work Volume Using Combined S...
Development of Methodology for Determining Earth Work Volume Using Combined S...IJMER
 
Channel and clipping level estimation for ofdm in io t –based networks a review
Channel and clipping level estimation for ofdm in io t –based networks a reviewChannel and clipping level estimation for ofdm in io t –based networks a review
Channel and clipping level estimation for ofdm in io t –based networks a reviewIJARIIT
 
Supply chain logistics : vehicle routing and scheduling
Supply chain logistics : vehicle  routing and  schedulingSupply chain logistics : vehicle  routing and  scheduling
Supply chain logistics : vehicle routing and schedulingRetigence Technologies
 
HOL, GDCT AND LDCT FOR PEDESTRIAN DETECTION
HOL, GDCT AND LDCT FOR PEDESTRIAN DETECTIONHOL, GDCT AND LDCT FOR PEDESTRIAN DETECTION
HOL, GDCT AND LDCT FOR PEDESTRIAN DETECTIONcsandit
 
HOL, GDCT AND LDCT FOR PEDESTRIAN DETECTION
HOL, GDCT AND LDCT FOR PEDESTRIAN DETECTIONHOL, GDCT AND LDCT FOR PEDESTRIAN DETECTION
HOL, GDCT AND LDCT FOR PEDESTRIAN DETECTIONcscpconf
 
ESCC 2016, July 10-16, Athens, Greece
ESCC 2016, July 10-16, Athens, GreeceESCC 2016, July 10-16, Athens, Greece
ESCC 2016, July 10-16, Athens, GreeceLIFE GreenYourMove
 
Comparitive analysis of doa and beamforming algorithms for smart antenna systems
Comparitive analysis of doa and beamforming algorithms for smart antenna systemsComparitive analysis of doa and beamforming algorithms for smart antenna systems
Comparitive analysis of doa and beamforming algorithms for smart antenna systemseSAT Journals
 
Central moments of traffic delay at a signalized intersection
Central moments of traffic delay at a signalized intersectionCentral moments of traffic delay at a signalized intersection
Central moments of traffic delay at a signalized intersectionAlexander Decker
 
Presentation of GreenYourMove's hybrid approach in 3rd International Conferen...
Presentation of GreenYourMove's hybrid approach in 3rd International Conferen...Presentation of GreenYourMove's hybrid approach in 3rd International Conferen...
Presentation of GreenYourMove's hybrid approach in 3rd International Conferen...GreenYourMove
 
ALEXANDER FRACTIONAL INTEGRAL FILTERING OF WAVELET COEFFICIENTS FOR IMAGE DEN...
ALEXANDER FRACTIONAL INTEGRAL FILTERING OF WAVELET COEFFICIENTS FOR IMAGE DEN...ALEXANDER FRACTIONAL INTEGRAL FILTERING OF WAVELET COEFFICIENTS FOR IMAGE DEN...
ALEXANDER FRACTIONAL INTEGRAL FILTERING OF WAVELET COEFFICIENTS FOR IMAGE DEN...sipij
 
Drobics, m. 2001: datamining using synergiesbetween self-organising maps and...
Drobics, m. 2001:  datamining using synergiesbetween self-organising maps and...Drobics, m. 2001:  datamining using synergiesbetween self-organising maps and...
Drobics, m. 2001: datamining using synergiesbetween self-organising maps and...ArchiLab 7
 
DETECTION OF MOVING OBJECT
DETECTION OF MOVING OBJECTDETECTION OF MOVING OBJECT
DETECTION OF MOVING OBJECTAM Publications
 
FINGERPRINT CLASSIFICATION BASED ON ORIENTATION FIELD
FINGERPRINT CLASSIFICATION BASED ON ORIENTATION FIELDFINGERPRINT CLASSIFICATION BASED ON ORIENTATION FIELD
FINGERPRINT CLASSIFICATION BASED ON ORIENTATION FIELDijesajournal
 
Real time implementation of unscented kalman filter for target tracking
Real time implementation of unscented kalman filter for target trackingReal time implementation of unscented kalman filter for target tracking
Real time implementation of unscented kalman filter for target trackingIAEME Publication
 

Similaire à Cab travel time prediction using ensemble models (20)

A Strategic Model For Dynamic Traffic Assignment
A Strategic Model For Dynamic Traffic AssignmentA Strategic Model For Dynamic Traffic Assignment
A Strategic Model For Dynamic Traffic Assignment
 
Hybrid Ant Colony Optimization for Real-World Delivery Problems Based on Real...
Hybrid Ant Colony Optimization for Real-World Delivery Problems Based on Real...Hybrid Ant Colony Optimization for Real-World Delivery Problems Based on Real...
Hybrid Ant Colony Optimization for Real-World Delivery Problems Based on Real...
 
Development of Methodology for Determining Earth Work Volume Using Combined S...
Development of Methodology for Determining Earth Work Volume Using Combined S...Development of Methodology for Determining Earth Work Volume Using Combined S...
Development of Methodology for Determining Earth Work Volume Using Combined S...
 
Channel and clipping level estimation for ofdm in io t –based networks a review
Channel and clipping level estimation for ofdm in io t –based networks a reviewChannel and clipping level estimation for ofdm in io t –based networks a review
Channel and clipping level estimation for ofdm in io t –based networks a review
 
Where Next
Where NextWhere Next
Where Next
 
Supply chain logistics : vehicle routing and scheduling
Supply chain logistics : vehicle  routing and  schedulingSupply chain logistics : vehicle  routing and  scheduling
Supply chain logistics : vehicle routing and scheduling
 
HOL, GDCT AND LDCT FOR PEDESTRIAN DETECTION
HOL, GDCT AND LDCT FOR PEDESTRIAN DETECTIONHOL, GDCT AND LDCT FOR PEDESTRIAN DETECTION
HOL, GDCT AND LDCT FOR PEDESTRIAN DETECTION
 
HOL, GDCT AND LDCT FOR PEDESTRIAN DETECTION
HOL, GDCT AND LDCT FOR PEDESTRIAN DETECTIONHOL, GDCT AND LDCT FOR PEDESTRIAN DETECTION
HOL, GDCT AND LDCT FOR PEDESTRIAN DETECTION
 
ESCC 2016, July 10-16, Athens, Greece
ESCC 2016, July 10-16, Athens, GreeceESCC 2016, July 10-16, Athens, Greece
ESCC 2016, July 10-16, Athens, Greece
 
Comparitive analysis of doa and beamforming algorithms for smart antenna systems
Comparitive analysis of doa and beamforming algorithms for smart antenna systemsComparitive analysis of doa and beamforming algorithms for smart antenna systems
Comparitive analysis of doa and beamforming algorithms for smart antenna systems
 
Intelligent Traffic Management
Intelligent Traffic ManagementIntelligent Traffic Management
Intelligent Traffic Management
 
Central moments of traffic delay at a signalized intersection
Central moments of traffic delay at a signalized intersectionCentral moments of traffic delay at a signalized intersection
Central moments of traffic delay at a signalized intersection
 
Presentation escc 2016
Presentation escc 2016Presentation escc 2016
Presentation escc 2016
 
Presentation of GreenYourMove's hybrid approach in 3rd International Conferen...
Presentation of GreenYourMove's hybrid approach in 3rd International Conferen...Presentation of GreenYourMove's hybrid approach in 3rd International Conferen...
Presentation of GreenYourMove's hybrid approach in 3rd International Conferen...
 
ALEXANDER FRACTIONAL INTEGRAL FILTERING OF WAVELET COEFFICIENTS FOR IMAGE DEN...
ALEXANDER FRACTIONAL INTEGRAL FILTERING OF WAVELET COEFFICIENTS FOR IMAGE DEN...ALEXANDER FRACTIONAL INTEGRAL FILTERING OF WAVELET COEFFICIENTS FOR IMAGE DEN...
ALEXANDER FRACTIONAL INTEGRAL FILTERING OF WAVELET COEFFICIENTS FOR IMAGE DEN...
 
Drobics, m. 2001: datamining using synergiesbetween self-organising maps and...
Drobics, m. 2001:  datamining using synergiesbetween self-organising maps and...Drobics, m. 2001:  datamining using synergiesbetween self-organising maps and...
Drobics, m. 2001: datamining using synergiesbetween self-organising maps and...
 
DETECTION OF MOVING OBJECT
DETECTION OF MOVING OBJECTDETECTION OF MOVING OBJECT
DETECTION OF MOVING OBJECT
 
FINGERPRINT CLASSIFICATION BASED ON ORIENTATION FIELD
FINGERPRINT CLASSIFICATION BASED ON ORIENTATION FIELDFINGERPRINT CLASSIFICATION BASED ON ORIENTATION FIELD
FINGERPRINT CLASSIFICATION BASED ON ORIENTATION FIELD
 
Real time implementation of unscented kalman filter for target tracking
Real time implementation of unscented kalman filter for target trackingReal time implementation of unscented kalman filter for target tracking
Real time implementation of unscented kalman filter for target tracking
 
How to Decide the Best Fuzzy Model in ANFIS
How to Decide the Best Fuzzy Model in ANFIS How to Decide the Best Fuzzy Model in ANFIS
How to Decide the Best Fuzzy Model in ANFIS
 

Plus de Ayan Sengupta

Elliptic Curve Cryptography: Arithmetic behind
Elliptic Curve Cryptography: Arithmetic behindElliptic Curve Cryptography: Arithmetic behind
Elliptic Curve Cryptography: Arithmetic behindAyan Sengupta
 
Pricing of Apple iPhone
Pricing of Apple iPhonePricing of Apple iPhone
Pricing of Apple iPhoneAyan Sengupta
 
Applications of Machine Learning in High Frequency Trading
Applications of Machine Learning in High Frequency TradingApplications of Machine Learning in High Frequency Trading
Applications of Machine Learning in High Frequency TradingAyan Sengupta
 
Case Study on Housing.com
Case Study on Housing.comCase Study on Housing.com
Case Study on Housing.comAyan Sengupta
 
Nike Stock Pitch: Analysis and Valuation
Nike Stock Pitch: Analysis and ValuationNike Stock Pitch: Analysis and Valuation
Nike Stock Pitch: Analysis and ValuationAyan Sengupta
 
2-Approximation Algorithm of Semi-Matching Problem
2-Approximation Algorithm of Semi-Matching Problem2-Approximation Algorithm of Semi-Matching Problem
2-Approximation Algorithm of Semi-Matching ProblemAyan Sengupta
 
Existence and Uniqueness of Algebraic Closure
Existence and Uniqueness of Algebraic ClosureExistence and Uniqueness of Algebraic Closure
Existence and Uniqueness of Algebraic ClosureAyan Sengupta
 

Plus de Ayan Sengupta (7)

Elliptic Curve Cryptography: Arithmetic behind
Elliptic Curve Cryptography: Arithmetic behindElliptic Curve Cryptography: Arithmetic behind
Elliptic Curve Cryptography: Arithmetic behind
 
Pricing of Apple iPhone
Pricing of Apple iPhonePricing of Apple iPhone
Pricing of Apple iPhone
 
Applications of Machine Learning in High Frequency Trading
Applications of Machine Learning in High Frequency TradingApplications of Machine Learning in High Frequency Trading
Applications of Machine Learning in High Frequency Trading
 
Case Study on Housing.com
Case Study on Housing.comCase Study on Housing.com
Case Study on Housing.com
 
Nike Stock Pitch: Analysis and Valuation
Nike Stock Pitch: Analysis and ValuationNike Stock Pitch: Analysis and Valuation
Nike Stock Pitch: Analysis and Valuation
 
2-Approximation Algorithm of Semi-Matching Problem
2-Approximation Algorithm of Semi-Matching Problem2-Approximation Algorithm of Semi-Matching Problem
2-Approximation Algorithm of Semi-Matching Problem
 
Existence and Uniqueness of Algebraic Closure
Existence and Uniqueness of Algebraic ClosureExistence and Uniqueness of Algebraic Closure
Existence and Uniqueness of Algebraic Closure
 

Dernier

原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档208367051
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptxthyngster
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]📊 Markus Baersch
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPramod Kumar Srivastava
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...Boston Institute of Analytics
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...soniya singh
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...Florian Roscheck
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryJeremy Anderson
 
IMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptxIMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptxdolaknnilon
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanMYRABACSAFRA2
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhijennyeacort
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsVICTOR MAESTRE RAMIREZ
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfJohn Sterrett
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queensdataanalyticsqueen03
 
9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home ServiceSapana Sha
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDRafezzaman
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort servicejennyeacort
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一F sss
 

Dernier (20)

E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data Story
 
IMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptxIMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptx
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population Mean
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business Professionals
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdf
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queens
 
9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
 

Cab travel time prediction using ensemble models

  • 1. - BASED ON HISTORICAL TRIP OBSERVATION CAB TRAVEL TIME PREDICTI N
  • 2. PROBLEM DEFINITION • New Booking - Time of Arrival • Shortest Route (Distance/Time) • Taxi-Passenger Demand Distribution Opportunity • Accurate Travel Time Prediction • Increased Fuel Efficiency • Optimized Utilization of Cabs Value Our project involves developing a model for predicting the travel time for a particular cab along different routes in a rectangular map of Porto. The output is based on several factors, like – • day of the week, • time of the day and other features derived from Floating Car Data.
  • 3. FLOATING CAR DATA ESTIMATION  Advantages  Most Inexpensive Data  Generally very accurate (GPS)  Disadvantages  FCD is usually sampled infrequently  High Density of data is required to get meaningful travel time predictions. Floating car data are position fixes of vehicles traversing city streets throughout the day.  Trajectories of Single Vehicles,  Location and Time of Lane Changes,  Traffic Density (vehicles per kilometer),  Traffic Flow (vehicles per hour),  Vehicle Speed  Length and Position of Traffic Jams  Use
  • 4. DATA DESCRIPTION  The dataset contained 9 features and a total of 17,10,760 observations of various parameters of taxi movement on Porto, Portugal.  Features : 1. TRIP_ID: (String) It contains an unique identifier for each trip. 2. CALL_TYPE: (Char) It identifies the way used to demand this service. 3. ORIGIN_CALL: (Integer) It contains an unique identifier for each phone number which was used to demand, at least, one service. It identifies the trip’s customer if CALL_TYPE=’A’. Otherwise, it assumes a NULL value. 4. ORIGIN_STAND: (Integer): It contains an unique identifier for the taxi stand. 5. TAXI_ID: (Integer): It contains an unique identifier for the taxi driver that performed each trip.
  • 5. DATA DESCRIPTION  The dataset contained 9 features and a total of 17,10,760 observations of various parameters of taxi movement on Porto, Portugal.  Features : 6. TIMESTAMP: (Integer) Unix Timestamp (in seconds). It identifies the trip’s start. 7. DAYTYPE: (Char) It identifies the type of the day in which the trip starts. (Holiday, Day before Holiday, Weekday) 8. MISSING_DATA: (Boolean) It is FALSE when the GPS data stream is complete and TRUE whenever one (or more) locations are missing. 9. POLYLINE: (String): It contains a list of GPS coordinates (i.e. WGS84 format) mapped as a string. Each pair of coordinates is of the form [LONGITUDE, LATITUDE].  For Prediction of Time we have considered Timestamp and Polyline.  The rows having missing value are not considered in our calculations.
  • 6. PREDICTION ARCHITECTURE  Brief overview of the various processing steps required Motion Detection Grid Matching Directed Graph Supervised Learning Prediction
  • 7. PREDICTION ARCHITECTURE  Motion Detection  Since the incoming streams of traffic data come from taxi cabs, delivery vehicles, or other commercial fleet vehicles, there will always be a certain amount of ambiguity between a slowdown in traffic and a commercial stop by the vehicle. (i.e. for a taxi customer, or a delivery vehicle dropping off packages). Therefore, any further processing must clean out all unwanted stops that are in the GPS logs. The most common and intuitive technique, and that which is used in this project is to track the number of consecutive GPS points that are less than a specified distance Dmax from each other.
  • 8. PREDICTION ARCHITECTURE  Grid Matching  Grid Matching is a widely studied problem in transportation research is perhaps the most computationally difficult and important subcomponent.  Input: {x1 , y1 , t1}  Output: {i1 , t1}
  • 9. PREDICTION ARCHITECTURE  Several distance measure: Manhattan Distance, Block Distance, Minkowski distance, Haversine Distance.  For our calculation we have used Haversine distance which is defined as:  For any two points on a sphere, the haversine of the central angle between them is given by  Where hav is the haversine function: d : the distance between the two points (along a great circle of the sphere; see spherical distance), r : the radius of the sphere, ∅1, ∅2: latitude of point 1 and latitude of point 2 𝜆1, 𝜆2 : longitude of point 1 and longitude of point 2
  • 10. PREDICTION ARCHITECTURE  Directed Graph  After Grid Matching of polyline covered by each training observation, we get a series of nodes and edges. Thus a directed graph is constructed using the grid points as nodes and velocity or time as edges.  Input: {i1 , t1}  Process:  e1,2 = t2-t1  e1,2 = (Haversine distance) / (t2-t1)  Output: Sparse Matrix M = {ei,j} for all i , j
  • 11. PREDICTION ARCHITECTURE  Supervised Learning  Decision Tree Classifier, Decision Tree Regression, Random Forest Regression, Historical Mean  Process:  The whole day is categorized in x units.  The timestamp of each edge and the time/velocity is fed into the system.  The output gives us a predicted value for a corresponding timestamp value.  Output: Sparse Matrix M = {ei,j} for all i, j at that timestamp.
  • 12. PREDICTION ARCHITECTURE  Decision Tree Classification:  Classification trees work just like regression trees, only they try to predict a discrete category (the class), rather than a numerical value. The response variable Y is categorical, so we can use information theory to measure how much we learn about it from knowing the value of another discrete variable A: I[Y | A] = 𝑎 P A = a ∗ I[Y | A = a] Where, I[Y | A = a] = H[Y ] − H[Y | A = a]  The definitions of entropy H[Y ] and conditional entropy H[Y |A = a]. H[Y | A = a] = - 𝑘 𝑝 𝑎𝑘 log2 𝑝 𝑎𝑘
  • 13. PREDICTION ARCHITECTURE  Decision Tree Regression:  If the target is a continuous value, then for node m, representing a region Rm with Nm criterion, a common criterion is to minimize the Mean Square Error: mc = 1 𝑁 𝑚 𝑖 ∈ 𝑁 𝑚 𝑦𝑖 H(x) = 1 𝑁 𝑚 𝑖 ∈ 𝑁 𝑚 (𝑦𝑖−𝑚 𝑐)2  Complexity : O(nfeatures . nsamples . log(nsamples))
  • 14. PREDICTION ARCHITECTURE  Random Forest Regression:  Here, each tree in the ensemble is built from a sample drawn with replacement (i.e., a bootstrap sample) from the training set.  Disadvantage: In addition, when splitting a node during the construction of the tree, the split that is chosen is no longer the best split among all features. Instead, the split that is picked is the best split among a random subset of the features. As a result of this randomness, the bias of the forest usually slightly increases (with respect to the bias of a single non-random tree)  Advantage: But, due to averaging, its variance also decreases, usually more than compensating for the increase in bias, hence yielding an overall better model.
  • 15. PREDICTION ARCHITECTURE  Gradient Tree Boosting:  Gradient Tree Boosting or Gradient Boosted Regression Trees (GBRT) is a generalization of boosting to arbitrary differentiable loss functions. 𝐹𝑚(𝑥) = 𝐹 𝑚−1(𝑥) + ℎ(𝑥) = y  Find h, such that residual 𝑦 − 𝐹 𝑚−1 𝑥 is minimized. 𝐹𝑚 𝑥 = 𝐹 𝑚−1 𝑥 + 𝛾 𝑚 𝑖=1 𝑛 𝛻𝐹 𝐿 𝑦𝑖, 𝐹 𝑚−1 𝑥𝑖  Loss Functions: -  Regression: LS, LAS, HUBER, QUANTILE  Classification: Binomial Deviance, Multinomial Deviance, Exponential Loss.
  • 16. DATA INFERENCE  The Demand distribution of taxi along the daytime is shown here. 1. Highest peak is at the range of 9 to 10 A.M. (Office Hours) 2. A steady distribution thereafter till 6 P.M. 3. Can be utilized to model the value of charge/Km during the course of the day. 4. Can be used to optimize the model of resource utilization and thereby profit generation.
  • 17. DATA INFERENCE Individual trip time during daytime is shown here.  Travelling time of maximum trips are within half an hour.
  • 18. DATA INFERENCE  The trip time shows a cluster of points in the time domain of 26000 to 82000 corresponding to 7 A.M. and 11 P.M. respectively.  Most of the trip lengths are within 10 km.  This is in conformity with the previous plot of travelling time where most of the travelling time are within half an hour.
  • 20. DATA INFERENCE  Top 5 important road segments of the city are  [[41.162292, -8.60787], [41.162031, -8.60571]]  [[41.16258, -8.61003], [41.162292, -8.60787]]  [[41.163291, -8.6157], [41.163021, -8.613639]]  [[41.163525, -8.617644], [41.163291, -8.6157]]  [[41.163768, -8.619678], [41.163525, -8.617644]]  0.0772321809805613, 0.07708869873451546, 0.07441287173742973, 0.07212227668575326, 0.06932519386480848 Rank Segment Centrality 1 [41.162292, -8.60787], [41.162031, - 8.60571] 0.0772321809805613 2 [41.16258, -8.61003], [41.162292, -8.60787] 0.0770886987345154 6 3 [41.163291, -8.6157], [41.163021, - 8.613639] 0.0744128717374297 3 4 [41.163525, -8.617644], [41.163291, - 8.6157] 0.0721222766857532 6
  • 21. DATA INFERENCE Rank Geo-Location Centrality 1 [41.162031, -8.60571] 0.08277015940786502 2 [41.162292, -8.60787] 0.07877967724696255 3 [41.16258, -8.61003] 0.07775467831692078 4 [41.163021, - 8.613639] 0.07690847152899795 5 [41.163291, -8.6157] 0.07608514260606453
  • 22. PERFORMANCE EVALUATION  Mean Absolute Percentage Error  The MAPE (Mean Absolute Percent Error) measures the size of the error in percentage terms. It is calculated as the average of the unsigned percentage error.  Mean Absolute Error:  The MAE measures the average magnitude of the errors in a set of forecasts, without considering their direction. It measures accuracy for continuous variables.  Mean Percentage Error:  MPE is the computed average of percentage errors by which forecasts of a model differ from actual values of the quantity being forecast.
  • 23. HYPOTHESIS TESTING (P-VALUE CALCULATION)  The Percentage Error is estimated using the Null Hypothesis:  H0 : PE <= 0  H1 : PE > 0  Here, the 𝑃𝐸 = mean Percentage Error ( ∆𝑡 𝑡 ) for the selected trips  t = 𝑃𝐸 𝑆 𝑛  t > ta for a = critical value (one-sided) implies we reject H0 in favor of H1
  • 24. PERFORMANCE EVALUATION MAPE MAE MPE P-Value DT(R) 0.3713 252.00s 0.3556 0.999781858 DT(C) 0.3729 253.33s 0.3365 0.999182999 RF(C) 0.3692 253.33s 0.3933 0.999430984 Historical Mean 0.5899 274.24s -0.3664 0.043007171
  • 25. PERFORMANCE EVALUATION Blue Original Path 345s Red Historical Mean 467.51 s Green Decision Tree / Random Forest 240s
  • 27. FUTURE VALUE ADDITION  Apply an incremental ARIMA model to each segment for a specific time of the day (Data needed)  Use Map Matching Technique to position the standard points on the road accurately.  Real Time Traffic Map (travelling route management, traffic signal optimization etc.)  Optimization of our approach in order to reduce model error.  To make this prediction real time and make better visualization of prediction.  Implement it locally.  Consolidate the Features in an Application
  • 28. REFERENCES  http://arxiv.org/pdf/1012.4249.pdf  http://arxiv.org/pdf/1509.05257.pdf  https://en.wikipedia.org/wiki/Haversine_formula  http://toc.proceedings.com/04373webtoc.pdf  http://www.ivt.ethz.ch/oev/ped2012/vpl/publications/reports/ab379.pdf  https://www.kaggle.com/c/pkdd-15-taxi-trip-time-prediction-ii  http://www.galitshmueli.com/data-mining-project/predicting-customer-cancellation-cab-bookings- yourcabscom  http://scikit-learn.org/stable/modules/ensemble.html  https://www.scribblemaps.com/create/#id=srin&lat=41.1619705&lng=- 8.60570389999998&z=14&t=road