Public Transit data is largely available in an open-source data format, GTFS. Using FME, this presentation will describe how to efficiently synthesize the extremely large datasets, from cities all over the world into meaningful information used to run TransitDatabase.com.
2. About Me
BCIT GIS StudentUniversity of
Victoria
Advanced Diploma, 2016BA Geography,
GIS/Geomatics
Concentration, 2013
Workplace
Practicum
Jan-May, 2016
3. GTFS Datasets – What are they?
Quite simply –
a common format and
specification for public
transportation schedules
and their associated
geographic information
4. GTFS Structure
• Collection of 6-13 CSV files (.txt),
packaged into a single ZIP file
• Each CSV is a table of relevant
components of a transit system’s
scheduling, stops, routes and related
attributes
6. Required Components
• routes.txt
• Transit routes. A route is a group of trips that are
displayed to riders as a single service.
• Toronto has 197 routes on all modes of transit
• stops.txt
• Individual locations where vehicles pick up or drop off
passengers.
• Chicago has 11449 transit stops in the CTA
7. Required Components
• trips.txt
• Trips for each route. A trip is a sequence of two or more
stops that occurs at specific time.
• NYC Subways make 19425 trips per week
• stop_times.txt
• Times that a vehicle arrives at and departs from
individual stops for each trip.
• TransLink busses, trains and boats make 2.4m scheduled stops/week
8. Required Components
• agency.txt
• One or more transit agencies that provide the data in
this feed.
• Vancouver has three, TransLink/CMBC, WCE and BCRTC
• calendar.txt
• Dates for service IDs using a weekly schedule. Specify
when service starts and ends, as well as days of the week
where service is available.
9. Common Optional Components
• shapes.txt
• Rules for drawing lines on a map to represent a transit
organization's routes.
• TransLink services travel 1229 different routes per week- on 246 unique routes –
25 Brentwood Stn, 25 UBC, 25 Granville, 25 BCIT
11. What's the problem?
• Format can be difficult for the average person to access
and interpret
• Especially if looking for information involving transit data from
multiple operators
• Route Maps as KML
• Tabular Schedules in Excel
• Stop locations in GeoJSON
• Service Extent in Shapefile
• Learning Opportunity!
12. How can FME Help?
• FME 2016 introduced some shiny new features
making my life a lot easier, saved time and cut my
workspace complexity in half
• Direct GTFS Format Support
• MapboxStyler
• FeatureWriter
• AttributeManager❤
13. Project Considerations
• Needed to be able to serve data quickly
• ‘Student’ budget
• Processing files can be expensive and time
consuming if done on the fly, so pre-processing
essential
• Historical record accessibility
14. TransitDatabase.com
• Web portal to view and download desired information
• Multiple datasets
• Multiple formats (GeoJSON, SHP, KML, XLS, etc.)
• Multiple versions of the GTFS datasets
Still quite a work in progress, but the shell of the site is up
and running now
18. Overall Process
Download
GTFS from
Transit Agency
Run Main GTFS
Workspace
FTP resulting
output files to
web server
Check to see
if new file
found
This process is run overnight,
every night using FME Cloud
Average runtime for one new
GTFS file is between 2 and 20
minutes (incl. FTP uploads)
24. Next Steps?
Website
• Better way of
implementing the maps
– geoJsons and Mapbox are
limited to simple styling
FME Workflows
• Integrate with GTFS data
feeds to only run when
updates are found
• Many many more outputs
and formats
-regular interaction
GTFS started out as a side project of a Google employee, worked together with TriMet in Portland OR to create an interchange format for their internal data. Portland became the first city to be featured in the first version of Google's “Transit Trip Planner
format released as the Google Transit Feed Specification
Later renamed General Transit Feed Specification to emphasize community involvement in project
This boils dwon to 6 major components,
Stops has lat/long
Additional optional tables include:
Fare Attributes and Rules
Zones, prices
Frequencies for routes that don’t operate on a set schedule
Subway in Toronto comes every 2 minutes M-F, 6:30am to 9:00am
Transfer/Connection Rules
General Feed Metadata
Versions, effective/expiry details, publisher
Usage of these varies from system to system, and some operators add additional files outside of the specification too
Basic gist is that there is lots of information in lots of files, but they share some fundamental commonalities between different sources. Using all this pieces together, you can start to look at the bigger picture of the transit infrastructure of an area
Learning Project
4-5 years ago, CRD Project Competition
Simple web hosting package, postgres database hosted on AWS.
PHP5, Little Javascript
Plug Owen
Everything is really quite simple, and straightforward. Lots of manhandling attribute data and styling outputs for various formats
Still needs performance tweaking – removing unnecessary fields returned, attributes, etc
This is the brain of the operation – query SQLite DB for the relevant information. Probably 10x faster than using various combinations of Feature Mergers.