SlideShare une entreprise Scribd logo
1  sur  28
Mongo Analytics – 
Learn aggregation by example 
Exploratory Analytics and 
Visualization using Flight Data 
www.jsonstudio.com
Analyzing Flight Data 
• JSON data imported from CSV downloaded from www.transtats.bts.gov 
• Sample document for a flight: 
{ 
"_id": { "$oid": "534205f61c479f6149a92709" }, 
"YEAR": 2013, "QUARTER": 1, 
"MONTH": 1, 
"DAY_OF_MONTH": 18, 
"DAY_OF_WEEK": 5, 
"FL_DATE": "2013-01-18”, 
"UNIQUE_CARRIER": "DL”, 
"AIRLINE_ID": 19790, 
"CARRIER": "DL", 
"TAIL_NUM": "N325US”, 
"FL_NUM": 1497, 
"ORIGIN_AIRPORT_ID": 14100, 
"ORIGIN_AIRPORT_SEQ_ID": 1410002, 
"ORIGIN_CITY_MARKET_ID": 34100, 
"ORIGIN": "PHL", 
"ORIGIN_CITY_NAME": "Philadelphia, PA", 
"ORIGIN_STATE_ABR": "PA”, 
"ORIGIN_STATE_FIPS": 42, 
"DEST_AIRPORT_ID": 13487, 
"DEST_AIRPORT_SEQ_ID": 1348702, 
"DEST_CITY_MARKET_ID": 31650, 
"DEST": "MSP", 
"DEST_CITY_NAME": "Minneapolis, MN", 
"DEST_STATE_ABR": "MN", 
"DEST_STATE_FIPS": 27, 
"DEST_STATE_NM": "Minnesota", 
"DEST_WAC": 63, 
"CRS_DEP_TIME": 805, 
"DEP_TIME": 758, 
"DEP_DELAY": -7, 
"DEP_DELAY_NEW": 0, 
"DEP_DEL15": 0, 
"DEP_DELAY_GROUP": -1, 
"DEP_TIME_BLK": "0800-0859", 
"TAXI_OUT": 24, 
"WHEELS_OFF": 822, 
"WHEELS_ON": 958, 
"TAXI_IN": 4, 
"CRS_ARR_TIME": 1015, 
"ARR_TIME": 1002, 
"ARR_DELAY": -13, 
"ARR_DELAY_NEW": 0, 
"ARR_DEL15": 0, 
"ARR_DELAY_GROUP": -1, 
"ARR_TIME_BLK": "1000-1059", 
"CANCELLED": 0, 
"CANCELLATION_CODE": "", 
"DIVERTED": 0, 
"CRS_ELAPSED_TIME": 190, 
"ACTUAL_ELAPSED_TIME": 184, 
"AIR_TIME": 156, 
"FLIGHTS": 1, 
"DISTANCE": 980, 
"DISTANCE_GROUP": 4, 
"CARRIER_DELAY": "", 
"WEATHER_DELAY": "", 
"NAS_DELAY": "", 
"SECURITY_DELAY": "", 
"LATE_AIRCRAFT_DELAY": "", 
"FIRST_DEP_TIME": "", 
"TOTAL_ADD_GTIME": "", 
"LONGEST_ADD_GTIME": "", 
"": "" 
} 
• We will build aggregation pipelines and visualize data using JSON Studio (www.jsonstudio.com) 
• You will fall madly in love with the Aggregation Framework and it’s capabilities
MongoDB aggregation steps/stages 
• Grouping 
• Matching/filtering 
• Projection 
• Sorting 
• Unwind 
• Limit, skip 
• Added in 2.6 
– Out 
– Redact
Who are the largest carriers?
Some Carrier Stats { 
"$group": { 
"_id": { 
"CARRIER": "$CARRIER" 
}, 
"_avg_DEP_DELAY": { 
"$avg": "$DEP_DELAY" 
}, 
"_avg_ARR_DELAY": { 
"$avg": "$ARR_DELAY" 
}, 
"_avg_DISTANCE_GROUP": { 
"$avg": "$DISTANCE_GROUP" 
}, 
"_avg_TAXI_IN": { 
"$avg": "$TAXI_IN" 
}, 
"_avg_TAXI_OUT": { 
"$avg": "$TAXI_OUT" 
} 
} 
} 
{ 
"_id": { 
"CARRIER": "9E" 
}, 
"_avg_DEP_DELAY": 8.45451754385965, 
"_avg_ARR_DELAY": 3.3237368838726744, 
"_avg_DISTANCE_GROUP": 2.2188688815622624, 
"_avg_TAXI_IN": 7.082464246424642, 
"_avg_TAXI_OUT": 20.558167120639663 
}
Which airports have the most cancellations?
Which carriers are most at fault for cancellations?
Arrival delays by distance
Delays by distance by carrier
Delays by distance by carrier – long haul only
Words of caution (courtesy of David Weisman)
Words of caution (courtesy of David Weisman)
What to do? 
“Touch” the data – e.g. Histograms
Words of caution (courtesy of David Weisman)
Words of caution (courtesy of David Weisman)
Order Does Matter 
http://docs.mongodb.org/manual/core/aggregation-pipeline-optimization/
An example for $unwind 
Count how many airports each carrier lands in 
{ 
"_id": { 
"$oid": "5383623b7bfb8767e2e9ca1f" 
}, 
"iata": "00M", 
"airport": "Thigpen ", 
"city": "Bay Springs", 
"state": "MS", 
"country": "USA", 
"lat": 31.95376472, 
"long": -89.23450472, 
"carriers": [ 
"AA", 
"UA", 
"DL", 
"BA" 
] 
} 
… 
[ 
{ 
"_id": { 
"$oid": "5383623b7bfb8767e2e9ca1f" 
}, 
"iata": "00M", 
"airport": "Thigpen ", 
"city": "Bay Springs", 
"state": "MS", 
"country": "USA", 
"lat": 31.95376472, 
"long": -89.23450472, 
"carriers": "AA" 
}, 
{ 
"_id": { 
"$oid": "542217ffc026b858b47a6640" 
}, 
"iata": "00M", 
"airport": "Thigpen ", 
"city": "Bay Springs", 
"state": "MS", 
"country": "USA", 
"lat": 31.95376472, 
"long": -89.23450472, 
"carriers": "UA" 
} 
… 
] 
[ 
{ 
"_id": { 
"carriers": "BA" 
}, 
"count": 10 
}, 
{ 
"_id": { 
"carriers": "DL" 
}, 
"count": 10 
} 
… 
] 
airports2 
$unwind $group
Hub airports – try1
Hub airports – try2
Hub airports – try 3 
{ $group: { _id: { ORIGIN: "$ORIGIN", CARRIER: "$CARRIER" }, count: { $sum: 1 } } }, 
{ $project: { airport: "$_id.ORIGIN", carrier: "$_id.CARRIER", "count": 1 } }, 
{ $match: { "count": { $gte: "$$hub_threshold" } } }, 
{ $group: { 
_id: { airport: "$airport" }, 
airlines: { $sum: 1 }, 
flights: { $sum: "$count" }, 
avg_airline: { $avg: "$count" }, 
max_airline: { $max: "$count" } } }, 
{ $project: { 
"airlines": 1, 
"flights": 1, 
"avg_airline": 1, 
"max_airline": 1, 
"avg_no_max": { $divide: [ { $subtract: [ "$flights", "$max_airline" ] }, "$airlines" ] } } }, 
{ $sort: { "flights": -1 } }
Hub airports
From-to Insensitive 
{ $group: { _id: { UNIQUE_CARRIER: "$UNIQUE_CARRIER", ORIGIN: "$ORIGIN", 
DEST: "$DEST" }, count: { $sum: 1 } } }, 
{ $match: { "count": { $gt: "$$count_threshold" } } }, 
{ $project: { _id_UNIQUE_CARRIER: "$_id.UNIQUE_CARRIER", "count": 1, 
rroute: { 
$cond: [ 
{ $lt: [ { $cmp: [ "$_id.ORIGIN", "$_id.DEST" ] }, 0 ] }, 
{ $concat: [ "$_id.ORIGIN", "$_id.DEST" ] }, 
{ $concat: [ "$_id.DEST", "$_id.ORIGIN" ] } 
] } } 
}, 
{ $group: { _id: { _id_UNIQUE_CARRIER: "$_id_UNIQUE_CARRIER", rroute: "$rroute" }, 
_sum_count: { $sum: "$count" } } }
Hub visualization (using routes – from/to, $$count=1, origin treemap)
Using “R” for Advanced Analytics 
• Using a MongoDB driver for “R” 
• Using the JSON Studio Gateway (including using aggregation output) 
install.packages("jSonarR") 
library(’jSonarR') 
con2 <- sonarR::new.SonarConnection('https://localhost:8443', 'localhost', 'flights', port=47017, username="ron", 
pwd=”<pwd>”) 
nyc_by_day <- sonarR::sonarAgg(con2, 'delays_by_day', 'NYCFlights', 
colClasses=c(X_avg_AirTime='numeric', X_avg_ArrDelay='numeric',X_avg_DepDelay='numeric')) 
lm.out = lm(nyc_by_day$X_sum_ArrDelay ~ nyc_by_day$X_sum_AirTime) 
MongoDB
Recommendation engine example: jsonstudio.com
NYC Flights – Quiz Questions 
• Of the three airports, who has the most flights? 
– Nyc1 
• Who has the most cancellations and highest cancellation ratio? 
– Nyc2 
• Taxi in/out times? 
– Nyc3 
• What about delays? 
– Nyc4 
• How do delays differ by month? 
– Nyc5 + nyc5 
– (summer vs. winter / bubble size vs. y-axis) 
• What about weather delays only? Which months are worse? Are the three airports 
equivalent? 
– Nyc7 + nyc7 
• Where can I fly to if I work for Boeing and am very loyal (and on which aicraft)? 
– Nyc8 + map
www.jsonstudio.com 
(download – presentation and eval copy) 
Discount code: MUGTX* 
(* Good for 1 month after event) 
ron@jsonar.com
MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and Visualization Using Flight Data

Contenu connexe

Tendances

MongoDB Aggregation Framework
MongoDB Aggregation FrameworkMongoDB Aggregation Framework
MongoDB Aggregation Framework
Tyler Brock
 
Aggregation Framework
Aggregation FrameworkAggregation Framework
Aggregation Framework
MongoDB
 
Aggregation in MongoDB
Aggregation in MongoDBAggregation in MongoDB
Aggregation in MongoDB
Kishor Parkhe
 
"Powerful Analysis with the Aggregation Pipeline (Tutorial)"
"Powerful Analysis with the Aggregation Pipeline (Tutorial)""Powerful Analysis with the Aggregation Pipeline (Tutorial)"
"Powerful Analysis with the Aggregation Pipeline (Tutorial)"
MongoDB
 

Tendances (20)

MongoDB - Aggregation Pipeline
MongoDB - Aggregation PipelineMongoDB - Aggregation Pipeline
MongoDB - Aggregation Pipeline
 
MongoDB Aggregation
MongoDB Aggregation MongoDB Aggregation
MongoDB Aggregation
 
MongoDB World 2016 : Advanced Aggregation
MongoDB World 2016 : Advanced AggregationMongoDB World 2016 : Advanced Aggregation
MongoDB World 2016 : Advanced Aggregation
 
Joins and Other Aggregation Enhancements Coming in MongoDB 3.2
Joins and Other Aggregation Enhancements Coming in MongoDB 3.2Joins and Other Aggregation Enhancements Coming in MongoDB 3.2
Joins and Other Aggregation Enhancements Coming in MongoDB 3.2
 
MongoDB Aggregation Framework
MongoDB Aggregation FrameworkMongoDB Aggregation Framework
MongoDB Aggregation Framework
 
Aggregation Framework MongoDB Days Munich
Aggregation Framework MongoDB Days MunichAggregation Framework MongoDB Days Munich
Aggregation Framework MongoDB Days Munich
 
Aggregation Framework in MongoDB Overview Part-1
Aggregation Framework in MongoDB Overview Part-1Aggregation Framework in MongoDB Overview Part-1
Aggregation Framework in MongoDB Overview Part-1
 
Aggregation Framework
Aggregation FrameworkAggregation Framework
Aggregation Framework
 
MongoDB Europe 2016 - Graph Operations with MongoDB
MongoDB Europe 2016 - Graph Operations with MongoDBMongoDB Europe 2016 - Graph Operations with MongoDB
MongoDB Europe 2016 - Graph Operations with MongoDB
 
Aggregation in MongoDB
Aggregation in MongoDBAggregation in MongoDB
Aggregation in MongoDB
 
MongoDB Europe 2016 - Advanced MongoDB Aggregation Pipelines
MongoDB Europe 2016 - Advanced MongoDB Aggregation PipelinesMongoDB Europe 2016 - Advanced MongoDB Aggregation Pipelines
MongoDB Europe 2016 - Advanced MongoDB Aggregation Pipelines
 
Querying Nested JSON Data Using N1QL and Couchbase
Querying Nested JSON Data Using N1QL and CouchbaseQuerying Nested JSON Data Using N1QL and Couchbase
Querying Nested JSON Data Using N1QL and Couchbase
 
Webinar: Working with Graph Data in MongoDB
Webinar: Working with Graph Data in MongoDBWebinar: Working with Graph Data in MongoDB
Webinar: Working with Graph Data in MongoDB
 
2014 bigdatacamp asya_kamsky
2014 bigdatacamp asya_kamsky2014 bigdatacamp asya_kamsky
2014 bigdatacamp asya_kamsky
 
Beyond the Basics 2: Aggregation Framework
Beyond the Basics 2: Aggregation Framework Beyond the Basics 2: Aggregation Framework
Beyond the Basics 2: Aggregation Framework
 
Data Governance with JSON Schema
Data Governance with JSON SchemaData Governance with JSON Schema
Data Governance with JSON Schema
 
Building Your First MongoDB Application (Mongo Austin)
Building Your First MongoDB Application (Mongo Austin)Building Your First MongoDB Application (Mongo Austin)
Building Your First MongoDB Application (Mongo Austin)
 
Using a mobile phone as a therapist - Superweek 2018
Using a mobile phone as a therapist - Superweek 2018Using a mobile phone as a therapist - Superweek 2018
Using a mobile phone as a therapist - Superweek 2018
 
MongoDB Europe 2016 - ETL for Pros – Getting Data Into MongoDB The Right Way
MongoDB Europe 2016 - ETL for Pros – Getting Data Into MongoDB The Right WayMongoDB Europe 2016 - ETL for Pros – Getting Data Into MongoDB The Right Way
MongoDB Europe 2016 - ETL for Pros – Getting Data Into MongoDB The Right Way
 
"Powerful Analysis with the Aggregation Pipeline (Tutorial)"
"Powerful Analysis with the Aggregation Pipeline (Tutorial)""Powerful Analysis with the Aggregation Pipeline (Tutorial)"
"Powerful Analysis with the Aggregation Pipeline (Tutorial)"
 

Similaire à MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and Visualization Using Flight Data

Real-time search in Drupal with Elasticsearch @Moldcamp
Real-time search in Drupal with Elasticsearch @MoldcampReal-time search in Drupal with Elasticsearch @Moldcamp
Real-time search in Drupal with Elasticsearch @Moldcamp
Alexei Gorobets
 
1403 app dev series - session 5 - analytics
1403   app dev series - session 5 - analytics1403   app dev series - session 5 - analytics
1403 app dev series - session 5 - analytics
MongoDB
 

Similaire à MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and Visualization Using Flight Data (20)

Introduction to MongoDB for C# developers
Introduction to MongoDB for C# developersIntroduction to MongoDB for C# developers
Introduction to MongoDB for C# developers
 
SDKs, the good the bad the ugly - Japan
SDKs, the good the bad the ugly - JapanSDKs, the good the bad the ugly - Japan
SDKs, the good the bad the ugly - Japan
 
Using R for Building a Simple and Effective Dashboard
Using R for Building a Simple and Effective DashboardUsing R for Building a Simple and Effective Dashboard
Using R for Building a Simple and Effective Dashboard
 
Drupal Mobile
Drupal MobileDrupal Mobile
Drupal Mobile
 
Agile Testing Days 2018 - API Fundamentals - postman collection
Agile Testing Days 2018 - API Fundamentals - postman collectionAgile Testing Days 2018 - API Fundamentals - postman collection
Agile Testing Days 2018 - API Fundamentals - postman collection
 
Peggy elasticsearch應用
Peggy elasticsearch應用Peggy elasticsearch應用
Peggy elasticsearch應用
 
Couchbase N1QL: Index Advisor
Couchbase N1QL: Index AdvisorCouchbase N1QL: Index Advisor
Couchbase N1QL: Index Advisor
 
elasticsearch - advanced features in practice
elasticsearch - advanced features in practiceelasticsearch - advanced features in practice
elasticsearch - advanced features in practice
 
Elasticsearch in 15 Minutes
Elasticsearch in 15 MinutesElasticsearch in 15 Minutes
Elasticsearch in 15 Minutes
 
MongoDB 3.2 - Analytics
MongoDB 3.2  - AnalyticsMongoDB 3.2  - Analytics
MongoDB 3.2 - Analytics
 
Understanding N1QL Optimizer to Tune Queries
Understanding N1QL Optimizer to Tune QueriesUnderstanding N1QL Optimizer to Tune Queries
Understanding N1QL Optimizer to Tune Queries
 
Map/Confused? A practical approach to Map/Reduce with MongoDB
Map/Confused? A practical approach to Map/Reduce with MongoDBMap/Confused? A practical approach to Map/Reduce with MongoDB
Map/Confused? A practical approach to Map/Reduce with MongoDB
 
MongoDB for Time Series Data Part 2: Analyzing Time Series Data Using the Agg...
MongoDB for Time Series Data Part 2: Analyzing Time Series Data Using the Agg...MongoDB for Time Series Data Part 2: Analyzing Time Series Data Using the Agg...
MongoDB for Time Series Data Part 2: Analyzing Time Series Data Using the Agg...
 
Joins and Other MongoDB 3.2 Aggregation Enhancements
Joins and Other MongoDB 3.2 Aggregation EnhancementsJoins and Other MongoDB 3.2 Aggregation Enhancements
Joins and Other MongoDB 3.2 Aggregation Enhancements
 
Spark - Citi Bike NYC
Spark - Citi Bike NYCSpark - Citi Bike NYC
Spark - Citi Bike NYC
 
Database api
Database apiDatabase api
Database api
 
Webinar: Applikationsentwicklung mit MongoDB : Teil 5: Reporting & Aggregation
Webinar: Applikationsentwicklung mit MongoDB: Teil 5: Reporting & AggregationWebinar: Applikationsentwicklung mit MongoDB: Teil 5: Reporting & Aggregation
Webinar: Applikationsentwicklung mit MongoDB : Teil 5: Reporting & Aggregation
 
Real-time search in Drupal with Elasticsearch @Moldcamp
Real-time search in Drupal with Elasticsearch @MoldcampReal-time search in Drupal with Elasticsearch @Moldcamp
Real-time search in Drupal with Elasticsearch @Moldcamp
 
1403 app dev series - session 5 - analytics
1403   app dev series - session 5 - analytics1403   app dev series - session 5 - analytics
1403 app dev series - session 5 - analytics
 
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
 

Plus de MongoDB

Plus de MongoDB (20)

MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB AtlasMongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
 
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
 
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
 
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDBMongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
 
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
 
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series DataMongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
 
MongoDB SoCal 2020: MongoDB Atlas Jump Start
 MongoDB SoCal 2020: MongoDB Atlas Jump Start MongoDB SoCal 2020: MongoDB Atlas Jump Start
MongoDB SoCal 2020: MongoDB Atlas Jump Start
 
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
 
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
 
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
 
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
 
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your MindsetMongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
 
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas JumpstartMongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
 
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
 
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
 
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep DiveMongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
 
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & GolangMongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
 
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
 
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
 
MongoDB .local Paris 2020: Les bonnes pratiques pour sécuriser MongoDB
MongoDB .local Paris 2020: Les bonnes pratiques pour sécuriser MongoDBMongoDB .local Paris 2020: Les bonnes pratiques pour sécuriser MongoDB
MongoDB .local Paris 2020: Les bonnes pratiques pour sécuriser MongoDB
 

Dernier

IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
Earley Information Science
 

Dernier (20)

How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdf
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 

MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and Visualization Using Flight Data

  • 1. Mongo Analytics – Learn aggregation by example Exploratory Analytics and Visualization using Flight Data www.jsonstudio.com
  • 2. Analyzing Flight Data • JSON data imported from CSV downloaded from www.transtats.bts.gov • Sample document for a flight: { "_id": { "$oid": "534205f61c479f6149a92709" }, "YEAR": 2013, "QUARTER": 1, "MONTH": 1, "DAY_OF_MONTH": 18, "DAY_OF_WEEK": 5, "FL_DATE": "2013-01-18”, "UNIQUE_CARRIER": "DL”, "AIRLINE_ID": 19790, "CARRIER": "DL", "TAIL_NUM": "N325US”, "FL_NUM": 1497, "ORIGIN_AIRPORT_ID": 14100, "ORIGIN_AIRPORT_SEQ_ID": 1410002, "ORIGIN_CITY_MARKET_ID": 34100, "ORIGIN": "PHL", "ORIGIN_CITY_NAME": "Philadelphia, PA", "ORIGIN_STATE_ABR": "PA”, "ORIGIN_STATE_FIPS": 42, "DEST_AIRPORT_ID": 13487, "DEST_AIRPORT_SEQ_ID": 1348702, "DEST_CITY_MARKET_ID": 31650, "DEST": "MSP", "DEST_CITY_NAME": "Minneapolis, MN", "DEST_STATE_ABR": "MN", "DEST_STATE_FIPS": 27, "DEST_STATE_NM": "Minnesota", "DEST_WAC": 63, "CRS_DEP_TIME": 805, "DEP_TIME": 758, "DEP_DELAY": -7, "DEP_DELAY_NEW": 0, "DEP_DEL15": 0, "DEP_DELAY_GROUP": -1, "DEP_TIME_BLK": "0800-0859", "TAXI_OUT": 24, "WHEELS_OFF": 822, "WHEELS_ON": 958, "TAXI_IN": 4, "CRS_ARR_TIME": 1015, "ARR_TIME": 1002, "ARR_DELAY": -13, "ARR_DELAY_NEW": 0, "ARR_DEL15": 0, "ARR_DELAY_GROUP": -1, "ARR_TIME_BLK": "1000-1059", "CANCELLED": 0, "CANCELLATION_CODE": "", "DIVERTED": 0, "CRS_ELAPSED_TIME": 190, "ACTUAL_ELAPSED_TIME": 184, "AIR_TIME": 156, "FLIGHTS": 1, "DISTANCE": 980, "DISTANCE_GROUP": 4, "CARRIER_DELAY": "", "WEATHER_DELAY": "", "NAS_DELAY": "", "SECURITY_DELAY": "", "LATE_AIRCRAFT_DELAY": "", "FIRST_DEP_TIME": "", "TOTAL_ADD_GTIME": "", "LONGEST_ADD_GTIME": "", "": "" } • We will build aggregation pipelines and visualize data using JSON Studio (www.jsonstudio.com) • You will fall madly in love with the Aggregation Framework and it’s capabilities
  • 3. MongoDB aggregation steps/stages • Grouping • Matching/filtering • Projection • Sorting • Unwind • Limit, skip • Added in 2.6 – Out – Redact
  • 4. Who are the largest carriers?
  • 5. Some Carrier Stats { "$group": { "_id": { "CARRIER": "$CARRIER" }, "_avg_DEP_DELAY": { "$avg": "$DEP_DELAY" }, "_avg_ARR_DELAY": { "$avg": "$ARR_DELAY" }, "_avg_DISTANCE_GROUP": { "$avg": "$DISTANCE_GROUP" }, "_avg_TAXI_IN": { "$avg": "$TAXI_IN" }, "_avg_TAXI_OUT": { "$avg": "$TAXI_OUT" } } } { "_id": { "CARRIER": "9E" }, "_avg_DEP_DELAY": 8.45451754385965, "_avg_ARR_DELAY": 3.3237368838726744, "_avg_DISTANCE_GROUP": 2.2188688815622624, "_avg_TAXI_IN": 7.082464246424642, "_avg_TAXI_OUT": 20.558167120639663 }
  • 6. Which airports have the most cancellations?
  • 7. Which carriers are most at fault for cancellations?
  • 8. Arrival delays by distance
  • 9. Delays by distance by carrier
  • 10. Delays by distance by carrier – long haul only
  • 11. Words of caution (courtesy of David Weisman)
  • 12. Words of caution (courtesy of David Weisman)
  • 13. What to do? “Touch” the data – e.g. Histograms
  • 14. Words of caution (courtesy of David Weisman)
  • 15. Words of caution (courtesy of David Weisman)
  • 16. Order Does Matter http://docs.mongodb.org/manual/core/aggregation-pipeline-optimization/
  • 17. An example for $unwind Count how many airports each carrier lands in { "_id": { "$oid": "5383623b7bfb8767e2e9ca1f" }, "iata": "00M", "airport": "Thigpen ", "city": "Bay Springs", "state": "MS", "country": "USA", "lat": 31.95376472, "long": -89.23450472, "carriers": [ "AA", "UA", "DL", "BA" ] } … [ { "_id": { "$oid": "5383623b7bfb8767e2e9ca1f" }, "iata": "00M", "airport": "Thigpen ", "city": "Bay Springs", "state": "MS", "country": "USA", "lat": 31.95376472, "long": -89.23450472, "carriers": "AA" }, { "_id": { "$oid": "542217ffc026b858b47a6640" }, "iata": "00M", "airport": "Thigpen ", "city": "Bay Springs", "state": "MS", "country": "USA", "lat": 31.95376472, "long": -89.23450472, "carriers": "UA" } … ] [ { "_id": { "carriers": "BA" }, "count": 10 }, { "_id": { "carriers": "DL" }, "count": 10 } … ] airports2 $unwind $group
  • 20. Hub airports – try 3 { $group: { _id: { ORIGIN: "$ORIGIN", CARRIER: "$CARRIER" }, count: { $sum: 1 } } }, { $project: { airport: "$_id.ORIGIN", carrier: "$_id.CARRIER", "count": 1 } }, { $match: { "count": { $gte: "$$hub_threshold" } } }, { $group: { _id: { airport: "$airport" }, airlines: { $sum: 1 }, flights: { $sum: "$count" }, avg_airline: { $avg: "$count" }, max_airline: { $max: "$count" } } }, { $project: { "airlines": 1, "flights": 1, "avg_airline": 1, "max_airline": 1, "avg_no_max": { $divide: [ { $subtract: [ "$flights", "$max_airline" ] }, "$airlines" ] } } }, { $sort: { "flights": -1 } }
  • 22. From-to Insensitive { $group: { _id: { UNIQUE_CARRIER: "$UNIQUE_CARRIER", ORIGIN: "$ORIGIN", DEST: "$DEST" }, count: { $sum: 1 } } }, { $match: { "count": { $gt: "$$count_threshold" } } }, { $project: { _id_UNIQUE_CARRIER: "$_id.UNIQUE_CARRIER", "count": 1, rroute: { $cond: [ { $lt: [ { $cmp: [ "$_id.ORIGIN", "$_id.DEST" ] }, 0 ] }, { $concat: [ "$_id.ORIGIN", "$_id.DEST" ] }, { $concat: [ "$_id.DEST", "$_id.ORIGIN" ] } ] } } }, { $group: { _id: { _id_UNIQUE_CARRIER: "$_id_UNIQUE_CARRIER", rroute: "$rroute" }, _sum_count: { $sum: "$count" } } }
  • 23. Hub visualization (using routes – from/to, $$count=1, origin treemap)
  • 24. Using “R” for Advanced Analytics • Using a MongoDB driver for “R” • Using the JSON Studio Gateway (including using aggregation output) install.packages("jSonarR") library(’jSonarR') con2 <- sonarR::new.SonarConnection('https://localhost:8443', 'localhost', 'flights', port=47017, username="ron", pwd=”<pwd>”) nyc_by_day <- sonarR::sonarAgg(con2, 'delays_by_day', 'NYCFlights', colClasses=c(X_avg_AirTime='numeric', X_avg_ArrDelay='numeric',X_avg_DepDelay='numeric')) lm.out = lm(nyc_by_day$X_sum_ArrDelay ~ nyc_by_day$X_sum_AirTime) MongoDB
  • 26. NYC Flights – Quiz Questions • Of the three airports, who has the most flights? – Nyc1 • Who has the most cancellations and highest cancellation ratio? – Nyc2 • Taxi in/out times? – Nyc3 • What about delays? – Nyc4 • How do delays differ by month? – Nyc5 + nyc5 – (summer vs. winter / bubble size vs. y-axis) • What about weather delays only? Which months are worse? Are the three airports equivalent? – Nyc7 + nyc7 • Where can I fly to if I work for Boeing and am very loyal (and on which aicraft)? – Nyc8 + map
  • 27. www.jsonstudio.com (download – presentation and eval copy) Discount code: MUGTX* (* Good for 1 month after event) ron@jsonar.com