SlideShare une entreprise Scribd logo
1  sur  6
Analyzing Air Quality Measurements in Macedonia
with Apache Drill
Author: Marjan Sterjev
Apache Drill (https://drill.apache.org/) is schema free SQL engine for analyzing Big data coming from
disparate data sources having various data formats. Drill can query data stored in HBase, Hive,
HDFS, S3, MongoDB etc.
Its engine is especially powerful for analyzing JSON data records:
https://drill.apache.org/docs/json-data-model/
One of the key constructs when dealing with JSON records are the functions KVGEN and FLATTEN
that are described in details in the link above. Take a deeper look for details.
The text in this article provides a sketch for procedure consisting of collecting publicly available air
quality measurement data and analyzing that data with Drill.
In particular, the air quality measurement data for Macedonia is available at the following address:
http://airquality.moepp.gov.mk
The data is available for various periods, regions and stations:
http://airquality.moepp.gov.mk/?page_id=4
Collect air quality measurement data
We will collect PM10 related air quality data for a period of one week originating from the air quality
measure stations in the Western Region (Bitola1, Bitola2, Lazaropole, Kicevo, Tetovo), Eastern
Region (Veles1, Veles2, Kocani, Kavadarci, Kumanovo) and 3 stations in Skopje (Center, Karpos and
Lisice). You can obtain the data by copying it from your browser's plugins (Firebug for example) or you
can use curl:
curl -o air_measurement_east.json
"http://airquality.moepp.gov.mk/graphs/site/pages/MakeGraph.php?
graph=StationLineGraph&station=EasternRegion&parameter=PM10&endDate=2015-12-
04&timeMode=Week"
curl -o air_measurement_west.json
"http://airquality.moepp.gov.mk/graphs/site/pages/MakeGraph.php?
graph=StationLineGraph&station=WesternRegion&parameter=PM10&endDate=2015-12-
04&timeMode=Week"
curl -o air_measurement_centar.json
"http://airquality.moepp.gov.mk/graphs/site/pages/MakeGraph.php?
graph=StationLineGraph&station=Centar&parameter=PM10&endDate=2015-12-04&timeMode=Week"
curl -o air_measurement_karpos.json
"http://airquality.moepp.gov.mk/graphs/site/pages/MakeGraph.php?
graph=StationLineGraph&station=Karpos&parameter=PM10&endDate=2015-12-04&timeMode=Week"
curl -o air_measurement_lisice.json
1
"http://airquality.moepp.gov.mk/graphs/site/pages/MakeGraph.php?
graph=StationLineGraph&station=Lisice&parameter=PM10&endDate=2015-12-04&timeMode=Week"
The data retrieved has the following format:
{"parameter":"PM10","measurements":{
"20151128 11":
{"Bitola1":"13.26","Bitola2":"47.42","Kicevo":"31.52","Lazaropole":"","Tetovo":"106.59"},
"20151128 12":
{"Bitola1":"8.42","Bitola2":"47.12","Kicevo":"45.28","Lazaropole":"","Tetovo":"106.59"},
If we observe the data format we can see that:
1. The structure is nested: particular measurements are located at the third level of the JSON
structure
2. The field names correspond to the station names, i.e. there is no schema in the structure
We need to create a view for this data that will allow us to place standard SQL queries for data
analysis.
Download and install Apache Drill
Download Drill from https://drill.apache.org/. Just unzip the bundle in the folder of your choice.
Start Apache Drill
If your are Windows user, navigate to the bin directory located in the Drill installation folder and start
the engine in embedded mode:
sqlline -u "jdbc:drill:zk=local"
Linux users can run the command:
drill-embedded.sh
Once Drill is started, you can access its WEB console at:
http://localhost:8047
Create Views
Navigate to the Query tab in the Drill UI:
http://localhost:8047/query
For each of the air quality data files collected:
• air_measurement_west.json
• air_measurement_east.json
• air_measurement_centar.json,
• air_measurement_karpos.json
2
• air_measurement_lisice.json
we will create corresponding view:
CREATE OR REPLACE VIEW
dfs.tmp.air_measurement_<<replace_it>>
AS
SELECT
TO_TIMESTAMP(dmt1.date_hour,'YYYYMMdd HH') AS `timestamp`,
dmt1.station_measurement.key AS station,
CAST(CONCAT('0',dmt1.station_measurement.`value`) AS FLOAT) AS measure
FROM
(
SELECT
dmt.dm.key AS date_hour,
FLATTEN(KVGEN(dmt.dm.`value`)) AS `station_measurement`
FROM
(
SELECT FLATTEN(KVGEN(aq.measurements)) dm FROM
dfs.`C:/ml/air_measurement_<<replace_it>>.json` aq
) dmt
)dmt1
Once the ingredient views are created, we will create the final union view, that sublimates all of the
data:
CREATE OR REPLACE VIEW
dfs.tmp.air_measurement
AS
SELECT * FROM dfs.tmp.air_measurement_west
UNION ALL
SELECT * FROM dfs.tmp.air_measurement_east
UNION ALL
SELECT * FROM dfs.tmp.air_measurement_centar
UNION ALL
SELECT * FROM dfs.tmp.air_measurement_karpos
UNION ALL
SELECT * FROM dfs.tmp.air_measurement_lisice
Note that the created views are persistent and they will survive Apache Drill restarts.
Analyze Data
With the final view created, we have the full SQL tool set available for air quality measurement data
analysis. For example, we can group the data per station and find the average measurement for the
data collected:
SELECT
station, AVG(measure) as avg_measure
FROM
dfs.tmp.air_measurement
GROUP BY
station
ORDER BY
avg_measure DESC
3
The result is:
Table 1. Average air quality measurement per station
We can query and filter temporal data as well.
The average air quality measurement early in the morning is:
SELECT
station, AVG(measure) avg_measure
FROM
dfs.tmp.air_measurement
WHERE
EXTRACT(hour FROM `timestamp`) <8
GROUP BY
station
ORDER BY
avg_measure DESC
4
Table 2. Average air quality measurement per station early in the morning
The average air quality measurement in the evening is:
SELECT
station, AVG(measure) avg_measure
FROM
dfs.tmp.air_measurement
WHERE
EXTRACT(hour FROM `timestamp`) >=18
GROUP BY
station
ORDER BY
avg_measure DESC
5
Table 3. Average air quality measurement per station in the evening
For example, we can conclude that the air quality in Bitola degrades in the evenings compared with its
morning siblings.
The data sets used in this “toy” demonstration were small. However, Drill is designed to work with
very large data sets and you can apply your existing SQL knowledge on those large data sets as well.
6

Contenu connexe

Tendances

Call report from x++
Call report from x++Call report from x++
Call report from x++Ahmed Farag
 
Import OData to SQL Table
Import OData to SQL TableImport OData to SQL Table
Import OData to SQL TableSean Cleaver
 
Sensitivity of AERMOD to AERMINUTE-Generated Meteorology
Sensitivity of AERMOD to AERMINUTE-Generated MeteorologySensitivity of AERMOD to AERMINUTE-Generated Meteorology
Sensitivity of AERMOD to AERMINUTE-Generated MeteorologyBREEZE Software
 
Generating and Using Meteorological Data in AERMOD
Generating and Using Meteorological Data in AERMOD Generating and Using Meteorological Data in AERMOD
Generating and Using Meteorological Data in AERMOD BREEZE Software
 
China's natural gas power plants
China's natural gas power plantsChina's natural gas power plants
China's natural gas power plantsarapublication
 
SAS writing example
SAS writing exampleSAS writing example
SAS writing exampleTianyue Wang
 
Experiment no 05
Experiment no 05Experiment no 05
Experiment no 05Ankit Dubey
 
An analytical advantage - using data to reduce lap times
An analytical advantage - using data to reduce lap timesAn analytical advantage - using data to reduce lap times
An analytical advantage - using data to reduce lap timesDavid Carson
 
Instrucciones para el reporte
Instrucciones para el reporteInstrucciones para el reporte
Instrucciones para el reporteCESAR VALLEJOS
 
Counters for real-time statistics
Counters for real-time statisticsCounters for real-time statistics
Counters for real-time statisticsEdward Capriolo
 
SSN-TC workshop talk at ISWC 2015 on Emrooz
SSN-TC workshop talk at ISWC 2015 on EmroozSSN-TC workshop talk at ISWC 2015 on Emrooz
SSN-TC workshop talk at ISWC 2015 on EmroozMarkus Stocker
 

Tendances (20)

Call report from x++
Call report from x++Call report from x++
Call report from x++
 
Import OData to SQL Table
Import OData to SQL TableImport OData to SQL Table
Import OData to SQL Table
 
Sensitivity of AERMOD to AERMINUTE-Generated Meteorology
Sensitivity of AERMOD to AERMINUTE-Generated MeteorologySensitivity of AERMOD to AERMINUTE-Generated Meteorology
Sensitivity of AERMOD to AERMINUTE-Generated Meteorology
 
Generating and Using Meteorological Data in AERMOD
Generating and Using Meteorological Data in AERMOD Generating and Using Meteorological Data in AERMOD
Generating and Using Meteorological Data in AERMOD
 
China's natural gas power plants
China's natural gas power plantsChina's natural gas power plants
China's natural gas power plants
 
Ariane5
Ariane5Ariane5
Ariane5
 
Team3 presentation
Team3 presentationTeam3 presentation
Team3 presentation
 
SAS writing example
SAS writing exampleSAS writing example
SAS writing example
 
Experiment no 05
Experiment no 05Experiment no 05
Experiment no 05
 
An analytical advantage - using data to reduce lap times
An analytical advantage - using data to reduce lap timesAn analytical advantage - using data to reduce lap times
An analytical advantage - using data to reduce lap times
 
Horizons doc
Horizons docHorizons doc
Horizons doc
 
hecht_resume_2017
hecht_resume_2017hecht_resume_2017
hecht_resume_2017
 
Instrucciones para el reporte
Instrucciones para el reporteInstrucciones para el reporte
Instrucciones para el reporte
 
FlightDelayAnalysis
FlightDelayAnalysisFlightDelayAnalysis
FlightDelayAnalysis
 
Lab report 10 hydrology
Lab report 10 hydrologyLab report 10 hydrology
Lab report 10 hydrology
 
VentoPortieMare FOSS4G-EU
VentoPortieMare FOSS4G-EUVentoPortieMare FOSS4G-EU
VentoPortieMare FOSS4G-EU
 
Calendar class in java
Calendar class in javaCalendar class in java
Calendar class in java
 
Counters for real-time statistics
Counters for real-time statisticsCounters for real-time statistics
Counters for real-time statistics
 
SSN-TC workshop talk at ISWC 2015 on Emrooz
SSN-TC workshop talk at ISWC 2015 on EmroozSSN-TC workshop talk at ISWC 2015 on Emrooz
SSN-TC workshop talk at ISWC 2015 on Emrooz
 
Gregorian calendar class
Gregorian calendar classGregorian calendar class
Gregorian calendar class
 

Similaire à Analyzing Air Quality Measurements in Macedonia with Apache Drill

Introduction to SQL Report tool
Introduction to SQL Report toolIntroduction to SQL Report tool
Introduction to SQL Report toolRussell Frearson
 
Predicting landing distance: Adrian Valles
Predicting landing distance: Adrian VallesPredicting landing distance: Adrian Valles
Predicting landing distance: Adrian VallesAdrián Vallés
 
maXbox Starter 40 REST API Coding
maXbox Starter 40 REST API CodingmaXbox Starter 40 REST API Coding
maXbox Starter 40 REST API CodingMax Kleiner
 
COMP41680 - Sample API Assignment¶In [5] .docx
COMP41680 - Sample API Assignment¶In [5] .docxCOMP41680 - Sample API Assignment¶In [5] .docx
COMP41680 - Sample API Assignment¶In [5] .docxpickersgillkayne
 
Everything That Is Really Useful in Oracle Database 12c for Application Devel...
Everything That Is Really Useful in Oracle Database 12c for Application Devel...Everything That Is Really Useful in Oracle Database 12c for Application Devel...
Everything That Is Really Useful in Oracle Database 12c for Application Devel...Lucas Jellema
 
Predicting flight cancellation likelihood
Predicting flight cancellation likelihoodPredicting flight cancellation likelihood
Predicting flight cancellation likelihoodAashish Jain
 
adaptTo() 2014 - Integrating Open Source Search with CQ/AEM
adaptTo() 2014 - Integrating Open Source Search with CQ/AEMadaptTo() 2014 - Integrating Open Source Search with CQ/AEM
adaptTo() 2014 - Integrating Open Source Search with CQ/AEMtherealgaston
 
Real Time Connected Vehicle Networking with HDInsight and Apache Storm
Real Time Connected Vehicle Networking with HDInsight and Apache StormReal Time Connected Vehicle Networking with HDInsight and Apache Storm
Real Time Connected Vehicle Networking with HDInsight and Apache StormOur Community Exchange LLC
 
AGU_Iguassu_Brazil_AUG
AGU_Iguassu_Brazil_AUGAGU_Iguassu_Brazil_AUG
AGU_Iguassu_Brazil_AUGJordan Alpert
 
Pumps, Compressors and Turbine Fault Frequency Analysis
Pumps, Compressors and Turbine Fault Frequency AnalysisPumps, Compressors and Turbine Fault Frequency Analysis
Pumps, Compressors and Turbine Fault Frequency AnalysisUniversity of Illinois,Chicago
 
Pumps, Compressors and Turbine Fault Frequency Analysis
Pumps, Compressors and Turbine Fault Frequency AnalysisPumps, Compressors and Turbine Fault Frequency Analysis
Pumps, Compressors and Turbine Fault Frequency AnalysisUniversity of Illinois,Chicago
 
Apache Lens at Hadoop meetup
Apache Lens at Hadoop meetupApache Lens at Hadoop meetup
Apache Lens at Hadoop meetupamarsri
 
FAA Flight Landing Distance Forecasting and Analysis
FAA Flight Landing Distance Forecasting and AnalysisFAA Flight Landing Distance Forecasting and Analysis
FAA Flight Landing Distance Forecasting and AnalysisQuynh Tran
 
060128 Galeon Rept
060128 Galeon Rept060128 Galeon Rept
060128 Galeon ReptRudolf Husar
 
Overview of query evaluation
Overview of query evaluationOverview of query evaluation
Overview of query evaluationavniS
 
MMYERS Portfolio
MMYERS PortfolioMMYERS Portfolio
MMYERS PortfolioMike Myers
 
MongoDB for Time Series Data: Analyzing Time Series Data Using the Aggregatio...
MongoDB for Time Series Data: Analyzing Time Series Data Using the Aggregatio...MongoDB for Time Series Data: Analyzing Time Series Data Using the Aggregatio...
MongoDB for Time Series Data: Analyzing Time Series Data Using the Aggregatio...MongoDB
 
Reporting Summary Information of Spatial Datasets and Non-Compliance Issues U...
Reporting Summary Information of Spatial Datasets and Non-Compliance Issues U...Reporting Summary Information of Spatial Datasets and Non-Compliance Issues U...
Reporting Summary Information of Spatial Datasets and Non-Compliance Issues U...Safe Software
 

Similaire à Analyzing Air Quality Measurements in Macedonia with Apache Drill (20)

Introduction to SQL Report tool
Introduction to SQL Report toolIntroduction to SQL Report tool
Introduction to SQL Report tool
 
Predicting landing distance: Adrian Valles
Predicting landing distance: Adrian VallesPredicting landing distance: Adrian Valles
Predicting landing distance: Adrian Valles
 
maXbox Starter 40 REST API Coding
maXbox Starter 40 REST API CodingmaXbox Starter 40 REST API Coding
maXbox Starter 40 REST API Coding
 
COMP41680 - Sample API Assignment¶In [5] .docx
COMP41680 - Sample API Assignment¶In [5] .docxCOMP41680 - Sample API Assignment¶In [5] .docx
COMP41680 - Sample API Assignment¶In [5] .docx
 
Everything That Is Really Useful in Oracle Database 12c for Application Devel...
Everything That Is Really Useful in Oracle Database 12c for Application Devel...Everything That Is Really Useful in Oracle Database 12c for Application Devel...
Everything That Is Really Useful in Oracle Database 12c for Application Devel...
 
Predicting flight cancellation likelihood
Predicting flight cancellation likelihoodPredicting flight cancellation likelihood
Predicting flight cancellation likelihood
 
adaptTo() 2014 - Integrating Open Source Search with CQ/AEM
adaptTo() 2014 - Integrating Open Source Search with CQ/AEMadaptTo() 2014 - Integrating Open Source Search with CQ/AEM
adaptTo() 2014 - Integrating Open Source Search with CQ/AEM
 
Real Time Connected Vehicle Networking with HDInsight and Apache Storm
Real Time Connected Vehicle Networking with HDInsight and Apache StormReal Time Connected Vehicle Networking with HDInsight and Apache Storm
Real Time Connected Vehicle Networking with HDInsight and Apache Storm
 
AGU_Iguassu_Brazil_AUG
AGU_Iguassu_Brazil_AUGAGU_Iguassu_Brazil_AUG
AGU_Iguassu_Brazil_AUG
 
Pumps, Compressors and Turbine Fault Frequency Analysis
Pumps, Compressors and Turbine Fault Frequency AnalysisPumps, Compressors and Turbine Fault Frequency Analysis
Pumps, Compressors and Turbine Fault Frequency Analysis
 
Pumps, Compressors and Turbine Fault Frequency Analysis
Pumps, Compressors and Turbine Fault Frequency AnalysisPumps, Compressors and Turbine Fault Frequency Analysis
Pumps, Compressors and Turbine Fault Frequency Analysis
 
Apache Lens at Hadoop meetup
Apache Lens at Hadoop meetupApache Lens at Hadoop meetup
Apache Lens at Hadoop meetup
 
FAA Flight Landing Distance Forecasting and Analysis
FAA Flight Landing Distance Forecasting and AnalysisFAA Flight Landing Distance Forecasting and Analysis
FAA Flight Landing Distance Forecasting and Analysis
 
060128 Galeon Rept
060128 Galeon Rept060128 Galeon Rept
060128 Galeon Rept
 
Overview of query evaluation
Overview of query evaluationOverview of query evaluation
Overview of query evaluation
 
Unit 2 part-2
Unit 2 part-2Unit 2 part-2
Unit 2 part-2
 
MMYERS Portfolio
MMYERS PortfolioMMYERS Portfolio
MMYERS Portfolio
 
CMPE275-Project1Report
CMPE275-Project1ReportCMPE275-Project1Report
CMPE275-Project1Report
 
MongoDB for Time Series Data: Analyzing Time Series Data Using the Aggregatio...
MongoDB for Time Series Data: Analyzing Time Series Data Using the Aggregatio...MongoDB for Time Series Data: Analyzing Time Series Data Using the Aggregatio...
MongoDB for Time Series Data: Analyzing Time Series Data Using the Aggregatio...
 
Reporting Summary Information of Spatial Datasets and Non-Compliance Issues U...
Reporting Summary Information of Spatial Datasets and Non-Compliance Issues U...Reporting Summary Information of Spatial Datasets and Non-Compliance Issues U...
Reporting Summary Information of Spatial Datasets and Non-Compliance Issues U...
 

Dernier

Call Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night StandCall Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...amitlee9823
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...amitlee9823
 
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...Elaine Werffeli
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Researchmichael115558
 
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...amitlee9823
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...amitlee9823
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...amitlee9823
 
Detecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning ApproachDetecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning ApproachBoston Institute of Analytics
 
Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...gajnagarg
 
Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...
Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...
Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...gajnagarg
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...amitlee9823
 
Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...gajnagarg
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNKTimothy Spann
 
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraGovindSinghDasila
 

Dernier (20)

Call Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night StandCall Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night Stand
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
 
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
 
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
 
Detecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning ApproachDetecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning Approach
 
Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...
 
Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...
Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...
Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
 
Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
 
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - Almora
 

Analyzing Air Quality Measurements in Macedonia with Apache Drill

  • 1. Analyzing Air Quality Measurements in Macedonia with Apache Drill Author: Marjan Sterjev Apache Drill (https://drill.apache.org/) is schema free SQL engine for analyzing Big data coming from disparate data sources having various data formats. Drill can query data stored in HBase, Hive, HDFS, S3, MongoDB etc. Its engine is especially powerful for analyzing JSON data records: https://drill.apache.org/docs/json-data-model/ One of the key constructs when dealing with JSON records are the functions KVGEN and FLATTEN that are described in details in the link above. Take a deeper look for details. The text in this article provides a sketch for procedure consisting of collecting publicly available air quality measurement data and analyzing that data with Drill. In particular, the air quality measurement data for Macedonia is available at the following address: http://airquality.moepp.gov.mk The data is available for various periods, regions and stations: http://airquality.moepp.gov.mk/?page_id=4 Collect air quality measurement data We will collect PM10 related air quality data for a period of one week originating from the air quality measure stations in the Western Region (Bitola1, Bitola2, Lazaropole, Kicevo, Tetovo), Eastern Region (Veles1, Veles2, Kocani, Kavadarci, Kumanovo) and 3 stations in Skopje (Center, Karpos and Lisice). You can obtain the data by copying it from your browser's plugins (Firebug for example) or you can use curl: curl -o air_measurement_east.json "http://airquality.moepp.gov.mk/graphs/site/pages/MakeGraph.php? graph=StationLineGraph&station=EasternRegion&parameter=PM10&endDate=2015-12- 04&timeMode=Week" curl -o air_measurement_west.json "http://airquality.moepp.gov.mk/graphs/site/pages/MakeGraph.php? graph=StationLineGraph&station=WesternRegion&parameter=PM10&endDate=2015-12- 04&timeMode=Week" curl -o air_measurement_centar.json "http://airquality.moepp.gov.mk/graphs/site/pages/MakeGraph.php? graph=StationLineGraph&station=Centar&parameter=PM10&endDate=2015-12-04&timeMode=Week" curl -o air_measurement_karpos.json "http://airquality.moepp.gov.mk/graphs/site/pages/MakeGraph.php? graph=StationLineGraph&station=Karpos&parameter=PM10&endDate=2015-12-04&timeMode=Week" curl -o air_measurement_lisice.json 1
  • 2. "http://airquality.moepp.gov.mk/graphs/site/pages/MakeGraph.php? graph=StationLineGraph&station=Lisice&parameter=PM10&endDate=2015-12-04&timeMode=Week" The data retrieved has the following format: {"parameter":"PM10","measurements":{ "20151128 11": {"Bitola1":"13.26","Bitola2":"47.42","Kicevo":"31.52","Lazaropole":"","Tetovo":"106.59"}, "20151128 12": {"Bitola1":"8.42","Bitola2":"47.12","Kicevo":"45.28","Lazaropole":"","Tetovo":"106.59"}, If we observe the data format we can see that: 1. The structure is nested: particular measurements are located at the third level of the JSON structure 2. The field names correspond to the station names, i.e. there is no schema in the structure We need to create a view for this data that will allow us to place standard SQL queries for data analysis. Download and install Apache Drill Download Drill from https://drill.apache.org/. Just unzip the bundle in the folder of your choice. Start Apache Drill If your are Windows user, navigate to the bin directory located in the Drill installation folder and start the engine in embedded mode: sqlline -u "jdbc:drill:zk=local" Linux users can run the command: drill-embedded.sh Once Drill is started, you can access its WEB console at: http://localhost:8047 Create Views Navigate to the Query tab in the Drill UI: http://localhost:8047/query For each of the air quality data files collected: • air_measurement_west.json • air_measurement_east.json • air_measurement_centar.json, • air_measurement_karpos.json 2
  • 3. • air_measurement_lisice.json we will create corresponding view: CREATE OR REPLACE VIEW dfs.tmp.air_measurement_<<replace_it>> AS SELECT TO_TIMESTAMP(dmt1.date_hour,'YYYYMMdd HH') AS `timestamp`, dmt1.station_measurement.key AS station, CAST(CONCAT('0',dmt1.station_measurement.`value`) AS FLOAT) AS measure FROM ( SELECT dmt.dm.key AS date_hour, FLATTEN(KVGEN(dmt.dm.`value`)) AS `station_measurement` FROM ( SELECT FLATTEN(KVGEN(aq.measurements)) dm FROM dfs.`C:/ml/air_measurement_<<replace_it>>.json` aq ) dmt )dmt1 Once the ingredient views are created, we will create the final union view, that sublimates all of the data: CREATE OR REPLACE VIEW dfs.tmp.air_measurement AS SELECT * FROM dfs.tmp.air_measurement_west UNION ALL SELECT * FROM dfs.tmp.air_measurement_east UNION ALL SELECT * FROM dfs.tmp.air_measurement_centar UNION ALL SELECT * FROM dfs.tmp.air_measurement_karpos UNION ALL SELECT * FROM dfs.tmp.air_measurement_lisice Note that the created views are persistent and they will survive Apache Drill restarts. Analyze Data With the final view created, we have the full SQL tool set available for air quality measurement data analysis. For example, we can group the data per station and find the average measurement for the data collected: SELECT station, AVG(measure) as avg_measure FROM dfs.tmp.air_measurement GROUP BY station ORDER BY avg_measure DESC 3
  • 4. The result is: Table 1. Average air quality measurement per station We can query and filter temporal data as well. The average air quality measurement early in the morning is: SELECT station, AVG(measure) avg_measure FROM dfs.tmp.air_measurement WHERE EXTRACT(hour FROM `timestamp`) <8 GROUP BY station ORDER BY avg_measure DESC 4
  • 5. Table 2. Average air quality measurement per station early in the morning The average air quality measurement in the evening is: SELECT station, AVG(measure) avg_measure FROM dfs.tmp.air_measurement WHERE EXTRACT(hour FROM `timestamp`) >=18 GROUP BY station ORDER BY avg_measure DESC 5
  • 6. Table 3. Average air quality measurement per station in the evening For example, we can conclude that the air quality in Bitola degrades in the evenings compared with its morning siblings. The data sets used in this “toy” demonstration were small. However, Drill is designed to work with very large data sets and you can apply your existing SQL knowledge on those large data sets as well. 6