DevEX - reference for building teams, processes, and platforms
NoizCrowd: A Crowd-Based Data Gathering and Management System for Noise Level Data
1. NoizCrowd:
A Crowd-Based Data Gathering and
Management System for Noise Level Data
Mariusz Wisniewski, Gianluca
Demartini, Apostolos Malatras, and
Philippe Cudré-Mauroux
University of Fribourg, Switzerland
2. Motivation - Big Data
• Large dataset are necessary to enable analytics
and support decision making
– Meteorological station / car traffic
• Set up a large-scale sensing infrastructure is
costly and time-consuming
• Create a large amount of valuable data
– Crowdsourcing
– Data generation models
– Smartphones as sensors
– Big Data analytics
Gianluca Demartini 2
3. NoizCrowd
• A crowd-sensing approach to big data generation
using commodity sensors
• Crowd-source noise level in a geo region
• Noise propagation models to generate data
• Array data management techniques to scale
• Results accessible via a visual interface
• Support decisions (e.g., where to live)
Gianluca Demartini 3
4. Outline
• Related approaches
• NoizCrowd Architecture Overview
– Data Gathering
– Storage
– Modeling
– Export and Visualization
• Data Models
• Performance Evaluation
Gianluca Demartini 4
5. Related Work
• Participatory Sensing vs Sensor Networks
– Low cost / High cost
– Mobile phones / Sensors
– Distributed / Centralized management
– Privacy, data quality
• Applications: Environment, vehicle routing
Gianluca Demartini 5
6. Related Work
• Noise Mapping Apps
– NoiseTube: opensource, widespread usage
– NoiseMap: control over data
– SoundSense: machine learning to classify sounds
• NoizCrowd
– Data in RDF linkable to other datasets
(linkeddata.org)
– Scalable storage: generate data by interpolation
Gianluca Demartini 6
8. Data Gathering
• By means of Crowd-sourcing
– GPS: location
– Microphone: noise level
– Internet connection: send data to server
• Microphone Calibration
– Sound level meter
– Sharing conversion table for smartphone models
Gianluca Demartini 8
9. Data Storage
• App sends median and peak dB values over
few seconds
• Spatio-temporal data: non-relational storage
system (SciDB)
– Durable storage
– Retrieve data to build models
– Export data for visualization
• Multi-dimensional array (space and time)
• Distributed storage
Gianluca Demartini 9
10. Noise Modeling
• Data from crowd is noisy and skewed/sparse
• Raw data is not shown to the end users
• Models to deal with
– Overlapping data
– Missing data
Gianluca Demartini 10
11. Data Export and Visualization
• From SciDB data is
– converted to RDF
– stored in dipLODocus[RDF]
– Available via SPARQL
• Visualization
– Overlay noise level on a map
– Additional chart for time evolution
Gianluca Demartini 11
13. Data Models
• Spatial Interpolation
– In the same time interval, data from different
locations
– Need to be computational simple (large volume)
– Bi-dimensional range queries in space (SciDB)
– K-nearest neighbor interpolation
– Computed in parallel
Gianluca Demartini
14. Data Models
• Temporal interpolation
– Short ranges (minutes) like spatial interp. in 3D
– Long ranges, look for patterns and infer
• E.g., every Monday at 11am we have 50dB and we miss
a Monday measurement
• E.g., same measurement (50dB) in same area 2h ago
and now
Gianluca Demartini 14
15. Noise Propagation Models
• We adopt an existing model that takes into
account:
– Sound power
– Distance from source
– Directivity
– Atmospheric absorption
– Excess attenuation (we use meteo conditions)
• Difficult to measure with smartphone
• Constant in a given region (and use GPS info)
Gianluca Demartini 15
16. Materialization of Models
• Data from models
– Is computationally expensive to generate
– May be a lot since we can cover any region
• We do late materialization
– At query time
– Only for the specific request
– Cached and indexed for future requests
– Incremental updates of views, if possible
Gianluca Demartini 16
17. Performance Evaluation (1)
• 30 outdoor deployments
– 2,3,4 smartphones
– Multiple noise sources
– Urban setting, flat area of 50x50 meters
• Professional-grade noise level meter as gold
standard measurement
• 85% of interpolated data +-6dB error
• 63% of interpolated data +-4dB error
Gianluca Demartini 17
19. Performance Evaluation (3)
• Sound level of source error
– 16% with 3 measurements
– 10% with 4 measurements
– 9% with 5 measurements
• Source location
– 3m error on average
Gianluca Demartini 19
20. NoizCrowd - Conclusions
• Large scale data is key for decision making
• Crowd-source noise level data using mobiles
– Scale-out using an array backend
– Generate missing data and visualize
• Next steps
– Android app
– Data recording as background feature
– Additional materialization strategies
http://exascale.info
Gianluca Demartini 20