SlideShare une entreprise Scribd logo
1  sur  94
1 
Intelligent Data Processing for the Internet of 
Things 
Payam Barnaghi 
Institute for Communication Systems (ICS) 
University of Surrey 
Guildford, United Kingdom 
International “IoT 360″ Summer School 
October 29th – November 1st, 2014 – Rome, Italy
2 
Key characteristics of IoT devices 
−Often inexpensive sensors (actuators) equipped with a radio 
transceiver for various applications, typically low data rate ~ 
10-250 kbps. 
−Deployed in large numbers 
−The sensors should coordinate to perform the desired task. 
−The acquired information (periodic or event-based) is 
reported back to the information processing centre (or some 
cases in-network processing is required) 
−Solutions are often application-dependent. 
2
3 
Beyond conventional sensors 
− Human as a sensor (citizen sensors) 
− e.g. tweeting real world data and/or events 
− Software sensors 
− e.g. Software agents/services generating/representing 
data 
Road block, A3 
Road block, A3 
Suggest a different route
Internet of Things: The story so far 
P. Barnaghi, A. Sheth, “The Internet of Things: The story so far”, IEEE IoT Newsletter, September 2014.
5 
The benefits of data processing in IoT 
− Turn 12 terabytes of Tweets created each day into sentiment 
analysis related to different events/occurrences or relate them to 
products and services. 
− Convert (billions of) smart meter readings to better predict and 
balance power consumption. 
− Analyze thousands of traffic, pollution, weather, congestion, public 
transport and event sensory data to provide better traffic and 
smart city management. 
− Monitor patients, elderly care and much more… 
− Requires: real-time, reliable, efficient (for low power and resource 
limited nodes), and scalable solutions. 
Partially adapted from: What is Bog Data?, IBM
Not just Volume… 
… but also Data Dynamicity: 
How can we efficiently deal with: 
- Large amounts of (heterogeneous/distributed) data? 
- Both static and dynamic data? 
- In a re-usable, modular, flexible way? 
- Integrate different types of data 
- Provide hypothesis and create more context-aware solutions 
Adapted from: M. Hauswirth. A. Mileo, Insight, National University of Ireland, Galway.
Data Volume 
AnyThing 
AnyPlace AnyTime 
Security, Reliability, 
Trust and Privacy 
Societal Impacts, Economic Values 
and Viability 
Services and Applications 
Networking and 
Communication
Data is not what we want or is it?
What we need are insights 
and actionable-information
Diffusion of innovation 
image source: Wikipedia 
IoT
Problem #1 
Data: We seem to have lots of it… 
Real World Data: it is always difficult to get 
(silos, format, privacy, business interests or 
lack of interest!...)
Problem #2 
Data: interoperability and metadata 
frameworks… 
Real World Data: there are solutions for 
service based (RESTful) access, meta-data/ 
semantic representation frameworks 
(e.g. W3C SSN, HyperCat,…) but none of 
them are still widely adapted.
Problem #3 
Data: quality, reliability… 
Real World Data: data can be noisy, crowed 
source data can be inaccurate, 
contradictory, delay in accessing/processing 
the data…
Problem #4 
Data: having too much data and using 
analytics tools alone won’t solve the 
problem… 
Real World Data: in addition to the HPC 
issues, we need new methods/solutions that 
can provide real-time analysis of dynamic, 
variable quality and multi-modal streams…
Problem #5 
Data: abstraction, discovering the 
associations… 
Real World Data: co-occurrence vs. 
causation; we need hypothesis, background 
knowledge,… 
After all data is not what we are really 
after…
We need more linked open data
Sometimes it’s even better if we have: 
(near) real-time 
linked open data 
Streams
or even better than that if we have: 
(near) real-time 
linked open data 
Streams 
+ 
meta-data (semantic annotations) 
+ 
Adaptable and scalable analytics tools 
+ 
Sufficient background knowledge
IoT Data 
19 
?
Current focus on Big Data 
− Emphasis on power of data and data mining 
solutions 
− Technology solutions to handle large volumes of 
data; e.g. Hadoop, NoSQL, Graph Databases, … 
− Trying to find patterns and trends from large 
volumes of data…
Myths About Big Data 
− Big Data is only about massive data volume 
− Big Data means Hadoop 
− Big Data means unstructured data 
− If we have enough data we can draw conclusions 
(enough here often means massive amounts) 
− NoSQL means No SQL 
− It is about increasing computational power and 
taking more data and running data mining 
algorithms. 
21 
Some of the items are adapted from: Brain Gentile, http://mashable.com/2012/06/19/big-data-myths/
What happens if we only focus on data 
− Number of burgers consumed per day. 
− Number of cats outside. 
− Number of people checking their facebook 
account. 
− What insight would you draw? 
22
It is also important to note what type of 
problems we expect to solve.
Back to the future 
24
Future cities: a view from 1998 
Source LAT Times, http://documents.latimes.com/la-2013/ 25
Source: http://robertluisrabello.com/denial/traffic-in-la/#gallery[default]/0/ 26 
Source: wikipedia
27
101 Smart City Use-case Scenarios 
http://www.ict-citypulse.eu/page/content/smart-city-use-cases-and-requirements
29 
Data alone is not enough 
− Domain knowledge 
− Machine interpretable meta-data 
− Delivery, sharing and representation services 
− Query, discovery, aggregation services 
− Publish, subscribe, notification, and access 
interfaces/services 
− More open solutions for innovation and citizen participation 
− Efficient feedback and control mechanisms 
− Social network and social system analysis 
− In cities, interactions with people and social systems is the 
key.
30 
We need an Integrated Approach
31 
Processing steps
32 
Storing, handling and processing 
the data 
Image courtesy: IEEE Spectrum
33 
Technical Challenges 
− Discovery: finding appropriate device and data sources 
− Access: Availability and (open) access to data resources 
and data 
− Search: querying for data 
− Integration: dealing with heterogeneous devices, networks 
and data 
− Large-scale data mining, adaptable learning and efficient 
computing and processing 
− Interpretation: translating data to knowledge that can be 
used by people and applications 
− Scalability: dealing with large numbers of devices and a 
myriad of data and the computational complexity of 
interpreting the data.
34 
IoT Data Access 
− Publish/Subscribe (long-term/short-term) 
− Ad-hoc query 
− The typical types of data request for sensory data: 
− Query based on 
− ID (resource/service) – for known resources 
− Location 
− Type 
− Time – requests for freshness data or historical data; 
− One of the above + a range [+ Unit of Measurement] 
− Type/Location/Time + A combination of Quality of Information 
attributes 
− An entity of interest (a feature of an entity on interest) 
− Complex Data Types (e.g. pollution data could be a combination of 
different types)
35 
IoT Data in the Cloud 
Image courtesy: http://images.mathrubhumi.com 
http://www.anacostiaws.org/userfiles/image/Blog-Photos/river2.jpg
Comparing IoT data streams with 
conventional multimedia streams 
Source: P. Barnaghi, W. Wang, L. Dong, C. Wang, "A Linked-data Model for Semantic Sensor Streams", in the Proc. of 
IEEE International Conference on Internet of Things (iThings 2013), August 2013.
37 
Describing IoT Data: An example 
Time 
Location 
Type 
Value 
Link to QoI metadata 
UTC 
#GeoHash 
#Hash 
[DataType, Value] 
URI 
Ontology for 
common 
types
38 
Observation and Measurement Value 
GeoHash 
UTC time 
Standard XSD data type 
UTC time (in Java) : The time indicated is returned represented as the distance, measured in milliseconds, 
of that time from the epoch (00:00:00 GMT on January 1, 1970).
39 
GeoHashing 
− For example Guildford: lat: 51.235401 and 
long: -0.574600 can be hashed as: gcpe6zjeffgp 
− It can be used as: 
− A unique identifier 
− represent point data as hash string 
− It uses Base 32 encoding and bit interleaving 
− It’s used for geo-tagging (and is a symmetric technique) 
− Place close to each other will have similar prefix (string similarity) 
− Limitations: 
− We could have Geohash codes with no common prefix 
− Edge case (locations close to each other but on opposite sides of the 
Equator) 
− A meridian point (line of longitude)
40 
GeoHash Example 
Sample locations on a Google Map and their 
equivalent geohash strings; 
- close locations have similar prefixes
IoT Data Processing 
WSN 
WSN 
WSN 
WSN 
WSN 
Network-enabled 
Devices 
Network-enabled 
Devices 
Network 
services/storage 
and processing 
units Data/service 
access at 
application level 
Data collections 
and processing 
within the 
networks 
Data Discovery 
Service/ 
Resource 
Discovery
42 
In-network processing 
− Mobile Ad-hoc Networks can be seen as a set of nodes that 
deliver bits from one end to the other; 
− WSNs, on the other end, are expected to provide 
information, not necessarily original bits 
− Gives additional options 
− e.g., manipulate or process the data in the network 
− Main example: aggregation 
− Applying aggregation functions to a obtain an average value of 
measurement data 
− Typical functions: minimum, maximum, average, sum, … 
− Not amenable functions: median 
Source: Protocols and Architectures for Wireless Sensor Networks, Protocols and Architectures for Wireless Sensor Networks 
Holger Karl, Andreas Willig, chapter 3, Wiley, 2005 .
43 
In-network processing 
− Depending on application, more sophisticated processing of 
data can take place within the network 
− Example edge detection: locally exchange raw data with 
neighboring nodes, compute edges, only communicate edge 
description to far away data sinks 
− Example tracking/angle detection of signal source: Conceive of 
sensor nodes as a distributed microphone array, use it to 
compute the angle of a single source, only communicate this 
angle, not all the raw data 
− Exploit temporal and spatial correlation 
− Observed signals might vary only slowly in time; so no need to 
transmit all data at full rate all the time 
− Signals of neighboring nodes are often quite similar; only try 
to transmit differences (details a bit complicated, see later) 
Source: Protocols and Architectures for Wireless Sensor Networks, Protocols and Architectures for Wireless Sensor Networks 
Holger Karl, Andreas Willig, chapter 3, Wiley, 2005 .
Data Aggregation 
− Computing a smaller representation of a number of data items (or 
messages) that is extracted from all the individual data items. 
− For example computing min/max or mean of sensor data. 
− More advance aggregation solutions could use approximation 
techniques to transform high-dimensionality data to lower-dimensionality 
abstractions/representations. 
− The aggregated data can be smaller in size, represent 
patterns/abstractions; so in multi-hop networks, nodes can 
receive data form other node and aggregate them before 
forwarding them to a sink or gateway. 
− Or the aggregation can happen on a sink/gateway node.
Aggregation example 
− Reduce number of transmitted bits/packets by applying an 
aggregation function in the network 
1 
1 
3 
1 
1 
6 
1 
1 
1 
1 
1 
1 
Source: Holger Karl, Andreas Willig, Protocols and Architectures for Wireless Sensor Networks, Protocols and Architectures for Wireless Sensor 
Networks, chapter 3, Wiley, 2005 .
Efficacy of an aggregation mechanism 
− Accuracy: difference between the resulting value or representation 
and the original data 
− Some solutions can be lossless or lossly depending on the 
applied techniques. 
− Completeness: the percentage of all the data items that are 
included in the computation of the aggregated data. 
− Latency: delay time to compute and report the aggregated data 
− Computation foot-print; complexity; 
− Overhead: the main advantage of the aggregation is reducing the 
size of the data representation; 
− Aggregation functions can trade-off between accuracy, latency and 
overhead; 
− Aggregation should happen close to the source.
Publish/Subscribe 
− Achieved by publish/subscribe paradigm 
− Idea: Entities can publish data under certain names 
− Entities can subscribe to updates of such named data 
− Conceptually: Implemented by a software bus 
− Software bus stores subscriptions, published data; names used 
as filters; subscribers notified when values of named data 
changes 
Publisher 1 Publisher 2 
Software bus 
Subscriber 1 Subscriber 2 Subscriber 3 
Source: Holger Karl, Andreas Willig, Protocols and Architectures for Wireless Sensor Networks, Protocols and Architectures for Wireless Sensor 
Networks, chapter 12, Wiley, 2005 .
MQTT Pub/Sub Protocol 
− MQ Telemetry Transport (MQTT) is a lightweight broker-based 
publish/subscribe messaging protocol. 
− MQTT is designed to be open, simple, lightweight and easy to 
implement. 
− These characteristics make MQTT ideal for use in constrained 
environments, for example in IoT. 
−Where the network is expensive, has low bandwidth or is 
unreliable 
−When run on an embedded device with limited processor or 
memory resources; 
− A small transport overhead (the fixed-length header is just 2 
bytes), and protocol exchanges minimised to reduce network 
traffic 
− MQTT was developed by Andy Stanford-Clark of IBM, and Arlen 
Nipper of Cirrus Link Solutions. 
Source: MQTT V3.1 Protocol Specification, IBM, http://public.dhe.ibm.com/software/dw/webservices/ws-mqtt/mqtt-v3r1.html
MQTT 
− It supports publish/subscribe message pattern to provide one-to-many 
message distribution and decoupling of applications 
− A messaging transport that is agnostic to the content of the payload 
− The use of TCP/IP to provide basic network connectivity 
− Three qualities of service for message delivery: 
− "At most once", where messages are delivered according to the best 
efforts of the underlying TCP/IP network. Message loss or duplication 
can occur. 
− This level could be used, for example, with ambient sensor data 
where it does not matter if an individual reading is lost as the next 
one will be published soon after. 
− "At least once", where messages are assured to arrive but duplicates 
may occur. 
− "Exactly once", where message are assured to arrive exactly once. This 
level could be used, for example, with billing systems where duplicate 
or lost messages could lead to incorrect charges being applied. 
Source: MQTT V3.1 Protocol Specification, IBM, http://public.dhe.ibm.com/software/dw/webservices/ws-mqtt/mqtt-v3r1.html
MQTT Message Format 
− The message header for each MQTT command message contains a 
fixed header. 
− Some messages also require a variable header and a payload. 
− The format for each part of the message header: 
— DUP: Duplicate delivery 
— QoS: Quality of Service 
— RETAIN: RETAIN flag 
—This flag is only used on PUBLISH messages. When a client 
sends a PUBLISH to a server, if the Retain flag is set (1), the 
server should hold on to the message after it has been delivered 
to the current subscribers. 
—This allows new subscribers to instantly receive data with the 
retained, or Last Known Good, value. 
Source: MQTT V3.1 Protocol Specification, IBM, http://public.dhe.ibm.com/software/dw/webservices/ws-mqtt/mqtt-v3r1.html
Sensor Data as time-series data 
− The sensor data (or IoT data in general) can be seen as time-series 
data. 
− A sensor stream refers to a source that provide sensor data over 
time. 
− The data can be sampled/collected at a rate (can be also variable) 
and is sent as a series of values. 
− Over time, there will be a large number of data items collected. 
− Using time-series processing techniques can help to reduce the 
size of the data that is communicated; 
− Let’s remember, communication can consume more energy 
than communication;
Sensor Data as time-series data 
− Different representation method that introduced for time-series 
data can be applied. 
− The goal is to reduce the dimensionality (and size) of the data, to 
find patterns, detect anomalies, to query similar data; 
− Dimensionality reduction techniques transform a data series with 
n items to a representation with w items where w < n. 
− This functions are often lossy in comparison with solutions like 
normal compression that preserve all the data. 
− One of these techniques is called Symbolic Aggregation 
Approximation (SAX). 
− SAX was originally proposed for symbolic representation of time-series 
data; it can be also used for symbolic representation of 
time-series sensor measurements. 
− The computational foot-print of SAX is low; so it can be also used 
as a an in-network processing technique.
53 
In-network processing 
Using Symbolic Aggregate Approximation (SAX) 
SAX Pattern (blue) with word length of 20 and a vocabulary of 10 symbols 
over the original sensor time-series data (green) 
Source: P. Barnaghi, F. Ganz, C. Henson, A. Sheth, "Computing Perception from Sensor Data", 
in Proc. of the IEEE Sensors 2012, Oct. 2012. 
fggfffhfffffgjhghfff 
jfhiggfffhfffffgjhgi 
fggfffhfffffgjhghfff
Symbolic Aggregate Approximation 
(SAX) 
− SAX transforms time-series data into symbolic string 
representations. 
− Symbolic Aggregate approXimation was proposed by Jessica Lin et 
al at the University of California –Riverside; 
− http://www.cs.ucr.edu/~eamonn/SAX.htm . 
− It extends Piecewise Aggregate Approximation (PAA) symbolic 
representation approach. 
− SAX algorithm is interesting for in-network processing in WSN 
because of its simplicity and low computational complexity. 
− SAX provides reasonable sensitivity and selectivity in representing 
the data. 
− The use of a symbolic representation makes it possible to use 
several other algorithms and techniques to process/utilise SAX 
representations such as hashing, pattern matching, suffix trees 
etc.
Processing Steps in SAX 
− SAX transforms a time-series X of length n into the string of 
arbitrary length, where typically, using an alphabet A of size a > 
2. 
− The SAX algorithm has two main steps: 
− Transforming the original time-series into a PAA representation 
− Converting the PAA intermediate representation into a string 
during. 
− The string representations can be used for pattern matching, 
distance measurements, outlier detection, etc.
Piecewise Aggregate Approximation 
− In PAA, to reduce the time series from n dimensions to w 
dimensions, the data is divided into w equal sized “frames.” 
− The mean value of the data falling within a frame is calculated and 
a vector of these values becomes the data-reduced 
representation. 
− Before applying PAA, each time series to have a needs to be 
normalised to achieve a mean of zero and a standard deviation of 
one. 
− The reason is to avoid comparing time series with different 
offsets and amplitudes; 
Source: Jessica Lin, Eamonn Keogh, Stefano Lonardi, and Bill Chiu. 2003. A symbolic representation of time series, with implications for streaming algorithms. In 
Proceedings of the 8th ACM SIGMOD workshop on Research issues in data mining and knowledge discovery (DMKD '03). ACM, New York, NY, USA, 2-11.
SAX- normalisation before PAA 
Timeseries (c): 2, 3, 4.5, 7.6, 4, 2, 2, 2, 3, 1 
Mean (μ): μ= (2+3+4.5+7.6+4+2+2+2+3+1)/10= 3.11 
Standard deviation (σ): 
(2-3.11)2 = 1.2321 
(3-3.11)2 = 0.0121 
(4.5-3.11)2 = 1.9321 
(7.6-3.11)2 = 20.1601 
(4-3.11)2 = 0.7921 
(2-3.11)2 = 1.2321 
(2-3.11)2 = 1.2321 
(2-3.11)2 = 1.2321 
(3-3.11)2 = 0.0121 
(1-3.11)2 = 4.4521 
1.2321+0.0121+ 1.9321+ 20.1601+ 
0.7921+ 1.2321+ 1.2321+ 1.2321+ 
1.2321+ 0.0121+4.4521 = 33.5211 
σ = √ (33.5211/10) = 1.83087683911
Normalisation 
Timeseries (c): 2, 3, 4.5, 7.6, 4, 2, 2, 2, 3, 1 
Normalised: zi = (ci – μ)/ σ 
σ = 1.83087683911 
μ = 3.11 
z1 = (2- 3.11)/1.83087683911 = -0.606 
z2 = (3-3.11)/ 1.83087683911= -0.600 
z3 = (4.5-3.11)/ 1.83087683911= 2.452 
z4 = (7.6-3.11)/ 1.83087683911= -0.600 
z5 = (4-3.11)/ 1.83087683911= 0.486 
z6 = (2-3.11)/ 1.83087683911= -0.606 
z7 = (2-3.11)/ 1.83087683911= -0.606 
z8 = (2-3.11)/ 1.83087683911= -0.606 
z9 = (3-3.11)/ 1.83087683911= -0.600 
z10 = (1-3.11)/ 1.83087683911= -1.152 
Normalised Timeseries (z): -0.606, -0.600, 2.452, -0.600, 
0.486, -0.606, -0.606, -0.606 , -0.600, -1.152
PAA calculation 
Timeseries (c): 2, 3, 4.5, 7.6, 4, 2, 2, 2, 3, 1 
Normalised Timeseries (z): -0.606, -0.600, 2.452, -0.600, 
0.486, -0.606, -0.606, -0.606 , -0.600, -1.152 
PAA (w=5): -0.603, 0.926, -0.06, -0.606, 0.273
PAA to SAX Conversion 
− Conversion of the PAA representation of a time-series into 
SAX is based on producing symbols that correspond to the 
time-series features with equal probability. 
− The SAX developers have shown that time-series which are 
normalised (zero mean and standard deviation of 1) follow 
a Normal distribution (Gaussian distribution). 
− The SAX method introduces breakpoints that divides the 
PAA representation to equal sections and assigns an 
alphabet for each section. 
− For defining breakpoints, Normal inverse cumulative 
distribution function
Breakpoints in SAX 
− “Breakpoints: breakpoints are a sorted list of numbers B = 
β 1,…, β a-1 such that the area under a N(0,1) Gaussian curve 
from βi to βi+1 = 1/a”. 
Source: Jessica Lin, Eamonn Keogh, Stefano Lonardi, and Bill Chiu. 2003. A symbolic representation of time series, with implications for streaming algorithms. In 
Proceedings of the 8th ACM SIGMOD workshop on Research issues in data mining and knowledge discovery (DMKD '03). ACM, New York, NY, USA, 2-11.
Alphabet representation in SAX 
− Let’s assume that we will have 4 symbols alphabet: a,b,c,d 
− As shown in the table in the previous slide, the cut lines for 
this alphabet (also shown as the thin red lines on the plot 
below) will be { -0.67, 0, 0.67 } 
Source: JMOTIF Time series mining, http://code.google.com/p/jmotif/wiki/SAX
SAX Represetantion 
Timeseries (c): 2, 3, 4.5, 7.6, 4, 2, 2, 2, 3, 1 
Normalised Timeseries (z): -0.606, -0.600, 2.452, 
-0.600, 0.486, -0.606, -0.606, -0.606 , -0.600, 
-1.152 
PAA (w=5): -0.603, 0.926, -0.06, -0.606, 0.273 
Cut off ranges: {-0.67, 0, 0.67} 
Alphabet: a ,b ,c, d 
SAX representation: bdbbc
Features of the SAX technique 
− SAX divides a time series data into equal segments and then 
creates a string representation for each segment. 
− The SAX patterns create the lower-level abstractions that are used 
to create the higher-level interpretation of the underlying data. 
− The string representation of the SAX mechanism enables to 
compare the patterns using a specific type of string similarity 
function.
High-level 
information/ 
knowledge 
65 
A sample data processing framework 
Temporal data 
(extracted from 
descriptions) 
Day-time 
Night-time 
High-level abstractions 
Domain knowledge 
fggfffhfffffgjhghfff dddfffffffffffddd cccddddccccdddccc dddcdcdcdcddasddd aaaacccaaaaaaaacccc 
Raw sensor data stream Raw sensor data stream Raw sensor data stream 
PIR Sensor Light Sensor Temperature 
Sensor 
Attendance Phone Hot 
Temperature 
Cold 
Temperature Bright 
Office room 
BA0121 
On going 
meeting 
Window has 
been left open 
…. 
Spatial data 
(extracted from 
descriptions) 
Thematic data 
(low level 
abstractions) 
Intelligent Processing 
Observations 
SAX Patterns 
Raw sensor data 
(or Annotated data) 
… 
…. 
Intelligent 
Processing/ 
Reasoning
Correlation analysis 
Slides: Maria Bermudez-Edo
Correlations: linear and nonlinear 
67
Co-occurrence analysis 
68
Correlation analysis 
69
Challenges 
− Complexity 
− Co-occurrences vs. Causation 
70
Data Discovery 
Slides: Syed Amir Hoseinitabatabaei
A discovery 
method in the IoT 
time 
location 
type 
Query formulating 
[#location || ##ttyyppee || ttiimmee]] 
Discovery ID 
Discovery/ 
DHT Server 
Data repository 
(archived data) 
#location 
#type 
#location 
#type 
Data hypercube 
#location 
#type 
Gateway 
Core network 
Logical Connection 
Network Connection 
Data
An example: a discovery method in the IoT 
S. A. Hoseinitabatabaei, P. Barnaghi, C. Wang, R. Tafazolli, L. Dong, "A Distributed Data Discovery Mechanism for the Internet of 73 
Things", 2014.
An example: a discovery method in the IoT 
S. A. Hoseinitabatabaei, P. Barnaghi, C. Wang, R. Tafazolli, L. Dong, "A Distributed Data Discovery Mechanism for the Internet of 74 
Things", 2014.
Adaptive learning 
Slides: Daniel Puschmann, Masound Hassanpour
Adaptable and dynamic learning 
methods 
http://kat.ee.surrey.ac.uk/
Equilibrium in transient and non-uniform 
world 
A D 
B C 
Image source for equilibrium diagram: John D. Hey, The University of York.
Social data analysis: A case study 
Slides: Pramod Anantharam, Kno.e.sis Centre, 
Wright State University
Social data analysis: A case study 
− Are people talking about city infrastructure on 
twitter? 
− Can we extract city infrastructure related events 
from twitter? 
− How can we leverage event and location 
knowledge bases for event extraction? 
− How well can we extract city events? 
Source: Pramod Anantharam, Kno.e.sis Centre, Wright State University.
Do people talk about city 
infrastructure on Twitter? 
Source: Pramod Anantharam, Kno.e.sis Centre, Wright State University. 80
Some challenges in extracting 
events from Tweets 
− No well accepted definition of “events related to a 
city”. 
− Tweets are short (140 characters) and its 
informal nature make it hard to analyze 
− Entity, location, time, and type of the event 
− Multiple reports of the same event and sparse 
report of some events (biased sample) 
− Numbers don’t necessarily indicate intensity 
− Validation of the solution is hard due to the open 
domain nature of the problem 
Source: Pramod Anantharam, Kno.e.sis Centre, Wright State University. 81
Event extraction techniques 
− N-grams + Regression 
− Text analysis to extract uni- and bi-grams (event markers) 
− Feature selection to select best possible event markers 
− Apply regression to predict P(Y|X) where Y is the target (rainfall) 
and X is the input (event marker). 
− Clustering 
− Create event clusters incrementally over time 
− Identify clusters of interest based on its relevance (manual 
inspection) 
− Granularity remains at the tweet/cluster level (tweets are 
assigned to clusters of interest) 
− Sequence Labeling (CRFs) 
− Text analysis to extract features such as named entities, POS1 
tagging 
− Each event indicator is modeled as a mixture of event types that 
are latent variables 
− Each type corresponds to a distribution over named entities n 
(labels assigned to event types by manual inspection) and other 
features 
Source: Pramod Anantharam, Kno.e.sis Centre, Wright State University.
City event extraction 
− Event extraction should be open domain (no a 
priori event types) with event metadata. 
− Incorporate background knowledge related to city 
related events e.g., 511.org hierarchy, SCRIBE 
ontology, city location names. 
− Assess the intensity of an event using content 
and network cues. 
− Robust to noise, informal nature, and variability 
of data. 
Source: Pramod Anantharam, Kno.e.sis Centre, Wright State University.
N-grams + Regression 
− Open domain: works best when there is a reference 
corpus to extract n-grams 
− Event metadata: cannot distinguish between entities 
and hence hard to extract event metadata 
− Background knowledge: incorporating domain 
vocabulary (e.g., subsumption) is not natural 
− Event intensity: regression maps the event indicators to 
some quantified values 
− Robustness: quite robust if there is a reference corpus 
Source: Pramod Anantharam, Kno.e.sis Centre, Wright State University.
Clustering 
− Open domain: works well for domains with no a priori 
knowledge of events (may need human inspection) 
− Background knowledge: incorporating domain 
vocabulary is not natural 
− Event intensity: not captured 
− Robustness: quite robust for twitter data with enough 
data for each cluster 
Source: Pramod Anantharam, Kno.e.sis Centre, Wright State University.
Sequence Labeling (CRFs) 
− Open domain: works well for domains with no a priori 
knowledge of events (may need human inspection) 
− Event metadata: event metadata extraction is captured 
naturally with the named entities 
− Background knowledge: incorporating domain 
vocabulary is quite natural 
− Event intensity: part-of-speech tag may indirectly 
capture intensity 
− Robustness: with a deeper model for capturing context, 
quite robust for twitter data 
Source: Pramod Anantharam, Kno.e.sis Centre, Wright State University.
City event extraction solution 
architecture 
POS 
Tagging 
Tweets from a city POS 
City Infrastructure 
Tagging 
Hybrid NER+ 
Event term 
extraction 
Hybrid NER+ 
Event term 
extraction 
Impact 
Assessment 
Impact 
Assessment 
Temporal 
Estimation 
Temporal 
Estimation 
Event 
Event 
OOSSMM L Looccaatitoionnss SSCCRRIBIBEE o onntotolologgyy Aggregation 
Aggregation 
GGeeoohhaasshhiningg 
551111.o.orrgg h hieierraarrcchhyy 
City Event Annotation City Event Extraction 
Source: Pramod Anantharam, Kno.e.sis Centre, Wright State University.
City event extraction - Key insights 
a) Space: events reported within a grid (gi ∈G 
where G is a set of all grids in a city)at a certain 
time are most likely reporting the same event 
b) Time: events reported within a time Δt in a grid 
gi are most likely to be reporting the same event 
c) Theme: events with similar entities within a grid 
gi and time Δt are most likely reporting the 
same event 
Source: Pramod Anantharam, Kno.e.sis Centre, Wright State University.
0.6 miles 
Max-lat 
Min-lat 
Min-long 
37.7545166015625, -122.40966796875 
Max-long 
0.38 miles 
37.7490234375, -122.40966796875 
37.7545166015625, -122.420654296875 
37.7490234375, -122.420654296875 
4 
37.74933, -122.4106711 
Hierarchical spatial structure of geohash for 
representing locations with variable precision. 
Here the location string is 5H34 
0 1 2 3 4 5 6 
7 8 9 B C D E 
F G H I J K L 
0 1 
2 3 4 
5 6 7 
8 9 
0 1 2 3 4 
5 6 7 
0 1 2 
3 4 5 
6 7 8 
City event extraction – Geohashing
Implmentation 
− City event annotation 
− Automated creation of training data 
− Annotation task (our CRF model vs. baseline CRF model) 
− City event extraction 
− Use aggregation algorithm for event extraction 
− Extracted events AND ground truth 
− Dataset (Aug – Nov 2013) ~ 8 GB of data on disk 
− Over 8 million tweets 
− Over 162 million sensor data points 
− 311 active events and 170 scheduled events 
Source: Pramod Anantharam, Kno.e.sis Centre, Wright State University.
Evaluation – extracted events & ground truth
92 
Conclusions 
− A primary goal of interconnecting devices and 
collecting/processing data from them is to create situation 
awareness and enable applications, machines, and human 
users to better understand their surrounding environments. 
− The understanding of a situation, or context, potentially 
enables services and applications to make intelligent 
decisions and to respond to the dynamics of their 
environments. 
− Dynamicity, energy efficiency, multi-modality, 
heterogeneity and volume are among the key challenges. 
− We need to design adaptable and scalable solutions that 
combine knowledge and data engineering (semantics), 
background knowledge (ontologies) and machine learning 
techniques.
Acknowledgements 
− Some parts of the content are adapted from: 
− Holger Karl, Andreas Willig, Protocols and Architectures for 
Wireless Sensor Networks, Protocols and Architectures for 
Wireless Sensor Networks, chapters 3 and 12, Wiley, 2005 . 
− Jessica Lin, Eamonn Keogh, Stefano Lonardi, and Bill Chiu. 
2003. A symbolic representation of time series, with 
implications for streaming algorithms. In Proceedings of the 
8th ACM SIGMOD workshop on Research issues in data mining 
and knowledge discovery (DMKD '03). ACM, New York, NY, 
USA, 2-11. 
− JMOTIF Time series mining, 
http://code.google.com/p/jmotif/wiki/SAX
Q&A 
− Payam Barnaghi, University of 
Surrey/EU FP7 CityPulse Project 
http://www.ict-citypulse.eu/ 
@pbarnaghi 
p.barnaghi@surrey.ac.uk

Contenu connexe

Tendances

Internet of Things and Data Analytics for Smart Cities
Internet of Things and Data Analytics for Smart CitiesInternet of Things and Data Analytics for Smart Cities
Internet of Things and Data Analytics for Smart CitiesPayamBarnaghi
 
Large-scale data analytics for smart cities
Large-scale data analytics for smart citiesLarge-scale data analytics for smart cities
Large-scale data analytics for smart citiesPayamBarnaghi
 
Internet of Things: The story so far
Internet of Things: The story so farInternet of Things: The story so far
Internet of Things: The story so farPayamBarnaghi
 
Working with real world data
Working with real world dataWorking with real world data
Working with real world dataPayamBarnaghi
 
The Future is Cyber-Healthcare
The Future is Cyber-Healthcare The Future is Cyber-Healthcare
The Future is Cyber-Healthcare PayamBarnaghi
 
How to make cities "smarter"?
How to make cities "smarter"?How to make cities "smarter"?
How to make cities "smarter"?PayamBarnaghi
 
Internet of Things and Large-scale Data Analytics
Internet of Things and Large-scale Data Analytics Internet of Things and Large-scale Data Analytics
Internet of Things and Large-scale Data Analytics PayamBarnaghi
 
The impact of Big Data on next generation of smart cities
The impact of Big Data on next generation of smart citiesThe impact of Big Data on next generation of smart cities
The impact of Big Data on next generation of smart citiesPayamBarnaghi
 
Opportunities and Challenges of Large-scale IoT Data Analytics
Opportunities and Challenges of Large-scale IoT Data AnalyticsOpportunities and Challenges of Large-scale IoT Data Analytics
Opportunities and Challenges of Large-scale IoT Data AnalyticsPayamBarnaghi
 
How to make data more usable on the Internet of Things
How to make data more usable on the Internet of ThingsHow to make data more usable on the Internet of Things
How to make data more usable on the Internet of ThingsPayamBarnaghi
 
CityPulse: Large-scale data analysis for smart city applications
CityPulse: Large-scale data analysis for smart city applications CityPulse: Large-scale data analysis for smart city applications
CityPulse: Large-scale data analysis for smart city applications PayamBarnaghi
 
What makes smart cities “Smart”?
What makes smart cities “Smart”? What makes smart cities “Smart”?
What makes smart cities “Smart”? PayamBarnaghi
 
CityPulse: Large-scale data analytics for smart cities
CityPulse: Large-scale data analytics for smart cities CityPulse: Large-scale data analytics for smart cities
CityPulse: Large-scale data analytics for smart cities PayamBarnaghi
 
Smart Cities: How are they different?
Smart Cities: How are they different? Smart Cities: How are they different?
Smart Cities: How are they different? PayamBarnaghi
 
Physical-Cyber-Social Data Analytics & Smart City Applications
Physical-Cyber-Social Data Analytics & Smart City ApplicationsPhysical-Cyber-Social Data Analytics & Smart City Applications
Physical-Cyber-Social Data Analytics & Smart City ApplicationsPayamBarnaghi
 
CityPulse: Large-scale data analysis for smart city applications
CityPulse: Large-scale data analysis for smart city applicationsCityPulse: Large-scale data analysis for smart city applications
CityPulse: Large-scale data analysis for smart city applicationsPayamBarnaghi
 
Semantic technologies for the Internet of Things
Semantic technologies for the Internet of Things Semantic technologies for the Internet of Things
Semantic technologies for the Internet of Things PayamBarnaghi
 
Dynamic Semantics for the Internet of Things
Dynamic Semantics for the Internet of Things Dynamic Semantics for the Internet of Things
Dynamic Semantics for the Internet of Things PayamBarnaghi
 
Internet of Things and Data Analytics for Smart Cities and eHealth
Internet of Things and Data Analytics for Smart Cities and eHealthInternet of Things and Data Analytics for Smart Cities and eHealth
Internet of Things and Data Analytics for Smart Cities and eHealthPayamBarnaghi
 
Semantic Technologies for the Internet of Things: Challenges and Opportunities
Semantic Technologies for the Internet of Things: Challenges and Opportunities Semantic Technologies for the Internet of Things: Challenges and Opportunities
Semantic Technologies for the Internet of Things: Challenges and Opportunities PayamBarnaghi
 

Tendances (20)

Internet of Things and Data Analytics for Smart Cities
Internet of Things and Data Analytics for Smart CitiesInternet of Things and Data Analytics for Smart Cities
Internet of Things and Data Analytics for Smart Cities
 
Large-scale data analytics for smart cities
Large-scale data analytics for smart citiesLarge-scale data analytics for smart cities
Large-scale data analytics for smart cities
 
Internet of Things: The story so far
Internet of Things: The story so farInternet of Things: The story so far
Internet of Things: The story so far
 
Working with real world data
Working with real world dataWorking with real world data
Working with real world data
 
The Future is Cyber-Healthcare
The Future is Cyber-Healthcare The Future is Cyber-Healthcare
The Future is Cyber-Healthcare
 
How to make cities "smarter"?
How to make cities "smarter"?How to make cities "smarter"?
How to make cities "smarter"?
 
Internet of Things and Large-scale Data Analytics
Internet of Things and Large-scale Data Analytics Internet of Things and Large-scale Data Analytics
Internet of Things and Large-scale Data Analytics
 
The impact of Big Data on next generation of smart cities
The impact of Big Data on next generation of smart citiesThe impact of Big Data on next generation of smart cities
The impact of Big Data on next generation of smart cities
 
Opportunities and Challenges of Large-scale IoT Data Analytics
Opportunities and Challenges of Large-scale IoT Data AnalyticsOpportunities and Challenges of Large-scale IoT Data Analytics
Opportunities and Challenges of Large-scale IoT Data Analytics
 
How to make data more usable on the Internet of Things
How to make data more usable on the Internet of ThingsHow to make data more usable on the Internet of Things
How to make data more usable on the Internet of Things
 
CityPulse: Large-scale data analysis for smart city applications
CityPulse: Large-scale data analysis for smart city applications CityPulse: Large-scale data analysis for smart city applications
CityPulse: Large-scale data analysis for smart city applications
 
What makes smart cities “Smart”?
What makes smart cities “Smart”? What makes smart cities “Smart”?
What makes smart cities “Smart”?
 
CityPulse: Large-scale data analytics for smart cities
CityPulse: Large-scale data analytics for smart cities CityPulse: Large-scale data analytics for smart cities
CityPulse: Large-scale data analytics for smart cities
 
Smart Cities: How are they different?
Smart Cities: How are they different? Smart Cities: How are they different?
Smart Cities: How are they different?
 
Physical-Cyber-Social Data Analytics & Smart City Applications
Physical-Cyber-Social Data Analytics & Smart City ApplicationsPhysical-Cyber-Social Data Analytics & Smart City Applications
Physical-Cyber-Social Data Analytics & Smart City Applications
 
CityPulse: Large-scale data analysis for smart city applications
CityPulse: Large-scale data analysis for smart city applicationsCityPulse: Large-scale data analysis for smart city applications
CityPulse: Large-scale data analysis for smart city applications
 
Semantic technologies for the Internet of Things
Semantic technologies for the Internet of Things Semantic technologies for the Internet of Things
Semantic technologies for the Internet of Things
 
Dynamic Semantics for the Internet of Things
Dynamic Semantics for the Internet of Things Dynamic Semantics for the Internet of Things
Dynamic Semantics for the Internet of Things
 
Internet of Things and Data Analytics for Smart Cities and eHealth
Internet of Things and Data Analytics for Smart Cities and eHealthInternet of Things and Data Analytics for Smart Cities and eHealth
Internet of Things and Data Analytics for Smart Cities and eHealth
 
Semantic Technologies for the Internet of Things: Challenges and Opportunities
Semantic Technologies for the Internet of Things: Challenges and Opportunities Semantic Technologies for the Internet of Things: Challenges and Opportunities
Semantic Technologies for the Internet of Things: Challenges and Opportunities
 

En vedette

Multi-resolution Data Communication in Wireless Sensor Networks
Multi-resolution Data Communication in Wireless Sensor NetworksMulti-resolution Data Communication in Wireless Sensor Networks
Multi-resolution Data Communication in Wireless Sensor NetworksPayamBarnaghi
 
Semantic Sensor Service Networks
Semantic Sensor Service NetworksSemantic Sensor Service Networks
Semantic Sensor Service NetworksPayamBarnaghi
 
Data Modeling and Knowledge Engineering for the Internet of Things
Data Modeling and Knowledge Engineering for the Internet of ThingsData Modeling and Knowledge Engineering for the Internet of Things
Data Modeling and Knowledge Engineering for the Internet of ThingsPayamBarnaghi
 
Spatial Data on the Web
Spatial Data on the WebSpatial Data on the Web
Spatial Data on the WebPayamBarnaghi
 
A Knowledge-based Approach for Real-Time IoT Stream Annotation and Processing
A Knowledge-based Approach for Real-Time IoT Stream Annotation and ProcessingA Knowledge-based Approach for Real-Time IoT Stream Annotation and Processing
A Knowledge-based Approach for Real-Time IoT Stream Annotation and ProcessingPayamBarnaghi
 
Semantic technologies for the Internet of Things
Semantic technologies for the Internet of Things Semantic technologies for the Internet of Things
Semantic technologies for the Internet of Things PayamBarnaghi
 
IoT-Lite: A Lightweight Semantic Model for the Internet of Things
IoT-Lite:  A Lightweight Semantic Model for the Internet of ThingsIoT-Lite:  A Lightweight Semantic Model for the Internet of Things
IoT-Lite: A Lightweight Semantic Model for the Internet of ThingsPayamBarnaghi
 
Internet of Things: Concepts and Technologies
Internet of Things: Concepts and TechnologiesInternet of Things: Concepts and Technologies
Internet of Things: Concepts and TechnologiesPayamBarnaghi
 
MQTT - MQ Telemetry Transport for Message Queueing
MQTT - MQ Telemetry Transport for Message QueueingMQTT - MQ Telemetry Transport for Message Queueing
MQTT - MQ Telemetry Transport for Message QueueingPeter R. Egli
 

En vedette (9)

Multi-resolution Data Communication in Wireless Sensor Networks
Multi-resolution Data Communication in Wireless Sensor NetworksMulti-resolution Data Communication in Wireless Sensor Networks
Multi-resolution Data Communication in Wireless Sensor Networks
 
Semantic Sensor Service Networks
Semantic Sensor Service NetworksSemantic Sensor Service Networks
Semantic Sensor Service Networks
 
Data Modeling and Knowledge Engineering for the Internet of Things
Data Modeling and Knowledge Engineering for the Internet of ThingsData Modeling and Knowledge Engineering for the Internet of Things
Data Modeling and Knowledge Engineering for the Internet of Things
 
Spatial Data on the Web
Spatial Data on the WebSpatial Data on the Web
Spatial Data on the Web
 
A Knowledge-based Approach for Real-Time IoT Stream Annotation and Processing
A Knowledge-based Approach for Real-Time IoT Stream Annotation and ProcessingA Knowledge-based Approach for Real-Time IoT Stream Annotation and Processing
A Knowledge-based Approach for Real-Time IoT Stream Annotation and Processing
 
Semantic technologies for the Internet of Things
Semantic technologies for the Internet of Things Semantic technologies for the Internet of Things
Semantic technologies for the Internet of Things
 
IoT-Lite: A Lightweight Semantic Model for the Internet of Things
IoT-Lite:  A Lightweight Semantic Model for the Internet of ThingsIoT-Lite:  A Lightweight Semantic Model for the Internet of Things
IoT-Lite: A Lightweight Semantic Model for the Internet of Things
 
Internet of Things: Concepts and Technologies
Internet of Things: Concepts and TechnologiesInternet of Things: Concepts and Technologies
Internet of Things: Concepts and Technologies
 
MQTT - MQ Telemetry Transport for Message Queueing
MQTT - MQ Telemetry Transport for Message QueueingMQTT - MQ Telemetry Transport for Message Queueing
MQTT - MQ Telemetry Transport for Message Queueing
 

Similaire à Intelligent Data Processing for the Internet of Things

Data Processing and Semantics for Advanced Internet of Things (IoT) Applicati...
Data Processing and Semantics for Advanced Internet of Things (IoT) Applicati...Data Processing and Semantics for Advanced Internet of Things (IoT) Applicati...
Data Processing and Semantics for Advanced Internet of Things (IoT) Applicati...Artificial Intelligence Institute at UofSC
 
Discovering Things and Things’ data/services
Discovering Things and  Things’ data/servicesDiscovering Things and  Things’ data/services
Discovering Things and Things’ data/servicesPayamBarnaghi
 
Big Data in Distributed Analytics,Cybersecurity And Digital Forensics
Big Data in Distributed Analytics,Cybersecurity And Digital ForensicsBig Data in Distributed Analytics,Cybersecurity And Digital Forensics
Big Data in Distributed Analytics,Cybersecurity And Digital ForensicsSherinMariamReji05
 
Introduction to Cloud computing and Big Data-Hadoop
Introduction to Cloud computing and  Big Data-HadoopIntroduction to Cloud computing and  Big Data-Hadoop
Introduction to Cloud computing and Big Data-HadoopNagarjuna D.N
 
The impact of Big Data on next generation of smart cities
The impact of Big Data on next generation of smart citiesThe impact of Big Data on next generation of smart cities
The impact of Big Data on next generation of smart citiesCityPulse Project
 
1_IoT_Fundamentals.ppt
1_IoT_Fundamentals.ppt1_IoT_Fundamentals.ppt
1_IoT_Fundamentals.pptHodaKHajeh1
 
General introduction to IoTCrawler
General introduction to IoTCrawlerGeneral introduction to IoTCrawler
General introduction to IoTCrawlerIoTCrawler
 
Internet of Things & Big Data
Internet of Things & Big DataInternet of Things & Big Data
Internet of Things & Big DataArun Rajput
 
Data Modelling and Knowledge Engineering for the Internet of Things
Data Modelling and Knowledge Engineering for the Internet of ThingsData Modelling and Knowledge Engineering for the Internet of Things
Data Modelling and Knowledge Engineering for the Internet of ThingsCory Andrew Henson
 
Internet of things (IOT) connects physical to digital
Internet of things (IOT) connects physical to digitalInternet of things (IOT) connects physical to digital
Internet of things (IOT) connects physical to digitalEslam Nader
 
Week 8 - Module 19 - PPT- Internet of Things for Libraries.pdf
Week 8 - Module 19 - PPT- Internet of Things for Libraries.pdfWeek 8 - Module 19 - PPT- Internet of Things for Libraries.pdf
Week 8 - Module 19 - PPT- Internet of Things for Libraries.pdfMohamedAli899919
 
INF2190_W1_2016_public
INF2190_W1_2016_publicINF2190_W1_2016_public
INF2190_W1_2016_publicAttila Barta
 
Big Data PPT by Rohit Dubey
Big Data PPT by Rohit DubeyBig Data PPT by Rohit Dubey
Big Data PPT by Rohit DubeyRohit Dubey
 
GK NU CS 101 Session 1B (1).ppt
GK NU CS 101 Session 1B (1).pptGK NU CS 101 Session 1B (1).ppt
GK NU CS 101 Session 1B (1).pptPiyushRanjan269184
 
Opportunities and methodological challenges of Big Data for official statist...
Opportunities and methodological challenges of  Big Data for official statist...Opportunities and methodological challenges of  Big Data for official statist...
Opportunities and methodological challenges of Big Data for official statist...Piet J.H. Daas
 
From Data Platforms to Dataspaces: Enabling Data Ecosystems for Intelligent S...
From Data Platforms to Dataspaces: Enabling Data Ecosystems for Intelligent S...From Data Platforms to Dataspaces: Enabling Data Ecosystems for Intelligent S...
From Data Platforms to Dataspaces: Enabling Data Ecosystems for Intelligent S...Edward Curry
 
Data Science at Scale - The DevOps Approach
Data Science at Scale - The DevOps ApproachData Science at Scale - The DevOps Approach
Data Science at Scale - The DevOps ApproachMihai Criveti
 
SuanIct-Bigdata desktop-final
SuanIct-Bigdata desktop-finalSuanIct-Bigdata desktop-final
SuanIct-Bigdata desktop-finalstelligence
 

Similaire à Intelligent Data Processing for the Internet of Things (20)

Data Processing and Semantics for Advanced Internet of Things (IoT) Applicati...
Data Processing and Semantics for Advanced Internet of Things (IoT) Applicati...Data Processing and Semantics for Advanced Internet of Things (IoT) Applicati...
Data Processing and Semantics for Advanced Internet of Things (IoT) Applicati...
 
Discovering Things and Things’ data/services
Discovering Things and  Things’ data/servicesDiscovering Things and  Things’ data/services
Discovering Things and Things’ data/services
 
Big Data in Distributed Analytics,Cybersecurity And Digital Forensics
Big Data in Distributed Analytics,Cybersecurity And Digital ForensicsBig Data in Distributed Analytics,Cybersecurity And Digital Forensics
Big Data in Distributed Analytics,Cybersecurity And Digital Forensics
 
Introduction to Cloud computing and Big Data-Hadoop
Introduction to Cloud computing and  Big Data-HadoopIntroduction to Cloud computing and  Big Data-Hadoop
Introduction to Cloud computing and Big Data-Hadoop
 
The impact of Big Data on next generation of smart cities
The impact of Big Data on next generation of smart citiesThe impact of Big Data on next generation of smart cities
The impact of Big Data on next generation of smart cities
 
1_IoT_Fundamentals.ppt
1_IoT_Fundamentals.ppt1_IoT_Fundamentals.ppt
1_IoT_Fundamentals.ppt
 
General introduction to IoTCrawler
General introduction to IoTCrawlerGeneral introduction to IoTCrawler
General introduction to IoTCrawler
 
Internet of Things & Big Data
Internet of Things & Big DataInternet of Things & Big Data
Internet of Things & Big Data
 
Data Modelling and Knowledge Engineering for the Internet of Things
Data Modelling and Knowledge Engineering for the Internet of ThingsData Modelling and Knowledge Engineering for the Internet of Things
Data Modelling and Knowledge Engineering for the Internet of Things
 
Internet of things (IOT) connects physical to digital
Internet of things (IOT) connects physical to digitalInternet of things (IOT) connects physical to digital
Internet of things (IOT) connects physical to digital
 
Week 8 - Module 19 - PPT- Internet of Things for Libraries.pdf
Week 8 - Module 19 - PPT- Internet of Things for Libraries.pdfWeek 8 - Module 19 - PPT- Internet of Things for Libraries.pdf
Week 8 - Module 19 - PPT- Internet of Things for Libraries.pdf
 
INF2190_W1_2016_public
INF2190_W1_2016_publicINF2190_W1_2016_public
INF2190_W1_2016_public
 
Big Data PPT by Rohit Dubey
Big Data PPT by Rohit DubeyBig Data PPT by Rohit Dubey
Big Data PPT by Rohit Dubey
 
GK NU CS 101 Session 1B (1).ppt
GK NU CS 101 Session 1B (1).pptGK NU CS 101 Session 1B (1).ppt
GK NU CS 101 Session 1B (1).ppt
 
Opportunities and methodological challenges of Big Data for official statist...
Opportunities and methodological challenges of  Big Data for official statist...Opportunities and methodological challenges of  Big Data for official statist...
Opportunities and methodological challenges of Big Data for official statist...
 
From Data Platforms to Dataspaces: Enabling Data Ecosystems for Intelligent S...
From Data Platforms to Dataspaces: Enabling Data Ecosystems for Intelligent S...From Data Platforms to Dataspaces: Enabling Data Ecosystems for Intelligent S...
From Data Platforms to Dataspaces: Enabling Data Ecosystems for Intelligent S...
 
Data Science at Scale - The DevOps Approach
Data Science at Scale - The DevOps ApproachData Science at Scale - The DevOps Approach
Data Science at Scale - The DevOps Approach
 
Internet of Things
Internet of ThingsInternet of Things
Internet of Things
 
8_iot.pdf
8_iot.pdf8_iot.pdf
8_iot.pdf
 
SuanIct-Bigdata desktop-final
SuanIct-Bigdata desktop-finalSuanIct-Bigdata desktop-final
SuanIct-Bigdata desktop-final
 

Plus de PayamBarnaghi

Academic Research: A Survival Guide
Academic Research: A Survival GuideAcademic Research: A Survival Guide
Academic Research: A Survival GuidePayamBarnaghi
 
Reproducibility in machine learning
Reproducibility in machine learningReproducibility in machine learning
Reproducibility in machine learningPayamBarnaghi
 
Search, Discovery and Analysis of Sensory Data Streams
Search, Discovery and Analysis of Sensory Data StreamsSearch, Discovery and Analysis of Sensory Data Streams
Search, Discovery and Analysis of Sensory Data StreamsPayamBarnaghi
 
Internet Search: the past, present and the future
Internet Search: the past, present and the futureInternet Search: the past, present and the future
Internet Search: the past, present and the futurePayamBarnaghi
 
Scientific and Academic Research: A Survival Guide 
Scientific and Academic Research:  A Survival Guide Scientific and Academic Research:  A Survival Guide 
Scientific and Academic Research: A Survival Guide PayamBarnaghi
 
Lecture 8: IoT System Models and Applications
Lecture 8: IoT System Models and ApplicationsLecture 8: IoT System Models and Applications
Lecture 8: IoT System Models and ApplicationsPayamBarnaghi
 
Lecture 7: Semantic Technologies and Interoperability
Lecture 7: Semantic Technologies and InteroperabilityLecture 7: Semantic Technologies and Interoperability
Lecture 7: Semantic Technologies and InteroperabilityPayamBarnaghi
 
Lecture 6: IoT Data Processing
Lecture 6: IoT Data Processing Lecture 6: IoT Data Processing
Lecture 6: IoT Data Processing PayamBarnaghi
 
Lecture 5: Software platforms and services
Lecture 5: Software platforms and services Lecture 5: Software platforms and services
Lecture 5: Software platforms and services PayamBarnaghi
 
Internet of Things for healthcare: data integration and security/privacy issu...
Internet of Things for healthcare: data integration and security/privacy issu...Internet of Things for healthcare: data integration and security/privacy issu...
Internet of Things for healthcare: data integration and security/privacy issu...PayamBarnaghi
 
Scientific and Academic Research: A Survival Guide 
Scientific and Academic Research:  A Survival Guide Scientific and Academic Research:  A Survival Guide 
Scientific and Academic Research: A Survival Guide PayamBarnaghi
 
Semantic Technolgies for the Internet of Things
Semantic Technolgies for the Internet of ThingsSemantic Technolgies for the Internet of Things
Semantic Technolgies for the Internet of ThingsPayamBarnaghi
 

Plus de PayamBarnaghi (12)

Academic Research: A Survival Guide
Academic Research: A Survival GuideAcademic Research: A Survival Guide
Academic Research: A Survival Guide
 
Reproducibility in machine learning
Reproducibility in machine learningReproducibility in machine learning
Reproducibility in machine learning
 
Search, Discovery and Analysis of Sensory Data Streams
Search, Discovery and Analysis of Sensory Data StreamsSearch, Discovery and Analysis of Sensory Data Streams
Search, Discovery and Analysis of Sensory Data Streams
 
Internet Search: the past, present and the future
Internet Search: the past, present and the futureInternet Search: the past, present and the future
Internet Search: the past, present and the future
 
Scientific and Academic Research: A Survival Guide 
Scientific and Academic Research:  A Survival Guide Scientific and Academic Research:  A Survival Guide 
Scientific and Academic Research: A Survival Guide 
 
Lecture 8: IoT System Models and Applications
Lecture 8: IoT System Models and ApplicationsLecture 8: IoT System Models and Applications
Lecture 8: IoT System Models and Applications
 
Lecture 7: Semantic Technologies and Interoperability
Lecture 7: Semantic Technologies and InteroperabilityLecture 7: Semantic Technologies and Interoperability
Lecture 7: Semantic Technologies and Interoperability
 
Lecture 6: IoT Data Processing
Lecture 6: IoT Data Processing Lecture 6: IoT Data Processing
Lecture 6: IoT Data Processing
 
Lecture 5: Software platforms and services
Lecture 5: Software platforms and services Lecture 5: Software platforms and services
Lecture 5: Software platforms and services
 
Internet of Things for healthcare: data integration and security/privacy issu...
Internet of Things for healthcare: data integration and security/privacy issu...Internet of Things for healthcare: data integration and security/privacy issu...
Internet of Things for healthcare: data integration and security/privacy issu...
 
Scientific and Academic Research: A Survival Guide 
Scientific and Academic Research:  A Survival Guide Scientific and Academic Research:  A Survival Guide 
Scientific and Academic Research: A Survival Guide 
 
Semantic Technolgies for the Internet of Things
Semantic Technolgies for the Internet of ThingsSemantic Technolgies for the Internet of Things
Semantic Technolgies for the Internet of Things
 

Dernier

1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdfQucHHunhnh
 
General AI for Medical Educators April 2024
General AI for Medical Educators April 2024General AI for Medical Educators April 2024
General AI for Medical Educators April 2024Janet Corral
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationnomboosow
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdfSoniaTolstoy
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Sapana Sha
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfJayanti Pande
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...christianmathematics
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactPECB
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfsanyamsingh5019
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphThiyagu K
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformChameera Dedduwage
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdfQucHHunhnh
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDThiyagu K
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpinRaunakKeshri1
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhikauryashika82
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxVishalSingh1417
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Celine George
 

Dernier (20)

1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
General AI for Medical Educators April 2024
General AI for Medical Educators April 2024General AI for Medical Educators April 2024
General AI for Medical Educators April 2024
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communication
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdf
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot Graph
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy Reform
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SD
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpin
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
Advance Mobile Application Development class 07
Advance Mobile Application Development class 07Advance Mobile Application Development class 07
Advance Mobile Application Development class 07
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptx
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17
 

Intelligent Data Processing for the Internet of Things

  • 1. 1 Intelligent Data Processing for the Internet of Things Payam Barnaghi Institute for Communication Systems (ICS) University of Surrey Guildford, United Kingdom International “IoT 360″ Summer School October 29th – November 1st, 2014 – Rome, Italy
  • 2. 2 Key characteristics of IoT devices −Often inexpensive sensors (actuators) equipped with a radio transceiver for various applications, typically low data rate ~ 10-250 kbps. −Deployed in large numbers −The sensors should coordinate to perform the desired task. −The acquired information (periodic or event-based) is reported back to the information processing centre (or some cases in-network processing is required) −Solutions are often application-dependent. 2
  • 3. 3 Beyond conventional sensors − Human as a sensor (citizen sensors) − e.g. tweeting real world data and/or events − Software sensors − e.g. Software agents/services generating/representing data Road block, A3 Road block, A3 Suggest a different route
  • 4. Internet of Things: The story so far P. Barnaghi, A. Sheth, “The Internet of Things: The story so far”, IEEE IoT Newsletter, September 2014.
  • 5. 5 The benefits of data processing in IoT − Turn 12 terabytes of Tweets created each day into sentiment analysis related to different events/occurrences or relate them to products and services. − Convert (billions of) smart meter readings to better predict and balance power consumption. − Analyze thousands of traffic, pollution, weather, congestion, public transport and event sensory data to provide better traffic and smart city management. − Monitor patients, elderly care and much more… − Requires: real-time, reliable, efficient (for low power and resource limited nodes), and scalable solutions. Partially adapted from: What is Bog Data?, IBM
  • 6. Not just Volume… … but also Data Dynamicity: How can we efficiently deal with: - Large amounts of (heterogeneous/distributed) data? - Both static and dynamic data? - In a re-usable, modular, flexible way? - Integrate different types of data - Provide hypothesis and create more context-aware solutions Adapted from: M. Hauswirth. A. Mileo, Insight, National University of Ireland, Galway.
  • 7. Data Volume AnyThing AnyPlace AnyTime Security, Reliability, Trust and Privacy Societal Impacts, Economic Values and Viability Services and Applications Networking and Communication
  • 8. Data is not what we want or is it?
  • 9. What we need are insights and actionable-information
  • 10. Diffusion of innovation image source: Wikipedia IoT
  • 11. Problem #1 Data: We seem to have lots of it… Real World Data: it is always difficult to get (silos, format, privacy, business interests or lack of interest!...)
  • 12. Problem #2 Data: interoperability and metadata frameworks… Real World Data: there are solutions for service based (RESTful) access, meta-data/ semantic representation frameworks (e.g. W3C SSN, HyperCat,…) but none of them are still widely adapted.
  • 13. Problem #3 Data: quality, reliability… Real World Data: data can be noisy, crowed source data can be inaccurate, contradictory, delay in accessing/processing the data…
  • 14. Problem #4 Data: having too much data and using analytics tools alone won’t solve the problem… Real World Data: in addition to the HPC issues, we need new methods/solutions that can provide real-time analysis of dynamic, variable quality and multi-modal streams…
  • 15. Problem #5 Data: abstraction, discovering the associations… Real World Data: co-occurrence vs. causation; we need hypothesis, background knowledge,… After all data is not what we are really after…
  • 16. We need more linked open data
  • 17. Sometimes it’s even better if we have: (near) real-time linked open data Streams
  • 18. or even better than that if we have: (near) real-time linked open data Streams + meta-data (semantic annotations) + Adaptable and scalable analytics tools + Sufficient background knowledge
  • 20. Current focus on Big Data − Emphasis on power of data and data mining solutions − Technology solutions to handle large volumes of data; e.g. Hadoop, NoSQL, Graph Databases, … − Trying to find patterns and trends from large volumes of data…
  • 21. Myths About Big Data − Big Data is only about massive data volume − Big Data means Hadoop − Big Data means unstructured data − If we have enough data we can draw conclusions (enough here often means massive amounts) − NoSQL means No SQL − It is about increasing computational power and taking more data and running data mining algorithms. 21 Some of the items are adapted from: Brain Gentile, http://mashable.com/2012/06/19/big-data-myths/
  • 22. What happens if we only focus on data − Number of burgers consumed per day. − Number of cats outside. − Number of people checking their facebook account. − What insight would you draw? 22
  • 23. It is also important to note what type of problems we expect to solve.
  • 24. Back to the future 24
  • 25. Future cities: a view from 1998 Source LAT Times, http://documents.latimes.com/la-2013/ 25
  • 27. 27
  • 28. 101 Smart City Use-case Scenarios http://www.ict-citypulse.eu/page/content/smart-city-use-cases-and-requirements
  • 29. 29 Data alone is not enough − Domain knowledge − Machine interpretable meta-data − Delivery, sharing and representation services − Query, discovery, aggregation services − Publish, subscribe, notification, and access interfaces/services − More open solutions for innovation and citizen participation − Efficient feedback and control mechanisms − Social network and social system analysis − In cities, interactions with people and social systems is the key.
  • 30. 30 We need an Integrated Approach
  • 32. 32 Storing, handling and processing the data Image courtesy: IEEE Spectrum
  • 33. 33 Technical Challenges − Discovery: finding appropriate device and data sources − Access: Availability and (open) access to data resources and data − Search: querying for data − Integration: dealing with heterogeneous devices, networks and data − Large-scale data mining, adaptable learning and efficient computing and processing − Interpretation: translating data to knowledge that can be used by people and applications − Scalability: dealing with large numbers of devices and a myriad of data and the computational complexity of interpreting the data.
  • 34. 34 IoT Data Access − Publish/Subscribe (long-term/short-term) − Ad-hoc query − The typical types of data request for sensory data: − Query based on − ID (resource/service) – for known resources − Location − Type − Time – requests for freshness data or historical data; − One of the above + a range [+ Unit of Measurement] − Type/Location/Time + A combination of Quality of Information attributes − An entity of interest (a feature of an entity on interest) − Complex Data Types (e.g. pollution data could be a combination of different types)
  • 35. 35 IoT Data in the Cloud Image courtesy: http://images.mathrubhumi.com http://www.anacostiaws.org/userfiles/image/Blog-Photos/river2.jpg
  • 36. Comparing IoT data streams with conventional multimedia streams Source: P. Barnaghi, W. Wang, L. Dong, C. Wang, "A Linked-data Model for Semantic Sensor Streams", in the Proc. of IEEE International Conference on Internet of Things (iThings 2013), August 2013.
  • 37. 37 Describing IoT Data: An example Time Location Type Value Link to QoI metadata UTC #GeoHash #Hash [DataType, Value] URI Ontology for common types
  • 38. 38 Observation and Measurement Value GeoHash UTC time Standard XSD data type UTC time (in Java) : The time indicated is returned represented as the distance, measured in milliseconds, of that time from the epoch (00:00:00 GMT on January 1, 1970).
  • 39. 39 GeoHashing − For example Guildford: lat: 51.235401 and long: -0.574600 can be hashed as: gcpe6zjeffgp − It can be used as: − A unique identifier − represent point data as hash string − It uses Base 32 encoding and bit interleaving − It’s used for geo-tagging (and is a symmetric technique) − Place close to each other will have similar prefix (string similarity) − Limitations: − We could have Geohash codes with no common prefix − Edge case (locations close to each other but on opposite sides of the Equator) − A meridian point (line of longitude)
  • 40. 40 GeoHash Example Sample locations on a Google Map and their equivalent geohash strings; - close locations have similar prefixes
  • 41. IoT Data Processing WSN WSN WSN WSN WSN Network-enabled Devices Network-enabled Devices Network services/storage and processing units Data/service access at application level Data collections and processing within the networks Data Discovery Service/ Resource Discovery
  • 42. 42 In-network processing − Mobile Ad-hoc Networks can be seen as a set of nodes that deliver bits from one end to the other; − WSNs, on the other end, are expected to provide information, not necessarily original bits − Gives additional options − e.g., manipulate or process the data in the network − Main example: aggregation − Applying aggregation functions to a obtain an average value of measurement data − Typical functions: minimum, maximum, average, sum, … − Not amenable functions: median Source: Protocols and Architectures for Wireless Sensor Networks, Protocols and Architectures for Wireless Sensor Networks Holger Karl, Andreas Willig, chapter 3, Wiley, 2005 .
  • 43. 43 In-network processing − Depending on application, more sophisticated processing of data can take place within the network − Example edge detection: locally exchange raw data with neighboring nodes, compute edges, only communicate edge description to far away data sinks − Example tracking/angle detection of signal source: Conceive of sensor nodes as a distributed microphone array, use it to compute the angle of a single source, only communicate this angle, not all the raw data − Exploit temporal and spatial correlation − Observed signals might vary only slowly in time; so no need to transmit all data at full rate all the time − Signals of neighboring nodes are often quite similar; only try to transmit differences (details a bit complicated, see later) Source: Protocols and Architectures for Wireless Sensor Networks, Protocols and Architectures for Wireless Sensor Networks Holger Karl, Andreas Willig, chapter 3, Wiley, 2005 .
  • 44. Data Aggregation − Computing a smaller representation of a number of data items (or messages) that is extracted from all the individual data items. − For example computing min/max or mean of sensor data. − More advance aggregation solutions could use approximation techniques to transform high-dimensionality data to lower-dimensionality abstractions/representations. − The aggregated data can be smaller in size, represent patterns/abstractions; so in multi-hop networks, nodes can receive data form other node and aggregate them before forwarding them to a sink or gateway. − Or the aggregation can happen on a sink/gateway node.
  • 45. Aggregation example − Reduce number of transmitted bits/packets by applying an aggregation function in the network 1 1 3 1 1 6 1 1 1 1 1 1 Source: Holger Karl, Andreas Willig, Protocols and Architectures for Wireless Sensor Networks, Protocols and Architectures for Wireless Sensor Networks, chapter 3, Wiley, 2005 .
  • 46. Efficacy of an aggregation mechanism − Accuracy: difference between the resulting value or representation and the original data − Some solutions can be lossless or lossly depending on the applied techniques. − Completeness: the percentage of all the data items that are included in the computation of the aggregated data. − Latency: delay time to compute and report the aggregated data − Computation foot-print; complexity; − Overhead: the main advantage of the aggregation is reducing the size of the data representation; − Aggregation functions can trade-off between accuracy, latency and overhead; − Aggregation should happen close to the source.
  • 47. Publish/Subscribe − Achieved by publish/subscribe paradigm − Idea: Entities can publish data under certain names − Entities can subscribe to updates of such named data − Conceptually: Implemented by a software bus − Software bus stores subscriptions, published data; names used as filters; subscribers notified when values of named data changes Publisher 1 Publisher 2 Software bus Subscriber 1 Subscriber 2 Subscriber 3 Source: Holger Karl, Andreas Willig, Protocols and Architectures for Wireless Sensor Networks, Protocols and Architectures for Wireless Sensor Networks, chapter 12, Wiley, 2005 .
  • 48. MQTT Pub/Sub Protocol − MQ Telemetry Transport (MQTT) is a lightweight broker-based publish/subscribe messaging protocol. − MQTT is designed to be open, simple, lightweight and easy to implement. − These characteristics make MQTT ideal for use in constrained environments, for example in IoT. −Where the network is expensive, has low bandwidth or is unreliable −When run on an embedded device with limited processor or memory resources; − A small transport overhead (the fixed-length header is just 2 bytes), and protocol exchanges minimised to reduce network traffic − MQTT was developed by Andy Stanford-Clark of IBM, and Arlen Nipper of Cirrus Link Solutions. Source: MQTT V3.1 Protocol Specification, IBM, http://public.dhe.ibm.com/software/dw/webservices/ws-mqtt/mqtt-v3r1.html
  • 49. MQTT − It supports publish/subscribe message pattern to provide one-to-many message distribution and decoupling of applications − A messaging transport that is agnostic to the content of the payload − The use of TCP/IP to provide basic network connectivity − Three qualities of service for message delivery: − "At most once", where messages are delivered according to the best efforts of the underlying TCP/IP network. Message loss or duplication can occur. − This level could be used, for example, with ambient sensor data where it does not matter if an individual reading is lost as the next one will be published soon after. − "At least once", where messages are assured to arrive but duplicates may occur. − "Exactly once", where message are assured to arrive exactly once. This level could be used, for example, with billing systems where duplicate or lost messages could lead to incorrect charges being applied. Source: MQTT V3.1 Protocol Specification, IBM, http://public.dhe.ibm.com/software/dw/webservices/ws-mqtt/mqtt-v3r1.html
  • 50. MQTT Message Format − The message header for each MQTT command message contains a fixed header. − Some messages also require a variable header and a payload. − The format for each part of the message header: — DUP: Duplicate delivery — QoS: Quality of Service — RETAIN: RETAIN flag —This flag is only used on PUBLISH messages. When a client sends a PUBLISH to a server, if the Retain flag is set (1), the server should hold on to the message after it has been delivered to the current subscribers. —This allows new subscribers to instantly receive data with the retained, or Last Known Good, value. Source: MQTT V3.1 Protocol Specification, IBM, http://public.dhe.ibm.com/software/dw/webservices/ws-mqtt/mqtt-v3r1.html
  • 51. Sensor Data as time-series data − The sensor data (or IoT data in general) can be seen as time-series data. − A sensor stream refers to a source that provide sensor data over time. − The data can be sampled/collected at a rate (can be also variable) and is sent as a series of values. − Over time, there will be a large number of data items collected. − Using time-series processing techniques can help to reduce the size of the data that is communicated; − Let’s remember, communication can consume more energy than communication;
  • 52. Sensor Data as time-series data − Different representation method that introduced for time-series data can be applied. − The goal is to reduce the dimensionality (and size) of the data, to find patterns, detect anomalies, to query similar data; − Dimensionality reduction techniques transform a data series with n items to a representation with w items where w < n. − This functions are often lossy in comparison with solutions like normal compression that preserve all the data. − One of these techniques is called Symbolic Aggregation Approximation (SAX). − SAX was originally proposed for symbolic representation of time-series data; it can be also used for symbolic representation of time-series sensor measurements. − The computational foot-print of SAX is low; so it can be also used as a an in-network processing technique.
  • 53. 53 In-network processing Using Symbolic Aggregate Approximation (SAX) SAX Pattern (blue) with word length of 20 and a vocabulary of 10 symbols over the original sensor time-series data (green) Source: P. Barnaghi, F. Ganz, C. Henson, A. Sheth, "Computing Perception from Sensor Data", in Proc. of the IEEE Sensors 2012, Oct. 2012. fggfffhfffffgjhghfff jfhiggfffhfffffgjhgi fggfffhfffffgjhghfff
  • 54. Symbolic Aggregate Approximation (SAX) − SAX transforms time-series data into symbolic string representations. − Symbolic Aggregate approXimation was proposed by Jessica Lin et al at the University of California –Riverside; − http://www.cs.ucr.edu/~eamonn/SAX.htm . − It extends Piecewise Aggregate Approximation (PAA) symbolic representation approach. − SAX algorithm is interesting for in-network processing in WSN because of its simplicity and low computational complexity. − SAX provides reasonable sensitivity and selectivity in representing the data. − The use of a symbolic representation makes it possible to use several other algorithms and techniques to process/utilise SAX representations such as hashing, pattern matching, suffix trees etc.
  • 55. Processing Steps in SAX − SAX transforms a time-series X of length n into the string of arbitrary length, where typically, using an alphabet A of size a > 2. − The SAX algorithm has two main steps: − Transforming the original time-series into a PAA representation − Converting the PAA intermediate representation into a string during. − The string representations can be used for pattern matching, distance measurements, outlier detection, etc.
  • 56. Piecewise Aggregate Approximation − In PAA, to reduce the time series from n dimensions to w dimensions, the data is divided into w equal sized “frames.” − The mean value of the data falling within a frame is calculated and a vector of these values becomes the data-reduced representation. − Before applying PAA, each time series to have a needs to be normalised to achieve a mean of zero and a standard deviation of one. − The reason is to avoid comparing time series with different offsets and amplitudes; Source: Jessica Lin, Eamonn Keogh, Stefano Lonardi, and Bill Chiu. 2003. A symbolic representation of time series, with implications for streaming algorithms. In Proceedings of the 8th ACM SIGMOD workshop on Research issues in data mining and knowledge discovery (DMKD '03). ACM, New York, NY, USA, 2-11.
  • 57. SAX- normalisation before PAA Timeseries (c): 2, 3, 4.5, 7.6, 4, 2, 2, 2, 3, 1 Mean (μ): μ= (2+3+4.5+7.6+4+2+2+2+3+1)/10= 3.11 Standard deviation (σ): (2-3.11)2 = 1.2321 (3-3.11)2 = 0.0121 (4.5-3.11)2 = 1.9321 (7.6-3.11)2 = 20.1601 (4-3.11)2 = 0.7921 (2-3.11)2 = 1.2321 (2-3.11)2 = 1.2321 (2-3.11)2 = 1.2321 (3-3.11)2 = 0.0121 (1-3.11)2 = 4.4521 1.2321+0.0121+ 1.9321+ 20.1601+ 0.7921+ 1.2321+ 1.2321+ 1.2321+ 1.2321+ 0.0121+4.4521 = 33.5211 σ = √ (33.5211/10) = 1.83087683911
  • 58. Normalisation Timeseries (c): 2, 3, 4.5, 7.6, 4, 2, 2, 2, 3, 1 Normalised: zi = (ci – μ)/ σ σ = 1.83087683911 μ = 3.11 z1 = (2- 3.11)/1.83087683911 = -0.606 z2 = (3-3.11)/ 1.83087683911= -0.600 z3 = (4.5-3.11)/ 1.83087683911= 2.452 z4 = (7.6-3.11)/ 1.83087683911= -0.600 z5 = (4-3.11)/ 1.83087683911= 0.486 z6 = (2-3.11)/ 1.83087683911= -0.606 z7 = (2-3.11)/ 1.83087683911= -0.606 z8 = (2-3.11)/ 1.83087683911= -0.606 z9 = (3-3.11)/ 1.83087683911= -0.600 z10 = (1-3.11)/ 1.83087683911= -1.152 Normalised Timeseries (z): -0.606, -0.600, 2.452, -0.600, 0.486, -0.606, -0.606, -0.606 , -0.600, -1.152
  • 59. PAA calculation Timeseries (c): 2, 3, 4.5, 7.6, 4, 2, 2, 2, 3, 1 Normalised Timeseries (z): -0.606, -0.600, 2.452, -0.600, 0.486, -0.606, -0.606, -0.606 , -0.600, -1.152 PAA (w=5): -0.603, 0.926, -0.06, -0.606, 0.273
  • 60. PAA to SAX Conversion − Conversion of the PAA representation of a time-series into SAX is based on producing symbols that correspond to the time-series features with equal probability. − The SAX developers have shown that time-series which are normalised (zero mean and standard deviation of 1) follow a Normal distribution (Gaussian distribution). − The SAX method introduces breakpoints that divides the PAA representation to equal sections and assigns an alphabet for each section. − For defining breakpoints, Normal inverse cumulative distribution function
  • 61. Breakpoints in SAX − “Breakpoints: breakpoints are a sorted list of numbers B = β 1,…, β a-1 such that the area under a N(0,1) Gaussian curve from βi to βi+1 = 1/a”. Source: Jessica Lin, Eamonn Keogh, Stefano Lonardi, and Bill Chiu. 2003. A symbolic representation of time series, with implications for streaming algorithms. In Proceedings of the 8th ACM SIGMOD workshop on Research issues in data mining and knowledge discovery (DMKD '03). ACM, New York, NY, USA, 2-11.
  • 62. Alphabet representation in SAX − Let’s assume that we will have 4 symbols alphabet: a,b,c,d − As shown in the table in the previous slide, the cut lines for this alphabet (also shown as the thin red lines on the plot below) will be { -0.67, 0, 0.67 } Source: JMOTIF Time series mining, http://code.google.com/p/jmotif/wiki/SAX
  • 63. SAX Represetantion Timeseries (c): 2, 3, 4.5, 7.6, 4, 2, 2, 2, 3, 1 Normalised Timeseries (z): -0.606, -0.600, 2.452, -0.600, 0.486, -0.606, -0.606, -0.606 , -0.600, -1.152 PAA (w=5): -0.603, 0.926, -0.06, -0.606, 0.273 Cut off ranges: {-0.67, 0, 0.67} Alphabet: a ,b ,c, d SAX representation: bdbbc
  • 64. Features of the SAX technique − SAX divides a time series data into equal segments and then creates a string representation for each segment. − The SAX patterns create the lower-level abstractions that are used to create the higher-level interpretation of the underlying data. − The string representation of the SAX mechanism enables to compare the patterns using a specific type of string similarity function.
  • 65. High-level information/ knowledge 65 A sample data processing framework Temporal data (extracted from descriptions) Day-time Night-time High-level abstractions Domain knowledge fggfffhfffffgjhghfff dddfffffffffffddd cccddddccccdddccc dddcdcdcdcddasddd aaaacccaaaaaaaacccc Raw sensor data stream Raw sensor data stream Raw sensor data stream PIR Sensor Light Sensor Temperature Sensor Attendance Phone Hot Temperature Cold Temperature Bright Office room BA0121 On going meeting Window has been left open …. Spatial data (extracted from descriptions) Thematic data (low level abstractions) Intelligent Processing Observations SAX Patterns Raw sensor data (or Annotated data) … …. Intelligent Processing/ Reasoning
  • 66. Correlation analysis Slides: Maria Bermudez-Edo
  • 67. Correlations: linear and nonlinear 67
  • 70. Challenges − Complexity − Co-occurrences vs. Causation 70
  • 71. Data Discovery Slides: Syed Amir Hoseinitabatabaei
  • 72. A discovery method in the IoT time location type Query formulating [#location || ##ttyyppee || ttiimmee]] Discovery ID Discovery/ DHT Server Data repository (archived data) #location #type #location #type Data hypercube #location #type Gateway Core network Logical Connection Network Connection Data
  • 73. An example: a discovery method in the IoT S. A. Hoseinitabatabaei, P. Barnaghi, C. Wang, R. Tafazolli, L. Dong, "A Distributed Data Discovery Mechanism for the Internet of 73 Things", 2014.
  • 74. An example: a discovery method in the IoT S. A. Hoseinitabatabaei, P. Barnaghi, C. Wang, R. Tafazolli, L. Dong, "A Distributed Data Discovery Mechanism for the Internet of 74 Things", 2014.
  • 75. Adaptive learning Slides: Daniel Puschmann, Masound Hassanpour
  • 76. Adaptable and dynamic learning methods http://kat.ee.surrey.ac.uk/
  • 77. Equilibrium in transient and non-uniform world A D B C Image source for equilibrium diagram: John D. Hey, The University of York.
  • 78. Social data analysis: A case study Slides: Pramod Anantharam, Kno.e.sis Centre, Wright State University
  • 79. Social data analysis: A case study − Are people talking about city infrastructure on twitter? − Can we extract city infrastructure related events from twitter? − How can we leverage event and location knowledge bases for event extraction? − How well can we extract city events? Source: Pramod Anantharam, Kno.e.sis Centre, Wright State University.
  • 80. Do people talk about city infrastructure on Twitter? Source: Pramod Anantharam, Kno.e.sis Centre, Wright State University. 80
  • 81. Some challenges in extracting events from Tweets − No well accepted definition of “events related to a city”. − Tweets are short (140 characters) and its informal nature make it hard to analyze − Entity, location, time, and type of the event − Multiple reports of the same event and sparse report of some events (biased sample) − Numbers don’t necessarily indicate intensity − Validation of the solution is hard due to the open domain nature of the problem Source: Pramod Anantharam, Kno.e.sis Centre, Wright State University. 81
  • 82. Event extraction techniques − N-grams + Regression − Text analysis to extract uni- and bi-grams (event markers) − Feature selection to select best possible event markers − Apply regression to predict P(Y|X) where Y is the target (rainfall) and X is the input (event marker). − Clustering − Create event clusters incrementally over time − Identify clusters of interest based on its relevance (manual inspection) − Granularity remains at the tweet/cluster level (tweets are assigned to clusters of interest) − Sequence Labeling (CRFs) − Text analysis to extract features such as named entities, POS1 tagging − Each event indicator is modeled as a mixture of event types that are latent variables − Each type corresponds to a distribution over named entities n (labels assigned to event types by manual inspection) and other features Source: Pramod Anantharam, Kno.e.sis Centre, Wright State University.
  • 83. City event extraction − Event extraction should be open domain (no a priori event types) with event metadata. − Incorporate background knowledge related to city related events e.g., 511.org hierarchy, SCRIBE ontology, city location names. − Assess the intensity of an event using content and network cues. − Robust to noise, informal nature, and variability of data. Source: Pramod Anantharam, Kno.e.sis Centre, Wright State University.
  • 84. N-grams + Regression − Open domain: works best when there is a reference corpus to extract n-grams − Event metadata: cannot distinguish between entities and hence hard to extract event metadata − Background knowledge: incorporating domain vocabulary (e.g., subsumption) is not natural − Event intensity: regression maps the event indicators to some quantified values − Robustness: quite robust if there is a reference corpus Source: Pramod Anantharam, Kno.e.sis Centre, Wright State University.
  • 85. Clustering − Open domain: works well for domains with no a priori knowledge of events (may need human inspection) − Background knowledge: incorporating domain vocabulary is not natural − Event intensity: not captured − Robustness: quite robust for twitter data with enough data for each cluster Source: Pramod Anantharam, Kno.e.sis Centre, Wright State University.
  • 86. Sequence Labeling (CRFs) − Open domain: works well for domains with no a priori knowledge of events (may need human inspection) − Event metadata: event metadata extraction is captured naturally with the named entities − Background knowledge: incorporating domain vocabulary is quite natural − Event intensity: part-of-speech tag may indirectly capture intensity − Robustness: with a deeper model for capturing context, quite robust for twitter data Source: Pramod Anantharam, Kno.e.sis Centre, Wright State University.
  • 87. City event extraction solution architecture POS Tagging Tweets from a city POS City Infrastructure Tagging Hybrid NER+ Event term extraction Hybrid NER+ Event term extraction Impact Assessment Impact Assessment Temporal Estimation Temporal Estimation Event Event OOSSMM L Looccaatitoionnss SSCCRRIBIBEE o onntotolologgyy Aggregation Aggregation GGeeoohhaasshhiningg 551111.o.orrgg h hieierraarrcchhyy City Event Annotation City Event Extraction Source: Pramod Anantharam, Kno.e.sis Centre, Wright State University.
  • 88. City event extraction - Key insights a) Space: events reported within a grid (gi ∈G where G is a set of all grids in a city)at a certain time are most likely reporting the same event b) Time: events reported within a time Δt in a grid gi are most likely to be reporting the same event c) Theme: events with similar entities within a grid gi and time Δt are most likely reporting the same event Source: Pramod Anantharam, Kno.e.sis Centre, Wright State University.
  • 89. 0.6 miles Max-lat Min-lat Min-long 37.7545166015625, -122.40966796875 Max-long 0.38 miles 37.7490234375, -122.40966796875 37.7545166015625, -122.420654296875 37.7490234375, -122.420654296875 4 37.74933, -122.4106711 Hierarchical spatial structure of geohash for representing locations with variable precision. Here the location string is 5H34 0 1 2 3 4 5 6 7 8 9 B C D E F G H I J K L 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 8 City event extraction – Geohashing
  • 90. Implmentation − City event annotation − Automated creation of training data − Annotation task (our CRF model vs. baseline CRF model) − City event extraction − Use aggregation algorithm for event extraction − Extracted events AND ground truth − Dataset (Aug – Nov 2013) ~ 8 GB of data on disk − Over 8 million tweets − Over 162 million sensor data points − 311 active events and 170 scheduled events Source: Pramod Anantharam, Kno.e.sis Centre, Wright State University.
  • 91. Evaluation – extracted events & ground truth
  • 92. 92 Conclusions − A primary goal of interconnecting devices and collecting/processing data from them is to create situation awareness and enable applications, machines, and human users to better understand their surrounding environments. − The understanding of a situation, or context, potentially enables services and applications to make intelligent decisions and to respond to the dynamics of their environments. − Dynamicity, energy efficiency, multi-modality, heterogeneity and volume are among the key challenges. − We need to design adaptable and scalable solutions that combine knowledge and data engineering (semantics), background knowledge (ontologies) and machine learning techniques.
  • 93. Acknowledgements − Some parts of the content are adapted from: − Holger Karl, Andreas Willig, Protocols and Architectures for Wireless Sensor Networks, Protocols and Architectures for Wireless Sensor Networks, chapters 3 and 12, Wiley, 2005 . − Jessica Lin, Eamonn Keogh, Stefano Lonardi, and Bill Chiu. 2003. A symbolic representation of time series, with implications for streaming algorithms. In Proceedings of the 8th ACM SIGMOD workshop on Research issues in data mining and knowledge discovery (DMKD '03). ACM, New York, NY, USA, 2-11. − JMOTIF Time series mining, http://code.google.com/p/jmotif/wiki/SAX
  • 94. Q&A − Payam Barnaghi, University of Surrey/EU FP7 CityPulse Project http://www.ict-citypulse.eu/ @pbarnaghi p.barnaghi@surrey.ac.uk

Notes de l'éditeur

  1. Publishing and upload the data into the Cloud platforms is not enough; The IoT data needs to be pre-processed and processed; the aim is to extract meaningful knowledge from the data and to use this knowledge and information for control and monitoring and/or for decision making (automated or human controlled).
  2. Classification is another method – but it is not feasible for the open domain problem we are interested in. CRFs – Conditional Random Fields Sequence Labeling Consider deeper features such as named entities, POS tags Captures context of mention of entities within a tweet Allows natural incorporation of domain knowledge
  3. Intensity vs. density =&amp;gt; tweets, page-rank analogous to importance of event, importance of events Importance of event based on location + time Contrasting density Noise Flood - people affected vs. people conversing
  4. N-grams + Regression Cannot provide event metadata extraction May need reference corpus for creating n-grams Needs good quality tweets if no reference corpus Clustering Does not capture event metadata Too coarse grained (tweet level) May not be able to identify location, time etc. Sequence Labeling (CRFs)
  5. can be captured as a latent variable conditioned on the occurrence of some entities
  6. Distance computed using the formula: dlon = lon2 - lon1 dlat = lat2 - lat1 a = (sin(dlat/2))^2 + cos(lat1) * cos(lat2) * (sin(dlon/2))^2 c = 2 * atan2( sqrt(a), sqrt(1-a) ) d = R * c (where R is the radius of the Earth) Found the box for the tweet! 37.7545166015625, -122.420654296875 37.7545166015625, -122.40966796875 37.7490234375, -122.40966796875 37.7490234375, -122.420654296875