Search, Discovery and Analysis of Sensory Data Streams
1. Search, Discovery and Analysis
of Sensory Data Streams
1
Payam Barnaghi
Centre for Vision, Speech and Signal Processing (CVSSP), University of
Surrey
Care Technology & Research Centre, The UK Dementia Research
Institute (DRI)
SAW2019: 1st International Workshop on Sensors and Actuators on
the Web
2. 46 years ago on the 5th of November (submission
day)
2
Source: https://www.cs.princeton.edu/courses/archive/fall06/cos561/papers/cerf74.pdf
• A 32 bit IP address was
used of which the first 8
bits signified the network
and the remaining 24 bits
designated the host on
that network.
• The assumption was that
256 networks would be
sufficient for the
foreseeable future…
• Obviously this was before
LANs (Ethernet was
under development at
Xerox PARC at that time).
5. And there came Google!
55
Google says that the web has now over 30
trillion unique individual pages. It is
probably not even that relevant anymore;
lots of resources are dynamic…
17. Sensor Data Flow on the Web
17
P. Barnaghi, A. Sheth, “On Searching the Internet of Things: Requirements and Challenges”, IEEE Intelligent Systems, 2016.
28. The Crawling Challenge
− Uniform policy: re-visiting all pages in the collection with
the same frequency, regardless of their rates of change.
− Proportional policy: re-visiting more often the pages that
change more frequently. The visiting frequency is directly
proportional to the (estimated) change frequency.
28
Cho, Junghoo; Garcia-Molina, Hector (2003). "Effective page refresh policies for Web
crawlers". ACM Transactions on Database Systems. 28 (4): 390–426.
29. Web Crawling
− Cho and Garcia-Molina proved the surprising result that,
in terms of average freshness, the uniform policy
outperforms the proportional policy in both a simulated
Web and a real Web crawl.
− Allocating too many new crawls to rapidly changing
pages at the expense of less frequently updating pages.
− A proportional policy allocates more resources to
crawling frequently updating pages, but experiences less
overall freshness time from them.
29
Source: Wikipedia
30. Crawling and the Freshness Issue
− To improve freshness, the crawler should penalise the
elements that change too often.
− The optimal re-visiting policy is neither the uniform policy
nor the proportional policy.
− The optimal method for keeping average freshness high
includes ignoring the pages that change too often, and
the optimal for keeping average age low is to use access
frequencies that monotonically (and sub-linearly)
increase with the rate of change of each page.
30
Junghoo Cho; Hector Garcia-Molina (2003). "Estimating frequency of change". ACM
Transactions on Internet Technology. 3 (3): 256–290.
Source: Wikipedia
33. But the data is often multidimensional and
multivariate
33Credit: Shirin Enshaeifar, CR&T Centre, UK Dementia Research Institute/CVSSP, Uni of Surrey
34. Creating patterns from streaming data
34(Gonzalez-Vidal, Barnaghi, Skarmeta, IEEE TKDE, 2018)
39. Some of the Research Challenges
− Provenance monitoring and fact checking algorithms
and tools
− Dealing with noisy, incomplete and dynamic data.
− Handling and processing large data streams, search and
identification of patterns.
− Crawling, search and query of changing data
− Multi-modal information analysis and continual and
adaptive learning algorithms
− Security, privacy, trust and accessibility
− Solutions to keep (and make) the Web a safe, open,
inclusive and collaborative environment.
39
43. How stable are the models that you learn from
your data?
43
Credits: Roonak Rezvani, CR&T Centre, UK Dementia Research Institute/CVSSP, Uni of Surrey
44. Dynamicity and machine learning issue
44
Noise and missing data Pattern and change representation
Continual and adaptive learning Network and Causation analysis
47. References
− S. Enshaeifar et. al, "Health management and pattern analysis of daily living activities
of people with Dementia using in-home sensors and machine learning techniques",
PLoS ONE 13(5): e0195605, 2018.
− A. González Vidal, P. Barnaghi, A. F. Skarmeta, "BEATS: Blocks of Eigenvalues
Algorithm for Time series Segmentation", IEEE Transactions on Knowledge and Data
Engineering (TKDE), 2018.
− Y. Fathy, P. Barnaghi, R. Tafazolli, "An Online Adaptive Algorithm for Change
Detection in Streaming Sensory Data", IEEE Systems Journal, 2018.
− Y. Fathy, P. Barnaghi, R. Tafazolli, "Large-Scale Indexing, Discovery and Ranking for
the Internet of Things (IoT)", ACM Computing Surveys, 2017.
− S. A. Hosieni Tabatabaei, Y. Fathy, P. Barnaghi, C. Wang, R. Tafazolli, "A Novel
Indexing Method for Scalable IoT Source Lookup", IEEE Internet of Things Journal,
2018.
− Y. Fathy, P. Barnaghi, R. Tafazolli, "Distributed Spatial Indexing for the Internet of
Things Data Management", Proc. of IFIP/IEEE International Symposium on
Integrated Network Management, Lisbon, Portugal, May 2017.
47