The presentation was given at the SOCM'16 workshop at the WWW16 conference. It corresponds to the research study titled "Observlets: Empowering Analytical Observations on Web Observatory".
2. • Understanding Web Observatory
– Resources and End-users
• Issues for developing data analytic applications
• Defining Observlets
• Observlets on Web Observatory
• A use-case of Observlets
12-11-2016 SOCM Workshop 2016 2
Outline
3. Web Observatory
• A global catalogue for sharing distributed datasets and analytic
applications
• Web observatory node includes applications of computational social
science, models of evolution of social machines and big data analytics
• Datasets on a web observatory may include quantitative or qualitative
data, real-time data, multimedia content, open-data, archives, and e-
Science resources.
• It aims to support understanding of web evolution through
observation and experimentation + support user-engagement with
analytic resources
12-11-2016 SOCM Workshop 2016 3
4. Web Observatory: Resources
12-11-2016 SOCM Workshop 2016 4
WO Portal
WO Datastores WO Apps
WO Portal
WO Datastores WO Apps
WO Portal
WO Datastores WO Apps
Links to other observatories
EPrints repository, harvested news articles,
patient records
Harvesters, visualizations, analytic applications
5. Web Observatory: Users
12-11-2016 SOCM Workshop 2016 5
Healthcare
Experts
Meteorologists
Computer
Scientists
• End-users on a web observatory
include individuals, public and
private organizations agencies
• Domain experts with limited
technical skills. E.g. social
scientists, medical experts
• Technical experts including
computer scientists and web
scientists
6. The Gap
• Data processing on the web observatory is challenging -
– Data generated from diverse sources in a variety of formats
– Data is owned and shared among different administrative domains
– Data may need to be filtered based on temporal and spatial dimensions
– Complex statistical aggregations are required to study the datasets
• Domain experts are limited in their technical skills and fail to understand
possible data transformations
• Technical users duplicate efforts to build similar analysis for different
datasets hindering building of richer and insightful applications
• Need to enable the users develop and re-use analytic applications
12-11-2016 SOCM Workshop 2016 6
7. • Formal design patterns for data transformations on the web
observatory
• Provide abstract definitions for intermediate steps of data
analysis
• Support re-use of analytic applications and avoid rebuilding
applications from scratch
• Support share application modules and aggregations
12-11-2016 SOCM Workshop 2016 7
Observlets
9. Observlets (2): Data Harmonization
12-11-2016 SOCM Workshop 2016 9
Mongodb MySQL Excel
Data Harmonization
Application
Registered “Asthma” datasets
Data analytic application – Asthma conditions
in a given geographical area
Output format: Relational
Input + Metadata
10. Observlets (3): Spatio-Temporal Filters
12-11-2016 SOCM Workshop 2016 10
India
Floods
Spatio-temporal filters
Application
Registered datasets about “floods”
in India
Data analytic application – compares disaster response
and analyses micro-climate for floods in different states
of India during 2014-15
Subset of original dataset
Query within time window (OR|AND) location attributes
11. 12-11-2016 SOCM Workshop 2016 11
Observlets (4): Aggregation
Aggregation Observlet
Application
Registered datasets about “income and education”
of people Delhi
Analyze income trends w.r.t education statistics of people of
“Delhi”
Apply selected aggregation for analyses
Schematic definitions of statistical formulae
and pseudo-code
12. 12-11-2016 SOCM Workshop 2016 12
Observlets (5): Visualization
Visualization Observlet
Visualization
Application
Schematic definitions, pseudo-code of
visualizations
Dataset/Aggregated data
14. References
[1] W3c community group for web observatory. www.w3.org/community/webobservatory. Accessed: 2015-11-
26.
[2] Web observatory schema. https: //www.w3.org/wiki/WebSchemas/WebObsSchema. Accessed: 2015-11-26.
[3] Web observatory, university of southampton. http://web-001.ecs.soton.ac.uk/. Accessed: 2015-12-11.
[4] I. C. Brown, W. Hall, and L. Harris. Towards a taxonomy for web observatories. In Proceedings of the 23rd
International Conference on World Wide Web Companion, WWW Companion '14, pages 1067{1072, Republic
and Canton of Geneva, Switzerland, 2014. International World Wide Web Conferences Steering Committee.
[5] J. O. Coplien. Software design patterns: Common questions and answers. The Patterns Handbook:
Techniques, Strategies, and Applications. Cambridge University Press, NY, pages 311{320, 1998.
[6] B. M. Frischmann. Infrastructure: The social value of shared resources. Oxford University Press, 2012.
[7] W. Hall and T. Tiropanis. Web evolution and web science. Computer Networks, 56(18):3859{3865, 2012.
[8] J. Heer and M. Agrawala. Software design patterns for information visualization. IEEE Transactions on
Visualization and Computer Graphics, 12(5):853-860, September 2006.
[9] V. Hristidis, S.-C. Chen, T. Li, S. Luis, and Y. Deng. Survey of data management and analysis in disaster
situations. J. Syst. Softw., 83(10):1701-1714, Oct. 2010.
[10] I. O. Popov, M. M. C. Schraefel, G. Correndo, W. Hall, and N. Shadbolt. Interacting with the web of data
through a web of inter-connected lenses. In WWW2012 Workshop on Linked Data on the Web, Lyon, France, 16
April, 2012.
[11] C. Pu and M. Kitsuregawa. Big data and disaster management: a report from the JST-NSF joint workshop.
Georgia Institute of Technology, CERCS, 2013.
[12] T. Tiropanis, W. Hall, N. Shadbolt, D. De Roure, N. Contractor, and J. Hendler. The web science
observatory. IEEE Intelligent Systems, (2), pp100-104, 2013.
[13] T. Tiropanis, X. Wang, R. Tinati, and W. Hall. Building a connected web observatory: architecture and
challenges. 2014.
12-11-2016 SOCM Workshop 2016 14
Notes de l'éditeur
- Users on the Web generate value and realize benefits through various applications, consuming and generating content, and engaging in various socio-economic relations with other users
- Various social media platforms (Facebook, Twitter), open encyclopaedias (Wikipedia), forums (Stack-overflow, Quora) generate enormous volume of data about end-user activities
Various governments are increasingly publishing their data on the web
Web observatory catalogues these and more datasets
It is a global catalogue for sharing datasets and analytic applications across geographically distributed locations
Various applications, datasets and users engage with the web observatory
- A major goal of the web observatory is to support users through these datasets and applications particularly applications which are closer to the understanding of the end-users
A web observatory WO Portal hosts datasets in form of repositories such as EPrints, HBase. It also contains data which are propreitary to the individuals or organizations or open datasets available through the web. On the other hand the applications comprise of harvesters, visualizations and analytic applications. These resources (data and applications) are interconnected and may use the applications and datasets on other web observatory nodes.
- Large amount of data is generated on the web which belongs to a number of disciplines. Along with the web scientists and computer scientists several domain experts wish to analyze data on the web for complex analyses.
- The image of web observatory is clickable here.
Each web observatory node has a observlet inventory. The inventory catalogues observlets imported from other web observatory nodes, and those contributed by users registered at a web observatory. Each observlet is uniquely identifiable by its URI. The observlets can be registered at any web observatory node and can be discovered at other nodes through APIs.
- We will talk about these observlets which are a conceptual layer between applications and datasets in the following slides
- The data harmonizer observlet harmonizes the data from one format to another as required by an application. We aim to first test the datasets in MongoDB, RDF and SQL formats.
For eg. There may be Asthma datsets in a number of formats for instance mongodb, mysql and tabular format.
But the application which correlates asthma conditions with a geographical region may take only relational input. The data harmonizer design pattern provides psuedo-code for the converting tabular and no-sql data to relational format using the meta-data of the input dataset
The data on the web is usually time-stamped and geo-marked. During a given analysis a user may not need all the data he or she may just wish to analyse the data about say floods for the disaster response during a given period. The spatial and temporal observlets enable the users to streamline the relevant data by querying a subset of the data based on temporal and spatial parameter values.
- Defining and coding statistical formulae for building data analytics is often complex for domain experts and need support for writing the relevant code. Therefore, the aggregation observlets allow the users to select the aggregation to be applied and provides pseudo-code for the same. A user can build new aggregation by combining the existing formulae.
- For example, here to analyse the education vs income statistics a user may wish to correlate and understand standard deviation w.r.t increase/decrease in rate and level of education. He may define the various measures using the aggregation observlet.
Visualizations and their features such as the ability to zoom-in on an interesting pattern in the data are important for large scale analytics. The visualization observlet allows the users to combine existing visualization libraries and analysis by a user to provide in-depth understanding of a dataset.
Web observatory can bring together diverse group of researchers to collaborate for research in urban and natural disasters to help society respond to these events.
As seen in the figure, we have web observatory nodes, one located in UK and one in India. These catalogue datasets about “floods" from the respective regions and observlets for data aggregation and visualization.
A user at either observatory may perform complex analyses by importing observlets from other observatory, defining his or her own observlets and adding it to the observlet inventory
The observlets may be defined such as “meta-mongo” which allows a user to convert any dataset into mongodb equivalent or say “anova” which allows a user to define anova statistic for different datsets
These observlets are a basic set of observlets in our view and users can define/collaborate/share their own observlets to support application development.
In the future we would like to extend the definitions of observlets for data processing life-cycle to enable users visualize risk and complex data transformations associated with a dataset on the web observatory.