Watch this webinar on YouTube: https://youtu.be/MwG0yhrctDs
Slides for the latest update on our Big Data Europe pilot in Societal Challenge 1: Health, Demographic Change and Wellbeing.
Last year we successfully completed the first phase of this pilot, replicating the functionality of the Open PHACTS Discovery Platform on the BDE infrastructure. The Open PHACTS Discovery Platform brings together pharmacological data resources in an integrated, interoperable infrastructure, and has been developed to reduce barriers to drug discovery for industry, academia, and small businesses.
Learn more about the progress we’ve made, and what’s coming next.
1. General overview of the Big Data Europe project and Societal Challenges it addresses (Ronald Siebes, VU Amsterdam)
2. The Big Data Europe infrastructure, generic components that are being developed, and their flexibility for different applications (Hajira Jabeen, University of Bonn)
3. Latest details of the current state of the Open PHACTS architecture in BDE, and ongoing work (Nick Lynch, CTO, Open PHACTS Foundation)
Call Girls Bareilly Just Call 9907093804 Top Class Call Girl Service Available
BDE-SC1 Webinar: OpenPHACTS Re-engineered with Big Data Europe
1. BIG DATA EUROPE
H2020 CSA (2015-17)
SC1 – HEALTH CHALLENGE WEBINAR
Integrating Big Data, Software & Communities for Addressing
Europe’s Societal ChallengesApril 4th 2017
Kiera McNeice, Ronald Siebes, Hajira Jabeen and Nick Lynch
7. SC2: Food & Agriculture
5-avr.-17www.big-data-europe.eu
Partners:
FAO, the largest autonomous agency within the
United Nations system and one of the main
players in the agricultural information
community.
Big Data Focus area: Large-scale distributed agricultural data integration
Selected Key Data assets: INFOODS, AQUASTAT Green Learning Network (GLN), Agricultural
Bibliography Network (ABN), AgroVoc, AquaMaps, Fishbase
Semantic Web Company (SWC) is a technology provider headquartered in
Vienna (Austria). SWC supports organizations from all industrial sectors
worldwide to improve their information management. Their core product is to
extract meaning from big data by making use of linked data technologies.
Agroknow is a company that captures, organizes and adds value to the
rich information available in agricultural and food sciences, in order to
make it universally accessible, useful and meaningful.
8. SC2: Food & Agriculture
5-avr.-17www.big-data-europe.eu
Pilot focus area:
Viticulture
(from the Latin word for vine)
is the science, production,
and study of grapes.
It deals with the series of
events that occur in the vineyard.
9. SC2: Food & Agriculture
5-avr.-17www.big-data-europe.eu
Pilot 2: Support advanced crop
data discovery, processing,
combining and visualization from
distributed and heterogeneous
data repositories
Vine and Wine sector: emerging market in EU
Sustainability and biodiversity challenges:
local varieties are being lost
Exploitation of new grapevine varieties and
clones in terms of climate change adaptation
Quality and health status of viticultural
products
Contribution to human health (antioxidants,
prevention of heart diseases etc.)
Wide variety of heterogeneous (and big)
data from various information sources
Reasons:
11. SC3: Energy
5-avr.-17www.big-data-europe.eu
Partners:
A public entity supervised by the Ministry of Environment,
Energy and Climate Change in Greece, founded in
September 1987, active in the fields of Renewable
Energy Sources (RES), Rational Use of Energy (RUE) and
Energy Saving (ES).
Big Data Focus area: Real-time turbine monitoring stream processing and analytics
Selected Key Data assets: European Energy Exchange Data, smart meter sensor data,
gas/fuels market/price data, consumption statistics, stratigraphic model data (geology,
geophysics)
NCSR "Demokritos", the largest multidisciplinary research
centre of Greece hosts significant scientific research,
technological development and educational activities,
coordinated by eight Institutes.
13. SC3: Energy
5-avr.-17www.big-data-europe.eu
Pilot 3: Operation, maintenance
and production forecasting for
wind turbines on real-time sensor
data.
Current technology is not able to deal with
full amount of available valuable data
Economic benefit of predicting output and
prevention of damage (if one can predict one
part about to fail it can be prevented that other
parts get damaged)
Large continuous stream of sensor data,
perfect to test our platform
Reasons:
15. SC4: Transport
5-avr.-17www.big-data-europe.eu
Partners: The Fraunhofer Society is a German research organization with 67
institutes spread throughout Germany, each focusing on different
fields of applied science.
Big Data Focus area: Streaming sensor network & geo-spatial data integration
Selected Key Data assets: GTFS data, OSM/LinkedGeoData, MobilityMaps, Transport
sensor data, ROSATTE Road safety attributes, European Road Data Infrastructure -
EuroRoadS
The Centre for Research and Technology-Hellas (CERTH)
founded in 2000 is one of the leading research
centres in Greece. CERTH includes the Hellenic Institute of
Transport (HIT): Land, Sea and Air Transportation as well
as Sustainable Mobility services
ERTICO - ITS Europe is a partnership of around 100 companies
and institutions involved in the production of Intelligent Transport
Systems (ITS).
17. SC4: Transport
5-avr.-17www.big-data-europe.eu
Pilot 4: Multisource data collection
for the provision of accurate info-
mobility and advanced transport
planning service in Thessaloniki,
Greece
Congestion is a major problem in Europe,
especially in urban areas.
utilizing real-time probe data for the
provision of accurate info-mobility services and
advanced transport planning, leads to better
decisions
The use of mobility data coming from multiple
sources presents significant challenges,
especially due to the different nature of the
datasets both in content and spatio-temporal
terms as well as due to the fact that the data
should be collected and processed in real time.
Reasons:
19. SC5: Climate
5-avr.-17www.big-data-europe.eu
Partners:
A public entity supervised by the Ministry of Environment,
Energy and Climate Change in Greece, founded in
September 1987, active in the fields of Renewable
Energy Sources (RES), Rational Use of Energy (RUE) and
Energy Saving (ES).
Big Data Focus area: Enormous simulation time. Extremely complicated computing model.
Selected Key Data assets: European Grid Infrastructure (EGI). Access to several data centres
hosted at CNRS-Lyon, NCSR-D Athens, INFN-Milan, NIKhEF-Amsterdam.
NCSR "Demokritos", the largest multidisciplinary research
centre of Greece hosts significant scientific research,
technological development and educational activities,
coordinated by eight Institutes.
21. SC5: Climate
5-avr.-17www.big-data-europe.eu
Pilot 5: Downscaling, and retrieval
process on (raw) climate data via
User-defined parameters (e.g.
geographical areas, time period,
physical variables, computational
grids, time steps)
The provision of Climate model data satisfies
an important objective, that of assessing the
potential impacts of climate change on well
being for adaptation, prevention and mitigation
measures and supporting other policy making
decisions.
The awareness led to the availability of huge
datasets
Downscaling is a computational intensive
process
Reasons:
23. SC6: Social Sciences
5-avr.-17www.big-data-europe.eu
Partners:
CESSDA provides large scale, integrated and sustainable
data services to the social sciences. CESSDA is organised
as a limited company under Norwegian law owned and
financed by the individual EU member states’ ministry of
research or a delegated institution.
Big Data Focus area: Statistical and research data linking & integration
Selected Key Data assets: Federated social sciences data catalogs, statistical data from public
data portals and statistical offices (e.g. EuroStats, UNESCO, WorldBank)
NCSR "Demokritos", the largest multidisciplinary research
centre of Greece hosts significant scientific research,
technological development and educational activities,
coordinated by eight Institutes.
25. SC6: Social Sciences
5-avr.-17www.big-data-europe.eu
Pilot 6: Citizens budget
in municipal level
Budget: the most important document of
public policy
Budget execution affects everyday lives
Citizens are more involved in city level
Having a platform that integrates
heterogeneous budget data (many municipality
have their own data formats) and calculates
infographics would benefit the citizens, the
research community and policy makers
Reasons:
27. SC7: Security
5-avr.-17www.big-data-europe.eu
Partners:
The Centre supports the decision making of the European
Union in the field of the Common Foreign and Security
Policy (CFSP), by providing products and services
resulting from the exploitation of relevant space assets
and collateral data, including satellite imagery and
aerial imagery, and related services.
NCSR "Demokritos", the largest multidisciplinary research
centre of Greece hosts significant scientific research,
technological development and educational activities,
coordinated by eight Institutes.
28. SC7: Security
5-avr.-17www.big-data-europe.eu
Big Data Focus area: Image data analysis
Selected Key Data assets: Earth Observation data (e.g. Very High Resolution Satellite
Imagery acquired from commercial providers and governmental systems) and collateral data
for supporting CFSP/CSDP missions and operations
30. SC7: Security
5-avr.-17www.big-data-europe.eu
Pilot 7: Ingestion of remote
sensing images and social
sensing data to detect and verify
man-made changes on the Earth
surface for security applications
Evacuation route planning
Monitoring of critical infrastructures
Border security
Satellite image data is HUGE and
computational intensive to compare
Smart ‘focus’ algorithms are needed to
prioritize the analysis jobs
Reasons:
31. Big Data Europe Integrator Platform
Dr Hajira Jabeen, University of Bonn
SC1 Webinar
32. Platform Goals
◎Opensource
◎Simple to get started with Big Data
◎Support a variety of use cases
◎Embrace emerging Big Data technologies
◎Simple integration with custom components
46. ◎ High level picture
o docker-compose.yml describes pipeline topology
◎ BDE provided components
o extend template image with your code
◎ New components
o build a Docker image for your component
o this is your own little Virtual Machine for your component
◎ Sharing
o publish topology as git repository
o publish new components on docker hub
Platform development
48. Platform installation
◎Manual installation guide
◎Using Docker Machine
o On local machine (VirtualBox)
o In cloud (AWS, DigitalOcean, Azure)
o Bare metal
◎Screencasts
18
49. Development
◎Base Docker images
o Serve as a template for a (Big Data) technology
o Easily extendable custom algorithm/data
◎Published components
o Image repositories on GitHub
o Automated builds on DockerHub
o Documentation on BDE Wiki
19
50. Deploying a Big Data Stack
◎ Stack
o collection of communicating components
o to solve a specific problem
◎ Described in Docker Compose
o Component configuration
o Application topology
20
51. Enhancing the Component
◎ Orchestrator required for initialization process
(init_daemon)
o Components may depend on each other
o Components may require manual intervention
◎ User Interface Integration
o Standard Interfaces from components
o Combine and align the interfaces
21
52. User Interfaces
◎Target: Facilitate use of the platform
o User Interface Adaption
◎Available interfaces
o Workflow UIs
❖ Workflow Builder
❖ Workflow Monitor
o Swarm UI
o Integrator UI
22
58. Beyond the state of the art ...
Smart Big Data
Increase the value of Big Data
by adding meaning to it!
28
59. Semantic Data Lake (Ontario)
◎Data Swamp
o Repository of data in its raw format
o Structured, semi-structured, unstructured
o Schema-less
◎Data Lake
o Add a Semantic layer on top of the source
datasets
o The data is semantically lifted using existing
29
64. BDE vs Hadoop distributions
Hortonworks Cloudera MapR Bigtop BDE
File System HDFS HDFS NFS HDFS HDFS
Installation Native Native Native Native lightweight
virtualization
Plug & play components (no
rigid schema)
no no no no yes
High Availability Single failure
recovery (yarn)
Single failure
recovery (yarn)
Self healing, mult.
failure rec.
Single failure
recovery (yarn)
Multiple Failure
recovery
Cost Commercial Commercial Commercial Free Free
Scaling Freemium Freemium Freemium Free Free
Addition of custom
components
Not easy No No No Yes
Integration testing yes yes yes yes --
Operating systems Linux Linux Linux Linux All
Management tool Ambari Cloudera manager MapR Control
system
- Docker swarm UI+
Custom
34
65. BDE vs Hadoop distributions
◎BDE is not built on top of existing distributions
◎Targets
o Communities
o Research institutions
◎Bridges scientists and open data
◎Multi Tier research efforts towards Smart
Data
35
66. Stian Soiland-Reyes, University of Manchester
Nick Lynch, CTO Open PHACTS Foundation
4 Apr 2017
Stian Soiland-Reyes, University of Manchester
Nick Lynch, CTO Open PHACTS Foundation
4 Apr 2017
67.
68. Summary
3
• Update on Docker and Open PHACTS
• Learnings & transition to AWS
• Next Steps & Future Releases
89. Open PHACTS Next Steps
34
• Data Refresh planned API 2.2:
–Phase 1: ChEMBL, WikiPathways, Uniprot + Chemistry
Refreshed (RDF and linksets)
–Phases 2 & 3: Remaining data sources
–Build data refresh processes
• Wider Architecture Review
• Science and Open PHACTS Webinar
–Science and Open PHACTS: Workflow tools for Life
Science Research
–https://register.gotowebinar.com/register/255035938
3420450817
90. Open PHACTS
35
• Custom Data Staging:
–Different licensing options to cover Annotated
SureChEMBL for members/non members
• MicroServices?
–Part of Architecture review to discuss future
services/API
–Interested in experiences of this
• Workflow
–BioExcel Workflow blocks in development
–See Bio.tools