Ce diaporama a bien été signalé.
Le téléchargement de votre SlideShare est en cours. ×

SC4 Workshop 2: Soren Auer BDE project Overview

Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Chargement dans…3
×

Consultez-les par la suite

1 sur 35 Publicité

Plus De Contenu Connexe

Diaporamas pour vous (20)

Similaire à SC4 Workshop 2: Soren Auer BDE project Overview (20)

Publicité

Plus par BigData_Europe (20)

Plus récents (20)

Publicité

SC4 Workshop 2: Soren Auer BDE project Overview

  1. 1. BIG DATA EUROPE H2020 CSA (2015-17) 22.9.2016 Integrating Big Data, Software & Communities for Addressing Europe’s Societal ChallengesSC4 Workshop
  2. 2. Big Data in Marketing 10-oct.-16www.big-data-europe.eu
  3. 3. Big Data in Intelligence 10-oct.-16www.big-data-europe.eu
  4. 4. Big Data Europe (CSA: 2015-17)  Show societal value of Big Data: 7 Domains  Lower barrier for using big data technologies o Required effort and resources o Limited data science skills  Help establishing cross- lingual/organizational/domain Data Value Chains 10-oct.-16
  5. 5. Consortium
  6. 6. Stakeholder Engagement & Activities M12M6 M18 M24 M30
  7. 7. ARCHITECTURE & COMPONENTS Integrating Big Data, Software & Communities for Addressing Europe’s Societal Challenges
  8. 8. The three Big Data „V“ Variety is often neglected Quelle: Gesellschaft für Informatik
  9. 9. Adding a Semantic Layer to Data Lakes Manufacturing Marketing Sales SupportAccounting Semantic Data Lake • central place for model, schema and data historization • Combination of Scale Out (cost reduction) and semantics (increased control & flexibility) • grows incrementally (pay-as-you-go) Inbound Data Sources Outbound and Consumption Inbound Raw Data Store Data Lake (order of magnitude cheaper scalable data store) Knowledge Graph for Relationship Definition and Meta Data Frontend to Access Relationship and KPI Definition / Documentation Frontend to Access (ad hoc) Reports Outbound Data Delivery to Target Systems JSON-LD CSVW R2RMLXML2RDF © eccenca.com https://www.eccenca.com/en/products-corporate-memory.html
  10. 10. Current State of Platform Architecture
  11. 11. BDE SANSA Stack  Distributed Machine Learning (ML) algorithms that work out of the box on RDF data and make use of its structure / semantics  Examples: o Tensor Factorization for e.g. KB completion o Spatiotemporal analytics o Anomaly prediction o Clustering o Association rules o Decision trees 10-oct.-16www.big-data-europe.eu
  12. 12. Platform components Search/indexing Data processing Apache Solr Apache Spark Data acquisition Apache Flink Apache Flume Semantic Components Message passing Strabon Apache Kafka Sextant Data storage GeoTriples Hue Silk Apache Cassandra SEMAGROW ScyllaDB LIMES Apache Hive 4Store Postgis OpenLink Virtuoso 10-oct.-16www.big-data-europe.eu
  13. 13. BDE vs Hadoop distributions Hortonworks Cloudera MapR Bigtop BDE File System HDFS HDFS NFS HDFS HDFS Installation Native Native Native Native lightweight virtualization Plug & play components (no rigid schema) no no no no yes High Availability Single failure recovery (yarn) Single failure recovery (yarn) Self healing, mult. failure rec. Single failure recovery (yarn) Multiple Failure recovery Cost Commercial Commercial Commercial Free Free Scaling Freemium Freemium Freemium Free Free Addition of custom components Not easy No No No Yes Integration testing yes yes yes yes -- Operating systems Linux Linux Linux Linux All Management tool Ambari Cloudera manager MapR Control system - Docker swarm UI+ Custom 10-oct.-16www.big-data-europe.eu
  14. 14. 7 SC-PILOT INSTANTIATIONS Integrating Big Data, Software & Communities for Addressing Europe’s Societal Challenges
  15. 15. Pilots: Overview  SC1: Health & Pharm.  SC2: Food & Agr.  SC3: Energy  SC4: Transport 10-oct.-16www.big-data-europe.eu  SC5: Climate  SC6: Social Sciences  SC7: Security
  16. 16. SC1: Life Sciences & Health 10-oct.-16www.big-data-europe.eu SC1: Life Sciences & Health
  17. 17. SC1: Life Sciences & Health 10-oct.-16www.big-data-europe.eu Big Data Focus area: Large-scale heterogeneous pharma-research data linking & integration Selected Key Data assets: ACD Labs / ChemSpider, ChEBI, ChEMBL, ConceptWiki, DrugBank, ENZYME, Gene Ontology, GO Annotation, SwissProt, WikiPathways
  18. 18. SC1: Life Sciences & Health 10-oct.-16www.big-data-europe.eu Pilot 1: Replicate Open PHACTS functionality on the BDE infrastructure using Open Source solutions Reasons: • Deployment possible in-house • Apply to other domains (e.g. Agriculture) • Using extra BDE functionalities (e.g. logging, analysis)
  19. 19. SC2: Food & Agriculture 10-oct.-16www.big-data-europe.eu SC2: Food & Agriculture
  20. 20. SC2: Food & Agriculture 10-oct.-16www.big-data-europe.eu AGINFRA Big Data Focus area: Large-scale distributed agricultural data integration Selected Key Data assets: INFOODS, AQUASTAT Green Learning Network (GLN), Agricultural Bibliography Network (ABN), AgroVoc, AquaMaps, Fishbase
  21. 21. SC2: Food & Agriculture 10-oct.-16www.big-data-europe.eu Pilot focus area: Viticulture (from the Latin word for vine) is the science, production, and study of grapes. It deals with the series of events that occur in the vineyard.
  22. 22. SC3: Energy 10-oct.-16www.big-data-europe.eu SC3: Energy
  23. 23. SC3: Energy 10-oct.-16www.big-data-europe.eu Pilot focus area: System monitoring in energy production units. Big Data Focus area: Real-time turbine monitoring stream processing and analytics Selected Key Data assets: European Energy Exchange Data, smart meter sensor data, gas/fuels market/price data, consumption statistics, stratigraphic model data (geology, geophysics)
  24. 24. SC4: Transport 10-oct.-16www.big-data-europe.eu The Fraunhofer Society is a German research organization with 67 institutes spread throughout Germany, each focusing on different fields of applied science. The Centre for Research and Technology-Hellas (CERTH) founded in 2000 is one of the leading research centres in Greece. CERTH includes the Hellenic Institute of Transport (HIT): Land, Sea and Air Transportation as well as Sustainable Mobility services ERTICO - ITS Europe is a partnership of around 100 companies and institutions involved in the production of Intelligent Transport Systems (ITS). IAIS
  25. 25. SC4 Pilot Focus Area 10-oct.-16www.big-data-europe.eu Info mobility based on Mobility Pattern IdentificationPilot 4: Multisource data collection for the provision of accurate info-mobility and advanced transport planning service in Thessaloniki, Greece
  26. 26. SC4: Twitter data in Thessaloniki
  27. 27. SC4: Floating Car Data www.big-data-europe.eu Real time traffic conditions information based on a combination of traffic modeling and real time measurements (traffic flow and speed) >1.200 vehicles (one taxi fleet) • Circulating 16-24 hours/day • Pulse each 100m or 10s • 500-2.500 pulses /minute Speeds along a 2km stretch
  28. 28. SC5: Climate 10-oct.-16www.big-data-europe.eu SC5: Climate
  29. 29. SC5: Climate 10-oct.-16www.big-data-europe.eu Pilot focus area: Supporting data-intensive climate research Big Data Focus area: Enormous simulation time. Extremely complicated computing model. Selected Key Data assets: European Grid Infrastructure (EGI). Access to several data centres hosted at CNRS-Lyon, NCSR-D Athens, INFN-Milan, NIKhEF-Amsterdam.
  30. 30. SC6: Social Sciences 10-oct.-16www.big-data-europe.eu SC6: Social Sciences
  31. 31. SC6: Social Sciences 10-oct.-16www.big-data-europe.eu Pilot focus area: Citizens budget spending on municipal level Big Data Focus area: Statistical and research data linking & integration Selected Key Data assets: Federated social sciences data catalogs, statistical data from public data portals and statistical offices (e.g. EuroStats, UNESCO,
  32. 32. SC7: Security 10-oct.-16www.big-data-europe.eu SC7: Security
  33. 33. SC7: Security 10-oct.-16www.big-data-europe.eu Pilot focus area: Getting insight in man-made surface changes triggered by automatic detection, news, or social media information Big Data Focus area: Image data analysis Selected Key Data assets: Earth Observation data (e.g. Very High Resolution Satellite Imagery acquired from commercial providers and governmental systems) and collateral data for supporting CFSP/CSDP missions and operations
  34. 34. SC7: Security 10-oct.-16www.big-data-europe.eu Pilot 7: Ingestion of remote sensing images and social sensing data to detect and verify man-made changes on the Earth surface for security applications Evacuation route planning Monitoring of critical infrastructures Border security Satellite image data is HUGE and computational intensive to compare Smart ‘focus’ algorithms are needed to prioritize the analysis jobs Reasons:
  35. 35. WEB: www.big-data-europe.eu EMAIL: info@big-data-europe.eu PROJECT COORDINATION Prof. Sören Auer, auer © cs.uni-bonn · de (Fraunhofer IAIS) > Dr. Simon Scerri (Deputy), scerri © cs.uni-bonn · de (Fraunhofer IAIS) EIS Department/Group, Fraunhofer IAIS & CS Department Uni-Bonn, Bonn, Germany Fraunhofer IAIS: Leads Fraunhofer Big Data Alliance Questions & Contacts www.big-data-europe.eu 10-oct.-16 #BigDataEurope

Notes de l'éditeur

  • Project obecjtives:
    Addressing each of the Societal Challenge domains (7), we have a domain representative for each & a pilot instantiation of the BDE platform for each in progress

    One of the challenges to Big Data opportunities is the lack of skills (data science) – our aim is to provide out of the box technology with not a lot of training required to use and apply

    BDE technology can be applied in multiple domains and in different phases within Data Value Chains, working with different data providers and addressing multiple objectives (as opposed to current solutions, which tend to be very specific to one data source or domain, and address one objective.
  • 9/16 partners: Sole or joint domain representatives of 7 SC domains (COORDINATION ROLE)
    Other 7/16 partners: technical support (SUPPORT ROLE)
    Fraunhofer coordinates the project
  • Centered around the 7 SC communitiess, we follow a number of iterations each having five steps:

    1) Engaging with stakeholders for 2) collecting feedback and translating to requirements and identify 3) domain-specific data assets. Prototypes are deployed as 7 Pilots use-cases (4) and evaluated with the community (5) – currently we are in the first iteration at step 4 (Pilot conceptualisation and deployment).
  • http://www.gi.de/nc/service/informatiklexikon/detailansicht/article/big-data.html
  • Data Lake is a storage repository for big data scale raw data in original data formats.
    late binding approach to schema: “Let us decide, when we need it.”
    scale out architecture on commodity infrastructure, mostly with HFS/Hadoop/Spark, which gives a huge cost advantage – about factor 10 compared to data warehouses.
    Semantic Data Lake = Data Lake + Knowledge Graph
    management of structure (vocabularies/schemas, KPIs trees, metadata, …) on top of the Data Lake is performed in a knowledge graph - a complex data fabric representing all kinds of things and how they relate to each other.
    A knowledge graph is unique regarding flexibility, multiple views and metadata capabilities.
    Based on the Resource Description Framework (RDF) standard and Linked Data principles.

×