Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.
Big Data Europe
Apps, challenges, goals
Ir. Aad Versteden, TenForce
SC6 workshop
Platform Goals
“Your data has value,
why don’t you unlock it?”
◎ What is Big Data?
o Volume
o Velocity
o Variety
o Veracity
Platform Goals
Key actors
Platform Goals
◎ Easy to
o Install
o Develop
o Deploy
o Integrate
Societal Challenges
Different domains
with pilot cases
validating the platform
Societal Challenges
◎ Health
◎ Food
◎ Energy
◎ Transport
◎ Climate
◎ Social Sciences
◎ Security
SC4: Transport
◎ Show and predict traffic jams
◎ ~ taxi fleet shares GPS data
◎ Big Data?
o Velocity
o [Volume]
SC4: Transport
SC3: Energy
◎ Preventative maintenance
by vibration analysis
◎ Big Data?
o High Volume (batch)
o High Velocity (live)
SC7: Security
◎ Detect change in human constructions, link to
news events
◎ Big Data?
o Volume
SC7: Security
SC1: Health
◎ Can we use open source to answer Pharma
questions?
◎ Large semantic graph, complex questions
◎ Big Data?
o V...
SC2: Food
◎ Mine viticulture research
& share semantic information
◎ Big Data?
o Variety
SC2: Food
SC5: Climate
◎ Where did an airborne risk come from?
◎ Precalculate emission spots with common
weather patterns
◎ Big Data...
SC5: Climate
SC6: Social Sciences
Martin will tell you later :-)
Platform architecture
Key actors
Platform Architecture
21
22
Platform Architecture
Platform Architecture
23
Platform Architecture
Support Layer
Init Daemon
GUIs
Monitor
App Layer
Traffic
Forecast
Satellite Image Analysis
Platform ...
Supported Frameworks
Search/indexing Data processing
Apache Solr Apache Spark
Data acquisition Apache Flink
Apache Flume S...
Platform Architecture
26
Making Big Data Accessible
How do we make it easy?
27
Platform Goals
◎ Easy to
o Install
o Develop
o Deploy
o Integrate
Actors
◎ Install stack
◎ Develop
◎ Deploy
◎ Monitor results
29
Platform installation
◎ Manual installation guide
◎ Using Docker Machine
o On local machine (VirtualBox)
o In cloud (AWS, ...
BDI Stack Lifecycle
BDI Stack Lifecycle
Developing
Custom
Applications
◎ High level picture
o docker-compose.yml describes pipeline topology
◎ BDE provided components
o extend template image wi...
Development
◎ Base Docker images
o Serve as a template for a (Big Data) technology
o Easily extendable custom algorithm/da...
BDI Stack Lifecycle
Docker Images
BDI Stack Lifecycle
BDI Stack (workflow)
builder
BDI Stack Lifecycle
Custom Components
*Init Daemon
*Integrator UI
Enhancing the Component
◎ Orchestrator required for initialization process
(init_daemon)
o Components may depend on each o...
BDI Stack Lifecycle
Deploy BDE
Platform/Stack
to the Cluster
Deploying a Big Data Stack
◎ Stack
o collection of communicating components
o to solve a specific problem
◎ Described in D...
BDI Stack Lifecycle
Stack/Cluster
Monitor
User Interfaces
◎ Make it easy to use
◎ Available interfaces
o Stack Builder
o Swarm UI
o Workflow Builder
o BDI Integrato...
BDE Workflow Builder
43
BDE Workflow Monitor
44
Swarm UI
Swarm UI
46
Integrator UI
47
Beyond the state of the art ...
Smart Big Data
Increase the value of Big Data
by adding meaning to it!
48
Semantic Data Lake (Ontario)
◎ Data Swamp
o Repository of data in its raw format
o Structured, semi-structured, unstructur...
51
SANSA Stack
Check it out
https://github.com/big-data-europe
52
aad.versteden@tenforce.com
@impulsater
https://github.com/madnificent
53
BDE vs Hadoop distributions
Hortonworks Cloudera MapR Bigtop BDE
File System HDFS HDFS NFS HDFS HDFS
Installation Native N...
BDE vs Hadoop distributions
◎ BDE is not built on top of existing distributions
◎ Targets
o Communities
o Research institu...
Big Data Europe SC6 WS #3: Big Data Europe Platform: Apps, challenges, goals by Aad Versteden, TenForce
Prochain SlideShare
Chargement dans…5
×

Big Data Europe SC6 WS #3: Big Data Europe Platform: Apps, challenges, goals by Aad Versteden, TenForce

181 vues

Publié le

Talk at the Big Data Europe SC6 workshop number 3 taking place on 11.9.2017 in Amsterdam co-located with SEMANTiCS2017 conference: The Big Data Europe Platform: Apps, challenges, goals by Aad Versteden, TenForce.

Publié dans : Données & analyses
  • Soyez le premier à commenter

  • Soyez le premier à aimer ceci

Big Data Europe SC6 WS #3: Big Data Europe Platform: Apps, challenges, goals by Aad Versteden, TenForce

  1. 1. Big Data Europe Apps, challenges, goals Ir. Aad Versteden, TenForce SC6 workshop
  2. 2. Platform Goals “Your data has value, why don’t you unlock it?”
  3. 3. ◎ What is Big Data? o Volume o Velocity o Variety o Veracity Platform Goals
  4. 4. Key actors
  5. 5. Platform Goals ◎ Easy to o Install o Develop o Deploy o Integrate
  6. 6. Societal Challenges Different domains with pilot cases validating the platform
  7. 7. Societal Challenges ◎ Health ◎ Food ◎ Energy ◎ Transport ◎ Climate ◎ Social Sciences ◎ Security
  8. 8. SC4: Transport ◎ Show and predict traffic jams ◎ ~ taxi fleet shares GPS data ◎ Big Data? o Velocity o [Volume]
  9. 9. SC4: Transport
  10. 10. SC3: Energy ◎ Preventative maintenance by vibration analysis ◎ Big Data? o High Volume (batch) o High Velocity (live)
  11. 11. SC7: Security ◎ Detect change in human constructions, link to news events ◎ Big Data? o Volume
  12. 12. SC7: Security
  13. 13. SC1: Health ◎ Can we use open source to answer Pharma questions? ◎ Large semantic graph, complex questions ◎ Big Data? o Variety
  14. 14. SC2: Food ◎ Mine viticulture research & share semantic information ◎ Big Data? o Variety
  15. 15. SC2: Food
  16. 16. SC5: Climate ◎ Where did an airborne risk come from? ◎ Precalculate emission spots with common weather patterns ◎ Big Data? o Volume
  17. 17. SC5: Climate
  18. 18. SC6: Social Sciences Martin will tell you later :-)
  19. 19. Platform architecture
  20. 20. Key actors
  21. 21. Platform Architecture 21
  22. 22. 22 Platform Architecture
  23. 23. Platform Architecture 23
  24. 24. Platform Architecture Support Layer Init Daemon GUIs Monitor App Layer Traffic Forecast Satellite Image Analysis Platform Layer Spark Flink Semantic Layer Ontario SANSA Semagrow Kafka Real-time Stream Monitoring ... ... Resource Management Layer (Swarm) Hardware Layer Premises Cloud (AWS, GCE, MS Azure, …) Data Layer Hadoop NOSQL Store CassandraElasticsearch ...RDF Store
  25. 25. Supported Frameworks Search/indexing Data processing Apache Solr Apache Spark Data acquisition Apache Flink Apache Flume Semantic Components Message passing Strabon Apache Kafka Sextant Data storage GeoTriples Hue Silk Apache Cassandra SEMAGROW ScyllaDB LIMES Apache Hive 4Store Postgis OpenLink Virtuoso 25
  26. 26. Platform Architecture 26
  27. 27. Making Big Data Accessible How do we make it easy? 27
  28. 28. Platform Goals ◎ Easy to o Install o Develop o Deploy o Integrate
  29. 29. Actors ◎ Install stack ◎ Develop ◎ Deploy ◎ Monitor results 29
  30. 30. Platform installation ◎ Manual installation guide ◎ Using Docker Machine o On local machine (VirtualBox) o In cloud (AWS, DigitalOcean, Azure) o Bare metal ◎ Screencasts 30
  31. 31. BDI Stack Lifecycle
  32. 32. BDI Stack Lifecycle Developing Custom Applications
  33. 33. ◎ High level picture o docker-compose.yml describes pipeline topology ◎ BDE provided components o extend template image with your code ◎ New components o build a Docker image for your component o this is your own little Virtual Machine for your component ◎ Sharing o publish topology as git repository o publish new components on docker hub Platform development
  34. 34. Development ◎ Base Docker images o Serve as a template for a (Big Data) technology o Easily extendable custom algorithm/data ◎ Published components o Image repositories on GitHub o Automated builds on DockerHub o Documentation on BDE Wiki 34
  35. 35. BDI Stack Lifecycle Docker Images
  36. 36. BDI Stack Lifecycle BDI Stack (workflow) builder
  37. 37. BDI Stack Lifecycle Custom Components *Init Daemon *Integrator UI
  38. 38. Enhancing the Component ◎ Orchestrator required for initialization process (init_daemon) o Components may depend on each other o Components may require manual intervention ◎ User Interface Integration o Standard Interfaces from components o Combine and align the interfaces 38
  39. 39. BDI Stack Lifecycle Deploy BDE Platform/Stack to the Cluster
  40. 40. Deploying a Big Data Stack ◎ Stack o collection of communicating components o to solve a specific problem ◎ Described in Docker Compose o Component configuration o Application topology 40
  41. 41. BDI Stack Lifecycle Stack/Cluster Monitor
  42. 42. User Interfaces ◎ Make it easy to use ◎ Available interfaces o Stack Builder o Swarm UI o Workflow Builder o BDI Integrator 42
  43. 43. BDE Workflow Builder 43
  44. 44. BDE Workflow Monitor 44
  45. 45. Swarm UI
  46. 46. Swarm UI 46
  47. 47. Integrator UI 47
  48. 48. Beyond the state of the art ... Smart Big Data Increase the value of Big Data by adding meaning to it! 48
  49. 49. Semantic Data Lake (Ontario) ◎ Data Swamp o Repository of data in its raw format o Structured, semi-structured, unstructured o Schema-less ◎ Data Lake o Add a Semantic layer on top of the source datasets o The data is semantically lifted using existing ontology terms 49
  50. 50. 51 SANSA Stack
  51. 51. Check it out https://github.com/big-data-europe 52 aad.versteden@tenforce.com @impulsater https://github.com/madnificent
  52. 52. 53
  53. 53. BDE vs Hadoop distributions Hortonworks Cloudera MapR Bigtop BDE File System HDFS HDFS NFS HDFS HDFS Installation Native Native Native Native lightweight virtualization Plug & play components (no rigid schema) no no no no yes High Availability Single failure recovery (yarn) Single failure recovery (yarn) Self healing, mult. failure rec. Single failure recovery (yarn) Multiple Failure recovery Cost Commercial Commercial Commercial Free Free Scaling Freemium Freemium Freemium Free Free Addition of custom components Not easy No No No Yes Integration testing yes yes yes yes -- Operating systems Linux Linux Linux Linux All Management tool Ambari Cloudera manager MapR Control system - Docker swarm UI+ Custom 54
  54. 54. BDE vs Hadoop distributions ◎ BDE is not built on top of existing distributions ◎ Targets o Communities o Research institutions ◎ Bridges scientists and open data ◎ Multi Tier research efforts towards Smart Data 55

×