Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.
Trends in
Big Data.
Natalino Busa
Data Platform Architect at Ing
Play with your phones
Re-think Big Data
Hadoop has turned 10
Memory is eating Big Data
Amazon is delivering instances with 2 TB RAM
Facebook, Microsoft: 90% workload below the 100 GB
...
250 MB hard disk drive from 1979
I like Big Data
and I cannot lie.
Disk -> RAM
Hadoop -> Spark
Map-Reduce -> Data Flow Graphs
HDFS -> Storage, MPPs, NoSQL
Wheel mill.
Stream like a boss
Streaming and Real-Time Analytics
Batch -> Event-Driven
ETL -> Streaming
Hive -> Flink, Akka, Spark
Stream Centric Architectures
Spark - RDDs
Streaming SQL MLlib Graphx
Analytics, Statistics, Data
Science, Model Training
HDFS NoSQL SQL
Data Sources
Ma...
Virtual resources
Big Data
Applications,
Assemble!
Clusters -> Resources
Orchestrated -> Isolated
Static -> Disposable
YARN, MESOS, CoreOS, Kubernetes
Application-oriented I...
Elastic: Docker, Mesos, Yarn, Kubernetes
Data Processing: Flink, Spark, Akka
Indexing: Elastic Search, Deep Learning
APIs ...
That’s all folks!
Natalino Busa
Data Platform Architect at Ing
@natbusa
New trends in big data:  in-memory analytics, streaming computing and distributed machine learning
Prochain SlideShare
Chargement dans…5
×

New trends in big data: in-memory analytics, streaming computing and distributed machine learning

880 vues

Publié le

New trends in Big Data:
In-memory analytics, streaming computing and distributed machine learning

The ability to understand data is a central ingredient to solve many real-world problems such as customer churn, cost optimization and fraud detection. Data processing is usually arranged as a pipeline made of several steps such as data extraction, data preparation, training and scoring. Big data technologies can be used to parallelise the whole pipeline, hence providing resilient, scalable and distributed data models.

Since the introduction of Hadoop 10 years back, memory and SSD technologies have become cheaper and more accessible. What was possible back in the days on disk is today possible on solid state technology. Apache Spark makes use of in-memory distributed data structures to accelerate data analytics.

Originally, Big Data and in particular Hadoop was designed to operate on large chunks of data (terabytes and petabytes). However, it's does not perform well to process single events or to provide low-latency actionable results. Streaming computing attempts to provide a single data processing paradigm which is works well for both "fast" and "big" data.

Publié dans : Données & analyses

New trends in big data: in-memory analytics, streaming computing and distributed machine learning

  1. 1. Trends in Big Data. Natalino Busa Data Platform Architect at Ing
  2. 2. Play with your phones
  3. 3. Re-think Big Data Hadoop has turned 10
  4. 4. Memory is eating Big Data Amazon is delivering instances with 2 TB RAM Facebook, Microsoft: 90% workload below the 100 GB Machine Learning algorithms fit on a single node
  5. 5. 250 MB hard disk drive from 1979 I like Big Data and I cannot lie.
  6. 6. Disk -> RAM Hadoop -> Spark Map-Reduce -> Data Flow Graphs HDFS -> Storage, MPPs, NoSQL
  7. 7. Wheel mill. Stream like a boss
  8. 8. Streaming and Real-Time Analytics
  9. 9. Batch -> Event-Driven ETL -> Streaming Hive -> Flink, Akka, Spark
  10. 10. Stream Centric Architectures
  11. 11. Spark - RDDs Streaming SQL MLlib Graphx Analytics, Statistics, Data Science, Model Training HDFS NoSQL SQL Data Sources Map-Reduce HDFS KAFKA Spark: Unified Distributed Computing: SQL + Machine Learning + Graph Analytics Hive
  12. 12. Virtual resources Big Data Applications, Assemble!
  13. 13. Clusters -> Resources Orchestrated -> Isolated Static -> Disposable YARN, MESOS, CoreOS, Kubernetes Application-oriented Infrastructure
  14. 14. Elastic: Docker, Mesos, Yarn, Kubernetes Data Processing: Flink, Spark, Akka Indexing: Elastic Search, Deep Learning APIs and microservices: Akka, Python, Java Data storage: SQL, NoSQL, HDFS, Streaming MESOS, YARN Spark Streaming SQL MLlib Graphx DBs ES C* Application Oriented Architectures
  15. 15. That’s all folks! Natalino Busa Data Platform Architect at Ing @natbusa

×