SlideShare une entreprise Scribd logo
1  sur  37
Hadoop Summit
San Jose, California
June 28th 2016
Analysis of Major Trends in
Big Data Analytics
Slim Baltagi
Director, Enterprise Architecture
Capital One Financial Corporation
Welcome!
About me:
• I’m currently director of Enterprise Architecture at Capital One: a
top 10 US financial corporation based in McLean, VA.
• I have over 20 years of IT experience.
• I have over 7 years of Big Data experience: Engineer, Architect,
Evangelist, Blogger, Thought Leader, Speaker, Organizer of Apache
Flink meetups in many countries, Creator and maintainer of the Big
Data Knowledge Base: http://SparkBigData.com with over 7,000
categorized web resources about Hadoop, Spark, Flink, …
Thanks: This talk won the community vote of the ‘Future
of Apache Hadoop’ track. Thanks to all of you who: voted
for this talk, attending this talk now, reading these slides.
Disclaimer: This is a vendor-independent talk that
expresses my own opinions. I am not endorsing nor
promoting any product or vendor mentioned in this talk.2
Agenda
1. Portability between Big Data Execution
Engines
2. Emergence of stream analytics
3. In-Memory analytics
4. Rapid Application Development of Big Data
applications
5. Open sourcing Machine Learning systems by
tech giants
6. Hybrid Cloud Computing
3
What is a typical Big Data Analytics Stack:
Hadoop, Spark, Flink, …?
4
1. Portability between Big Data Execution Engines
If you have an existing Big Data application based on
MapReduce and you want to benefit from a different
execution engine such as Tez, Spark or Flink, you might
need to:
• Reuse some of your existing code such as mapper and
reduce functions.
• Leverage a ‘compatibility layer’ to run your existing
Big Data application on the new engine. Example:
Hadoop Compatibility Layer from Flink
• Switch to a different engine if the tool you used
supports it. Example: Hive/Pig on Tez, Hive/Pig on
Spark, Sqoop on Spark, Cascading on Flink.
• Rewrite your Big Data application! 5
1. Portability between Big Data Execution Engines
Apache Beam (unified Batch and Stream processing) is
a new Apache incubator project based on years of
experience developing Big Data infrastructure
(MapReduce, FlumeJava, MillWheel) within Google
http://beam.incubator.apache.org/
Apache Beam provides a unified API for Batch and
Stream processing and also multiple runners.
Beam programs become portable across multiple
runtime environments, both proprietary (e.g., Google
Cloud Dataflow) and open-source (e.g., Flink, Spark).
Apache Beam web
resourceshttp://sparkbigdata.com/component/tags/tag/67
6
Agenda
1. Portability between Big Data Execution
Engines
2. Emergence of stream analytics
3. In-Memory analytics
4. Rapid Application Development of Big Data
applications
5. Open sourcing Machine Learning systems by
tech giants
6. Hybrid Cloud Computing
7
2. Emergence of stream analytics
Stonebraker et al. predicted in 2005 that stream
processing is going to become increasingly important
and attributed this to the ‘sensorization of the real
world: everything of material significance on the
planet get ‘sensor-tagged’ and report its state or
location in real time’. http://cs.brown.edu/~ugur/8rulesSigRec.pdf
I think stream processing is becoming important not
only because of this sensorization of the real world but
also because of the following factors:
1. Data streams
2. Technology
3. Business
4. Consumers
8
2. Emergence of stream analytics
ConsumersData Streams
Technology Business1
2 3
4
Emergence of Stream
Analytics
9
2. Emergence of stream analytics
1 Data Streams
 Real-world data is available as series of events that
are continuously produced by a variety of
applications and disparate systems inside and
outside the enterprise.
 Examples:
• Sensor networks data
• Web logs
• Database transactions
• System logs
• Tweets and social media data
• Click streams
• Mobile apps data
10
2. Emergence of stream analytics
2 Technology
Simplified data architecture with Apache Kafka as a
major innovation and backbone of stream
architectures.
Rapidly maturing open source stream analytics tools:
Apache Flink, Apache Apex, Spark Streaming, Kafka Streams,
Apache Samza, Apache Storm, Apache Gearpump, Heron, …
Cloud services for stream processing: Google Cloud
Dataflow, Microsoft’s Azure Stream Analytics, Amazon Kinesis
Streams, IBM InfoSphere Streams, …
Vendors innovating in this space: Confluent, Data
Artisans, Databricks, MapR, Hortonworks, StreamSets, …
More mobile devices than human beings!
11
2. Emergence of stream analytics
3 Business
Challenges:
Lag between data creation and actionable insights.
Infrastructure is idle most of the time
Web and mobile application growth, new types/sources
of data.
Need of organizations to shift from reactive approach
to a more of a proactive approach to interactions with
customers, suppliers and employees.
12
2. Emergence of stream analytics
3 Business
Opportunities:
Embracing stream analytics helps organizations with
faster time to insight, competitive advantages and
operational efficiency in a wide range of verticals.
With stream analytics, new startups are/will be
challenging established companies. Example: Pay-As-
You-Go insurance or Usage-Based Auto Insurance
Speed is said to have become the new currency of
business.
13
2. Emergence of stream analytics
4 Consumers
Consumers expect everything to be online and
immediately accessible through mobile
applications.
Mobile, always-on consumers are becoming more and
more demanding for instant responses from enterprise
applications in the way they are used to in mobile
applications from social networks such as Twitter,
Facebook, Linkedin …
Younger generation who grow up with video gaming
and accustomed to real-time interaction are now
themselves a growing class of consumers.
14
2. Emergence of stream analytics
 Financial services
 Telecommunications
 Online gaming systems
 Security & Intelligence
 Advertisement serving
 Sensor Networks
 Social Media
 Healthcare
 Oil & Gas
 Retail & eCommerce
 Transportation and logistics
Stream Processor
Business
Applications
(e.g. Enterprise
Command
Center)
Personal Mobile
Applications
Data Lake
Event
Collector
& Broker
Advanced Analytics
& Machine Learning
Real-Time
Notifications
Real-Time
DecisionsApps
Sensors
Devices
Other
Sources
Business
System
Backend
Dashboards
Sourcing & Integration Analytics & Processing Serving & Consuming
16
End-to-end stream analytics solution architecture
2. Emergence of stream analytics
Agenda
1. Portability between Big Data Execution
Engines
2. Emergence of stream analytics
3. In-Memory analytics
4. Rapid Application Development of Big Data
applications
5. Open sourcing Machine Learning systems by
tech giants
6. Hybrid Cloud Computing
17
3. In-Memory Analytics
While In-Memory Analytics are not new, the trend is that
they are the focus of renewed attention thanks to:
• the availability of new memory that could easily fit
most active data sets
• the maturing or newly available in-memory open source
tools in many categories such as:
 Memory-centric distributed File System
 Columnar data format
 Key Value data stores
 IMDG: In-Memory Data Grids
 Distributed Cache
 Very Large Hashmaps
In the next couple slides, I will share a few examples
18
3. In-Memory Analytics
Alluxio http://alluxio.org (formerly known as Tachyon) is
an open source memory speed virtual distributed
storage system. Example of its usage patterns:
• Accelerate Big Data Analytics workloads by
prefetching views and creating caches on demand.
• Sharing data between applications by writing to
Alluxio’s in-memory data store and read it back at
far greater speed.
 Rocks DB https://github.com/facebook/rocksdb/ An open
source library from Facebook that provides an
embeddable, persistent key-value store. It is suited for
fast storage of data on RAM and flash drives. It is used
as state backend by Samza, Flink, Kafka Streams, …
19
3. In-Memory Analytics
Apache Arrow (http://arrow.apache.org/) for columnar in-
memory analytics.
• Apache Arrow enables execution engines to take
advantage of the latest SIMD (Single Input Multiple
Data) operations included in modern processors, for
native vectorized optimization of analytical data
processing.
• Columnar layout of data also allows for a better use of
CPU caches by placing all data relevant to a column
operation in as compact of a format as possible.
• Apache Arrow advantages is that systems utilizing it
as a common memory format have no overhead for
cross-system data communication and also can share
functionality.
20
Agenda
1. Portability between Big Data Execution
Engines
2. Emergence of stream analytics frameworks
3. In-Memory analytics
4. Rapid Application Development of Big Data
applications
5. Open sourcing Machine Learning systems by
tech giants
6. Deployment of Big Data applications in a
hybrid model: on-premise and on the cloud
21
4. Rapid Application Development of Big
Data applications
MicroservicesAPIs
Notebooks
/Shells
GUIs1
2 3
4
Rapid Applications Development of
Big Data Analytics
22
4. Rapid Application Development of Big
Data applications
1 APIs
 Apache Spark and Apache Flink provide high level and
easy to use API compared to Hadoop MapReduce
 Apache Beam is a new open source project from
Google that attempts to unify data processing
frameworks with a core API, allowing easy portability
between execution engines.
 Use Apache Beam unified API for batch and streaming
and then run on a local runner, Apache Spark, Apache
Flink, …
 The biggest advantage is in developer productivity and
ease of migration between processing engines.
23
4. Rapid Application Development of Big
Data applications
2 Shells or Notebooks
• REPL (Read Evaluate Print Loop) interpreter
• Interactive queries
• Explore data quickly
• Sketch out your ideas in the shell to make sure you’ve
got your code right before deploying it to a cluster.
• Web-based interactive computation environment
• Collaborative data analytics and visualization tool
• Combines rich text, execution code, plots and rich
media
• Exploratory data science
• Saving and replaying of written code
24
4. Rapid Application Development of Big
Data applications
2 Shells or Notebooks Apache Zeppelin
25
4. Rapid Application Development of Big
Data applications
3 GUIs
 Apache Nifi
26
4. Rapid Application Development of Big
Data applications
4 Microservices:
 Microservices are an important trend in building larger
systems by:
• decomposing their functions into relatively simple,
single purpose services
• that asynchronously communicate via Apache
Kafka as a message passing technology that avoid
unwanted dependencies between these services.
 This streaming architectural style provides agility
as microservices can be built and maintained by
small and cross-functional teams.
27
Agenda
1. Portability between Big Data Execution
Engines
2. Emergence of stream analytics frameworks
3. In-Memory analytics
4. Rapid Application Development of Big Data
applications
5. Open sourcing Machine Learning systems by
tech giants
6. Hybrid Cloud Computing
28
5. Open sourcing Machine Learning systems
by tech giants
Yahoo
CaffeOnSpark
Facebook
Torch
IBM
SystemML
Google
TensorFlow1
2 3
5
Open sourcing machine
learning systems by tech giants
29
4
Microsoft
DMTK
Amazon
DSSTNE
6
5. Open sourcing Machine Learning systems
by tech giants
1 Torch http://torch.ch/ is an open source
Machine Learning library which provides a
wide range of deep learning algorithms.
Facebook donated its optimized deep learning modules to
the Torch project on January 16, 2015.
2 Apache SystemML http://systemml.apache.org/
is a distributed and declarative machine learning platform.
It was created in 2010 by IBM and donated as an open
source Apache project on November 2nd, 2015.
3 TensorFlow is an open source machine learning library
created by Google. https://www.tensorflow.org It was released
under the Apache 2.0 open source license on November 9th,
2015 30
5. Open sourcing Machine Learning
systems by tech giants
4 DMTK (Distributed Machine Learning Toolkit) allows
models to be trained on multiple nodes at once.
http://www.dmtk.io/ DMTK was open sourced
by Microsoft on November 12, 2015.
5 CaffeOnSpark https://github.com/yahoo/CaffeOnSpark is an
open source machine learning library created by Yahoo. It
was open sourced on February 24th, 2016
DSSTNE (Deep Scalable Sparse Tensor Network
Engine) “Destiny” is an Amazon developed library for
building Deep Learning (DL) Machine Learning (ML)
models. It was open sourced on May 11th, 2016
https://github.com/amznlabs/amazon-dsstne
31
6
5. Open sourcing Machine Learning
systems by tech giants
It is expected to see wider adoption of Machine Learning
tools by companies besides these tech giants in a
similar way that MapReduce and Hadoop helped making
“Big Data” a part of just every company’s strategy!
These tech giants are not pushing their machine
learning systems for internal use only but they are
racing to open source them, attract users and
committers and advance the entire industry.
This combined with deployment on commodity clusters
will accelerate such adoption and as a result we will see
new machine learning use cases especially building on
deep learning that will transform multiple industries.
32
Agenda
1. Portability between Big Data Execution
Engines
2. Emergence of stream analytics frameworks
3. In-Memory analytics
4. Rapid Application Development of Big Data
applications
5. Open sourcing Machine Learning systems by
tech giants
6. Hybrid Cloud Computing
33
6. Hybrid Cloud Computing
Cloud is becoming mainstream and software stack is
adapting.
Big Data applications will eventually all move to the
cloud to benefit from agility, elasticity and on-demand
computing!
Meanwhile, companies need to advance their strategy
for hybrid integration between cloud and on-premise
deployments.
Deployment of Big Data applications in a hybrid
model: on-premise and on the cloud
34
6. Hybrid Cloud Computing
The following are a few patterns for such hybrid
integration:
1. Replicating data from SaaS apps to existing on-
premise databases to be used by other on-premise
applications such as analytics ones.
2. Integrating SaaS applications themselves with on-
premise applications.
3. Hybrid Data Warehousing with the Cloud: move data
from on-premise data warehouse to the cloud.
4. Real-Time analytics on streaming data: depending on
your use case, you might keep your stream analytics
infrastructure directly accessible on-premise for low
latency.
Key Takeaways
1. Adopt Apache Beam for easier development and
portability between Big Data Execution Engines
2. Adopt stream analytics for faster time to insight,
competitive advantages and operational efficiency
3. Accelerate your Big Data applications with In-Memory
open source tools
4. Adopt Rapid Application Development of Big Data
applications: APIs, Notebooks, GUIs, Microservices…
5. Have Machine Learning part of your strategy or
passively watch your industry completely
transformed!
6. How to advance your strategy for hybrid integration
between cloud and on-premise deployments?
36
Thanks!
To all of you for attending!
Any questions?
Let’s keep in touch!
• sbaltagi@gmail.com
• @SlimBaltagi
• https://www.linkedin.com/in/slimbaltagi
37

Contenu connexe

Tendances

Big Data in Action : Operations, Analytics and more
Big Data in Action : Operations, Analytics and moreBig Data in Action : Operations, Analytics and more
Big Data in Action : Operations, Analytics and moreSoftweb Solutions
 
Big Data Use Cases
Big Data Use CasesBig Data Use Cases
Big Data Use Casesboorad
 
Владимир Слободянюк «DWH & BigData – architecture approaches»
Владимир Слободянюк «DWH & BigData – architecture approaches»Владимир Слободянюк «DWH & BigData – architecture approaches»
Владимир Слободянюк «DWH & BigData – architecture approaches»Anna Shymchenko
 
Big Data Architecture and Design Patterns
Big Data Architecture and Design PatternsBig Data Architecture and Design Patterns
Big Data Architecture and Design PatternsJohn Yeung
 
Hadoop Data Lake vs classical Data Warehouse: How to utilize best of both wor...
Hadoop Data Lake vs classical Data Warehouse: How to utilize best of both wor...Hadoop Data Lake vs classical Data Warehouse: How to utilize best of both wor...
Hadoop Data Lake vs classical Data Warehouse: How to utilize best of both wor...Kolja Manuel Rödel
 
Shortening the Feedback Loop: How Spotify’s Big Data Ecosystem has evolved to...
Shortening the Feedback Loop: How Spotify’s Big Data Ecosystem has evolved to...Shortening the Feedback Loop: How Spotify’s Big Data Ecosystem has evolved to...
Shortening the Feedback Loop: How Spotify’s Big Data Ecosystem has evolved to...Big Data Spain
 
Building a Big Data Pipeline
Building a Big Data PipelineBuilding a Big Data Pipeline
Building a Big Data PipelineJesus Rodriguez
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big DataMohammed Guller
 
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...Shirshanka Das
 
Lambda architecture for real time big data
Lambda architecture for real time big dataLambda architecture for real time big data
Lambda architecture for real time big dataTrieu Nguyen
 
Architecture of Big Data Solutions
Architecture of Big Data SolutionsArchitecture of Big Data Solutions
Architecture of Big Data SolutionsGuido Schmutz
 
High Performance Spatial-Temporal Trajectory Analysis with Spark
High Performance Spatial-Temporal Trajectory Analysis with Spark High Performance Spatial-Temporal Trajectory Analysis with Spark
High Performance Spatial-Temporal Trajectory Analysis with Spark DataWorks Summit/Hadoop Summit
 
Big Data Computing Architecture
Big Data Computing ArchitectureBig Data Computing Architecture
Big Data Computing ArchitectureGang Tao
 
[DataCon.TW 2017] Data Lake: centralize in on-prem vs. decentralize on cloud
[DataCon.TW 2017] Data Lake: centralize in on-prem vs. decentralize on cloud[DataCon.TW 2017] Data Lake: centralize in on-prem vs. decentralize on cloud
[DataCon.TW 2017] Data Lake: centralize in on-prem vs. decentralize on cloudJeff Hung
 

Tendances (20)

Big Data in Action : Operations, Analytics and more
Big Data in Action : Operations, Analytics and moreBig Data in Action : Operations, Analytics and more
Big Data in Action : Operations, Analytics and more
 
LinkedIn2
LinkedIn2LinkedIn2
LinkedIn2
 
Big Data Use Cases
Big Data Use CasesBig Data Use Cases
Big Data Use Cases
 
Владимир Слободянюк «DWH & BigData – architecture approaches»
Владимир Слободянюк «DWH & BigData – architecture approaches»Владимир Слободянюк «DWH & BigData – architecture approaches»
Владимир Слободянюк «DWH & BigData – architecture approaches»
 
Big Data Architecture and Design Patterns
Big Data Architecture and Design PatternsBig Data Architecture and Design Patterns
Big Data Architecture and Design Patterns
 
Hadoop Data Lake vs classical Data Warehouse: How to utilize best of both wor...
Hadoop Data Lake vs classical Data Warehouse: How to utilize best of both wor...Hadoop Data Lake vs classical Data Warehouse: How to utilize best of both wor...
Hadoop Data Lake vs classical Data Warehouse: How to utilize best of both wor...
 
Hadoop for the Masses
Hadoop for the MassesHadoop for the Masses
Hadoop for the Masses
 
Shortening the Feedback Loop: How Spotify’s Big Data Ecosystem has evolved to...
Shortening the Feedback Loop: How Spotify’s Big Data Ecosystem has evolved to...Shortening the Feedback Loop: How Spotify’s Big Data Ecosystem has evolved to...
Shortening the Feedback Loop: How Spotify’s Big Data Ecosystem has evolved to...
 
Building a Big Data Pipeline
Building a Big Data PipelineBuilding a Big Data Pipeline
Building a Big Data Pipeline
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
 
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...
 
Lambda architecture for real time big data
Lambda architecture for real time big dataLambda architecture for real time big data
Lambda architecture for real time big data
 
Big Data Tech Stack
Big Data Tech StackBig Data Tech Stack
Big Data Tech Stack
 
Architecture of Big Data Solutions
Architecture of Big Data SolutionsArchitecture of Big Data Solutions
Architecture of Big Data Solutions
 
High Performance Spatial-Temporal Trajectory Analysis with Spark
High Performance Spatial-Temporal Trajectory Analysis with Spark High Performance Spatial-Temporal Trajectory Analysis with Spark
High Performance Spatial-Temporal Trajectory Analysis with Spark
 
Benefits of Hadoop as Platform as a Service
Benefits of Hadoop as Platform as a ServiceBenefits of Hadoop as Platform as a Service
Benefits of Hadoop as Platform as a Service
 
Big Data Computing Architecture
Big Data Computing ArchitectureBig Data Computing Architecture
Big Data Computing Architecture
 
On Demand HDP Clusters using Cloudbreak and Ambari
On Demand HDP Clusters using Cloudbreak and AmbariOn Demand HDP Clusters using Cloudbreak and Ambari
On Demand HDP Clusters using Cloudbreak and Ambari
 
[DataCon.TW 2017] Data Lake: centralize in on-prem vs. decentralize on cloud
[DataCon.TW 2017] Data Lake: centralize in on-prem vs. decentralize on cloud[DataCon.TW 2017] Data Lake: centralize in on-prem vs. decentralize on cloud
[DataCon.TW 2017] Data Lake: centralize in on-prem vs. decentralize on cloud
 
451 Research Impact Report
451 Research Impact Report451 Research Impact Report
451 Research Impact Report
 

En vedette

Building Streaming Data Applications Using Apache Kafka
Building Streaming Data Applications Using Apache KafkaBuilding Streaming Data Applications Using Apache Kafka
Building Streaming Data Applications Using Apache KafkaSlim Baltagi
 
Kafka Streams for Java enthusiasts
Kafka Streams for Java enthusiastsKafka Streams for Java enthusiasts
Kafka Streams for Java enthusiastsSlim Baltagi
 
Apache Kafka vs RabbitMQ: Fit For Purpose / Decision Tree
Apache Kafka vs RabbitMQ: Fit For Purpose / Decision TreeApache Kafka vs RabbitMQ: Fit For Purpose / Decision Tree
Apache Kafka vs RabbitMQ: Fit For Purpose / Decision TreeSlim Baltagi
 
Aljoscha Krettek - Portable stateful big data processing in Apache Beam
Aljoscha Krettek - Portable stateful big data processing in Apache BeamAljoscha Krettek - Portable stateful big data processing in Apache Beam
Aljoscha Krettek - Portable stateful big data processing in Apache BeamVerverica
 
Fundamentals of Stream Processing with Apache Beam, Tyler Akidau, Frances Perry
Fundamentals of Stream Processing with Apache Beam, Tyler Akidau, Frances Perry Fundamentals of Stream Processing with Apache Beam, Tyler Akidau, Frances Perry
Fundamentals of Stream Processing with Apache Beam, Tyler Akidau, Frances Perry confluent
 
Building a Modern Data Architecture with Enterprise Hadoop
Building a Modern Data Architecture with Enterprise HadoopBuilding a Modern Data Architecture with Enterprise Hadoop
Building a Modern Data Architecture with Enterprise HadoopSlim Baltagi
 
Hadoop or Spark: is it an either-or proposition? By Slim Baltagi
Hadoop or Spark: is it an either-or proposition? By Slim BaltagiHadoop or Spark: is it an either-or proposition? By Slim Baltagi
Hadoop or Spark: is it an either-or proposition? By Slim BaltagiSlim Baltagi
 
Apache Beam: A unified model for batch and stream processing data
Apache Beam: A unified model for batch and stream processing dataApache Beam: A unified model for batch and stream processing data
Apache Beam: A unified model for batch and stream processing dataDataWorks Summit/Hadoop Summit
 
Apache-Flink-What-How-Why-Who-Where-by-Slim-Baltagi
Apache-Flink-What-How-Why-Who-Where-by-Slim-BaltagiApache-Flink-What-How-Why-Who-Where-by-Slim-Baltagi
Apache-Flink-What-How-Why-Who-Where-by-Slim-BaltagiSlim Baltagi
 
Apache Flink: Real-World Use Cases for Streaming Analytics
Apache Flink: Real-World Use Cases for Streaming AnalyticsApache Flink: Real-World Use Cases for Streaming Analytics
Apache Flink: Real-World Use Cases for Streaming AnalyticsSlim Baltagi
 
Flink Case Study: Amadeus
Flink Case Study: AmadeusFlink Case Study: Amadeus
Flink Case Study: AmadeusFlink Forward
 
Flink Case Study: OKKAM
Flink Case Study: OKKAMFlink Case Study: OKKAM
Flink Case Study: OKKAMFlink Forward
 
Flink Case Study: Capital One
Flink Case Study: Capital OneFlink Case Study: Capital One
Flink Case Study: Capital OneFlink Forward
 
Why apache Flink is the 4G of Big Data Analytics Frameworks
Why apache Flink is the 4G of Big Data Analytics FrameworksWhy apache Flink is the 4G of Big Data Analytics Frameworks
Why apache Flink is the 4G of Big Data Analytics FrameworksSlim Baltagi
 
Making Great User Experiences, Pittsburgh Scrum MeetUp, Oct 17, 2017
Making Great User Experiences, Pittsburgh Scrum MeetUp, Oct 17, 2017Making Great User Experiences, Pittsburgh Scrum MeetUp, Oct 17, 2017
Making Great User Experiences, Pittsburgh Scrum MeetUp, Oct 17, 2017Carol Smith
 

En vedette (16)

Building Streaming Data Applications Using Apache Kafka
Building Streaming Data Applications Using Apache KafkaBuilding Streaming Data Applications Using Apache Kafka
Building Streaming Data Applications Using Apache Kafka
 
Kafka Streams for Java enthusiasts
Kafka Streams for Java enthusiastsKafka Streams for Java enthusiasts
Kafka Streams for Java enthusiasts
 
Flink vs. Spark
Flink vs. SparkFlink vs. Spark
Flink vs. Spark
 
Apache Kafka vs RabbitMQ: Fit For Purpose / Decision Tree
Apache Kafka vs RabbitMQ: Fit For Purpose / Decision TreeApache Kafka vs RabbitMQ: Fit For Purpose / Decision Tree
Apache Kafka vs RabbitMQ: Fit For Purpose / Decision Tree
 
Aljoscha Krettek - Portable stateful big data processing in Apache Beam
Aljoscha Krettek - Portable stateful big data processing in Apache BeamAljoscha Krettek - Portable stateful big data processing in Apache Beam
Aljoscha Krettek - Portable stateful big data processing in Apache Beam
 
Fundamentals of Stream Processing with Apache Beam, Tyler Akidau, Frances Perry
Fundamentals of Stream Processing with Apache Beam, Tyler Akidau, Frances Perry Fundamentals of Stream Processing with Apache Beam, Tyler Akidau, Frances Perry
Fundamentals of Stream Processing with Apache Beam, Tyler Akidau, Frances Perry
 
Building a Modern Data Architecture with Enterprise Hadoop
Building a Modern Data Architecture with Enterprise HadoopBuilding a Modern Data Architecture with Enterprise Hadoop
Building a Modern Data Architecture with Enterprise Hadoop
 
Hadoop or Spark: is it an either-or proposition? By Slim Baltagi
Hadoop or Spark: is it an either-or proposition? By Slim BaltagiHadoop or Spark: is it an either-or proposition? By Slim Baltagi
Hadoop or Spark: is it an either-or proposition? By Slim Baltagi
 
Apache Beam: A unified model for batch and stream processing data
Apache Beam: A unified model for batch and stream processing dataApache Beam: A unified model for batch and stream processing data
Apache Beam: A unified model for batch and stream processing data
 
Apache-Flink-What-How-Why-Who-Where-by-Slim-Baltagi
Apache-Flink-What-How-Why-Who-Where-by-Slim-BaltagiApache-Flink-What-How-Why-Who-Where-by-Slim-Baltagi
Apache-Flink-What-How-Why-Who-Where-by-Slim-Baltagi
 
Apache Flink: Real-World Use Cases for Streaming Analytics
Apache Flink: Real-World Use Cases for Streaming AnalyticsApache Flink: Real-World Use Cases for Streaming Analytics
Apache Flink: Real-World Use Cases for Streaming Analytics
 
Flink Case Study: Amadeus
Flink Case Study: AmadeusFlink Case Study: Amadeus
Flink Case Study: Amadeus
 
Flink Case Study: OKKAM
Flink Case Study: OKKAMFlink Case Study: OKKAM
Flink Case Study: OKKAM
 
Flink Case Study: Capital One
Flink Case Study: Capital OneFlink Case Study: Capital One
Flink Case Study: Capital One
 
Why apache Flink is the 4G of Big Data Analytics Frameworks
Why apache Flink is the 4G of Big Data Analytics FrameworksWhy apache Flink is the 4G of Big Data Analytics Frameworks
Why apache Flink is the 4G of Big Data Analytics Frameworks
 
Making Great User Experiences, Pittsburgh Scrum MeetUp, Oct 17, 2017
Making Great User Experiences, Pittsburgh Scrum MeetUp, Oct 17, 2017Making Great User Experiences, Pittsburgh Scrum MeetUp, Oct 17, 2017
Making Great User Experiences, Pittsburgh Scrum MeetUp, Oct 17, 2017
 

Similaire à Analysis-of-Major-Trends-in-big-data-analytics-slim-baltagi-hadoop-summit

Overview of Apache Flink: the 4G of Big Data Analytics Frameworks
Overview of Apache Flink: the 4G of Big Data Analytics FrameworksOverview of Apache Flink: the 4G of Big Data Analytics Frameworks
Overview of Apache Flink: the 4G of Big Data Analytics FrameworksDataWorks Summit/Hadoop Summit
 
Overview of Apache Fink: the 4 G of Big Data Analytics Frameworks
Overview of Apache Fink: the 4 G of Big Data Analytics FrameworksOverview of Apache Fink: the 4 G of Big Data Analytics Frameworks
Overview of Apache Fink: the 4 G of Big Data Analytics FrameworksSlim Baltagi
 
Overview of Apache Fink: The 4G of Big Data Analytics Frameworks
Overview of Apache Fink: The 4G of Big Data Analytics FrameworksOverview of Apache Fink: The 4G of Big Data Analytics Frameworks
Overview of Apache Fink: The 4G of Big Data Analytics FrameworksSlim Baltagi
 
ALT-F1.BE : The Accelerator (Google Cloud Platform)
ALT-F1.BE : The Accelerator (Google Cloud Platform)ALT-F1.BE : The Accelerator (Google Cloud Platform)
ALT-F1.BE : The Accelerator (Google Cloud Platform)Abdelkrim Boujraf
 
Big Data Technologies.pdf
Big Data Technologies.pdfBig Data Technologies.pdf
Big Data Technologies.pdfRAHULRAHU8
 
OCC Overview OMG Clouds Meeting 07-13-09 v3
OCC Overview OMG Clouds Meeting 07-13-09 v3OCC Overview OMG Clouds Meeting 07-13-09 v3
OCC Overview OMG Clouds Meeting 07-13-09 v3Robert Grossman
 
Big Data Session 1.pptx
Big Data Session 1.pptxBig Data Session 1.pptx
Big Data Session 1.pptxElsonPaul2
 
Memory Management in BigData: A Perpective View
Memory Management in BigData: A Perpective ViewMemory Management in BigData: A Perpective View
Memory Management in BigData: A Perpective Viewijtsrd
 
Top 10 renowned big data companies
Top 10 renowned big data companiesTop 10 renowned big data companies
Top 10 renowned big data companiesRobert Smith
 
Introduction to pyspark new
Introduction to pyspark newIntroduction to pyspark new
Introduction to pyspark newAnam Mahmood
 
Smart Enterprise Big Data Bus for the Modern Responsive Enterprise- StreamAna...
Smart Enterprise Big Data Bus for the Modern Responsive Enterprise- StreamAna...Smart Enterprise Big Data Bus for the Modern Responsive Enterprise- StreamAna...
Smart Enterprise Big Data Bus for the Modern Responsive Enterprise- StreamAna...Impetus Technologies
 
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...confluent
 
Tools and techniques for data science
Tools and techniques for data scienceTools and techniques for data science
Tools and techniques for data scienceAjay Ohri
 
Big Data on Public Cloud
Big Data on Public CloudBig Data on Public Cloud
Big Data on Public CloudIMC Institute
 
Career opportunities in open source framework
Career opportunities in open source frameworkCareer opportunities in open source framework
Career opportunities in open source frameworkedunextgen
 
Career opportunities in open source framework
Career opportunities in open source framework Career opportunities in open source framework
Career opportunities in open source framework edunextgen
 
Strata EU 2014: Spark Streaming Case Studies
Strata EU 2014: Spark Streaming Case StudiesStrata EU 2014: Spark Streaming Case Studies
Strata EU 2014: Spark Streaming Case StudiesPaco Nathan
 

Similaire à Analysis-of-Major-Trends-in-big-data-analytics-slim-baltagi-hadoop-summit (20)

Overview of Apache Flink: the 4G of Big Data Analytics Frameworks
Overview of Apache Flink: the 4G of Big Data Analytics FrameworksOverview of Apache Flink: the 4G of Big Data Analytics Frameworks
Overview of Apache Flink: the 4G of Big Data Analytics Frameworks
 
Overview of Apache Fink: the 4 G of Big Data Analytics Frameworks
Overview of Apache Fink: the 4 G of Big Data Analytics FrameworksOverview of Apache Fink: the 4 G of Big Data Analytics Frameworks
Overview of Apache Fink: the 4 G of Big Data Analytics Frameworks
 
Overview of Apache Fink: The 4G of Big Data Analytics Frameworks
Overview of Apache Fink: The 4G of Big Data Analytics FrameworksOverview of Apache Fink: The 4G of Big Data Analytics Frameworks
Overview of Apache Fink: The 4G of Big Data Analytics Frameworks
 
ALT-F1.BE : The Accelerator (Google Cloud Platform)
ALT-F1.BE : The Accelerator (Google Cloud Platform)ALT-F1.BE : The Accelerator (Google Cloud Platform)
ALT-F1.BE : The Accelerator (Google Cloud Platform)
 
Big Data Technologies.pdf
Big Data Technologies.pdfBig Data Technologies.pdf
Big Data Technologies.pdf
 
OCC Overview OMG Clouds Meeting 07-13-09 v3
OCC Overview OMG Clouds Meeting 07-13-09 v3OCC Overview OMG Clouds Meeting 07-13-09 v3
OCC Overview OMG Clouds Meeting 07-13-09 v3
 
Ss eb29
Ss eb29Ss eb29
Ss eb29
 
Big Data Session 1.pptx
Big Data Session 1.pptxBig Data Session 1.pptx
Big Data Session 1.pptx
 
Memory Management in BigData: A Perpective View
Memory Management in BigData: A Perpective ViewMemory Management in BigData: A Perpective View
Memory Management in BigData: A Perpective View
 
OOP 2014
OOP 2014OOP 2014
OOP 2014
 
Top 10 renowned big data companies
Top 10 renowned big data companiesTop 10 renowned big data companies
Top 10 renowned big data companies
 
Introduction to pyspark new
Introduction to pyspark newIntroduction to pyspark new
Introduction to pyspark new
 
Smart Enterprise Big Data Bus for the Modern Responsive Enterprise- StreamAna...
Smart Enterprise Big Data Bus for the Modern Responsive Enterprise- StreamAna...Smart Enterprise Big Data Bus for the Modern Responsive Enterprise- StreamAna...
Smart Enterprise Big Data Bus for the Modern Responsive Enterprise- StreamAna...
 
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
 
Tools and techniques for data science
Tools and techniques for data scienceTools and techniques for data science
Tools and techniques for data science
 
Big Data on Public Cloud
Big Data on Public CloudBig Data on Public Cloud
Big Data on Public Cloud
 
Career opportunities in open source framework
Career opportunities in open source frameworkCareer opportunities in open source framework
Career opportunities in open source framework
 
Career opportunities in open source framework
Career opportunities in open source framework Career opportunities in open source framework
Career opportunities in open source framework
 
Strata EU 2014: Spark Streaming Case Studies
Strata EU 2014: Spark Streaming Case StudiesStrata EU 2014: Spark Streaming Case Studies
Strata EU 2014: Spark Streaming Case Studies
 
Streaming analytics
Streaming analyticsStreaming analytics
Streaming analytics
 

Plus de Slim Baltagi

How to select a modern data warehouse and get the most out of it?
How to select a modern data warehouse and get the most out of it?How to select a modern data warehouse and get the most out of it?
How to select a modern data warehouse and get the most out of it?Slim Baltagi
 
Modern-Data-Warehouses-In-The-Cloud-Use-Cases-Slim-Baltagi
Modern-Data-Warehouses-In-The-Cloud-Use-Cases-Slim-BaltagiModern-Data-Warehouses-In-The-Cloud-Use-Cases-Slim-Baltagi
Modern-Data-Warehouses-In-The-Cloud-Use-Cases-Slim-BaltagiSlim Baltagi
 
Modern big data and machine learning in the era of cloud, docker and kubernetes
Modern big data and machine learning in the era of cloud, docker and kubernetesModern big data and machine learning in the era of cloud, docker and kubernetes
Modern big data and machine learning in the era of cloud, docker and kubernetesSlim Baltagi
 
Apache Fink 1.0: A New Era for Real-World Streaming Analytics
Apache Fink 1.0: A New Era  for Real-World Streaming AnalyticsApache Fink 1.0: A New Era  for Real-World Streaming Analytics
Apache Fink 1.0: A New Era for Real-World Streaming AnalyticsSlim Baltagi
 
Apache Flink community Update for March 2016 - Slim Baltagi
Apache Flink community Update for March 2016 - Slim BaltagiApache Flink community Update for March 2016 - Slim Baltagi
Apache Flink community Update for March 2016 - Slim BaltagiSlim Baltagi
 
Step-by-Step Introduction to Apache Flink
Step-by-Step Introduction to Apache Flink Step-by-Step Introduction to Apache Flink
Step-by-Step Introduction to Apache Flink Slim Baltagi
 
Unified Batch and Real-Time Stream Processing Using Apache Flink
Unified Batch and Real-Time Stream Processing Using Apache FlinkUnified Batch and Real-Time Stream Processing Using Apache Flink
Unified Batch and Real-Time Stream Processing Using Apache FlinkSlim Baltagi
 
Apache Flink Crash Course by Slim Baltagi and Srini Palthepu
Apache Flink Crash Course by Slim Baltagi and Srini PalthepuApache Flink Crash Course by Slim Baltagi and Srini Palthepu
Apache Flink Crash Course by Slim Baltagi and Srini PalthepuSlim Baltagi
 
Overview of Apache Flink: Next-Gen Big Data Analytics Framework
Overview of Apache Flink: Next-Gen Big Data Analytics FrameworkOverview of Apache Flink: Next-Gen Big Data Analytics Framework
Overview of Apache Flink: Next-Gen Big Data Analytics FrameworkSlim Baltagi
 
Big Data at CME Group: Challenges and Opportunities
Big Data at CME Group: Challenges and Opportunities Big Data at CME Group: Challenges and Opportunities
Big Data at CME Group: Challenges and Opportunities Slim Baltagi
 
Transitioning Compute Models: Hadoop MapReduce to Spark
Transitioning Compute Models: Hadoop MapReduce to SparkTransitioning Compute Models: Hadoop MapReduce to Spark
Transitioning Compute Models: Hadoop MapReduce to SparkSlim Baltagi
 
A Big Data Journey: Bringing Open Source to Finance
A Big Data Journey: Bringing Open Source to FinanceA Big Data Journey: Bringing Open Source to Finance
A Big Data Journey: Bringing Open Source to FinanceSlim Baltagi
 

Plus de Slim Baltagi (12)

How to select a modern data warehouse and get the most out of it?
How to select a modern data warehouse and get the most out of it?How to select a modern data warehouse and get the most out of it?
How to select a modern data warehouse and get the most out of it?
 
Modern-Data-Warehouses-In-The-Cloud-Use-Cases-Slim-Baltagi
Modern-Data-Warehouses-In-The-Cloud-Use-Cases-Slim-BaltagiModern-Data-Warehouses-In-The-Cloud-Use-Cases-Slim-Baltagi
Modern-Data-Warehouses-In-The-Cloud-Use-Cases-Slim-Baltagi
 
Modern big data and machine learning in the era of cloud, docker and kubernetes
Modern big data and machine learning in the era of cloud, docker and kubernetesModern big data and machine learning in the era of cloud, docker and kubernetes
Modern big data and machine learning in the era of cloud, docker and kubernetes
 
Apache Fink 1.0: A New Era for Real-World Streaming Analytics
Apache Fink 1.0: A New Era  for Real-World Streaming AnalyticsApache Fink 1.0: A New Era  for Real-World Streaming Analytics
Apache Fink 1.0: A New Era for Real-World Streaming Analytics
 
Apache Flink community Update for March 2016 - Slim Baltagi
Apache Flink community Update for March 2016 - Slim BaltagiApache Flink community Update for March 2016 - Slim Baltagi
Apache Flink community Update for March 2016 - Slim Baltagi
 
Step-by-Step Introduction to Apache Flink
Step-by-Step Introduction to Apache Flink Step-by-Step Introduction to Apache Flink
Step-by-Step Introduction to Apache Flink
 
Unified Batch and Real-Time Stream Processing Using Apache Flink
Unified Batch and Real-Time Stream Processing Using Apache FlinkUnified Batch and Real-Time Stream Processing Using Apache Flink
Unified Batch and Real-Time Stream Processing Using Apache Flink
 
Apache Flink Crash Course by Slim Baltagi and Srini Palthepu
Apache Flink Crash Course by Slim Baltagi and Srini PalthepuApache Flink Crash Course by Slim Baltagi and Srini Palthepu
Apache Flink Crash Course by Slim Baltagi and Srini Palthepu
 
Overview of Apache Flink: Next-Gen Big Data Analytics Framework
Overview of Apache Flink: Next-Gen Big Data Analytics FrameworkOverview of Apache Flink: Next-Gen Big Data Analytics Framework
Overview of Apache Flink: Next-Gen Big Data Analytics Framework
 
Big Data at CME Group: Challenges and Opportunities
Big Data at CME Group: Challenges and Opportunities Big Data at CME Group: Challenges and Opportunities
Big Data at CME Group: Challenges and Opportunities
 
Transitioning Compute Models: Hadoop MapReduce to Spark
Transitioning Compute Models: Hadoop MapReduce to SparkTransitioning Compute Models: Hadoop MapReduce to Spark
Transitioning Compute Models: Hadoop MapReduce to Spark
 
A Big Data Journey: Bringing Open Source to Finance
A Big Data Journey: Bringing Open Source to FinanceA Big Data Journey: Bringing Open Source to Finance
A Big Data Journey: Bringing Open Source to Finance
 

Dernier

Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectBoston Institute of Analytics
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa
 
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSINGmarianagonzalez07
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Jack DiGiovanna
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Sapana Sha
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...limedy534
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 217djon017
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceSapana Sha
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]📊 Markus Baersch
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改yuu sss
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdfHuman37
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfgstagge
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPramod Kumar Srivastava
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queensdataanalyticsqueen03
 
IMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptxIMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptxdolaknnilon
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一F sss
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档208367051
 
Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfchwongval
 

Dernier (20)

Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis Project
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
 
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts Service
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdf
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
 
E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queens
 
IMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptxIMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptx
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
 
Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdf
 

Analysis-of-Major-Trends-in-big-data-analytics-slim-baltagi-hadoop-summit

  • 1. Hadoop Summit San Jose, California June 28th 2016 Analysis of Major Trends in Big Data Analytics Slim Baltagi Director, Enterprise Architecture Capital One Financial Corporation
  • 2. Welcome! About me: • I’m currently director of Enterprise Architecture at Capital One: a top 10 US financial corporation based in McLean, VA. • I have over 20 years of IT experience. • I have over 7 years of Big Data experience: Engineer, Architect, Evangelist, Blogger, Thought Leader, Speaker, Organizer of Apache Flink meetups in many countries, Creator and maintainer of the Big Data Knowledge Base: http://SparkBigData.com with over 7,000 categorized web resources about Hadoop, Spark, Flink, … Thanks: This talk won the community vote of the ‘Future of Apache Hadoop’ track. Thanks to all of you who: voted for this talk, attending this talk now, reading these slides. Disclaimer: This is a vendor-independent talk that expresses my own opinions. I am not endorsing nor promoting any product or vendor mentioned in this talk.2
  • 3. Agenda 1. Portability between Big Data Execution Engines 2. Emergence of stream analytics 3. In-Memory analytics 4. Rapid Application Development of Big Data applications 5. Open sourcing Machine Learning systems by tech giants 6. Hybrid Cloud Computing 3
  • 4. What is a typical Big Data Analytics Stack: Hadoop, Spark, Flink, …? 4
  • 5. 1. Portability between Big Data Execution Engines If you have an existing Big Data application based on MapReduce and you want to benefit from a different execution engine such as Tez, Spark or Flink, you might need to: • Reuse some of your existing code such as mapper and reduce functions. • Leverage a ‘compatibility layer’ to run your existing Big Data application on the new engine. Example: Hadoop Compatibility Layer from Flink • Switch to a different engine if the tool you used supports it. Example: Hive/Pig on Tez, Hive/Pig on Spark, Sqoop on Spark, Cascading on Flink. • Rewrite your Big Data application! 5
  • 6. 1. Portability between Big Data Execution Engines Apache Beam (unified Batch and Stream processing) is a new Apache incubator project based on years of experience developing Big Data infrastructure (MapReduce, FlumeJava, MillWheel) within Google http://beam.incubator.apache.org/ Apache Beam provides a unified API for Batch and Stream processing and also multiple runners. Beam programs become portable across multiple runtime environments, both proprietary (e.g., Google Cloud Dataflow) and open-source (e.g., Flink, Spark). Apache Beam web resourceshttp://sparkbigdata.com/component/tags/tag/67 6
  • 7. Agenda 1. Portability between Big Data Execution Engines 2. Emergence of stream analytics 3. In-Memory analytics 4. Rapid Application Development of Big Data applications 5. Open sourcing Machine Learning systems by tech giants 6. Hybrid Cloud Computing 7
  • 8. 2. Emergence of stream analytics Stonebraker et al. predicted in 2005 that stream processing is going to become increasingly important and attributed this to the ‘sensorization of the real world: everything of material significance on the planet get ‘sensor-tagged’ and report its state or location in real time’. http://cs.brown.edu/~ugur/8rulesSigRec.pdf I think stream processing is becoming important not only because of this sensorization of the real world but also because of the following factors: 1. Data streams 2. Technology 3. Business 4. Consumers 8
  • 9. 2. Emergence of stream analytics ConsumersData Streams Technology Business1 2 3 4 Emergence of Stream Analytics 9
  • 10. 2. Emergence of stream analytics 1 Data Streams  Real-world data is available as series of events that are continuously produced by a variety of applications and disparate systems inside and outside the enterprise.  Examples: • Sensor networks data • Web logs • Database transactions • System logs • Tweets and social media data • Click streams • Mobile apps data 10
  • 11. 2. Emergence of stream analytics 2 Technology Simplified data architecture with Apache Kafka as a major innovation and backbone of stream architectures. Rapidly maturing open source stream analytics tools: Apache Flink, Apache Apex, Spark Streaming, Kafka Streams, Apache Samza, Apache Storm, Apache Gearpump, Heron, … Cloud services for stream processing: Google Cloud Dataflow, Microsoft’s Azure Stream Analytics, Amazon Kinesis Streams, IBM InfoSphere Streams, … Vendors innovating in this space: Confluent, Data Artisans, Databricks, MapR, Hortonworks, StreamSets, … More mobile devices than human beings! 11
  • 12. 2. Emergence of stream analytics 3 Business Challenges: Lag between data creation and actionable insights. Infrastructure is idle most of the time Web and mobile application growth, new types/sources of data. Need of organizations to shift from reactive approach to a more of a proactive approach to interactions with customers, suppliers and employees. 12
  • 13. 2. Emergence of stream analytics 3 Business Opportunities: Embracing stream analytics helps organizations with faster time to insight, competitive advantages and operational efficiency in a wide range of verticals. With stream analytics, new startups are/will be challenging established companies. Example: Pay-As- You-Go insurance or Usage-Based Auto Insurance Speed is said to have become the new currency of business. 13
  • 14. 2. Emergence of stream analytics 4 Consumers Consumers expect everything to be online and immediately accessible through mobile applications. Mobile, always-on consumers are becoming more and more demanding for instant responses from enterprise applications in the way they are used to in mobile applications from social networks such as Twitter, Facebook, Linkedin … Younger generation who grow up with video gaming and accustomed to real-time interaction are now themselves a growing class of consumers. 14
  • 15. 2. Emergence of stream analytics  Financial services  Telecommunications  Online gaming systems  Security & Intelligence  Advertisement serving  Sensor Networks  Social Media  Healthcare  Oil & Gas  Retail & eCommerce  Transportation and logistics
  • 16. Stream Processor Business Applications (e.g. Enterprise Command Center) Personal Mobile Applications Data Lake Event Collector & Broker Advanced Analytics & Machine Learning Real-Time Notifications Real-Time DecisionsApps Sensors Devices Other Sources Business System Backend Dashboards Sourcing & Integration Analytics & Processing Serving & Consuming 16 End-to-end stream analytics solution architecture 2. Emergence of stream analytics
  • 17. Agenda 1. Portability between Big Data Execution Engines 2. Emergence of stream analytics 3. In-Memory analytics 4. Rapid Application Development of Big Data applications 5. Open sourcing Machine Learning systems by tech giants 6. Hybrid Cloud Computing 17
  • 18. 3. In-Memory Analytics While In-Memory Analytics are not new, the trend is that they are the focus of renewed attention thanks to: • the availability of new memory that could easily fit most active data sets • the maturing or newly available in-memory open source tools in many categories such as:  Memory-centric distributed File System  Columnar data format  Key Value data stores  IMDG: In-Memory Data Grids  Distributed Cache  Very Large Hashmaps In the next couple slides, I will share a few examples 18
  • 19. 3. In-Memory Analytics Alluxio http://alluxio.org (formerly known as Tachyon) is an open source memory speed virtual distributed storage system. Example of its usage patterns: • Accelerate Big Data Analytics workloads by prefetching views and creating caches on demand. • Sharing data between applications by writing to Alluxio’s in-memory data store and read it back at far greater speed.  Rocks DB https://github.com/facebook/rocksdb/ An open source library from Facebook that provides an embeddable, persistent key-value store. It is suited for fast storage of data on RAM and flash drives. It is used as state backend by Samza, Flink, Kafka Streams, … 19
  • 20. 3. In-Memory Analytics Apache Arrow (http://arrow.apache.org/) for columnar in- memory analytics. • Apache Arrow enables execution engines to take advantage of the latest SIMD (Single Input Multiple Data) operations included in modern processors, for native vectorized optimization of analytical data processing. • Columnar layout of data also allows for a better use of CPU caches by placing all data relevant to a column operation in as compact of a format as possible. • Apache Arrow advantages is that systems utilizing it as a common memory format have no overhead for cross-system data communication and also can share functionality. 20
  • 21. Agenda 1. Portability between Big Data Execution Engines 2. Emergence of stream analytics frameworks 3. In-Memory analytics 4. Rapid Application Development of Big Data applications 5. Open sourcing Machine Learning systems by tech giants 6. Deployment of Big Data applications in a hybrid model: on-premise and on the cloud 21
  • 22. 4. Rapid Application Development of Big Data applications MicroservicesAPIs Notebooks /Shells GUIs1 2 3 4 Rapid Applications Development of Big Data Analytics 22
  • 23. 4. Rapid Application Development of Big Data applications 1 APIs  Apache Spark and Apache Flink provide high level and easy to use API compared to Hadoop MapReduce  Apache Beam is a new open source project from Google that attempts to unify data processing frameworks with a core API, allowing easy portability between execution engines.  Use Apache Beam unified API for batch and streaming and then run on a local runner, Apache Spark, Apache Flink, …  The biggest advantage is in developer productivity and ease of migration between processing engines. 23
  • 24. 4. Rapid Application Development of Big Data applications 2 Shells or Notebooks • REPL (Read Evaluate Print Loop) interpreter • Interactive queries • Explore data quickly • Sketch out your ideas in the shell to make sure you’ve got your code right before deploying it to a cluster. • Web-based interactive computation environment • Collaborative data analytics and visualization tool • Combines rich text, execution code, plots and rich media • Exploratory data science • Saving and replaying of written code 24
  • 25. 4. Rapid Application Development of Big Data applications 2 Shells or Notebooks Apache Zeppelin 25
  • 26. 4. Rapid Application Development of Big Data applications 3 GUIs  Apache Nifi 26
  • 27. 4. Rapid Application Development of Big Data applications 4 Microservices:  Microservices are an important trend in building larger systems by: • decomposing their functions into relatively simple, single purpose services • that asynchronously communicate via Apache Kafka as a message passing technology that avoid unwanted dependencies between these services.  This streaming architectural style provides agility as microservices can be built and maintained by small and cross-functional teams. 27
  • 28. Agenda 1. Portability between Big Data Execution Engines 2. Emergence of stream analytics frameworks 3. In-Memory analytics 4. Rapid Application Development of Big Data applications 5. Open sourcing Machine Learning systems by tech giants 6. Hybrid Cloud Computing 28
  • 29. 5. Open sourcing Machine Learning systems by tech giants Yahoo CaffeOnSpark Facebook Torch IBM SystemML Google TensorFlow1 2 3 5 Open sourcing machine learning systems by tech giants 29 4 Microsoft DMTK Amazon DSSTNE 6
  • 30. 5. Open sourcing Machine Learning systems by tech giants 1 Torch http://torch.ch/ is an open source Machine Learning library which provides a wide range of deep learning algorithms. Facebook donated its optimized deep learning modules to the Torch project on January 16, 2015. 2 Apache SystemML http://systemml.apache.org/ is a distributed and declarative machine learning platform. It was created in 2010 by IBM and donated as an open source Apache project on November 2nd, 2015. 3 TensorFlow is an open source machine learning library created by Google. https://www.tensorflow.org It was released under the Apache 2.0 open source license on November 9th, 2015 30
  • 31. 5. Open sourcing Machine Learning systems by tech giants 4 DMTK (Distributed Machine Learning Toolkit) allows models to be trained on multiple nodes at once. http://www.dmtk.io/ DMTK was open sourced by Microsoft on November 12, 2015. 5 CaffeOnSpark https://github.com/yahoo/CaffeOnSpark is an open source machine learning library created by Yahoo. It was open sourced on February 24th, 2016 DSSTNE (Deep Scalable Sparse Tensor Network Engine) “Destiny” is an Amazon developed library for building Deep Learning (DL) Machine Learning (ML) models. It was open sourced on May 11th, 2016 https://github.com/amznlabs/amazon-dsstne 31 6
  • 32. 5. Open sourcing Machine Learning systems by tech giants It is expected to see wider adoption of Machine Learning tools by companies besides these tech giants in a similar way that MapReduce and Hadoop helped making “Big Data” a part of just every company’s strategy! These tech giants are not pushing their machine learning systems for internal use only but they are racing to open source them, attract users and committers and advance the entire industry. This combined with deployment on commodity clusters will accelerate such adoption and as a result we will see new machine learning use cases especially building on deep learning that will transform multiple industries. 32
  • 33. Agenda 1. Portability between Big Data Execution Engines 2. Emergence of stream analytics frameworks 3. In-Memory analytics 4. Rapid Application Development of Big Data applications 5. Open sourcing Machine Learning systems by tech giants 6. Hybrid Cloud Computing 33
  • 34. 6. Hybrid Cloud Computing Cloud is becoming mainstream and software stack is adapting. Big Data applications will eventually all move to the cloud to benefit from agility, elasticity and on-demand computing! Meanwhile, companies need to advance their strategy for hybrid integration between cloud and on-premise deployments. Deployment of Big Data applications in a hybrid model: on-premise and on the cloud 34
  • 35. 6. Hybrid Cloud Computing The following are a few patterns for such hybrid integration: 1. Replicating data from SaaS apps to existing on- premise databases to be used by other on-premise applications such as analytics ones. 2. Integrating SaaS applications themselves with on- premise applications. 3. Hybrid Data Warehousing with the Cloud: move data from on-premise data warehouse to the cloud. 4. Real-Time analytics on streaming data: depending on your use case, you might keep your stream analytics infrastructure directly accessible on-premise for low latency.
  • 36. Key Takeaways 1. Adopt Apache Beam for easier development and portability between Big Data Execution Engines 2. Adopt stream analytics for faster time to insight, competitive advantages and operational efficiency 3. Accelerate your Big Data applications with In-Memory open source tools 4. Adopt Rapid Application Development of Big Data applications: APIs, Notebooks, GUIs, Microservices… 5. Have Machine Learning part of your strategy or passively watch your industry completely transformed! 6. How to advance your strategy for hybrid integration between cloud and on-premise deployments? 36
  • 37. Thanks! To all of you for attending! Any questions? Let’s keep in touch! • sbaltagi@gmail.com • @SlimBaltagi • https://www.linkedin.com/in/slimbaltagi 37