SlideShare une entreprise Scribd logo
1  sur  48
Télécharger pour lire hors ligne
Case Studies
on
Big-Data Processing and Data Streaming
By: Amir Sedighi
LinkedIn: http://linkedin.com/in/amirsedighi
Twitter: @amirsedighi
JUG - A.Sedighi - 2015 2 / 48
Background
● BS and MS degrees in Software Engineering
● Senior Software Engineer
– +20 Years of Programming Experience
● Cross-platform Software Development
– +4 Years of Big-Data Processing and Machine-Learning Experience
● Log Management and Forensic
● Big-Data Visualization
● Data Warehouse using Big-Data Technologies
● Recommender Systems
● Analytical Real-Time Search Engines
● Integrating Fedora Digital Library with HDFS
● Next Generation Event Processing
● Online Resume
– http://linkedin.com/in/amirsedighi
JUG - A.Sedighi - 2015 3 / 48
Outline
● An Introduction to Big-Data Processing
● Big-Data and Processing and Data Streaming
– Data Processing
1. +TB Scale Data Warehouse
2. Analytical Real-Time Search Solution and BI
3. Scaleable Recommender System
4. Integrating Fedora Digital Library with HDFS
– Stream and Event Processing
1. Super Fast Scaleable Log Management, Forensic and BI
2. Super Fast Scaleable Fraud Detection
JUG - A.Sedighi - 2015 4 / 48
What Big-Data Is?
JUG - A.Sedighi - 2015 5 / 48
● Every 2 Days Human Create As Much Information As We Did
Up To 2003 - Eric Schmidt
JUG - A.Sedighi - 2015 6 / 48
Big-Data Characteristics
● Volume
● Variety
● Velocity
JUG - A.Sedighi - 2015 7 / 48
You're a Part of It Every Day
● We've have the ability to store anything
● Companies and people are generating data like
never before in history
– Social Networks
– Online Web Portals
– Log Writers - Our Digital Footprint!
JUG - A.Sedighi - 2015 8 / 48
You're a Part of It Every Day
● Big-Data is whatever people do in the digital world,
including the foot print of what people, companies,
devices and services do (Logs), including traditional
tabular data stores.
JUG - A.Sedighi - 2015 9 / 48
As a Manager still You're a Part of It
● “Over half of the business leaders today, realize they
don't have access to the insights they need to do their
job.” - IBM
JUG - A.Sedighi - 2015 10 / 48
Vertical or Horizontal?
JUG - A.Sedighi - 2015 11 / 48
Scale Up or Scale Out
JUG - A.Sedighi - 2015 12 / 48
Linear Scalability
JUG - A.Sedighi - 2015 13 / 48
Big-Data Processing Solutions
JUG - A.Sedighi - 2015 14 / 48
Q: How To Be Linear Scaleable on Commodity
Machines?
A: MapReduce
JUG - A.Sedighi - 2015 15 / 48
Q: How to store big data on commodity machines?
A: Distributed File System
JUG - A.Sedighi - 2015 16 / 48
Replication → Fault Tolerant
Replication → Data Locality → Utilization
JUG - A.Sedighi - 2015 17 / 48
Big-Data Processing, Most Popular
Technologies
● Apache Hadoop Ecosystem
● NoSQL Databases
– HBase
– Cassandra
– MongoDB
– Neo4j
● Elasticsearch
– Lucene
– SolR
● Java
JUG - A.Sedighi - 2015 18 / 48
+TB Scale Data Warehouse
1
JUG - A.Sedighi - 2015 19 / 48
DW Solution
● SQL
● ETL
– RDBMS
– NoSQL
– File System
● REST API
JUG - A.Sedighi - 2015 20 / 48
REST Admin Panel
JUG - A.Sedighi - 2015 21 / 48
Features
● Extendable Capacity for Data Warehousing
● Making Very Big Integrated Databases Based on Different
Technologies/Schemas
– DB2, Oracle, MS-SQL …
– Different Schemas Such as HRMS, Banking, Sales...
– Making Small Dense Integrated RDBMSs
● SQL Language Interface
● Linear Scalability
JUG - A.Sedighi - 2015 22 / 48
Main Technologies and Frameworks
● Apache Hadoop
– Sqoop
– YARN/HDFS
– Hive or Drill or Impala
● Microservices Architecture
– Java 1.7
– Spring Boot
JUG - A.Sedighi - 2015 23 / 48
Analytical Real-Time Scalable Search Solution
and BI
2
JUG - A.Sedighi - 2015 24 / 48
+TB Scale RT Searching
● Indexing Incoming Data on-the-fly
● Highly Scaleable and Reliable
● Simple or Complex Queries
● REST API
● Schema Agnostic
● Customizable GUI and BI
JUG - A.Sedighi - 2015 25 / 48
Business Intelligence
JUG - A.Sedighi - 2015 26 / 48
Rich GUI
JUG - A.Sedighi - 2015 27 / 48
Main Technologies and Frameworks
● Elasticsearch
– Apache Lucene
– REST
● Kibana
JUG - A.Sedighi - 2015 28 / 48
Scalable Recommender System
3
JUG - A.Sedighi - 2015 29 / 48
Recommender System
● Value-added Service (Loyalty Services)
● Machine-Learning
– Clustering Throw Thousands of Nodes
● Apache Mahout
● Super Fast
JUG - A.Sedighi - 2015 30 / 48
How It Works?
JUG - A.Sedighi - 2015 31 / 48
Technologies and Frameworks
● Microservices Architecture
● Java 1.6
● Apache Mahout
● Redis
Fedora Digital Library and HDFS Integration
4
Migrating from Expensive Servers to Commodity
Machines
● Making HDFS as Fedora Digital Library Storage
– Research and Development
– Hadoop 1.2, Later Hadoop YARN 2.2
– Integrating with SolR over HDFS
● Java 1.7
● Fedora
– Islandora
– GSearch
JUG - A.Sedighi - 2015 34 / 48
Data Streaming
JUG - A.Sedighi - 2015 35 / 48
Big-Data Streaming, Most Popular Technologies
● Piping and Messaging
– Kafka, Flume, FluentD and ZeroMQ
● Stream Processing
– Storm, Samza and Spark
● Machine Learning
– Machine Learning: MLLib and Mahout
● Persisting
– NoSQL DBs
– HDFS
JUG - A.Sedighi - 2015 36 / 48
Log Management, Forensic and BI
1
JUG - A.Sedighi - 2015 37 / 48
Log Management, Forensic and BI
● Every Digital Stuff Writes Things Into Log Files
– Log Files Are Streams of Data
– Log Files Are Messy
– Log Files Come Very Fast, in an Un-Predictable Manner
– Log Files Are About Everything within Your Business
● Log Files Are Full of Insight
– Who Can Hold Them For a Reasonable Period of Time
– Who Can Search Them Rapidly
– Who Can Visualize Them Easily (BI)
JUG - A.Sedighi - 2015 38 / 48
Network Topology
LB
Masters
Data
JUG - A.Sedighi - 2015 39 / 48
Main Technologies and Frameworks
● LogStash
– Flume
● Elasticsearch
● Kibana
JUG - A.Sedighi - 2015 40 / 48
Snapshot
JUG - A.Sedighi - 2015 41 / 48
Fraud Detection
2
JUG - A.Sedighi - 2015 42 / 48
Inputs & Outputs
● Inputs: One or multiple sources generate data continuously, in
real time
– Sensor Networks
– Transaction Logs
– Text Streams such as News
– Network Traffic Analysis
● Outputs: Up-to-date Answers generated continuously or
periodically
JUG - A.Sedighi - 2015 43 / 48
Data Processing
Transient Query
– Issued once, then forgotten
Persistent Data
Stored until deleted by user or apps
JUG - A.Sedighi - 2015 44 / 48
Stream Processing
Transient Data
– Deleted as Window Slides
Forward
Generated up-to-date
answers as time goes on
Persistent Queries
TimeBased
CountBased
JUG - A.Sedighi - 2015 45 / 48
Features
● Scalability
● Real-Timing, (Only 1 Second delay at most)
● Super Fast Decision Making
● Implementing Complex Fraud Scenarios Aa Easy as Defining
Queries
● Uniform Api For Processing Old or Early Events
JUG - A.Sedighi - 2015 46 / 48
Main Technologies and Frameworks
● Java 1.7, Scala 2.11
● Apache Flume
● Apache Kafka
● Apache Spark
Where To Start?
● You need Big Amount of Data
● You need to change your mind
– Rack Space and Number of Servers, IO and Process Limitations
● You need To Understand Fundamentals
– Linux (Bash Script)
– Java is a Most, Python works and Scala is an advantage
– SQL and ETL
– MapReduce, Resource Management and Serialization Frameworks
– Apache Hadoop Ecosystem and Successors
JUG - A.Sedighi - 2015 48 / 48
Thank You!, Question?
http://slideshare.net/amirsedighi

Contenu connexe

Tendances

Graphs & the Police: How Law Enforcement Analyze Connected Data at Scale
Graphs & the Police: How Law Enforcement Analyze Connected Data at ScaleGraphs & the Police: How Law Enforcement Analyze Connected Data at Scale
Graphs & the Police: How Law Enforcement Analyze Connected Data at ScaleNeo4j
 
Understanding Big Data Analytics - solutions for growing businesses - Rafał M...
Understanding Big Data Analytics - solutions for growing businesses - Rafał M...Understanding Big Data Analytics - solutions for growing businesses - Rafał M...
Understanding Big Data Analytics - solutions for growing businesses - Rafał M...GetInData
 
Don't build a data science team
Don't build a data science teamDon't build a data science team
Don't build a data science teamLars Albertsson
 
Real-time Streaming Analytics: Business Value, Use Cases and Architectural Co...
Real-time Streaming Analytics: Business Value, Use Cases and Architectural Co...Real-time Streaming Analytics: Business Value, Use Cases and Architectural Co...
Real-time Streaming Analytics: Business Value, Use Cases and Architectural Co...Impetus Technologies
 
Building your data driven business with Reactive Marketing Technology
Building your data driven business with Reactive Marketing TechnologyBuilding your data driven business with Reactive Marketing Technology
Building your data driven business with Reactive Marketing TechnologyTrieu Nguyen
 
Advanced Analytics and Machine Learning with Data Virtualization
Advanced Analytics and Machine Learning with Data VirtualizationAdvanced Analytics and Machine Learning with Data Virtualization
Advanced Analytics and Machine Learning with Data VirtualizationDenodo
 
Bigdata based fraud detection
Bigdata based fraud detectionBigdata based fraud detection
Bigdata based fraud detectionMk Kim
 
Moving Beyond Batch: Transactional Databases for Real-time Data
Moving Beyond Batch: Transactional Databases for Real-time DataMoving Beyond Batch: Transactional Databases for Real-time Data
Moving Beyond Batch: Transactional Databases for Real-time DataVoltDB
 
Fighting financial fraud at Danske Bank with artificial intelligence
Fighting financial fraud at Danske Bank with artificial intelligenceFighting financial fraud at Danske Bank with artificial intelligence
Fighting financial fraud at Danske Bank with artificial intelligenceRon Bodkin
 
The Connected Data Imperative: An Introduction to Neo4j
The Connected Data Imperative: An Introduction to Neo4jThe Connected Data Imperative: An Introduction to Neo4j
The Connected Data Imperative: An Introduction to Neo4jNeo4j
 
Big Data Analytics and a Chartered Accountant
Big Data Analytics and a Chartered AccountantBig Data Analytics and a Chartered Accountant
Big Data Analytics and a Chartered AccountantBharath Rao
 
GraphTour Keynote, Emil Eifrem, CEO and Founder, Neo4j
GraphTour Keynote, Emil Eifrem, CEO and Founder, Neo4jGraphTour Keynote, Emil Eifrem, CEO and Founder, Neo4j
GraphTour Keynote, Emil Eifrem, CEO and Founder, Neo4jNeo4j
 
Snowplow the evolving data pipeline
Snowplow   the evolving data pipelineSnowplow   the evolving data pipeline
Snowplow the evolving data pipelineyalisassoon
 
nl.OUG Tech Experience 2017 - Introduction in Oracle Big Data Cloud Service
nl.OUG Tech Experience 2017 - Introduction in Oracle Big Data Cloud Servicenl.OUG Tech Experience 2017 - Introduction in Oracle Big Data Cloud Service
nl.OUG Tech Experience 2017 - Introduction in Oracle Big Data Cloud ServiceDaan Bakboord
 
WSO2Con EU 2016: An Effective Device Strategy to Accelerate your Business
WSO2Con EU 2016: An Effective Device Strategy to  Accelerate your BusinessWSO2Con EU 2016: An Effective Device Strategy to  Accelerate your Business
WSO2Con EU 2016: An Effective Device Strategy to Accelerate your BusinessWSO2
 
Autograph - Natural Signatures for Graph Modelling, Simon Brueckheimer, Ciena
Autograph - Natural Signatures for Graph Modelling, Simon Brueckheimer, CienaAutograph - Natural Signatures for Graph Modelling, Simon Brueckheimer, Ciena
Autograph - Natural Signatures for Graph Modelling, Simon Brueckheimer, CienaNeo4j
 
2017-01-08-scaling tribalknowledge
2017-01-08-scaling tribalknowledge2017-01-08-scaling tribalknowledge
2017-01-08-scaling tribalknowledgeChristopher Williams
 
Operationalized Analytics in the Enterprise
Operationalized Analytics in the EnterpriseOperationalized Analytics in the Enterprise
Operationalized Analytics in the EnterpriseRon Bodkin
 

Tendances (20)

Graphs & the Police: How Law Enforcement Analyze Connected Data at Scale
Graphs & the Police: How Law Enforcement Analyze Connected Data at ScaleGraphs & the Police: How Law Enforcement Analyze Connected Data at Scale
Graphs & the Police: How Law Enforcement Analyze Connected Data at Scale
 
Understanding Big Data Analytics - solutions for growing businesses - Rafał M...
Understanding Big Data Analytics - solutions for growing businesses - Rafał M...Understanding Big Data Analytics - solutions for growing businesses - Rafał M...
Understanding Big Data Analytics - solutions for growing businesses - Rafał M...
 
Don't build a data science team
Don't build a data science teamDon't build a data science team
Don't build a data science team
 
Real-time Streaming Analytics: Business Value, Use Cases and Architectural Co...
Real-time Streaming Analytics: Business Value, Use Cases and Architectural Co...Real-time Streaming Analytics: Business Value, Use Cases and Architectural Co...
Real-time Streaming Analytics: Business Value, Use Cases and Architectural Co...
 
Building your data driven business with Reactive Marketing Technology
Building your data driven business with Reactive Marketing TechnologyBuilding your data driven business with Reactive Marketing Technology
Building your data driven business with Reactive Marketing Technology
 
Advanced Analytics and Machine Learning with Data Virtualization
Advanced Analytics and Machine Learning with Data VirtualizationAdvanced Analytics and Machine Learning with Data Virtualization
Advanced Analytics and Machine Learning with Data Virtualization
 
Bigdata based fraud detection
Bigdata based fraud detectionBigdata based fraud detection
Bigdata based fraud detection
 
Moving Beyond Batch: Transactional Databases for Real-time Data
Moving Beyond Batch: Transactional Databases for Real-time DataMoving Beyond Batch: Transactional Databases for Real-time Data
Moving Beyond Batch: Transactional Databases for Real-time Data
 
Fighting financial fraud at Danske Bank with artificial intelligence
Fighting financial fraud at Danske Bank with artificial intelligenceFighting financial fraud at Danske Bank with artificial intelligence
Fighting financial fraud at Danske Bank with artificial intelligence
 
The Connected Data Imperative: An Introduction to Neo4j
The Connected Data Imperative: An Introduction to Neo4jThe Connected Data Imperative: An Introduction to Neo4j
The Connected Data Imperative: An Introduction to Neo4j
 
Big Data Analytics and a Chartered Accountant
Big Data Analytics and a Chartered AccountantBig Data Analytics and a Chartered Accountant
Big Data Analytics and a Chartered Accountant
 
GraphTour Keynote, Emil Eifrem, CEO and Founder, Neo4j
GraphTour Keynote, Emil Eifrem, CEO and Founder, Neo4jGraphTour Keynote, Emil Eifrem, CEO and Founder, Neo4j
GraphTour Keynote, Emil Eifrem, CEO and Founder, Neo4j
 
Snowplow the evolving data pipeline
Snowplow   the evolving data pipelineSnowplow   the evolving data pipeline
Snowplow the evolving data pipeline
 
nl.OUG Tech Experience 2017 - Introduction in Oracle Big Data Cloud Service
nl.OUG Tech Experience 2017 - Introduction in Oracle Big Data Cloud Servicenl.OUG Tech Experience 2017 - Introduction in Oracle Big Data Cloud Service
nl.OUG Tech Experience 2017 - Introduction in Oracle Big Data Cloud Service
 
WSO2Con EU 2016: An Effective Device Strategy to Accelerate your Business
WSO2Con EU 2016: An Effective Device Strategy to  Accelerate your BusinessWSO2Con EU 2016: An Effective Device Strategy to  Accelerate your Business
WSO2Con EU 2016: An Effective Device Strategy to Accelerate your Business
 
Autograph - Natural Signatures for Graph Modelling, Simon Brueckheimer, Ciena
Autograph - Natural Signatures for Graph Modelling, Simon Brueckheimer, CienaAutograph - Natural Signatures for Graph Modelling, Simon Brueckheimer, Ciena
Autograph - Natural Signatures for Graph Modelling, Simon Brueckheimer, Ciena
 
Tim scottkoenverheyenpresentation
Tim scottkoenverheyenpresentationTim scottkoenverheyenpresentation
Tim scottkoenverheyenpresentation
 
2017-01-08-scaling tribalknowledge
2017-01-08-scaling tribalknowledge2017-01-08-scaling tribalknowledge
2017-01-08-scaling tribalknowledge
 
Big Data Analytics: From Insights to Production
Big Data Analytics: From Insights to ProductionBig Data Analytics: From Insights to Production
Big Data Analytics: From Insights to Production
 
Operationalized Analytics in the Enterprise
Operationalized Analytics in the EnterpriseOperationalized Analytics in the Enterprise
Operationalized Analytics in the Enterprise
 

En vedette

An Introduction to Apache Kafka
An Introduction to Apache KafkaAn Introduction to Apache Kafka
An Introduction to Apache KafkaAmir Sedighi
 
Elasticsearch 1.x Cluster Installation (VirtualBox)
Elasticsearch 1.x Cluster Installation (VirtualBox)Elasticsearch 1.x Cluster Installation (VirtualBox)
Elasticsearch 1.x Cluster Installation (VirtualBox)Amir Sedighi
 
An introduction To Apache Spark
An introduction To Apache SparkAn introduction To Apache Spark
An introduction To Apache SparkAmir Sedighi
 
آشنایی با داده‌های بزرگ و تکنیک‌های برنامه‌سازی برای پردازش داده‌های بزرگ
آشنایی با داده‌های بزرگ و تکنیک‌های برنامه‌سازی برای پردازش داده‌های بزرگآشنایی با داده‌های بزرگ و تکنیک‌های برنامه‌سازی برای پردازش داده‌های بزرگ
آشنایی با داده‌های بزرگ و تکنیک‌های برنامه‌سازی برای پردازش داده‌های بزرگAmir Sedighi
 
Hadoop 2.x HDFS Cluster Installation (VirtualBox)
Hadoop 2.x  HDFS Cluster Installation (VirtualBox)Hadoop 2.x  HDFS Cluster Installation (VirtualBox)
Hadoop 2.x HDFS Cluster Installation (VirtualBox)Amir Sedighi
 
Big Data and Machine Learning Workshop - Day 5 @ UTACM
Big Data and Machine Learning Workshop - Day 5 @ UTACMBig Data and Machine Learning Workshop - Day 5 @ UTACM
Big Data and Machine Learning Workshop - Day 5 @ UTACMAmir Sedighi
 
Distributed Data Processing Workshop - SBU
Distributed Data Processing Workshop - SBUDistributed Data Processing Workshop - SBU
Distributed Data Processing Workshop - SBUAmir Sedighi
 
Big Data and Machine Learning Workshop - Day 7 @ UTACM
Big Data and Machine Learning Workshop - Day 7 @ UTACM Big Data and Machine Learning Workshop - Day 7 @ UTACM
Big Data and Machine Learning Workshop - Day 7 @ UTACM Amir Sedighi
 
An Introduction to Elasticsearch for Beginners
An Introduction to Elasticsearch for BeginnersAn Introduction to Elasticsearch for Beginners
An Introduction to Elasticsearch for BeginnersAmir Sedighi
 
Big Data Processing Utilizing Open-source Technologies - May 2015
Big Data Processing Utilizing Open-source Technologies - May 2015Big Data Processing Utilizing Open-source Technologies - May 2015
Big Data Processing Utilizing Open-source Technologies - May 2015Amir Sedighi
 

En vedette (11)

An Introduction to Apache Kafka
An Introduction to Apache KafkaAn Introduction to Apache Kafka
An Introduction to Apache Kafka
 
Elasticsearch 1.x Cluster Installation (VirtualBox)
Elasticsearch 1.x Cluster Installation (VirtualBox)Elasticsearch 1.x Cluster Installation (VirtualBox)
Elasticsearch 1.x Cluster Installation (VirtualBox)
 
An introduction To Apache Spark
An introduction To Apache SparkAn introduction To Apache Spark
An introduction To Apache Spark
 
آشنایی با داده‌های بزرگ و تکنیک‌های برنامه‌سازی برای پردازش داده‌های بزرگ
آشنایی با داده‌های بزرگ و تکنیک‌های برنامه‌سازی برای پردازش داده‌های بزرگآشنایی با داده‌های بزرگ و تکنیک‌های برنامه‌سازی برای پردازش داده‌های بزرگ
آشنایی با داده‌های بزرگ و تکنیک‌های برنامه‌سازی برای پردازش داده‌های بزرگ
 
Hadoop 2.x HDFS Cluster Installation (VirtualBox)
Hadoop 2.x  HDFS Cluster Installation (VirtualBox)Hadoop 2.x  HDFS Cluster Installation (VirtualBox)
Hadoop 2.x HDFS Cluster Installation (VirtualBox)
 
Big Data and Machine Learning Workshop - Day 5 @ UTACM
Big Data and Machine Learning Workshop - Day 5 @ UTACMBig Data and Machine Learning Workshop - Day 5 @ UTACM
Big Data and Machine Learning Workshop - Day 5 @ UTACM
 
Dark data
Dark dataDark data
Dark data
 
Distributed Data Processing Workshop - SBU
Distributed Data Processing Workshop - SBUDistributed Data Processing Workshop - SBU
Distributed Data Processing Workshop - SBU
 
Big Data and Machine Learning Workshop - Day 7 @ UTACM
Big Data and Machine Learning Workshop - Day 7 @ UTACM Big Data and Machine Learning Workshop - Day 7 @ UTACM
Big Data and Machine Learning Workshop - Day 7 @ UTACM
 
An Introduction to Elasticsearch for Beginners
An Introduction to Elasticsearch for BeginnersAn Introduction to Elasticsearch for Beginners
An Introduction to Elasticsearch for Beginners
 
Big Data Processing Utilizing Open-source Technologies - May 2015
Big Data Processing Utilizing Open-source Technologies - May 2015Big Data Processing Utilizing Open-source Technologies - May 2015
Big Data Processing Utilizing Open-source Technologies - May 2015
 

Similaire à Case Studies on Big-Data Processing and Streaming - Iranian Java User Group

Moving from BI to AI : For decision makers
Moving from BI to AI : For decision makersMoving from BI to AI : For decision makers
Moving from BI to AI : For decision makerszekeLabs Technologies
 
Ai and data migration as a service subhash bhat cwin18-india
Ai and data migration as a service subhash bhat cwin18-indiaAi and data migration as a service subhash bhat cwin18-india
Ai and data migration as a service subhash bhat cwin18-indiaCapgemini
 
Transition to a modern data platform
Transition to a modern data platform Transition to a modern data platform
Transition to a modern data platform Michael Ghen
 
Big Data - Hadoop and MapReduce for QA and testing by Aditya Garg
Big Data - Hadoop and MapReduce for QA and testing by Aditya GargBig Data - Hadoop and MapReduce for QA and testing by Aditya Garg
Big Data - Hadoop and MapReduce for QA and testing by Aditya GargQA or the Highway
 
How Big Insights and Watson Explorer Raise New Abilities to HR Departments
How Big Insights and Watson Explorer Raise New Abilities to HR DepartmentsHow Big Insights and Watson Explorer Raise New Abilities to HR Departments
How Big Insights and Watson Explorer Raise New Abilities to HR DepartmentsCapgemini
 
How Comcast Turns Big Data into Real Time Operational Insights: Winter Olympi...
How Comcast Turns Big Data into Real Time Operational Insights: Winter Olympi...How Comcast Turns Big Data into Real Time Operational Insights: Winter Olympi...
How Comcast Turns Big Data into Real Time Operational Insights: Winter Olympi...Brett Sheppard
 
Big Data overview
Big Data overviewBig Data overview
Big Data overviewalexisroos
 
TechEvent DWH Modernization
TechEvent DWH ModernizationTechEvent DWH Modernization
TechEvent DWH ModernizationTrivadis
 
Driving Real Insights Through Data Science
Driving Real Insights Through Data ScienceDriving Real Insights Through Data Science
Driving Real Insights Through Data ScienceVMware Tanzu
 
Prithvi Prabhu + Shivam Bansal, H2O.ai - Building Blocks for AI Applications ...
Prithvi Prabhu + Shivam Bansal, H2O.ai - Building Blocks for AI Applications ...Prithvi Prabhu + Shivam Bansal, H2O.ai - Building Blocks for AI Applications ...
Prithvi Prabhu + Shivam Bansal, H2O.ai - Building Blocks for AI Applications ...Sri Ambati
 
Ataas2016 - Big data hadoop and map reduce - new age tools for aid to test...
Ataas2016 - Big data   hadoop and map reduce  - new age tools for aid to test...Ataas2016 - Big data   hadoop and map reduce  - new age tools for aid to test...
Ataas2016 - Big data hadoop and map reduce - new age tools for aid to test...Agile Testing Alliance
 
Big Data – Is it a hype or for real?
 Big Data – Is it a hype or for real?  Big Data – Is it a hype or for real?
Big Data – Is it a hype or for real? Dirk Ortloff
 
Observability in real time at scale
Observability in real time at scaleObservability in real time at scale
Observability in real time at scaleBalvinder Hira
 
Nadine Schöne, Dataiku. The Complete Data Value Chain in a Nutshell
Nadine Schöne, Dataiku. The Complete Data Value Chain in a NutshellNadine Schöne, Dataiku. The Complete Data Value Chain in a Nutshell
Nadine Schöne, Dataiku. The Complete Data Value Chain in a NutshellIT Arena
 
Big and fast data strategy 2017 jr
Big and fast data strategy 2017 jrBig and fast data strategy 2017 jr
Big and fast data strategy 2017 jrJonathan Raspaud
 
The Lyft data platform: Now and in the future
The Lyft data platform: Now and in the futureThe Lyft data platform: Now and in the future
The Lyft data platform: Now and in the futuremarkgrover
 
Lyft data Platform - 2019 slides
Lyft data Platform - 2019 slidesLyft data Platform - 2019 slides
Lyft data Platform - 2019 slidesKarthik Murugesan
 
Why data warehouses cannot support hot analytics
Why data warehouses cannot support hot analyticsWhy data warehouses cannot support hot analytics
Why data warehouses cannot support hot analyticsImply
 
Big Data Day LA 2015 - Transforming into a data driven enterprise using exist...
Big Data Day LA 2015 - Transforming into a data driven enterprise using exist...Big Data Day LA 2015 - Transforming into a data driven enterprise using exist...
Big Data Day LA 2015 - Transforming into a data driven enterprise using exist...Data Con LA
 
How to Build An AI Based Customer Data Platform: Learn the design patterns fo...
How to Build An AI Based Customer Data Platform: Learn the design patterns fo...How to Build An AI Based Customer Data Platform: Learn the design patterns fo...
How to Build An AI Based Customer Data Platform: Learn the design patterns fo...TigerGraph
 

Similaire à Case Studies on Big-Data Processing and Streaming - Iranian Java User Group (20)

Moving from BI to AI : For decision makers
Moving from BI to AI : For decision makersMoving from BI to AI : For decision makers
Moving from BI to AI : For decision makers
 
Ai and data migration as a service subhash bhat cwin18-india
Ai and data migration as a service subhash bhat cwin18-indiaAi and data migration as a service subhash bhat cwin18-india
Ai and data migration as a service subhash bhat cwin18-india
 
Transition to a modern data platform
Transition to a modern data platform Transition to a modern data platform
Transition to a modern data platform
 
Big Data - Hadoop and MapReduce for QA and testing by Aditya Garg
Big Data - Hadoop and MapReduce for QA and testing by Aditya GargBig Data - Hadoop and MapReduce for QA and testing by Aditya Garg
Big Data - Hadoop and MapReduce for QA and testing by Aditya Garg
 
How Big Insights and Watson Explorer Raise New Abilities to HR Departments
How Big Insights and Watson Explorer Raise New Abilities to HR DepartmentsHow Big Insights and Watson Explorer Raise New Abilities to HR Departments
How Big Insights and Watson Explorer Raise New Abilities to HR Departments
 
How Comcast Turns Big Data into Real Time Operational Insights: Winter Olympi...
How Comcast Turns Big Data into Real Time Operational Insights: Winter Olympi...How Comcast Turns Big Data into Real Time Operational Insights: Winter Olympi...
How Comcast Turns Big Data into Real Time Operational Insights: Winter Olympi...
 
Big Data overview
Big Data overviewBig Data overview
Big Data overview
 
TechEvent DWH Modernization
TechEvent DWH ModernizationTechEvent DWH Modernization
TechEvent DWH Modernization
 
Driving Real Insights Through Data Science
Driving Real Insights Through Data ScienceDriving Real Insights Through Data Science
Driving Real Insights Through Data Science
 
Prithvi Prabhu + Shivam Bansal, H2O.ai - Building Blocks for AI Applications ...
Prithvi Prabhu + Shivam Bansal, H2O.ai - Building Blocks for AI Applications ...Prithvi Prabhu + Shivam Bansal, H2O.ai - Building Blocks for AI Applications ...
Prithvi Prabhu + Shivam Bansal, H2O.ai - Building Blocks for AI Applications ...
 
Ataas2016 - Big data hadoop and map reduce - new age tools for aid to test...
Ataas2016 - Big data   hadoop and map reduce  - new age tools for aid to test...Ataas2016 - Big data   hadoop and map reduce  - new age tools for aid to test...
Ataas2016 - Big data hadoop and map reduce - new age tools for aid to test...
 
Big Data – Is it a hype or for real?
 Big Data – Is it a hype or for real?  Big Data – Is it a hype or for real?
Big Data – Is it a hype or for real?
 
Observability in real time at scale
Observability in real time at scaleObservability in real time at scale
Observability in real time at scale
 
Nadine Schöne, Dataiku. The Complete Data Value Chain in a Nutshell
Nadine Schöne, Dataiku. The Complete Data Value Chain in a NutshellNadine Schöne, Dataiku. The Complete Data Value Chain in a Nutshell
Nadine Schöne, Dataiku. The Complete Data Value Chain in a Nutshell
 
Big and fast data strategy 2017 jr
Big and fast data strategy 2017 jrBig and fast data strategy 2017 jr
Big and fast data strategy 2017 jr
 
The Lyft data platform: Now and in the future
The Lyft data platform: Now and in the futureThe Lyft data platform: Now and in the future
The Lyft data platform: Now and in the future
 
Lyft data Platform - 2019 slides
Lyft data Platform - 2019 slidesLyft data Platform - 2019 slides
Lyft data Platform - 2019 slides
 
Why data warehouses cannot support hot analytics
Why data warehouses cannot support hot analyticsWhy data warehouses cannot support hot analytics
Why data warehouses cannot support hot analytics
 
Big Data Day LA 2015 - Transforming into a data driven enterprise using exist...
Big Data Day LA 2015 - Transforming into a data driven enterprise using exist...Big Data Day LA 2015 - Transforming into a data driven enterprise using exist...
Big Data Day LA 2015 - Transforming into a data driven enterprise using exist...
 
How to Build An AI Based Customer Data Platform: Learn the design patterns fo...
How to Build An AI Based Customer Data Platform: Learn the design patterns fo...How to Build An AI Based Customer Data Platform: Learn the design patterns fo...
How to Build An AI Based Customer Data Platform: Learn the design patterns fo...
 

Plus de Amir Sedighi

Big Data and Machine Learning Workshop - Day 6 @ UTACM
Big Data and Machine Learning Workshop - Day 6 @ UTACMBig Data and Machine Learning Workshop - Day 6 @ UTACM
Big Data and Machine Learning Workshop - Day 6 @ UTACMAmir Sedighi
 
Big Data and Machine Learning Workshop - Day 4 @ UTACM
Big Data and Machine Learning Workshop - Day 4 @ UTACM Big Data and Machine Learning Workshop - Day 4 @ UTACM
Big Data and Machine Learning Workshop - Day 4 @ UTACM Amir Sedighi
 
Big Data and Machine Learning Workshop - Day 3 @ UTACM
Big Data and Machine Learning Workshop - Day 3 @ UTACMBig Data and Machine Learning Workshop - Day 3 @ UTACM
Big Data and Machine Learning Workshop - Day 3 @ UTACMAmir Sedighi
 
Big Data and Machine Learning Workshop - Day 2 @ UTACM
Big Data and Machine Learning Workshop - Day 2 @ UTACMBig Data and Machine Learning Workshop - Day 2 @ UTACM
Big Data and Machine Learning Workshop - Day 2 @ UTACMAmir Sedighi
 
Big Data and Machine Learning Workshop - Day 1 @ UTACM
Big Data and Machine Learning Workshop - Day 1 @ UTACMBig Data and Machine Learning Workshop - Day 1 @ UTACM
Big Data and Machine Learning Workshop - Day 1 @ UTACMAmir Sedighi
 
Two Case Studies Big-Data and Machine Learning at Scale Solutions in Iran
Two Case Studies Big-Data and Machine Learning at Scale Solutions in IranTwo Case Studies Big-Data and Machine Learning at Scale Solutions in Iran
Two Case Studies Big-Data and Machine Learning at Scale Solutions in IranAmir Sedighi
 
Opensource Frameworks and BigData Processing
Opensource Frameworks and BigData ProcessingOpensource Frameworks and BigData Processing
Opensource Frameworks and BigData ProcessingAmir Sedighi
 
An introduction to Big-Data processing applying hadoop
An introduction to Big-Data processing applying hadoopAn introduction to Big-Data processing applying hadoop
An introduction to Big-Data processing applying hadoopAmir Sedighi
 

Plus de Amir Sedighi (8)

Big Data and Machine Learning Workshop - Day 6 @ UTACM
Big Data and Machine Learning Workshop - Day 6 @ UTACMBig Data and Machine Learning Workshop - Day 6 @ UTACM
Big Data and Machine Learning Workshop - Day 6 @ UTACM
 
Big Data and Machine Learning Workshop - Day 4 @ UTACM
Big Data and Machine Learning Workshop - Day 4 @ UTACM Big Data and Machine Learning Workshop - Day 4 @ UTACM
Big Data and Machine Learning Workshop - Day 4 @ UTACM
 
Big Data and Machine Learning Workshop - Day 3 @ UTACM
Big Data and Machine Learning Workshop - Day 3 @ UTACMBig Data and Machine Learning Workshop - Day 3 @ UTACM
Big Data and Machine Learning Workshop - Day 3 @ UTACM
 
Big Data and Machine Learning Workshop - Day 2 @ UTACM
Big Data and Machine Learning Workshop - Day 2 @ UTACMBig Data and Machine Learning Workshop - Day 2 @ UTACM
Big Data and Machine Learning Workshop - Day 2 @ UTACM
 
Big Data and Machine Learning Workshop - Day 1 @ UTACM
Big Data and Machine Learning Workshop - Day 1 @ UTACMBig Data and Machine Learning Workshop - Day 1 @ UTACM
Big Data and Machine Learning Workshop - Day 1 @ UTACM
 
Two Case Studies Big-Data and Machine Learning at Scale Solutions in Iran
Two Case Studies Big-Data and Machine Learning at Scale Solutions in IranTwo Case Studies Big-Data and Machine Learning at Scale Solutions in Iran
Two Case Studies Big-Data and Machine Learning at Scale Solutions in Iran
 
Opensource Frameworks and BigData Processing
Opensource Frameworks and BigData ProcessingOpensource Frameworks and BigData Processing
Opensource Frameworks and BigData Processing
 
An introduction to Big-Data processing applying hadoop
An introduction to Big-Data processing applying hadoopAn introduction to Big-Data processing applying hadoop
An introduction to Big-Data processing applying hadoop
 

Dernier

Exploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdfExploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdfproinshot.com
 
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesAI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesVictorSzoltysek
 
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnAmarnathKambale
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...Health
 
10 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 202410 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 2024Mind IT Systems
 
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...kalichargn70th171
 
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park %in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park masabamasaba
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfkalichargn70th171
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisamasabamasaba
 
LEVEL 5 - SESSION 1 2023 (1).pptx - PDF 123456
LEVEL 5   - SESSION 1 2023 (1).pptx - PDF 123456LEVEL 5   - SESSION 1 2023 (1).pptx - PDF 123456
LEVEL 5 - SESSION 1 2023 (1).pptx - PDF 123456KiaraTiradoMicha
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comFatema Valibhai
 
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
Direct Style Effect Systems -The Print[A] Example- A Comprehension AidDirect Style Effect Systems -The Print[A] Example- A Comprehension Aid
Direct Style Effect Systems - The Print[A] Example - A Comprehension AidPhilip Schwarz
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Modelsaagamshah0812
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️Delhi Call girls
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...ICS
 
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...SelfMade bd
 
The Top App Development Trends Shaping the Industry in 2024-25 .pdf
The Top App Development Trends Shaping the Industry in 2024-25 .pdfThe Top App Development Trends Shaping the Industry in 2024-25 .pdf
The Top App Development Trends Shaping the Industry in 2024-25 .pdfayushiqss
 
%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrandmasabamasaba
 
ManageIQ - Sprint 236 Review - Slide Deck
ManageIQ - Sprint 236 Review - Slide DeckManageIQ - Sprint 236 Review - Slide Deck
ManageIQ - Sprint 236 Review - Slide DeckManageIQ
 

Dernier (20)

Exploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdfExploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdf
 
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesAI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
 
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learn
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 
10 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 202410 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 2024
 
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...
 
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park %in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
 
LEVEL 5 - SESSION 1 2023 (1).pptx - PDF 123456
LEVEL 5   - SESSION 1 2023 (1).pptx - PDF 123456LEVEL 5   - SESSION 1 2023 (1).pptx - PDF 123456
LEVEL 5 - SESSION 1 2023 (1).pptx - PDF 123456
 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
Direct Style Effect Systems -The Print[A] Example- A Comprehension AidDirect Style Effect Systems -The Print[A] Example- A Comprehension Aid
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
 
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
 
The Top App Development Trends Shaping the Industry in 2024-25 .pdf
The Top App Development Trends Shaping the Industry in 2024-25 .pdfThe Top App Development Trends Shaping the Industry in 2024-25 .pdf
The Top App Development Trends Shaping the Industry in 2024-25 .pdf
 
%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand
 
ManageIQ - Sprint 236 Review - Slide Deck
ManageIQ - Sprint 236 Review - Slide DeckManageIQ - Sprint 236 Review - Slide Deck
ManageIQ - Sprint 236 Review - Slide Deck
 

Case Studies on Big-Data Processing and Streaming - Iranian Java User Group

  • 1. Case Studies on Big-Data Processing and Data Streaming By: Amir Sedighi LinkedIn: http://linkedin.com/in/amirsedighi Twitter: @amirsedighi
  • 2. JUG - A.Sedighi - 2015 2 / 48 Background ● BS and MS degrees in Software Engineering ● Senior Software Engineer – +20 Years of Programming Experience ● Cross-platform Software Development – +4 Years of Big-Data Processing and Machine-Learning Experience ● Log Management and Forensic ● Big-Data Visualization ● Data Warehouse using Big-Data Technologies ● Recommender Systems ● Analytical Real-Time Search Engines ● Integrating Fedora Digital Library with HDFS ● Next Generation Event Processing ● Online Resume – http://linkedin.com/in/amirsedighi
  • 3. JUG - A.Sedighi - 2015 3 / 48 Outline ● An Introduction to Big-Data Processing ● Big-Data and Processing and Data Streaming – Data Processing 1. +TB Scale Data Warehouse 2. Analytical Real-Time Search Solution and BI 3. Scaleable Recommender System 4. Integrating Fedora Digital Library with HDFS – Stream and Event Processing 1. Super Fast Scaleable Log Management, Forensic and BI 2. Super Fast Scaleable Fraud Detection
  • 4. JUG - A.Sedighi - 2015 4 / 48 What Big-Data Is?
  • 5. JUG - A.Sedighi - 2015 5 / 48 ● Every 2 Days Human Create As Much Information As We Did Up To 2003 - Eric Schmidt
  • 6. JUG - A.Sedighi - 2015 6 / 48 Big-Data Characteristics ● Volume ● Variety ● Velocity
  • 7. JUG - A.Sedighi - 2015 7 / 48 You're a Part of It Every Day ● We've have the ability to store anything ● Companies and people are generating data like never before in history – Social Networks – Online Web Portals – Log Writers - Our Digital Footprint!
  • 8. JUG - A.Sedighi - 2015 8 / 48 You're a Part of It Every Day ● Big-Data is whatever people do in the digital world, including the foot print of what people, companies, devices and services do (Logs), including traditional tabular data stores.
  • 9. JUG - A.Sedighi - 2015 9 / 48 As a Manager still You're a Part of It ● “Over half of the business leaders today, realize they don't have access to the insights they need to do their job.” - IBM
  • 10. JUG - A.Sedighi - 2015 10 / 48 Vertical or Horizontal?
  • 11. JUG - A.Sedighi - 2015 11 / 48 Scale Up or Scale Out
  • 12. JUG - A.Sedighi - 2015 12 / 48 Linear Scalability
  • 13. JUG - A.Sedighi - 2015 13 / 48 Big-Data Processing Solutions
  • 14. JUG - A.Sedighi - 2015 14 / 48 Q: How To Be Linear Scaleable on Commodity Machines? A: MapReduce
  • 15. JUG - A.Sedighi - 2015 15 / 48 Q: How to store big data on commodity machines? A: Distributed File System
  • 16. JUG - A.Sedighi - 2015 16 / 48 Replication → Fault Tolerant Replication → Data Locality → Utilization
  • 17. JUG - A.Sedighi - 2015 17 / 48 Big-Data Processing, Most Popular Technologies ● Apache Hadoop Ecosystem ● NoSQL Databases – HBase – Cassandra – MongoDB – Neo4j ● Elasticsearch – Lucene – SolR ● Java
  • 18. JUG - A.Sedighi - 2015 18 / 48 +TB Scale Data Warehouse 1
  • 19. JUG - A.Sedighi - 2015 19 / 48 DW Solution ● SQL ● ETL – RDBMS – NoSQL – File System ● REST API
  • 20. JUG - A.Sedighi - 2015 20 / 48 REST Admin Panel
  • 21. JUG - A.Sedighi - 2015 21 / 48 Features ● Extendable Capacity for Data Warehousing ● Making Very Big Integrated Databases Based on Different Technologies/Schemas – DB2, Oracle, MS-SQL … – Different Schemas Such as HRMS, Banking, Sales... – Making Small Dense Integrated RDBMSs ● SQL Language Interface ● Linear Scalability
  • 22. JUG - A.Sedighi - 2015 22 / 48 Main Technologies and Frameworks ● Apache Hadoop – Sqoop – YARN/HDFS – Hive or Drill or Impala ● Microservices Architecture – Java 1.7 – Spring Boot
  • 23. JUG - A.Sedighi - 2015 23 / 48 Analytical Real-Time Scalable Search Solution and BI 2
  • 24. JUG - A.Sedighi - 2015 24 / 48 +TB Scale RT Searching ● Indexing Incoming Data on-the-fly ● Highly Scaleable and Reliable ● Simple or Complex Queries ● REST API ● Schema Agnostic ● Customizable GUI and BI
  • 25. JUG - A.Sedighi - 2015 25 / 48 Business Intelligence
  • 26. JUG - A.Sedighi - 2015 26 / 48 Rich GUI
  • 27. JUG - A.Sedighi - 2015 27 / 48 Main Technologies and Frameworks ● Elasticsearch – Apache Lucene – REST ● Kibana
  • 28. JUG - A.Sedighi - 2015 28 / 48 Scalable Recommender System 3
  • 29. JUG - A.Sedighi - 2015 29 / 48 Recommender System ● Value-added Service (Loyalty Services) ● Machine-Learning – Clustering Throw Thousands of Nodes ● Apache Mahout ● Super Fast
  • 30. JUG - A.Sedighi - 2015 30 / 48 How It Works?
  • 31. JUG - A.Sedighi - 2015 31 / 48 Technologies and Frameworks ● Microservices Architecture ● Java 1.6 ● Apache Mahout ● Redis
  • 32. Fedora Digital Library and HDFS Integration 4
  • 33. Migrating from Expensive Servers to Commodity Machines ● Making HDFS as Fedora Digital Library Storage – Research and Development – Hadoop 1.2, Later Hadoop YARN 2.2 – Integrating with SolR over HDFS ● Java 1.7 ● Fedora – Islandora – GSearch
  • 34. JUG - A.Sedighi - 2015 34 / 48 Data Streaming
  • 35. JUG - A.Sedighi - 2015 35 / 48 Big-Data Streaming, Most Popular Technologies ● Piping and Messaging – Kafka, Flume, FluentD and ZeroMQ ● Stream Processing – Storm, Samza and Spark ● Machine Learning – Machine Learning: MLLib and Mahout ● Persisting – NoSQL DBs – HDFS
  • 36. JUG - A.Sedighi - 2015 36 / 48 Log Management, Forensic and BI 1
  • 37. JUG - A.Sedighi - 2015 37 / 48 Log Management, Forensic and BI ● Every Digital Stuff Writes Things Into Log Files – Log Files Are Streams of Data – Log Files Are Messy – Log Files Come Very Fast, in an Un-Predictable Manner – Log Files Are About Everything within Your Business ● Log Files Are Full of Insight – Who Can Hold Them For a Reasonable Period of Time – Who Can Search Them Rapidly – Who Can Visualize Them Easily (BI)
  • 38. JUG - A.Sedighi - 2015 38 / 48 Network Topology LB Masters Data
  • 39. JUG - A.Sedighi - 2015 39 / 48 Main Technologies and Frameworks ● LogStash – Flume ● Elasticsearch ● Kibana
  • 40. JUG - A.Sedighi - 2015 40 / 48 Snapshot
  • 41. JUG - A.Sedighi - 2015 41 / 48 Fraud Detection 2
  • 42. JUG - A.Sedighi - 2015 42 / 48 Inputs & Outputs ● Inputs: One or multiple sources generate data continuously, in real time – Sensor Networks – Transaction Logs – Text Streams such as News – Network Traffic Analysis ● Outputs: Up-to-date Answers generated continuously or periodically
  • 43. JUG - A.Sedighi - 2015 43 / 48 Data Processing Transient Query – Issued once, then forgotten Persistent Data Stored until deleted by user or apps
  • 44. JUG - A.Sedighi - 2015 44 / 48 Stream Processing Transient Data – Deleted as Window Slides Forward Generated up-to-date answers as time goes on Persistent Queries TimeBased CountBased
  • 45. JUG - A.Sedighi - 2015 45 / 48 Features ● Scalability ● Real-Timing, (Only 1 Second delay at most) ● Super Fast Decision Making ● Implementing Complex Fraud Scenarios Aa Easy as Defining Queries ● Uniform Api For Processing Old or Early Events
  • 46. JUG - A.Sedighi - 2015 46 / 48 Main Technologies and Frameworks ● Java 1.7, Scala 2.11 ● Apache Flume ● Apache Kafka ● Apache Spark
  • 47. Where To Start? ● You need Big Amount of Data ● You need to change your mind – Rack Space and Number of Servers, IO and Process Limitations ● You need To Understand Fundamentals – Linux (Bash Script) – Java is a Most, Python works and Scala is an advantage – SQL and ETL – MapReduce, Resource Management and Serialization Frameworks – Apache Hadoop Ecosystem and Successors
  • 48. JUG - A.Sedighi - 2015 48 / 48 Thank You!, Question? http://slideshare.net/amirsedighi