SlideShare une entreprise Scribd logo
1  sur  34
Impetus Technologies Inc. 
1 © 2014 Impetus Technologies 
Big Data Architectures 
Beyond the Elephant Ride 
Recorded version available at 
http://www.impetus.com/webinar_registration?event=archived&eid=60
Outline 
• The Hadoop ecosystem and challenges 
• Big Data solutions beyond Hadoop 
- How and where to use them? 
• Some use cases 
• Big Data Architecture Strategy 
2 © 2014 Impetus Technologies 
Recorded version available at 
http://www.impetus.com/webinar_registration?event=archived&eid=60
Disclaimer 
• Not advising to discard Hadoop 
• Will discuss Big Data technologies that complement and 
supplement Hadoop 
3 © 2014 Impetus Technologies 
Recorded version available at 
http://www.impetus.com/webinar_registration?event=archived&eid=60
What is Hadoop? 
Scalable data processing engine 
• DFS: Scalable fault-tolerant distributed file-system 
• Map Reduce: Parallel processing programming model 
4 © 2014 Impetus Technologies 
Recorded version available at 
http://www.impetus.com/webinar_registration?event=archived&eid=60
Hadoop Ecosystem 
5 © 2014 Impetus Technologies 
Recorded version available at 
http://www.impetus.com/webinar_registration?event=archived&eid=60
Where to use Hadoop? 
• Risk Analysis 
– Intrusion detection, Credit scoring 
• Recommendation 
– Customers who purchased this also liked 
6 © 2014 Impetus Technologies 
Recorded version available at 
http://www.impetus.com/webinar_registration?event=archived&eid=60
Where to use Hadoop? 
• Sentiment Analysis 
– Positive, Negative or Neutral sentiment in sentences 
• Targeted Ads 
– Display ads based on user behavior and preferences 
7 © 2014 Impetus Technologies 
Recorded version available at 
http://www.impetus.com/webinar_registration?event=archived&eid=60
Where to use Hadoop? 
• Machine Learning 
– Spam vs. Not Spam 
• And a lot of other areas… 
8 © 2014 Impetus Technologies 
Recorded version available at 
http://www.impetus.com/webinar_registration?event=archived&eid=60
Challenges with Hadoop 
9 © 2014 Impetus Technologies 
Recorded version available at 
http://www.impetus.com/webinar_registration?event=archived&eid=60
Limitations 
• Data security 
• Dependence on OS/Language 
• MapReduce programming 
• Batch processing only 
10 © 2014 Impetus Technologies 
Recorded version available at 
http://www.impetus.com/webinar_registration?event=archived&eid=60
11 © 2014 Impetus Technologies 
Beyond Hadoop 
Recorded version available at 
http://www.impetus.com/webinar_registration?event=archived&eid=60
Faster Hadoop 
• MapR 
– Simple to manage (NFS) 
– MapR Express Lane 
– Handles real-time data flows 
12 © 2014 Impetus Technologies 
Dominant Players – HortonWorks, 
Cloudera, Hadapt 
Recorded version available at 
http://www.impetus.com/webinar_registration?event=archived&eid=60
Transactional Systems 
• E-commerce Websites 
-ATM 
-Traditional solutions - MySQL, Oracle, MSSQL 
13 © 2014 Impetus Technologies 
Go NewSQL - VoltDB, Clusterix 
Recorded version available at 
http://www.impetus.com/webinar_registration?event=archived&eid=60
Explaining VoltDB 
14 © 2014 Impetus Technologies 
Recorded version available at 
http://www.impetus.com/webinar_registration?event=archived&eid=60
Real Time Computation 
• Continuous computation 
- Trending topics 
• Stream processing 
- Twitter Firehose 
15 © 2014 Impetus Technologies 
Try.. Storm, Esper, S4, CloudScale! 
Recorded version available at 
http://www.impetus.com/webinar_registration?event=archived&eid=60
Explaining Storm 
Spouts- Data Source Bolts – Data Processors 
16 © 2014 Impetus Technologies 
Topologies – Combination of Spouts and Bots 
Recorded version available at 
http://www.impetus.com/webinar_registration?event=archived&eid=60
Real-time Traffic 
17 © 2014 Impetus Technologies 
Recorded version available at 
http://www.impetus.com/webinar_registration?event=archived&eid=60
Doing it Right 
18 © 2014 Impetus Technologies 
Recorded version available at 
http://www.impetus.com/webinar_registration?event=archived&eid=60
Graph Computation 
• Page Rank 
• Shortest Path 
• “Friends of my friends’ friends” 
19 © 2014 Impetus Technologies 
We suggest – Giraph, Pregel 
Recorded version available at 
http://www.impetus.com/webinar_registration?event=archived&eid=60
LinkedIn Degrees of Separation 
20 © 2014 Impetus Technologies 
Recorded version available at 
http://www.impetus.com/webinar_registration?event=archived&eid=60
Fast Key-Value Access 
• Show latest items listing in your homepage 
• Caching 
21 © 2014 Impetus Technologies 
Explore NoSQL - Redis and Riak! 
Recorded version available at 
http://www.impetus.com/webinar_registration?event=archived&eid=60
How it works? 
Latest comments posted by user 
Traditional Approach: Query in Runtime 
SELECT * FROM foo WHERE ... ORDER BY time DESC LIMIT 10 
Redis Live Cache Approach 
FUNCTION get_latest_comments(start,num_items): 
id_list = redis.lrange("latest.comments",start,start+num_items-1) 
IF id_list.length < num_items 
id_list = SQL_DB("SELECT ... ORDER BY time LIMIT ...") 
END 
RETURN id_list 
END 
22 © 2014 Impetus Technologies 
Recorded version available at 
http://www.impetus.com/webinar_registration?event=archived&eid=60
How it works? 
23 © 2014 Impetus Technologies 
Recorded version available at 
http://www.impetus.com/webinar_registration?event=archived&eid=60
24 © 2014 Impetus Technologies 
Good to Know 
Recorded version available at 
http://www.impetus.com/webinar_registration?event=archived&eid=60
Recap 
Already Invested in Hadoop 
25 © 2014 Impetus Technologies 
Explore Faster Hadoop – 
HortonWorks, Cloudera, 
MapR, Hadapt 
Alternatives to Hadoop HPCC, Disco 
Complex business queries, 
online transaction 
processing 
“New Gen” SQL 
VoltDB, Clustrix, Hadapt 
Real Time Analytics CloudScale, Storm, Esper 
Fast Key-Value Access Redis, Riak 
Recorded version available at 
http://www.impetus.com/webinar_registration?event=archived&eid=60
Bleeding Edge A peek into the future 
26 © 2014 Impetus Technologies 
High performance super 
computing Open MPI , BSP 
Highly efficient, large scale 
graph computing Pregel, Giraph 
Low latency queries over very 
large data sets 
Dremel 
Incremental updates on 
massive datasets 
Percolator (Caffeine) 
Recorded version available at 
http://www.impetus.com/webinar_registration?event=archived&eid=60
Architecture Strategy 
27 © 2014 Impetus Technologies 
Recorded version available at 
http://www.impetus.com/webinar_registration?event=archived&eid=60
Recommendations 
• Think Beyond the Warehouse 
• Time for Real-time 
• Not Only Hadoop 
• Hadoop is an enabler for better data warehouse solutions, 
not a replacement 
• Back To SQL? 
• SQL is not bad 
• Hadoop and SQL complement each other 
• Integrations & Visualizations 
• Realtime 
28 © 2014 Impetus Technologies 
Recorded version available at 
http://www.impetus.com/webinar_registration?event=archived&eid=60
Recommendations 
• Big Data in upstream operational systems 
• Forecasting Systems 
• Supply Chains 
• CRMs 
29 © 2014 Impetus Technologies 
Recorded version available at 
http://www.impetus.com/webinar_registration?event=archived&eid=60
Our Architecture Strategy 
30 © 2014 Impetus Technologies 
Recorded version available at 
http://www.impetus.com/webinar_registration?event=archived&eid=60
31 © 2014 Impetus Technologies 
About Impetus 
Recorded version available at 
http://www.impetus.com/webinar_registration?event=archived&eid=60
Our Expertise 
• Strategic partners for software product engineering and 
R&D 
• Thought leaders in cutting-edge technologies 
• Mature processes and practices that are methodical, yet 
flexible 
• Diverse domain expertise 
32 © 2014 Impetus Technologies 
Recorded version available at 
http://www.impetus.com/webinar_registration?event=archived&eid=60
33 © 2014 Impetus Technologies 
Q & A 
Recorded version available at 
http://www.impetus.com/webinar_registration?event=archived&eid=60
34 © 2014 Impetus Technologies 
Thank You 
Write to us at inquiry@impetus.com 
Follow us on Twitter @impetustech 
Recorded version available at 
http://www.impetus.com/webinar_registration?event=archived&eid=60

Contenu connexe

En vedette

Moving an E-commerce Site to AWS. A Case Study
Moving an  E-commerce Site to AWS. A Case StudyMoving an  E-commerce Site to AWS. A Case Study
Moving an E-commerce Site to AWS. A Case StudyClustrix
 
Achieve new levels of performance for Magento e-commerce sites.
Achieve new levels of performance for Magento e-commerce sites.Achieve new levels of performance for Magento e-commerce sites.
Achieve new levels of performance for Magento e-commerce sites.Clustrix
 
Db performance optimization with indexing
Db performance optimization with indexingDb performance optimization with indexing
Db performance optimization with indexingRajeev Kumar
 
Presto: Distributed sql query engine
Presto: Distributed sql query engine Presto: Distributed sql query engine
Presto: Distributed sql query engine kiran palaka
 
Introduction to InfluxDB, an Open Source Distributed Time Series Database by ...
Introduction to InfluxDB, an Open Source Distributed Time Series Database by ...Introduction to InfluxDB, an Open Source Distributed Time Series Database by ...
Introduction to InfluxDB, an Open Source Distributed Time Series Database by ...Hakka Labs
 

En vedette (6)

Moving an E-commerce Site to AWS. A Case Study
Moving an  E-commerce Site to AWS. A Case StudyMoving an  E-commerce Site to AWS. A Case Study
Moving an E-commerce Site to AWS. A Case Study
 
Achieve new levels of performance for Magento e-commerce sites.
Achieve new levels of performance for Magento e-commerce sites.Achieve new levels of performance for Magento e-commerce sites.
Achieve new levels of performance for Magento e-commerce sites.
 
Db performance optimization with indexing
Db performance optimization with indexingDb performance optimization with indexing
Db performance optimization with indexing
 
Clusterix at VDS 2016
Clusterix at VDS 2016Clusterix at VDS 2016
Clusterix at VDS 2016
 
Presto: Distributed sql query engine
Presto: Distributed sql query engine Presto: Distributed sql query engine
Presto: Distributed sql query engine
 
Introduction to InfluxDB, an Open Source Distributed Time Series Database by ...
Introduction to InfluxDB, an Open Source Distributed Time Series Database by ...Introduction to InfluxDB, an Open Source Distributed Time Series Database by ...
Introduction to InfluxDB, an Open Source Distributed Time Series Database by ...
 

Plus de Impetus Technologies

Data Warehouse Modernization Webinar Series- Critical Trends, Implementation ...
Data Warehouse Modernization Webinar Series- Critical Trends, Implementation ...Data Warehouse Modernization Webinar Series- Critical Trends, Implementation ...
Data Warehouse Modernization Webinar Series- Critical Trends, Implementation ...Impetus Technologies
 
Future-Proof Your Streaming Analytics Architecture- StreamAnalytix Webinar
Future-Proof Your Streaming Analytics Architecture- StreamAnalytix WebinarFuture-Proof Your Streaming Analytics Architecture- StreamAnalytix Webinar
Future-Proof Your Streaming Analytics Architecture- StreamAnalytix WebinarImpetus Technologies
 
Building Real-time Streaming Apps in Minutes- Impetus Webinar
Building Real-time Streaming Apps in Minutes- Impetus WebinarBuilding Real-time Streaming Apps in Minutes- Impetus Webinar
Building Real-time Streaming Apps in Minutes- Impetus WebinarImpetus Technologies
 
Smart Enterprise Big Data Bus for the Modern Responsive Enterprise- StreamAna...
Smart Enterprise Big Data Bus for the Modern Responsive Enterprise- StreamAna...Smart Enterprise Big Data Bus for the Modern Responsive Enterprise- StreamAna...
Smart Enterprise Big Data Bus for the Modern Responsive Enterprise- StreamAna...Impetus Technologies
 
Impetus White Paper- Handling Data Corruption in Elasticsearch
Impetus White Paper- Handling  Data Corruption  in ElasticsearchImpetus White Paper- Handling  Data Corruption  in Elasticsearch
Impetus White Paper- Handling Data Corruption in ElasticsearchImpetus Technologies
 
Real-world Applications of Streaming Analytics- StreamAnalytix Webinar
Real-world Applications of Streaming Analytics- StreamAnalytix WebinarReal-world Applications of Streaming Analytics- StreamAnalytix Webinar
Real-world Applications of Streaming Analytics- StreamAnalytix WebinarImpetus Technologies
 
Real-world Applications of Streaming Analytics- StreamAnalytix Webinar
Real-world Applications of Streaming Analytics- StreamAnalytix WebinarReal-world Applications of Streaming Analytics- StreamAnalytix Webinar
Real-world Applications of Streaming Analytics- StreamAnalytix WebinarImpetus Technologies
 
Real-time Streaming Analytics for Enterprises based on Apache Storm - Impetus...
Real-time Streaming Analytics for Enterprises based on Apache Storm - Impetus...Real-time Streaming Analytics for Enterprises based on Apache Storm - Impetus...
Real-time Streaming Analytics for Enterprises based on Apache Storm - Impetus...Impetus Technologies
 
Accelerating Hadoop Solution Lifecycle and Improving ROI- Impetus On-demand W...
Accelerating Hadoop Solution Lifecycle and Improving ROI- Impetus On-demand W...Accelerating Hadoop Solution Lifecycle and Improving ROI- Impetus On-demand W...
Accelerating Hadoop Solution Lifecycle and Improving ROI- Impetus On-demand W...Impetus Technologies
 
Deep Learning: Evolution of ML from Statistical to Brain-like Computing- Data...
Deep Learning: Evolution of ML from Statistical to Brain-like Computing- Data...Deep Learning: Evolution of ML from Statistical to Brain-like Computing- Data...
Deep Learning: Evolution of ML from Statistical to Brain-like Computing- Data...Impetus Technologies
 
SPARK USE CASE- Distributed Reinforcement Learning for Electricity Market Bi...
SPARK USE CASE-  Distributed Reinforcement Learning for Electricity Market Bi...SPARK USE CASE-  Distributed Reinforcement Learning for Electricity Market Bi...
SPARK USE CASE- Distributed Reinforcement Learning for Electricity Market Bi...Impetus Technologies
 
Enterprise Ready Android and Manageability- Impetus Webcast
Enterprise Ready Android and Manageability- Impetus WebcastEnterprise Ready Android and Manageability- Impetus Webcast
Enterprise Ready Android and Manageability- Impetus WebcastImpetus Technologies
 
Real-time Streaming Analytics: Business Value, Use Cases and Architectural Co...
Real-time Streaming Analytics: Business Value, Use Cases and Architectural Co...Real-time Streaming Analytics: Business Value, Use Cases and Architectural Co...
Real-time Streaming Analytics: Business Value, Use Cases and Architectural Co...Impetus Technologies
 
Leveraging NoSQL Database Technology to Implement Real-time Data Architecture...
Leveraging NoSQL Database Technology to Implement Real-time Data Architecture...Leveraging NoSQL Database Technology to Implement Real-time Data Architecture...
Leveraging NoSQL Database Technology to Implement Real-time Data Architecture...Impetus Technologies
 
Maturity of Mobile Test Automation: Approaches and Future Trends- Impetus Web...
Maturity of Mobile Test Automation: Approaches and Future Trends- Impetus Web...Maturity of Mobile Test Automation: Approaches and Future Trends- Impetus Web...
Maturity of Mobile Test Automation: Approaches and Future Trends- Impetus Web...Impetus Technologies
 
Big Data Analytics with Storm, Spark and GraphLab
Big Data Analytics with Storm, Spark and GraphLabBig Data Analytics with Storm, Spark and GraphLab
Big Data Analytics with Storm, Spark and GraphLabImpetus Technologies
 
Webinar maturity of mobile test automation- approaches and future trends
Webinar  maturity of mobile test automation- approaches and future trendsWebinar  maturity of mobile test automation- approaches and future trends
Webinar maturity of mobile test automation- approaches and future trendsImpetus Technologies
 
Next generation analytics with yarn, spark and graph lab
Next generation analytics with yarn, spark and graph labNext generation analytics with yarn, spark and graph lab
Next generation analytics with yarn, spark and graph labImpetus Technologies
 
The Shared Elephant - Hadoop as a Shared Service for Multiple Departments – I...
The Shared Elephant - Hadoop as a Shared Service for Multiple Departments – I...The Shared Elephant - Hadoop as a Shared Service for Multiple Departments – I...
The Shared Elephant - Hadoop as a Shared Service for Multiple Departments – I...Impetus Technologies
 
Performance Testing of Big Data Applications - Impetus Webcast
Performance Testing of Big Data Applications - Impetus WebcastPerformance Testing of Big Data Applications - Impetus Webcast
Performance Testing of Big Data Applications - Impetus WebcastImpetus Technologies
 

Plus de Impetus Technologies (20)

Data Warehouse Modernization Webinar Series- Critical Trends, Implementation ...
Data Warehouse Modernization Webinar Series- Critical Trends, Implementation ...Data Warehouse Modernization Webinar Series- Critical Trends, Implementation ...
Data Warehouse Modernization Webinar Series- Critical Trends, Implementation ...
 
Future-Proof Your Streaming Analytics Architecture- StreamAnalytix Webinar
Future-Proof Your Streaming Analytics Architecture- StreamAnalytix WebinarFuture-Proof Your Streaming Analytics Architecture- StreamAnalytix Webinar
Future-Proof Your Streaming Analytics Architecture- StreamAnalytix Webinar
 
Building Real-time Streaming Apps in Minutes- Impetus Webinar
Building Real-time Streaming Apps in Minutes- Impetus WebinarBuilding Real-time Streaming Apps in Minutes- Impetus Webinar
Building Real-time Streaming Apps in Minutes- Impetus Webinar
 
Smart Enterprise Big Data Bus for the Modern Responsive Enterprise- StreamAna...
Smart Enterprise Big Data Bus for the Modern Responsive Enterprise- StreamAna...Smart Enterprise Big Data Bus for the Modern Responsive Enterprise- StreamAna...
Smart Enterprise Big Data Bus for the Modern Responsive Enterprise- StreamAna...
 
Impetus White Paper- Handling Data Corruption in Elasticsearch
Impetus White Paper- Handling  Data Corruption  in ElasticsearchImpetus White Paper- Handling  Data Corruption  in Elasticsearch
Impetus White Paper- Handling Data Corruption in Elasticsearch
 
Real-world Applications of Streaming Analytics- StreamAnalytix Webinar
Real-world Applications of Streaming Analytics- StreamAnalytix WebinarReal-world Applications of Streaming Analytics- StreamAnalytix Webinar
Real-world Applications of Streaming Analytics- StreamAnalytix Webinar
 
Real-world Applications of Streaming Analytics- StreamAnalytix Webinar
Real-world Applications of Streaming Analytics- StreamAnalytix WebinarReal-world Applications of Streaming Analytics- StreamAnalytix Webinar
Real-world Applications of Streaming Analytics- StreamAnalytix Webinar
 
Real-time Streaming Analytics for Enterprises based on Apache Storm - Impetus...
Real-time Streaming Analytics for Enterprises based on Apache Storm - Impetus...Real-time Streaming Analytics for Enterprises based on Apache Storm - Impetus...
Real-time Streaming Analytics for Enterprises based on Apache Storm - Impetus...
 
Accelerating Hadoop Solution Lifecycle and Improving ROI- Impetus On-demand W...
Accelerating Hadoop Solution Lifecycle and Improving ROI- Impetus On-demand W...Accelerating Hadoop Solution Lifecycle and Improving ROI- Impetus On-demand W...
Accelerating Hadoop Solution Lifecycle and Improving ROI- Impetus On-demand W...
 
Deep Learning: Evolution of ML from Statistical to Brain-like Computing- Data...
Deep Learning: Evolution of ML from Statistical to Brain-like Computing- Data...Deep Learning: Evolution of ML from Statistical to Brain-like Computing- Data...
Deep Learning: Evolution of ML from Statistical to Brain-like Computing- Data...
 
SPARK USE CASE- Distributed Reinforcement Learning for Electricity Market Bi...
SPARK USE CASE-  Distributed Reinforcement Learning for Electricity Market Bi...SPARK USE CASE-  Distributed Reinforcement Learning for Electricity Market Bi...
SPARK USE CASE- Distributed Reinforcement Learning for Electricity Market Bi...
 
Enterprise Ready Android and Manageability- Impetus Webcast
Enterprise Ready Android and Manageability- Impetus WebcastEnterprise Ready Android and Manageability- Impetus Webcast
Enterprise Ready Android and Manageability- Impetus Webcast
 
Real-time Streaming Analytics: Business Value, Use Cases and Architectural Co...
Real-time Streaming Analytics: Business Value, Use Cases and Architectural Co...Real-time Streaming Analytics: Business Value, Use Cases and Architectural Co...
Real-time Streaming Analytics: Business Value, Use Cases and Architectural Co...
 
Leveraging NoSQL Database Technology to Implement Real-time Data Architecture...
Leveraging NoSQL Database Technology to Implement Real-time Data Architecture...Leveraging NoSQL Database Technology to Implement Real-time Data Architecture...
Leveraging NoSQL Database Technology to Implement Real-time Data Architecture...
 
Maturity of Mobile Test Automation: Approaches and Future Trends- Impetus Web...
Maturity of Mobile Test Automation: Approaches and Future Trends- Impetus Web...Maturity of Mobile Test Automation: Approaches and Future Trends- Impetus Web...
Maturity of Mobile Test Automation: Approaches and Future Trends- Impetus Web...
 
Big Data Analytics with Storm, Spark and GraphLab
Big Data Analytics with Storm, Spark and GraphLabBig Data Analytics with Storm, Spark and GraphLab
Big Data Analytics with Storm, Spark and GraphLab
 
Webinar maturity of mobile test automation- approaches and future trends
Webinar  maturity of mobile test automation- approaches and future trendsWebinar  maturity of mobile test automation- approaches and future trends
Webinar maturity of mobile test automation- approaches and future trends
 
Next generation analytics with yarn, spark and graph lab
Next generation analytics with yarn, spark and graph labNext generation analytics with yarn, spark and graph lab
Next generation analytics with yarn, spark and graph lab
 
The Shared Elephant - Hadoop as a Shared Service for Multiple Departments – I...
The Shared Elephant - Hadoop as a Shared Service for Multiple Departments – I...The Shared Elephant - Hadoop as a Shared Service for Multiple Departments – I...
The Shared Elephant - Hadoop as a Shared Service for Multiple Departments – I...
 
Performance Testing of Big Data Applications - Impetus Webcast
Performance Testing of Big Data Applications - Impetus WebcastPerformance Testing of Big Data Applications - Impetus Webcast
Performance Testing of Big Data Applications - Impetus Webcast
 

Dernier

The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfSeasiaInfotech2
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 

Dernier (20)

The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdf
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 

Big Data Architectures: Beyond Hadoop- Impetus Webinar

  • 1. Impetus Technologies Inc. 1 © 2014 Impetus Technologies Big Data Architectures Beyond the Elephant Ride Recorded version available at http://www.impetus.com/webinar_registration?event=archived&eid=60
  • 2. Outline • The Hadoop ecosystem and challenges • Big Data solutions beyond Hadoop - How and where to use them? • Some use cases • Big Data Architecture Strategy 2 © 2014 Impetus Technologies Recorded version available at http://www.impetus.com/webinar_registration?event=archived&eid=60
  • 3. Disclaimer • Not advising to discard Hadoop • Will discuss Big Data technologies that complement and supplement Hadoop 3 © 2014 Impetus Technologies Recorded version available at http://www.impetus.com/webinar_registration?event=archived&eid=60
  • 4. What is Hadoop? Scalable data processing engine • DFS: Scalable fault-tolerant distributed file-system • Map Reduce: Parallel processing programming model 4 © 2014 Impetus Technologies Recorded version available at http://www.impetus.com/webinar_registration?event=archived&eid=60
  • 5. Hadoop Ecosystem 5 © 2014 Impetus Technologies Recorded version available at http://www.impetus.com/webinar_registration?event=archived&eid=60
  • 6. Where to use Hadoop? • Risk Analysis – Intrusion detection, Credit scoring • Recommendation – Customers who purchased this also liked 6 © 2014 Impetus Technologies Recorded version available at http://www.impetus.com/webinar_registration?event=archived&eid=60
  • 7. Where to use Hadoop? • Sentiment Analysis – Positive, Negative or Neutral sentiment in sentences • Targeted Ads – Display ads based on user behavior and preferences 7 © 2014 Impetus Technologies Recorded version available at http://www.impetus.com/webinar_registration?event=archived&eid=60
  • 8. Where to use Hadoop? • Machine Learning – Spam vs. Not Spam • And a lot of other areas… 8 © 2014 Impetus Technologies Recorded version available at http://www.impetus.com/webinar_registration?event=archived&eid=60
  • 9. Challenges with Hadoop 9 © 2014 Impetus Technologies Recorded version available at http://www.impetus.com/webinar_registration?event=archived&eid=60
  • 10. Limitations • Data security • Dependence on OS/Language • MapReduce programming • Batch processing only 10 © 2014 Impetus Technologies Recorded version available at http://www.impetus.com/webinar_registration?event=archived&eid=60
  • 11. 11 © 2014 Impetus Technologies Beyond Hadoop Recorded version available at http://www.impetus.com/webinar_registration?event=archived&eid=60
  • 12. Faster Hadoop • MapR – Simple to manage (NFS) – MapR Express Lane – Handles real-time data flows 12 © 2014 Impetus Technologies Dominant Players – HortonWorks, Cloudera, Hadapt Recorded version available at http://www.impetus.com/webinar_registration?event=archived&eid=60
  • 13. Transactional Systems • E-commerce Websites -ATM -Traditional solutions - MySQL, Oracle, MSSQL 13 © 2014 Impetus Technologies Go NewSQL - VoltDB, Clusterix Recorded version available at http://www.impetus.com/webinar_registration?event=archived&eid=60
  • 14. Explaining VoltDB 14 © 2014 Impetus Technologies Recorded version available at http://www.impetus.com/webinar_registration?event=archived&eid=60
  • 15. Real Time Computation • Continuous computation - Trending topics • Stream processing - Twitter Firehose 15 © 2014 Impetus Technologies Try.. Storm, Esper, S4, CloudScale! Recorded version available at http://www.impetus.com/webinar_registration?event=archived&eid=60
  • 16. Explaining Storm Spouts- Data Source Bolts – Data Processors 16 © 2014 Impetus Technologies Topologies – Combination of Spouts and Bots Recorded version available at http://www.impetus.com/webinar_registration?event=archived&eid=60
  • 17. Real-time Traffic 17 © 2014 Impetus Technologies Recorded version available at http://www.impetus.com/webinar_registration?event=archived&eid=60
  • 18. Doing it Right 18 © 2014 Impetus Technologies Recorded version available at http://www.impetus.com/webinar_registration?event=archived&eid=60
  • 19. Graph Computation • Page Rank • Shortest Path • “Friends of my friends’ friends” 19 © 2014 Impetus Technologies We suggest – Giraph, Pregel Recorded version available at http://www.impetus.com/webinar_registration?event=archived&eid=60
  • 20. LinkedIn Degrees of Separation 20 © 2014 Impetus Technologies Recorded version available at http://www.impetus.com/webinar_registration?event=archived&eid=60
  • 21. Fast Key-Value Access • Show latest items listing in your homepage • Caching 21 © 2014 Impetus Technologies Explore NoSQL - Redis and Riak! Recorded version available at http://www.impetus.com/webinar_registration?event=archived&eid=60
  • 22. How it works? Latest comments posted by user Traditional Approach: Query in Runtime SELECT * FROM foo WHERE ... ORDER BY time DESC LIMIT 10 Redis Live Cache Approach FUNCTION get_latest_comments(start,num_items): id_list = redis.lrange("latest.comments",start,start+num_items-1) IF id_list.length < num_items id_list = SQL_DB("SELECT ... ORDER BY time LIMIT ...") END RETURN id_list END 22 © 2014 Impetus Technologies Recorded version available at http://www.impetus.com/webinar_registration?event=archived&eid=60
  • 23. How it works? 23 © 2014 Impetus Technologies Recorded version available at http://www.impetus.com/webinar_registration?event=archived&eid=60
  • 24. 24 © 2014 Impetus Technologies Good to Know Recorded version available at http://www.impetus.com/webinar_registration?event=archived&eid=60
  • 25. Recap Already Invested in Hadoop 25 © 2014 Impetus Technologies Explore Faster Hadoop – HortonWorks, Cloudera, MapR, Hadapt Alternatives to Hadoop HPCC, Disco Complex business queries, online transaction processing “New Gen” SQL VoltDB, Clustrix, Hadapt Real Time Analytics CloudScale, Storm, Esper Fast Key-Value Access Redis, Riak Recorded version available at http://www.impetus.com/webinar_registration?event=archived&eid=60
  • 26. Bleeding Edge A peek into the future 26 © 2014 Impetus Technologies High performance super computing Open MPI , BSP Highly efficient, large scale graph computing Pregel, Giraph Low latency queries over very large data sets Dremel Incremental updates on massive datasets Percolator (Caffeine) Recorded version available at http://www.impetus.com/webinar_registration?event=archived&eid=60
  • 27. Architecture Strategy 27 © 2014 Impetus Technologies Recorded version available at http://www.impetus.com/webinar_registration?event=archived&eid=60
  • 28. Recommendations • Think Beyond the Warehouse • Time for Real-time • Not Only Hadoop • Hadoop is an enabler for better data warehouse solutions, not a replacement • Back To SQL? • SQL is not bad • Hadoop and SQL complement each other • Integrations & Visualizations • Realtime 28 © 2014 Impetus Technologies Recorded version available at http://www.impetus.com/webinar_registration?event=archived&eid=60
  • 29. Recommendations • Big Data in upstream operational systems • Forecasting Systems • Supply Chains • CRMs 29 © 2014 Impetus Technologies Recorded version available at http://www.impetus.com/webinar_registration?event=archived&eid=60
  • 30. Our Architecture Strategy 30 © 2014 Impetus Technologies Recorded version available at http://www.impetus.com/webinar_registration?event=archived&eid=60
  • 31. 31 © 2014 Impetus Technologies About Impetus Recorded version available at http://www.impetus.com/webinar_registration?event=archived&eid=60
  • 32. Our Expertise • Strategic partners for software product engineering and R&D • Thought leaders in cutting-edge technologies • Mature processes and practices that are methodical, yet flexible • Diverse domain expertise 32 © 2014 Impetus Technologies Recorded version available at http://www.impetus.com/webinar_registration?event=archived&eid=60
  • 33. 33 © 2014 Impetus Technologies Q & A Recorded version available at http://www.impetus.com/webinar_registration?event=archived&eid=60
  • 34. 34 © 2014 Impetus Technologies Thank You Write to us at inquiry@impetus.com Follow us on Twitter @impetustech Recorded version available at http://www.impetus.com/webinar_registration?event=archived&eid=60