SlideShare une entreprise Scribd logo
1  sur  18
Welcome
• Thank you: Francis - Silicon Valley Strategy,
Innovation and Product Management group
• Thank you: Michael & Sam and the Microsoft
Store
• Thank you: Aleks & David & SAP HANA
• Thank you: All of You… You are the ‘Secret
Sauce’
Agenda
• Quick Poll
• Overview – AIBDP / Big Data Connection
• Prasad Mavuduri – Board Member, AIBDP –
“Demystifying Big Data”
• David Sonnenschein – Vice President & Aleks
Swerdlow Community Manager – SAP Labs -
HANA In-Memory – Start-ups Success Stories”
• Networking & Q&A
Quick Poll
• Relationship & Experience w/ Big Data
• Job Role
• Industry
• Company Years - Start-up?
• Big Data Implementation Status
• Biggest Challenges / Opportunities
• Vs Cometitors?
Overview - Big Data Connections
Mission: Demystify Big Data
– Five E’s – entertain, engage, educate etc
– Focus on Solutions (vs technology)
– Focus on Specific Verticals
• ex Healthcare, Risk, eCom/eMarketing,
Manufacturing, Logistics, Telecom…)
– Best Practices Case Study Reviews
– Networking & Shared Learning
– Sponsored by the American Institute of
Big Data Professionals (AIBDP.org)
– Sponsored by Big Data consulting firm,
Data-Magnum
BI Platform / Reporting
OSS
Visualizations
Unstructured/ Search
Indexing / Metadata
Search
NLP
Hadoop Analytics
Hadoop Dev Platforms / Automation
HDFS
Predictive Analytics
THE CONFUSING WORLD OF BIG DATAAPPLICATIONSTOOLSDATAMANAGEMENT
STRUCTURED UNSTRUCTURED
Transactional
DB
OSS
High Performance
Analytical DB
NewSQL
Enhancement
Distributed
NoSQL
Graph Document
Key Value /
Column
Enterprise
Apps
Internet
Apps
Social Media Web Content Mobile Devices Camera / DVR Sensors / RFID Logfiles
Hadoop
aaS
HDFS Alternatives
DBaaS
HANA
GraphDB
Filesystem
EMR
Text / Sentiment Analysis
Data as a Service
Data
Warehouses
vFabric L
Drill
Vertical Market Applications
Impala
Messaging Optimization Data Integration / CEP
OSS
IMDG
Redshift
Based on Source: Perella Weinberg Partners
AI
Source:
Source: CapGemini: http://www.capgemini.com/sites/default/files/technology-blog/files/2012/09/big-data-vendors.jpg
Big Data Landscape
http://www.bigdatalandscape.com/
Source: http://www.forbes.com/special-report/2013/industry-atlas.html
Business Intelligence Analytics / Visualization
Big Data BI & Analytics/Visualization Landscape
Oracle Essbase Laurén
Predictive Analytics Leaders
Source: http://wikibon.org/wiki/v/Big_Data:_Hadoop,_Business_Analytics_and_Beyond
AH.. Simplicity… This looks pretty straight-forward… I can handle this..
Our Landscape Collection as published on Startup50.com
Simplified (so far)
 Data Input - Sources, Databases, & Integration
Tools
 Platform / Infrastructure - Data Preparation,
map reduce, filing, governance…
 Data Presentation & Analysis – BI, Data
Discovery, Visualization
 Predictive Analytics & Machine Learning
 Vertical & Horizontal Products (Specialized
Applications)
It can be made more complicated…
o Hadoop
o NoSQL
o NewSQL
o Structured Databases
o NGDW (next generation data warehouse)
o Cloud Services
o Technical Services
o Professional Services
o Distributors
o Deployment services
o Deployment stack/appliances
o Development services
o Application stacks
o Database stacks
o Managed Monitoring
o Storage
o Security
Example Optimized Marketing

Contenu connexe

Tendances

"Integration of Hadoop in Business landscape", Michal Alexa, IT and Innovatio...
"Integration of Hadoop in Business landscape", Michal Alexa, IT and Innovatio..."Integration of Hadoop in Business landscape", Michal Alexa, IT and Innovatio...
"Integration of Hadoop in Business landscape", Michal Alexa, IT and Innovatio...Dataconomy Media
 
U of A Web Strategy and Sitecore
U of A Web Strategy and SitecoreU of A Web Strategy and Sitecore
U of A Web Strategy and SitecoreTim Schneider
 
DataScience and BigData Cebu 1st meetup
DataScience and BigData Cebu 1st meetupDataScience and BigData Cebu 1st meetup
DataScience and BigData Cebu 1st meetupFrancisco Liwa
 
Web analyticsandbigdata techweek2011
Web analyticsandbigdata techweek2011Web analyticsandbigdata techweek2011
Web analyticsandbigdata techweek2011Raghu Kashyap
 
5 Myths about Spark and Big Data by Nik Rouda
5 Myths about Spark and Big Data by Nik Rouda5 Myths about Spark and Big Data by Nik Rouda
5 Myths about Spark and Big Data by Nik RoudaSpark Summit
 
UCSD: Building a Big Data Culture - It Takes a Village
UCSD: Building a Big Data Culture - It Takes a VillageUCSD: Building a Big Data Culture - It Takes a Village
UCSD: Building a Big Data Culture - It Takes a VillagePaul Barsch
 
zData Inc. Big Data Consulting and Services - Overview and Summary
zData Inc. Big Data Consulting and Services - Overview and SummaryzData Inc. Big Data Consulting and Services - Overview and Summary
zData Inc. Big Data Consulting and Services - Overview and SummaryzData Inc.
 
[Strata NYC 2019] Turning big data into knowledge: Managing metadata and data...
[Strata NYC 2019] Turning big data into knowledge: Managing metadata and data...[Strata NYC 2019] Turning big data into knowledge: Managing metadata and data...
[Strata NYC 2019] Turning big data into knowledge: Managing metadata and data...Kaan Onuk
 
Goa public affairs 20121213
Goa public affairs 20121213Goa public affairs 20121213
Goa public affairs 20121213Tim Schneider
 
Gartner peer forum sept 2011 orbitz
Gartner peer forum sept 2011   orbitzGartner peer forum sept 2011   orbitz
Gartner peer forum sept 2011 orbitzRaghu Kashyap
 
Hadoop Perspectives for 2017
Hadoop Perspectives for 2017Hadoop Perspectives for 2017
Hadoop Perspectives for 2017Precisely
 
AzureDay - Introduction Big Data Analytics.
AzureDay  - Introduction Big Data Analytics.AzureDay  - Introduction Big Data Analytics.
AzureDay - Introduction Big Data Analytics.Łukasz Grala
 
SiSense Overview
SiSense OverviewSiSense Overview
SiSense OverviewBruno Aziza
 
Metadata discovery for enterprise packages - a better approach
Metadata discovery for enterprise packages - a better approachMetadata discovery for enterprise packages - a better approach
Metadata discovery for enterprise packages - a better approachRoland Bullivant
 
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Panel - Interactive Applic...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Panel - Interactive Applic...Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Panel - Interactive Applic...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Panel - Interactive Applic...Data Con LA
 
Benchmarking Digital Readiness: Moving at the Speed of the Market
Benchmarking Digital Readiness: Moving at the Speed of the MarketBenchmarking Digital Readiness: Moving at the Speed of the Market
Benchmarking Digital Readiness: Moving at the Speed of the MarketApigee | Google Cloud
 
Big Data Meetup: Analytical Systems Evolution
Big Data Meetup: Analytical Systems EvolutionBig Data Meetup: Analytical Systems Evolution
Big Data Meetup: Analytical Systems EvolutionProvectus
 
Geek Sync - Cloud Considerations
Geek Sync - Cloud ConsiderationsGeek Sync - Cloud Considerations
Geek Sync - Cloud ConsiderationsIDERA Software
 
BI and Predictive analytics 2011 shyam desigan presentation
BI and Predictive analytics 2011 shyam desigan presentationBI and Predictive analytics 2011 shyam desigan presentation
BI and Predictive analytics 2011 shyam desigan presentationShyam Desigan
 

Tendances (20)

"Integration of Hadoop in Business landscape", Michal Alexa, IT and Innovatio...
"Integration of Hadoop in Business landscape", Michal Alexa, IT and Innovatio..."Integration of Hadoop in Business landscape", Michal Alexa, IT and Innovatio...
"Integration of Hadoop in Business landscape", Michal Alexa, IT and Innovatio...
 
U of A Web Strategy and Sitecore
U of A Web Strategy and SitecoreU of A Web Strategy and Sitecore
U of A Web Strategy and Sitecore
 
DataScience and BigData Cebu 1st meetup
DataScience and BigData Cebu 1st meetupDataScience and BigData Cebu 1st meetup
DataScience and BigData Cebu 1st meetup
 
Web analyticsandbigdata techweek2011
Web analyticsandbigdata techweek2011Web analyticsandbigdata techweek2011
Web analyticsandbigdata techweek2011
 
SAS Visual Analytics Overview
SAS Visual Analytics OverviewSAS Visual Analytics Overview
SAS Visual Analytics Overview
 
5 Myths about Spark and Big Data by Nik Rouda
5 Myths about Spark and Big Data by Nik Rouda5 Myths about Spark and Big Data by Nik Rouda
5 Myths about Spark and Big Data by Nik Rouda
 
UCSD: Building a Big Data Culture - It Takes a Village
UCSD: Building a Big Data Culture - It Takes a VillageUCSD: Building a Big Data Culture - It Takes a Village
UCSD: Building a Big Data Culture - It Takes a Village
 
zData Inc. Big Data Consulting and Services - Overview and Summary
zData Inc. Big Data Consulting and Services - Overview and SummaryzData Inc. Big Data Consulting and Services - Overview and Summary
zData Inc. Big Data Consulting and Services - Overview and Summary
 
[Strata NYC 2019] Turning big data into knowledge: Managing metadata and data...
[Strata NYC 2019] Turning big data into knowledge: Managing metadata and data...[Strata NYC 2019] Turning big data into knowledge: Managing metadata and data...
[Strata NYC 2019] Turning big data into knowledge: Managing metadata and data...
 
Goa public affairs 20121213
Goa public affairs 20121213Goa public affairs 20121213
Goa public affairs 20121213
 
Gartner peer forum sept 2011 orbitz
Gartner peer forum sept 2011   orbitzGartner peer forum sept 2011   orbitz
Gartner peer forum sept 2011 orbitz
 
Hadoop Perspectives for 2017
Hadoop Perspectives for 2017Hadoop Perspectives for 2017
Hadoop Perspectives for 2017
 
AzureDay - Introduction Big Data Analytics.
AzureDay  - Introduction Big Data Analytics.AzureDay  - Introduction Big Data Analytics.
AzureDay - Introduction Big Data Analytics.
 
SiSense Overview
SiSense OverviewSiSense Overview
SiSense Overview
 
Metadata discovery for enterprise packages - a better approach
Metadata discovery for enterprise packages - a better approachMetadata discovery for enterprise packages - a better approach
Metadata discovery for enterprise packages - a better approach
 
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Panel - Interactive Applic...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Panel - Interactive Applic...Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Panel - Interactive Applic...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Panel - Interactive Applic...
 
Benchmarking Digital Readiness: Moving at the Speed of the Market
Benchmarking Digital Readiness: Moving at the Speed of the MarketBenchmarking Digital Readiness: Moving at the Speed of the Market
Benchmarking Digital Readiness: Moving at the Speed of the Market
 
Big Data Meetup: Analytical Systems Evolution
Big Data Meetup: Analytical Systems EvolutionBig Data Meetup: Analytical Systems Evolution
Big Data Meetup: Analytical Systems Evolution
 
Geek Sync - Cloud Considerations
Geek Sync - Cloud ConsiderationsGeek Sync - Cloud Considerations
Geek Sync - Cloud Considerations
 
BI and Predictive analytics 2011 shyam desigan presentation
BI and Predictive analytics 2011 shyam desigan presentationBI and Predictive analytics 2011 shyam desigan presentation
BI and Predictive analytics 2011 shyam desigan presentation
 

En vedette

1 to 1 Presentation 2015
1 to 1 Presentation 20151 to 1 Presentation 2015
1 to 1 Presentation 2015James Puliatte
 
加速開發! 在Windows開發hadoop程式,直接運行 map/reduce
加速開發! 在Windows開發hadoop程式,直接運行 map/reduce加速開發! 在Windows開發hadoop程式,直接運行 map/reduce
加速開發! 在Windows開發hadoop程式,直接運行 map/reduceWei-Yu Chen
 
Hadoop 2.0 之古往今來
Hadoop 2.0 之古往今來Hadoop 2.0 之古往今來
Hadoop 2.0 之古往今來Wei-Yu Chen
 
Hadoop ecosystem - hadoop 生態系
Hadoop ecosystem - hadoop 生態系Hadoop ecosystem - hadoop 生態系
Hadoop ecosystem - hadoop 生態系Wei-Yu Chen
 
大資料趨勢介紹與相關使用技術
大資料趨勢介紹與相關使用技術大資料趨勢介紹與相關使用技術
大資料趨勢介紹與相關使用技術Wei-Yu Chen
 
Hadoop Powers Modern Enterprise Data Architectures
Hadoop Powers Modern Enterprise Data ArchitecturesHadoop Powers Modern Enterprise Data Architectures
Hadoop Powers Modern Enterprise Data ArchitecturesDataWorks Summit
 
啟程:Data Technology 的待客之道
啟程:Data Technology 的待客之道啟程:Data Technology 的待客之道
啟程:Data Technology 的待客之道Etu Solution
 
台灣 Hadoop Big Data 2014 趨勢預測與企業策略藍圖
台灣 Hadoop Big Data 2014 趨勢預測與企業策略藍圖台灣 Hadoop Big Data 2014 趨勢預測與企業策略藍圖
台灣 Hadoop Big Data 2014 趨勢預測與企業策略藍圖Etu Solution
 
Data Leaders in Action - 資料價值領袖風範與關鍵行動
Data Leaders in Action - 資料價值領袖風範與關鍵行動Data Leaders in Action - 資料價值領袖風範與關鍵行動
Data Leaders in Action - 資料價值領袖風範與關鍵行動Etu Solution
 
那些你知道的,但還沒看過的 Big Data 風景
那些你知道的,但還沒看過的 Big Data 風景那些你知道的,但還沒看過的 Big Data 風景
那些你知道的,但還沒看過的 Big Data 風景Etu Solution
 
資料科學團隊人才培育分享 ─ 以 DSP 為例
資料科學團隊人才培育分享 ─ 以 DSP 為例資料科學團隊人才培育分享 ─ 以 DSP 為例
資料科學團隊人才培育分享 ─ 以 DSP 為例Fred Chiang
 
Summary of Insights Learned from the Data Science Program Team Training
Summary of Insights Learned from the Data Science Program Team TrainingSummary of Insights Learned from the Data Science Program Team Training
Summary of Insights Learned from the Data Science Program Team TrainingFred Chiang
 
轉兌數據的價值 — 從導購到策購
轉兌數據的價值 — 從導購到策購轉兌數據的價值 — 從導購到策購
轉兌數據的價值 — 從導購到策購Fred Chiang
 
資料價值 — 一位資料產品經理的視野
資料價值 — 一位資料產品經理的視野資料價值 — 一位資料產品經理的視野
資料價值 — 一位資料產品經理的視野Fred Chiang
 
Big Data vs. Open Data
Big Data vs. Open DataBig Data vs. Open Data
Big Data vs. Open DataFred Chiang
 
那些你知道的,但還沒看過的 Big Data 風景 ─ 致 Hadooper
那些你知道的,但還沒看過的 Big Data 風景 ─ 致 Hadooper那些你知道的,但還沒看過的 Big Data 風景 ─ 致 Hadooper
那些你知道的,但還沒看過的 Big Data 風景 ─ 致 HadooperFred Chiang
 
Big Data 現象,以及現象中的我們
Big Data 現象,以及現象中的我們Big Data 現象,以及現象中的我們
Big Data 現象,以及現象中的我們Fred Chiang
 

En vedette (20)

1 to 1 Presentation 2015
1 to 1 Presentation 20151 to 1 Presentation 2015
1 to 1 Presentation 2015
 
Hadoop hive
Hadoop hiveHadoop hive
Hadoop hive
 
加速開發! 在Windows開發hadoop程式,直接運行 map/reduce
加速開發! 在Windows開發hadoop程式,直接運行 map/reduce加速開發! 在Windows開發hadoop程式,直接運行 map/reduce
加速開發! 在Windows開發hadoop程式,直接運行 map/reduce
 
Big Data Landscape 2016
Big Data Landscape 2016Big Data Landscape 2016
Big Data Landscape 2016
 
Hadoop pig
Hadoop pigHadoop pig
Hadoop pig
 
Hadoop 2.0 之古往今來
Hadoop 2.0 之古往今來Hadoop 2.0 之古往今來
Hadoop 2.0 之古往今來
 
Hadoop ecosystem - hadoop 生態系
Hadoop ecosystem - hadoop 生態系Hadoop ecosystem - hadoop 生態系
Hadoop ecosystem - hadoop 生態系
 
大資料趨勢介紹與相關使用技術
大資料趨勢介紹與相關使用技術大資料趨勢介紹與相關使用技術
大資料趨勢介紹與相關使用技術
 
Hadoop Powers Modern Enterprise Data Architectures
Hadoop Powers Modern Enterprise Data ArchitecturesHadoop Powers Modern Enterprise Data Architectures
Hadoop Powers Modern Enterprise Data Architectures
 
啟程:Data Technology 的待客之道
啟程:Data Technology 的待客之道啟程:Data Technology 的待客之道
啟程:Data Technology 的待客之道
 
台灣 Hadoop Big Data 2014 趨勢預測與企業策略藍圖
台灣 Hadoop Big Data 2014 趨勢預測與企業策略藍圖台灣 Hadoop Big Data 2014 趨勢預測與企業策略藍圖
台灣 Hadoop Big Data 2014 趨勢預測與企業策略藍圖
 
Data Leaders in Action - 資料價值領袖風範與關鍵行動
Data Leaders in Action - 資料價值領袖風範與關鍵行動Data Leaders in Action - 資料價值領袖風範與關鍵行動
Data Leaders in Action - 資料價值領袖風範與關鍵行動
 
那些你知道的,但還沒看過的 Big Data 風景
那些你知道的,但還沒看過的 Big Data 風景那些你知道的,但還沒看過的 Big Data 風景
那些你知道的,但還沒看過的 Big Data 風景
 
資料科學團隊人才培育分享 ─ 以 DSP 為例
資料科學團隊人才培育分享 ─ 以 DSP 為例資料科學團隊人才培育分享 ─ 以 DSP 為例
資料科學團隊人才培育分享 ─ 以 DSP 為例
 
Summary of Insights Learned from the Data Science Program Team Training
Summary of Insights Learned from the Data Science Program Team TrainingSummary of Insights Learned from the Data Science Program Team Training
Summary of Insights Learned from the Data Science Program Team Training
 
轉兌數據的價值 — 從導購到策購
轉兌數據的價值 — 從導購到策購轉兌數據的價值 — 從導購到策購
轉兌數據的價值 — 從導購到策購
 
資料價值 — 一位資料產品經理的視野
資料價值 — 一位資料產品經理的視野資料價值 — 一位資料產品經理的視野
資料價值 — 一位資料產品經理的視野
 
Big Data vs. Open Data
Big Data vs. Open DataBig Data vs. Open Data
Big Data vs. Open Data
 
那些你知道的,但還沒看過的 Big Data 風景 ─ 致 Hadooper
那些你知道的,但還沒看過的 Big Data 風景 ─ 致 Hadooper那些你知道的,但還沒看過的 Big Data 風景 ─ 致 Hadooper
那些你知道的,但還沒看過的 Big Data 風景 ─ 致 Hadooper
 
Big Data 現象,以及現象中的我們
Big Data 現象,以及現象中的我們Big Data 現象,以及現象中的我們
Big Data 現象,以及現象中的我們
 

Similaire à Big Data Connections Event Agenda and Overview

BAR360 open data platform presentation at DAMA, Sydney
BAR360 open data platform presentation at DAMA, SydneyBAR360 open data platform presentation at DAMA, Sydney
BAR360 open data platform presentation at DAMA, SydneySai Paravastu
 
The Value of the Modern Data Architecture with Apache Hadoop and Teradata
The Value of the Modern Data Architecture with Apache Hadoop and Teradata The Value of the Modern Data Architecture with Apache Hadoop and Teradata
The Value of the Modern Data Architecture with Apache Hadoop and Teradata Hortonworks
 
A modern data platform meets the needs of each type of data in your business
A modern data platform meets the needs of each type of data in your businessA modern data platform meets the needs of each type of data in your business
A modern data platform meets the needs of each type of data in your businessMarcos Quezada
 
Creatinganext generationbigdataarchitecture-141204150317-conversion-gate02
Creatinganext generationbigdataarchitecture-141204150317-conversion-gate02Creatinganext generationbigdataarchitecture-141204150317-conversion-gate02
Creatinganext generationbigdataarchitecture-141204150317-conversion-gate02email2jl
 
Creating a Next-Generation Big Data Architecture
Creating a Next-Generation Big Data ArchitectureCreating a Next-Generation Big Data Architecture
Creating a Next-Generation Big Data ArchitecturePerficient, Inc.
 
Big Data with Not Only SQL
Big Data with Not Only SQLBig Data with Not Only SQL
Big Data with Not Only SQLPhilippe Julio
 
Hooduku - Big data analytics - case study
Hooduku - Big data analytics - case studyHooduku - Big data analytics - case study
Hooduku - Big data analytics - case studySudhi Seshachala
 
Latest corp big data and acme
Latest corp   big data and acmeLatest corp   big data and acme
Latest corp big data and acmehooduku
 
Hadoop India Summit, Feb 2011 - Informatica
Hadoop India Summit, Feb 2011 - InformaticaHadoop India Summit, Feb 2011 - Informatica
Hadoop India Summit, Feb 2011 - InformaticaSanjeev Kumar
 
Architecting Agile Data Applications for Scale
Architecting Agile Data Applications for ScaleArchitecting Agile Data Applications for Scale
Architecting Agile Data Applications for ScaleDatabricks
 
Big data landscape map collection by aibdp
Big data landscape map collection by aibdpBig data landscape map collection by aibdp
Big data landscape map collection by aibdpAIBDP
 
Big data tim
Big data timBig data tim
Big data timT Weir
 
SAP Data Hub e SUSE Container as a Service Platform
SAP Data Hub e SUSE Container as a Service PlatformSAP Data Hub e SUSE Container as a Service Platform
SAP Data Hub e SUSE Container as a Service PlatformSUSE Italy
 
Bbbt presentation 210415_final_2
Bbbt presentation 210415_final_2Bbbt presentation 210415_final_2
Bbbt presentation 210415_final_2Roland Bullivant
 
Best Practices For Building and Operating A Managed Data Lake - StampedeCon 2016
Best Practices For Building and Operating A Managed Data Lake - StampedeCon 2016Best Practices For Building and Operating A Managed Data Lake - StampedeCon 2016
Best Practices For Building and Operating A Managed Data Lake - StampedeCon 2016StampedeCon
 
Accelerate Your Big Data Analytics Efforts with SAS and Hadoop
Accelerate Your Big Data Analytics Efforts with SAS and HadoopAccelerate Your Big Data Analytics Efforts with SAS and Hadoop
Accelerate Your Big Data Analytics Efforts with SAS and HadoopDataWorks Summit
 
Demystify big data data science
Demystify big data  data scienceDemystify big data  data science
Demystify big data data scienceMahesh Kumar CV
 

Similaire à Big Data Connections Event Agenda and Overview (20)

BAR360 open data platform presentation at DAMA, Sydney
BAR360 open data platform presentation at DAMA, SydneyBAR360 open data platform presentation at DAMA, Sydney
BAR360 open data platform presentation at DAMA, Sydney
 
Data engineering design patterns
Data engineering design patternsData engineering design patterns
Data engineering design patterns
 
The Value of the Modern Data Architecture with Apache Hadoop and Teradata
The Value of the Modern Data Architecture with Apache Hadoop and Teradata The Value of the Modern Data Architecture with Apache Hadoop and Teradata
The Value of the Modern Data Architecture with Apache Hadoop and Teradata
 
A modern data platform meets the needs of each type of data in your business
A modern data platform meets the needs of each type of data in your businessA modern data platform meets the needs of each type of data in your business
A modern data platform meets the needs of each type of data in your business
 
Creatinganext generationbigdataarchitecture-141204150317-conversion-gate02
Creatinganext generationbigdataarchitecture-141204150317-conversion-gate02Creatinganext generationbigdataarchitecture-141204150317-conversion-gate02
Creatinganext generationbigdataarchitecture-141204150317-conversion-gate02
 
Creating a Next-Generation Big Data Architecture
Creating a Next-Generation Big Data ArchitectureCreating a Next-Generation Big Data Architecture
Creating a Next-Generation Big Data Architecture
 
Big Data with Not Only SQL
Big Data with Not Only SQLBig Data with Not Only SQL
Big Data with Not Only SQL
 
Hooduku - Big data analytics - case study
Hooduku - Big data analytics - case studyHooduku - Big data analytics - case study
Hooduku - Big data analytics - case study
 
Latest corp big data and acme
Latest corp   big data and acmeLatest corp   big data and acme
Latest corp big data and acme
 
Hadoop India Summit, Feb 2011 - Informatica
Hadoop India Summit, Feb 2011 - InformaticaHadoop India Summit, Feb 2011 - Informatica
Hadoop India Summit, Feb 2011 - Informatica
 
Architecting Agile Data Applications for Scale
Architecting Agile Data Applications for ScaleArchitecting Agile Data Applications for Scale
Architecting Agile Data Applications for Scale
 
Big data landscape map collection by aibdp
Big data landscape map collection by aibdpBig data landscape map collection by aibdp
Big data landscape map collection by aibdp
 
Big data tim
Big data timBig data tim
Big data tim
 
SAP Data Hub e SUSE Container as a Service Platform
SAP Data Hub e SUSE Container as a Service PlatformSAP Data Hub e SUSE Container as a Service Platform
SAP Data Hub e SUSE Container as a Service Platform
 
Bbbt presentation 210415_final_2
Bbbt presentation 210415_final_2Bbbt presentation 210415_final_2
Bbbt presentation 210415_final_2
 
Ramesh kutumbaka resume
Ramesh kutumbaka resumeRamesh kutumbaka resume
Ramesh kutumbaka resume
 
Best Practices For Building and Operating A Managed Data Lake - StampedeCon 2016
Best Practices For Building and Operating A Managed Data Lake - StampedeCon 2016Best Practices For Building and Operating A Managed Data Lake - StampedeCon 2016
Best Practices For Building and Operating A Managed Data Lake - StampedeCon 2016
 
Accelerate Your Big Data Analytics Efforts with SAS and Hadoop
Accelerate Your Big Data Analytics Efforts with SAS and HadoopAccelerate Your Big Data Analytics Efforts with SAS and Hadoop
Accelerate Your Big Data Analytics Efforts with SAS and Hadoop
 
Demystify big data data science
Demystify big data  data scienceDemystify big data  data science
Demystify big data data science
 
SoftServe BI/BigData Workshop in Utah
SoftServe BI/BigData Workshop in UtahSoftServe BI/BigData Workshop in Utah
SoftServe BI/BigData Workshop in Utah
 

Dernier

New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 

Dernier (20)

New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 

Big Data Connections Event Agenda and Overview

  • 1. Welcome • Thank you: Francis - Silicon Valley Strategy, Innovation and Product Management group • Thank you: Michael & Sam and the Microsoft Store • Thank you: Aleks & David & SAP HANA • Thank you: All of You… You are the ‘Secret Sauce’
  • 2. Agenda • Quick Poll • Overview – AIBDP / Big Data Connection • Prasad Mavuduri – Board Member, AIBDP – “Demystifying Big Data” • David Sonnenschein – Vice President & Aleks Swerdlow Community Manager – SAP Labs - HANA In-Memory – Start-ups Success Stories” • Networking & Q&A
  • 3. Quick Poll • Relationship & Experience w/ Big Data • Job Role • Industry • Company Years - Start-up? • Big Data Implementation Status • Biggest Challenges / Opportunities • Vs Cometitors?
  • 4. Overview - Big Data Connections Mission: Demystify Big Data – Five E’s – entertain, engage, educate etc – Focus on Solutions (vs technology) – Focus on Specific Verticals • ex Healthcare, Risk, eCom/eMarketing, Manufacturing, Logistics, Telecom…) – Best Practices Case Study Reviews – Networking & Shared Learning – Sponsored by the American Institute of Big Data Professionals (AIBDP.org) – Sponsored by Big Data consulting firm, Data-Magnum
  • 5. BI Platform / Reporting OSS Visualizations Unstructured/ Search Indexing / Metadata Search NLP Hadoop Analytics Hadoop Dev Platforms / Automation HDFS Predictive Analytics THE CONFUSING WORLD OF BIG DATAAPPLICATIONSTOOLSDATAMANAGEMENT STRUCTURED UNSTRUCTURED Transactional DB OSS High Performance Analytical DB NewSQL Enhancement Distributed NoSQL Graph Document Key Value / Column Enterprise Apps Internet Apps Social Media Web Content Mobile Devices Camera / DVR Sensors / RFID Logfiles Hadoop aaS HDFS Alternatives DBaaS HANA GraphDB Filesystem EMR Text / Sentiment Analysis Data as a Service Data Warehouses vFabric L Drill Vertical Market Applications Impala Messaging Optimization Data Integration / CEP OSS IMDG Redshift Based on Source: Perella Weinberg Partners AI
  • 7.
  • 9.
  • 12. Business Intelligence Analytics / Visualization Big Data BI & Analytics/Visualization Landscape Oracle Essbase Laurén
  • 15. Our Landscape Collection as published on Startup50.com
  • 16. Simplified (so far)  Data Input - Sources, Databases, & Integration Tools  Platform / Infrastructure - Data Preparation, map reduce, filing, governance…  Data Presentation & Analysis – BI, Data Discovery, Visualization  Predictive Analytics & Machine Learning  Vertical & Horizontal Products (Specialized Applications)
  • 17. It can be made more complicated… o Hadoop o NoSQL o NewSQL o Structured Databases o NGDW (next generation data warehouse) o Cloud Services o Technical Services o Professional Services o Distributors o Deployment services o Deployment stack/appliances o Development services o Application stacks o Database stacks o Managed Monitoring o Storage o Security

Notes de l'éditeur

  1. Source: sqrll:To simplify the NoSQL world, lets take a look at the top 3 databases in terms of current popularity and how they compare to Apache Accumulo, which is at the core of our product, Sqrrl Enterprise.MongoDB:  It is a wonderfully easy-to-use document store that many select as a flexible replacement for a SQL database, as it (like all NoSQL databases) does not require pre-defined schemas.   However, MongoDB has difficulty scaling to very large datasets (e.g., 100+ TB) and does not natively work with your Hadoop cluster.  It also does not possess fine-grained security controls.Cassandra:  This is an excellent choice if your data is too big for MongoDB and you require multi-datacenter replication.  Although Cassandra was not originally designed to run natively on your Hadoop cluster, it now has integrations with MapReduce, Pig, and Hive.  It does not possess fine-grained security controls.HBase:  HBase natively integrates with Hadoop, and it can handle very large datasets.  However, it does not have fine-grained security controls. Accumulo:  Accumulo has an architecture most similar to HBase, which allows it also to natively plug into your Hadoop cluster.  It is far more scalable than MongoDB, and with reported cluster sizes in the multiple thousands within the Intelligence Community it is also significantly more scalable than HBase and Cassandra.  Accumulo is the only NoSQL database with cell-level security capabilities.  Accumulo also has other features that could lead one to choose it over HBase or Cassandra for reasons other than security or scalability.  For example, Accumulo has a powerful server-side programming mechanism called Iterators, which provide it with the capability to do a variety of real-time aggregations and analytics.These high level differences between MongoDB, Cassandra, HBase, and Accumulo are summarized in the decision tree diagram below.  Of course, there are a wide variety of more detailed technical differences that will be explored in greater detail in a later post.  This decision tree can be summarized with a few simple statements:If you need a quick, simple solution and have “small” Big Data (e.g., a few dozen terabytes), MongoDB may be the answer.If you need cell-level security or multi-petabyte scalability, Accumulo is the right answer.If you have data that is too big for MongoDB and don’t need cell-level security or massive scalability, we would recommend testing HBase, Cassandra, and Accumulo for your specific workloads.  Each has their own nuanced advantages and disadvantages.If you don’t need real-time analytics, you are probably on the wrong decision tree and can stick with the Hadoop Distributed File System and batch analytics. It is worth noting that the NoSQL databases above are all open source databases.  Sqrrl Enterprise builds upon Accumulo and adds a number of additional features to Accumulo including streaming ingest, JSON, encryption, identity management integrations, full-text search, SQL queries, graph search, and statistics.  We believe that these features set Sqrrl Enterprise apart from other Big Data platforms.
  2. http://www.capgemini.com/blog/capping-it-off/2012/09/big-data-vendors-and-technologies-the-listBig Data Vendors and TechnologiesData Acquisition stream - technological providers Ab InitioHPIBM (Datastage, Streams, Data mirror)Informatica (PowerCenter, PowerExchange, CEP)KalidoMicrosoftNumentaOracleSAPSASSplunkSyncsortTalendTibcoData ProvidersComScoreDatasiftExperianFactualGfKGnipIMSInrixKaggleKnoemaLexisNexisMicrosoft (with their Windows Azure Marketplace data market)NielsenReutersSalesforce Radian6Symphony IRIsocial network websites like Facebook, Google+, LinkedIn, Tumblr, Twitter or Viadeoall the Open Data providers, like governments, regions, etc.Marshalling domain - Very Large Data Warehousing and BI AppliancesActian; ParaccelEMC² (Greenplum)HP (Vertica)IBM (Netezza)KognitioMicrosoft (SQL 2012 and PDW)Oracle (Exadata)SAP (HANA and Sybase IQ)SASTeradataNoSQL Domain – Main technologies and vendors: Amazon (as cloud provider or with their own NoSQL solution)CassandraCloudera (CDH, Hadoop distribution)CouchDBEMC²GoogleHadoop (of course)GoogleHortonworks (Hadoop distribution)HPIBMKXMapR (Hadoop distribution)MarkLogicMicrosoft (Hadoop on Windows and Azure)MongoDBNeo4JOraclePalantirSnaplogicSparsitySplunkTeradata (Aster Data)ZL TechnologiesContent Management Space:AdobeAlfrescoEMC² (Documentum)IBM (FileNet)HP (Autonomy)MicrosoftOpenTextOracle.Analytics phasePredictive technologies (such as data mining) and vendors which are Adobe, EMC², GoodData, Hadoop Map Reduce, HP, IBM (SPSS), Karmasphere, Kxen, Microsoft, Mzinga, Oracle, R, Salesforce, SAS, SAP (R on HANA) and Teradata (Aprimo). Data Virtualization (and data federation) is currently led by Composite, Denodo, HP (IDOL), IBM, Informatica, Microsoft, Oracle (Exalytics), SAP and Teiid (JBoss community).c BI Tools Vendors:ActuateDassaultSystèmes (Exalead)DomoEsriGoodDataGoogleHP (Autonomy)IBM (Cognos suite)Information BuildersLogiXML (LogiAnalytics)Microsoft (SQL 2012)MicrostrategyNeutrinoBIOracle (OBI Foundation)PanopticonPanoramaPentahoQlikviewRoambiSAP (BI4 suite)SASSpagoBITableauTIBCO Spotfire.Action Phase - Data Acquisition providers plus the ERP, CRM and BPM actorsAdobeEloquaEMC²IBMiGrafxMicrosoftOpenTextOraclePegaProgress softwareSAPSalesforceSoftware AGTeradata (Aprimo) Tibco.Data Governance area - Master Data Management (MDM), metadata and data quality toolsAdaptiveHPIBMInformaticaKalidoMicrosoftOracleOrchestra NetworksSAPSASTalendTibco. Note that the Complex Event Processing (CEP) Tools are part of Acquisition (streaming data acquisition), Marshalling (eg in-memory storage as data is used or compared immediately) and Analytics (eg Monitoring functions to detect abnormal activity) streams.Note that the BI Tools are part of Analytics (Computing Key Performance Indicators) and Action (eg Creating Alerts in a push mode by mail for instance) streams.
  3. Citrisleaf = AerospikeCouchbase – roots are in Northscale – Membase .. CouchDB; two focus audiences – Enterprise & funnel
  4. Analytics Infrastrucure = MPP – Distributed open-source, Apache-licensed distribution of Apache Hadoop ... Open source, Massively Parallel Processing (MPP) query engineInfrastucure ad a Service = Cloud IaaSOperational Infrastructure = Structure of Data – ex JSAN; ad-hoc queries; unstructured data; behaviorial, redundencyNot Listed – Hardware / Storage – NetApp, EMC, HP
  5. Per Forbes (per Wikibon), Big Data is an $18 billion industry heading to $50 billion in five years.  The companies in the inner-circle (ex: MapR, Cloudera, Splunk, Couchbase etc) are pure-plays within Big Data.  A theory is these inner-circle players will probably get gobbled up by the big boys on the outside, who are just starting to play in the Big Data space (like SAP, Microsoft, Oracle, IBM…) In the meantime, the relative sizes of the circles reflects the relative size of the companies, in terms of revenue.  The percentages reflect the % of their current business that is ‘big data’
  6. 5/18/13 w/ Paul HofmannPalantir – just text; just Homeland SecurityOracle Endica – addedHP Autonomy AddedAttivio (partner with TIBCO added)Saffron – Semantec and .. (Risk predictive) added0xData – changed logoMuSigma -= Consultant onlyRecorded Future -= Timeline; Opera = Text-only?; No predictive Analytics?Kxen – nice companySAS – Dead? Not scalable; Skytree = a platform / toolbox.. You need to have yoru own Data Quant to create yuur own analytics Sociocast – Saffron PartnerDigital Reasoning – Strong with Dept of Defense too
  7. NoSQL databases currently available include:Hbase (Apache)Cassandra (DataStax)MarkLogic (MarkLogic)Aerospike (CitrixDB)MongoDB (10gen)Accumulo (Apache)Riak (Basho)CouchDB (CouchBase)DynamoDB (Amazon)Sqrrl (?)VoltDB (?)http://thinkbiganalytics.com/leading_big_data_technologies/nosql/NoSQLNoSQL is an umbrella term for a broad class of database management systems that relax some of the tradition design constraints of relational database management systems (RDBMS) in order to meet goals of more cost-effective scalability, flexible tradeoffs of availability vs. consistency (as described by the CAP theorem), and flexibility for data structures that don’t fit well into the relational model, such as key-value data and large graphs. NoSQL databases typically don’t offer ACID transactions nor full SQL dialects.The NoSQL ecosystem is very large. Among the better known databases are HBase, Cassandra, Aerospike, DynamoDB, MongoDB, Riak, Redis, Accumulo, Datatomic, and Couchbase. Of these, HBase and Accumulo are more closely tied to Hadoop than the others, as both use HDFS, by default, for persistent storage and Zookeeper for service federation.NoSQL databases expose different information models, including key-value records, JSON or XML documents as records, or graph-oriented data. They expose corresponding programmer APIs and sometimes custom query languages that may or may not be SQL-based. However, a recent trend in this industry is the re-introduction of restricted SQL dialects to support the large user community accustomed to SQL and improving support for transactions.As an example of a scenario where a NoSQL database is a good fit, an event log for a web site might be captured in a key-value store, where fast appends and key-based retrievals are required, but not updates nor joins.HBaseHBase is a distributed, column-oriented database, where each cell is versioned (a configurable number of previous values is retained). HBase provides Bigtable-like capabilities on top of Hadoop. SQL queries (but not updates) are supported using Hive, but with high latency. Eventually, Impala will also support Hive queries with lower latency. Like many NoSQL databases, HBase does not support complex transactions, SQL, or ACID transactions. However, HBase offers high read and write performance and is used in several large applications, such as Facebook’s Messaging Platform. By default, HBase uses HDFS for durable storage, but it layers on top of this storage fast record-level queries and updates, which “raw” HDFS doesn’t support. Hence, HBase is useful when fast, record-level queries and updates are required, but storage in HDFS is desired for use with Pig, Hive, or other MapReduce-based tools.Cassandra Cassandra is the most popular NoSQL database for very large data sets. It is a key-value, clustered database that uses column-oriented storage, sharding by key ranges, and redundant storage for scalability in both data sizes and read/write performance, as well as resiliency against “hot” nodes and node failures. Cassandra has configurable consistency vs. availability (CAP theorem) tradeoffs, such as a tunable quorum model for writes.MongoDB MongoDB is a document-oriented NoSQL database where each record is a JSON document. It has a rich, Javascript-based query language that exploits the implicit structure of JSON. MongoDB supports sharding for improved scalability and resilience. It is most popular for small to large data sets and less commonly used for very large data sets.DynamoDBDynamoDB is Amazon’s highly scalable and available, key-value, NoSQL database. DynamoDB was one of the earliest NoSQL databases and papers written about it influenced the design of many other NoSQL databases, such as Cassandra.CouchbaseCouchbase is a key-value NoSQL database that is well-suited for mobile applications where a copy of a data set is resident on many devices, where changes can be performed on any copy, and copies are synchronized when connectivity is available. Think of how an email client works with local copies of your email history and corresponding email servers. RedisRedis is a key-value store with the specific support for fundamental data structures as values, including strings, hash maps, lists, sets, and sorted sets, whereas most key-value stores have limited understanding of a value’s meaning, except to represent the value as column cells, if many cases. For this reason, Redis is sometimes called a data structure server. Redis keeps all data in memory, which improves performance, but limits the data set sizes it can manage. Durability is optional, by periodic flushing to disk or writing updates to an append log. Master slave replication is also supported. Datomic Datomic is a newer entrant in the NoSQL landscape with a unique data model that remembers the state of the database at all points in the past, making historical reconstruction of events and state trivial. Many standard database operations are supported, including joins and ACID transactions. Deployments are distributed, elastic, highly available. RiakRiak is a fault-tolerant, distributed, key-value NoSQL database designed for large-scale deployments in cloud or hosted environments. A Riak database is masterless, with no single points of failure. It is resilient against the failure of multiple nodes and nodes can be added or removed easily. Riak is also optimized for read and write-intensive applications.