SlideShare une entreprise Scribd logo
1  sur  38
Productionizing Hadoop: 7
Architectural Best Practices
Mike Gualtieri, Principal Analyst
#BigData
© 2013 Forrester Research, Inc. Reproduction Prohibited
7% 13% 7% 17% 31%
Implemented, not expanding Expanding/upgrading implementation
Planning to implement in the next 12 months Planning to implement in more than 1 year
Interested but no plans
Base: 634 business intelligence users and planners
“What best describes your firm's current usage/plans to adopt Big Data technologies and solutions?”
Source: Forrsights BI/Big Data Survey, Q3 2012
Big Data has momentum
20% have
implemented
some big data
technology
37% are planning some big
data technology project
“Big Data is the frontier of a firm’s
ability to store, process, and access
(SPA) all of the data it needs to
operate, make decisions, reduce
risks, and serve customers.”
DEFINITION
FORRESTER
© 2013 Forrester Research, Inc. Reproduction Prohibited
2%
3%
21%
22%
28%
32%
32%
36%
36%
38%
41%
Other
Don't know
Earlier generation technology is too expensive
The velocity of data is too high for earlier technologies
We can achieve (or are achieving) significant cost reductions by
changing our data management and analytic architecture
Data changes or becomes available much faster than we can
process in support of business decisions
The number of data formats that we must be able to deal with
exceeds our ability to cost-effectively integrate
Analysis requirements change too fast to keep up with
We want to access data that was not accessible for us with
existing technologies
Data volumes have grown beyond what we can cost effectively
manage
We don't know what our entire data universe contains, we need
new ways to explore data and discover patterns and…
“What are the main business and technical requirements or inadequacies of earlier-generation BI technologies
that lead you to consider new BI techniques and technologies?”
Firms seek more value in data, struggle to
wrangle it, & seek lower cost solutions
© 2013 Forrester Research, Inc. Reproduction Prohibited
Integrating data from a variety of data
sources is a top challenge
© 2013 Forrester Research, Inc. Reproduction Prohibited
Big Data architecture must support
three core capabilities (SPA):
•Can you capture and store
all your data++?Store
•Do you have the compute
power to cleanse, enrich, &
analyze your data++?
Process
•Can you
retrieve, search, integrate, a
nd visualize all your data++?
Access
7
8
#Production
How can you keep your Big Data
operations running smoothly?
Production
© 2013 Forrester Research, Inc. Reproduction Prohibited
Productionizing Big Data can be complex
because of:
Integration with heterogeneous infrastructure
Use of multiple analytical software applications
Reliance on 3rd-party cloud services
Always available modeling and visualization sandboxes
Increasing volume, velocity, variety of data from multiple
data sources
Compute intensive analytics
Big Data production requires sound architecture.
Production
The 7 architectural qualities of Big Data
production platforms
Quality What it means
1 Experience
Users’ perceptions of the usefulness, usability, and
desirability of the application.
2 Availability
The readiness of the service or application to perform its
functions when needed
3 Performance
The speed to perform functions to meet business and
user expectations
4 Scalability
Handle increasing volumes of data, transactions,
services, and applications.
5 Adaptability
The ease with which an application or service can be
changed or extended
6 Security
Supports the security properties of confidentiality,
integrity, authentication, authorization, and
nonrepudiation
7 Economy
Minimize cost to build, operate, & change an application
or service without compromising its business value
Operational experience is critical to production.
1. Experience
Best practices: User experience
Usefulness, Usability, Desirability of applications
require ease of use with power
Developers Administrators
• Standard Tools
• Linux Commands
• Direct Access with NFS
• Visibility
• Self Healing
• Architectural Simplicity
Easy Workflow Management
Workload Automation with Cisco Tidal
Enterprise Scheduler
• Detailed, dependency-driven event execution
• Point-and-click dynamic variables and
parameters
• Scalable, extensible architecture
• Granular notification and alerts
High-availability strategy and architecture are
often overlooked in proof-of-concepts.
2. Availability
What does high availability mean?
Uptime %* Downtime per year
99.999% (5 nines) 5.26 minutes
99.99% (4 nines) 52.6 minutes
99.5% 1.83 days
99% (2 nines) 3.65 days
98% 7.30 days
95% 18.25 days
*Uptime calculations assume no scheduled downtime.
19©MapR Technologies - Confidential
High Availability and Dependability
Reliable Compute
Dependable
Storage
 Automated stateful failover
 Automated re-replication
 Self-healing from HW
and SW failures
 Load balancing
 Rolling upgrades
 No lost jobs or data
 99999’s of uptime
 Business continuity with
snapshots and mirrors
 Recover to a point in time
 End-to-end check summing
 Strong consistency
 Data safe
 Mirror across sites to meet
Recovery Time Objectives
Unexpected latencies can emerge from rapid
fluctuations in volume, velocity, & variety of data
and interactions of the larger Big Data
ecosystem.
3. Performance
21©MapR Technologies - Confidential
World Record Performance
New Minute Sort World
Record
1.5 TB in 1 minute
2103 nodes
Previous Record: 1.4 TB
Benchmark MapR 2.1.1 CDH 4.1.1 MapR Speed
Increase
Terasort (1x replication, compression disabled)
Total 13m 35s 26m 6s 1.9x
Map 7m 58s 21m 8s 2.7x
Reduce 13m 32s 23m 37s 1.7x
DFSIO throughput/node
Read 1003 MB/s 656 MB/s 1.5x
Write 924 MB/s 654 MB/s 1.4x
YCSB (50% read, 50% update)
Throughput 36,584.4 op/s 12,500.5 op/s 2.9x
Runtime 3.80 hr 11.11 hr 2.9x
YCSB (95% read, 5% update)
Throughput 24,704.3 op/s 10,776.4 op/s 2.3x
Runtime 0.56 hr 1.29 hr 2.3x
Scalability is as much about scaling up as it is
about scaling down.
4. Scalability
23©MapR Technologies - Confidential
MapR’s Relative Scale
Testing completed on 10 node cluster, 2x Quad-Core, 24G
DRAM 12 x 1TB SATA Drives @ 7200 rpm
0
2000
4000
6000
8000
10000
12000
14000
16000
18000
0 1000 2000 3000 4000 5000 6000
Filecreates/s
Files (M)
0 100 200 400 600 800 1000
0
100
200
300
400
0 0.5 1 1.5
Filecreates/s Files (M)
Other distribution
MapR distribution
Scale Advantage: 4600x
Firms have barely scratched the surface of what
is possible with Big Data analytics. Change is
always in the wind.
5. Adaptability
I am a data
scientist.
I am a data
scientist.
I am a data
scientist.
Data scientists will constantly have new
requirements
…to accelerate
the pace of
discovery
Compress…
Production must address and help compress
the full Big Data analytics life cycle
27©MapR Technologies - Confidential
Direct Integration with Existing Applications
 100% POSIX compliant
 Industry standard APIs
- NFS, ODBC, LDAP, REST
 More 3rd party solutions
 Proprietary connectors
unnecessary
 Language neutral
A breach can devastate an organization's
reputation with customers or have legal
repercussions.
6. Security
All, some, or none of these 6 security
properties may apply to Big Data
• Information is available only to the people
intended to use it or see itConfidentiality
• Information is only changed in appropriate ways
by people authorized to change itIntegrity
• Applications are available when needed and
perform acceptablyReadiness
• A person’s identity is determined before access
is granted if anonymous people are not allowedAuthentication
• People are allowed or denied access to
applications or application resourcesAuthorization
• A person cannot perform and action and then
later deny performing that actionNonrepudiation
30©MapR Technologies - Confidential
Securing Big Data
Corporate Security Requirements
 Authentication
Wire-level security
 Authorization (Access Control)
Standard: UID, GID based
Granular: File, Table, Column Family, Column, Cell
 Integration into Existing Environments
Kerberos or non-Kerberos
Use existing Directory for credential lookups
 Seamless Access with Single Sign-On
Every architectural decision has an impact on
the return on investment for Big Data analytics
platforms.
7. Economy
Production
Sweet
Spot
Beware of pilot programs that don’t scale
economically
Business value of big data
Investment
People-
intensive
platforms
Technology-
intensive
platforms
33©MapR Technologies - Confidential
Maximizing Economic Value
 Analytics – Ability to perform broader and
deeper analytics
– Real-time streaming
– Mission critical SLAs
– Cloud based analysis
 Ease of Development
 Ease of Administration
 Value of Uptime
 Value of Data Protection
 Hardware Efficiency
 First Class Support
34©MapR Technologies - Confidential
One Platform for Big Data
…
99.999%
HA
Data
Protection
Disaster
Recovery
Scalability
&
Performance
Enterprise
Integration
Multi-
tenancy
Map
Reduce
File-Based
Applications
SQL Database Search Stream
Processing
Batch Interactive Real-time
The 7 qualities of Big Data production
platforms
Quality What it means
1 Experience
Users’ perceptions of the usefulness, usability, and
desirability of the application.
2 Availability
The readiness of the service or application to perform its
functions when needed
3 Performance
The speed to perform functions to meet business and
user expectations
4 Scalability
Handle increasing or decreasing volumes of
transactions, services, and data
5 Adaptability
The ease with which an application or service can be
changed or extended
6 Security
Supports the security properties of confidentiality,
integrity, authentication, authorization, and
nonrepudiation
7 Economy
Minimize cost to build, operate, & change an application
or service without compromising its business value
Big Data is about innovation, but not if you
don’t productionize it.
36
Collectors
• Capture
• Store
Journalists
• Reports
• Dashboards
Innovators
• Predictive
analytics
Operations
Business
Intelligence
Predictive
Power
Frontier
Big data is about pushing limits. Exponential
growth in data means the frontier is vast.
Thank you
Mike Gualtieri
mgualtieri@forrester.com
Twitter: @mgualtieri

Contenu connexe

Tendances

Data Warehouse Optimization
Data Warehouse OptimizationData Warehouse Optimization
Data Warehouse OptimizationCloudera, Inc.
 
Big data technologies and Hadoop infrastructure
Big data technologies and Hadoop infrastructureBig data technologies and Hadoop infrastructure
Big data technologies and Hadoop infrastructureRoman Nikitchenko
 
Hadoop and Hive in Enterprises
Hadoop and Hive in EnterprisesHadoop and Hive in Enterprises
Hadoop and Hive in Enterprisesmarkgrover
 
Supporting Financial Services with a More Flexible Approach to Big Data
Supporting Financial Services with a More Flexible Approach to Big DataSupporting Financial Services with a More Flexible Approach to Big Data
Supporting Financial Services with a More Flexible Approach to Big DataWANdisco Plc
 
Hadoop in the cloud – The what, why and how from the experts
Hadoop in the cloud – The what, why and how from the expertsHadoop in the cloud – The what, why and how from the experts
Hadoop in the cloud – The what, why and how from the expertsDataWorks Summit
 
Top Hadoop Big Data Interview Questions and Answers for Fresher
Top Hadoop Big Data Interview Questions and Answers for FresherTop Hadoop Big Data Interview Questions and Answers for Fresher
Top Hadoop Big Data Interview Questions and Answers for FresherJanBask Training
 
20100806 cloudera 10 hadoopable problems webinar
20100806 cloudera 10 hadoopable problems webinar20100806 cloudera 10 hadoopable problems webinar
20100806 cloudera 10 hadoopable problems webinarCloudera, Inc.
 
Building a Big Data platform with the Hadoop ecosystem
Building a Big Data platform with the Hadoop ecosystemBuilding a Big Data platform with the Hadoop ecosystem
Building a Big Data platform with the Hadoop ecosystemGregg Barrett
 
Data warehousing with Hadoop
Data warehousing with HadoopData warehousing with Hadoop
Data warehousing with Hadoophadooparchbook
 
Introduction to Hadoop and Cloudera, Louisville BI & Big Data Analytics Meetup
Introduction to Hadoop and Cloudera, Louisville BI & Big Data Analytics MeetupIntroduction to Hadoop and Cloudera, Louisville BI & Big Data Analytics Meetup
Introduction to Hadoop and Cloudera, Louisville BI & Big Data Analytics Meetupiwrigley
 
Hadoop vs. RDBMS for Advanced Analytics
Hadoop vs. RDBMS for Advanced AnalyticsHadoop vs. RDBMS for Advanced Analytics
Hadoop vs. RDBMS for Advanced Analyticsjoshwills
 
Understanding Big Data And Hadoop
Understanding Big Data And HadoopUnderstanding Big Data And Hadoop
Understanding Big Data And HadoopEdureka!
 
The Time Has Come for Big-Data-as-a-Service
The Time Has Come for Big-Data-as-a-ServiceThe Time Has Come for Big-Data-as-a-Service
The Time Has Come for Big-Data-as-a-ServiceBlueData, Inc.
 
Big Data and Hadoop Basics
Big Data and Hadoop BasicsBig Data and Hadoop Basics
Big Data and Hadoop BasicsSonal Tiwari
 
VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...
VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...
VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...VMworld
 
Lecture4 big data technology foundations
Lecture4 big data technology foundationsLecture4 big data technology foundations
Lecture4 big data technology foundationshktripathy
 
Big Data Architecture Workshop - Vahid Amiri
Big Data Architecture Workshop -  Vahid AmiriBig Data Architecture Workshop -  Vahid Amiri
Big Data Architecture Workshop - Vahid Amiridatastack
 

Tendances (20)

Data Warehouse Optimization
Data Warehouse OptimizationData Warehouse Optimization
Data Warehouse Optimization
 
Big data technologies and Hadoop infrastructure
Big data technologies and Hadoop infrastructureBig data technologies and Hadoop infrastructure
Big data technologies and Hadoop infrastructure
 
A Mayo Clinic Big Data Implementation
A Mayo Clinic Big Data ImplementationA Mayo Clinic Big Data Implementation
A Mayo Clinic Big Data Implementation
 
Big data Hadoop
Big data  Hadoop   Big data  Hadoop
Big data Hadoop
 
Hadoop and Hive in Enterprises
Hadoop and Hive in EnterprisesHadoop and Hive in Enterprises
Hadoop and Hive in Enterprises
 
Supporting Financial Services with a More Flexible Approach to Big Data
Supporting Financial Services with a More Flexible Approach to Big DataSupporting Financial Services with a More Flexible Approach to Big Data
Supporting Financial Services with a More Flexible Approach to Big Data
 
Hadoop in the cloud – The what, why and how from the experts
Hadoop in the cloud – The what, why and how from the expertsHadoop in the cloud – The what, why and how from the experts
Hadoop in the cloud – The what, why and how from the experts
 
Top Hadoop Big Data Interview Questions and Answers for Fresher
Top Hadoop Big Data Interview Questions and Answers for FresherTop Hadoop Big Data Interview Questions and Answers for Fresher
Top Hadoop Big Data Interview Questions and Answers for Fresher
 
20100806 cloudera 10 hadoopable problems webinar
20100806 cloudera 10 hadoopable problems webinar20100806 cloudera 10 hadoopable problems webinar
20100806 cloudera 10 hadoopable problems webinar
 
Building a Big Data platform with the Hadoop ecosystem
Building a Big Data platform with the Hadoop ecosystemBuilding a Big Data platform with the Hadoop ecosystem
Building a Big Data platform with the Hadoop ecosystem
 
Data warehousing with Hadoop
Data warehousing with HadoopData warehousing with Hadoop
Data warehousing with Hadoop
 
Introduction to Hadoop and Cloudera, Louisville BI & Big Data Analytics Meetup
Introduction to Hadoop and Cloudera, Louisville BI & Big Data Analytics MeetupIntroduction to Hadoop and Cloudera, Louisville BI & Big Data Analytics Meetup
Introduction to Hadoop and Cloudera, Louisville BI & Big Data Analytics Meetup
 
Hadoop vs. RDBMS for Advanced Analytics
Hadoop vs. RDBMS for Advanced AnalyticsHadoop vs. RDBMS for Advanced Analytics
Hadoop vs. RDBMS for Advanced Analytics
 
50 Shades of SQL
50 Shades of SQL50 Shades of SQL
50 Shades of SQL
 
Understanding Big Data And Hadoop
Understanding Big Data And HadoopUnderstanding Big Data And Hadoop
Understanding Big Data And Hadoop
 
The Time Has Come for Big-Data-as-a-Service
The Time Has Come for Big-Data-as-a-ServiceThe Time Has Come for Big-Data-as-a-Service
The Time Has Come for Big-Data-as-a-Service
 
Big Data and Hadoop Basics
Big Data and Hadoop BasicsBig Data and Hadoop Basics
Big Data and Hadoop Basics
 
VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...
VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...
VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...
 
Lecture4 big data technology foundations
Lecture4 big data technology foundationsLecture4 big data technology foundations
Lecture4 big data technology foundations
 
Big Data Architecture Workshop - Vahid Amiri
Big Data Architecture Workshop -  Vahid AmiriBig Data Architecture Workshop -  Vahid Amiri
Big Data Architecture Workshop - Vahid Amiri
 

En vedette

HDFS HA : Stockage à haute disponibilité par Damien Hardy
HDFS HA : Stockage à haute disponibilité par Damien HardyHDFS HA : Stockage à haute disponibilité par Damien Hardy
HDFS HA : Stockage à haute disponibilité par Damien HardyOlivier DASINI
 
Hadoop tools with Examples
Hadoop tools with ExamplesHadoop tools with Examples
Hadoop tools with ExamplesJoe McTee
 
Java collections-interview-questions
Java collections-interview-questionsJava collections-interview-questions
Java collections-interview-questionsyearninginjava
 
Paris stormusergroup intrudocution
Paris stormusergroup intrudocutionParis stormusergroup intrudocution
Paris stormusergroup intrudocutionParis_Storm_UG
 
Hadoop Operations - Best Practices from the Field
Hadoop Operations - Best Practices from the FieldHadoop Operations - Best Practices from the Field
Hadoop Operations - Best Practices from the FieldDataWorks Summit
 
Best Practices for Big Data Analytics with Machine Learning by Datameer
Best Practices for Big Data Analytics with Machine Learning by DatameerBest Practices for Big Data Analytics with Machine Learning by Datameer
Best Practices for Big Data Analytics with Machine Learning by DatameerDatameer
 
(BDT309) Data Science & Best Practices for Apache Spark on Amazon EMR
(BDT309) Data Science & Best Practices for Apache Spark on Amazon EMR(BDT309) Data Science & Best Practices for Apache Spark on Amazon EMR
(BDT309) Data Science & Best Practices for Apache Spark on Amazon EMRAmazon Web Services
 
Best Practices for Using Apache Spark on AWS
Best Practices for Using Apache Spark on AWSBest Practices for Using Apache Spark on AWS
Best Practices for Using Apache Spark on AWSAmazon Web Services
 
(BDT303) Running Spark and Presto on the Netflix Big Data Platform
(BDT303) Running Spark and Presto on the Netflix Big Data Platform(BDT303) Running Spark and Presto on the Netflix Big Data Platform
(BDT303) Running Spark and Presto on the Netflix Big Data PlatformAmazon Web Services
 
Hadoop Overview & Architecture
Hadoop Overview & Architecture  Hadoop Overview & Architecture
Hadoop Overview & Architecture EMC
 
Big Data Analytics with Hadoop
Big Data Analytics with HadoopBig Data Analytics with Hadoop
Big Data Analytics with HadoopPhilippe Julio
 

En vedette (12)

Java Collections
Java CollectionsJava Collections
Java Collections
 
HDFS HA : Stockage à haute disponibilité par Damien Hardy
HDFS HA : Stockage à haute disponibilité par Damien HardyHDFS HA : Stockage à haute disponibilité par Damien Hardy
HDFS HA : Stockage à haute disponibilité par Damien Hardy
 
Hadoop tools with Examples
Hadoop tools with ExamplesHadoop tools with Examples
Hadoop tools with Examples
 
Java collections-interview-questions
Java collections-interview-questionsJava collections-interview-questions
Java collections-interview-questions
 
Paris stormusergroup intrudocution
Paris stormusergroup intrudocutionParis stormusergroup intrudocution
Paris stormusergroup intrudocution
 
Hadoop Operations - Best Practices from the Field
Hadoop Operations - Best Practices from the FieldHadoop Operations - Best Practices from the Field
Hadoop Operations - Best Practices from the Field
 
Best Practices for Big Data Analytics with Machine Learning by Datameer
Best Practices for Big Data Analytics with Machine Learning by DatameerBest Practices for Big Data Analytics with Machine Learning by Datameer
Best Practices for Big Data Analytics with Machine Learning by Datameer
 
(BDT309) Data Science & Best Practices for Apache Spark on Amazon EMR
(BDT309) Data Science & Best Practices for Apache Spark on Amazon EMR(BDT309) Data Science & Best Practices for Apache Spark on Amazon EMR
(BDT309) Data Science & Best Practices for Apache Spark on Amazon EMR
 
Best Practices for Using Apache Spark on AWS
Best Practices for Using Apache Spark on AWSBest Practices for Using Apache Spark on AWS
Best Practices for Using Apache Spark on AWS
 
(BDT303) Running Spark and Presto on the Netflix Big Data Platform
(BDT303) Running Spark and Presto on the Netflix Big Data Platform(BDT303) Running Spark and Presto on the Netflix Big Data Platform
(BDT303) Running Spark and Presto on the Netflix Big Data Platform
 
Hadoop Overview & Architecture
Hadoop Overview & Architecture  Hadoop Overview & Architecture
Hadoop Overview & Architecture
 
Big Data Analytics with Hadoop
Big Data Analytics with HadoopBig Data Analytics with Hadoop
Big Data Analytics with Hadoop
 

Similaire à Productionizing Hadoop: 7 Architectural Best Practices

Maximizing Oil and Gas (Data) Asset Utilization with a Logical Data Fabric (A...
Maximizing Oil and Gas (Data) Asset Utilization with a Logical Data Fabric (A...Maximizing Oil and Gas (Data) Asset Utilization with a Logical Data Fabric (A...
Maximizing Oil and Gas (Data) Asset Utilization with a Logical Data Fabric (A...Denodo
 
R+Hadoop - Ask Bigger (and New) Questions and Get Better, Faster Answers
R+Hadoop - Ask Bigger (and New) Questions and Get Better, Faster AnswersR+Hadoop - Ask Bigger (and New) Questions and Get Better, Faster Answers
R+Hadoop - Ask Bigger (and New) Questions and Get Better, Faster AnswersRevolution Analytics
 
Webinar: DataStax and Microsoft Azure: Empowering the Right-Now Enterprise wi...
Webinar: DataStax and Microsoft Azure: Empowering the Right-Now Enterprise wi...Webinar: DataStax and Microsoft Azure: Empowering the Right-Now Enterprise wi...
Webinar: DataStax and Microsoft Azure: Empowering the Right-Now Enterprise wi...DataStax
 
Building Confidence in Big Data - IBM Smarter Business 2013
Building Confidence in Big Data - IBM Smarter Business 2013 Building Confidence in Big Data - IBM Smarter Business 2013
Building Confidence in Big Data - IBM Smarter Business 2013 IBM Sverige
 
Introduction of big data and analytics
Introduction of big data and analyticsIntroduction of big data and analytics
Introduction of big data and analyticsSanjeev Solanki
 
AI as a Service, Build Shared AI Service Platforms Based on Deep Learning Tec...
AI as a Service, Build Shared AI Service Platforms Based on Deep Learning Tec...AI as a Service, Build Shared AI Service Platforms Based on Deep Learning Tec...
AI as a Service, Build Shared AI Service Platforms Based on Deep Learning Tec...Databricks
 
Benefits of Operating an On-Premises Infrastructure
Benefits of Operating an On-Premises InfrastructureBenefits of Operating an On-Premises Infrastructure
Benefits of Operating an On-Premises InfrastructureRebekah Rodriguez
 
Data Architecture Best Practices for Advanced Analytics
Data Architecture Best Practices for Advanced AnalyticsData Architecture Best Practices for Advanced Analytics
Data Architecture Best Practices for Advanced AnalyticsDATAVERSITY
 
IBM InterConnect 2013 Expert Integrated Systems Keynote: Sotiropoulos & Wieck
IBM InterConnect 2013 Expert Integrated Systems Keynote: Sotiropoulos & WieckIBM InterConnect 2013 Expert Integrated Systems Keynote: Sotiropoulos & Wieck
IBM InterConnect 2013 Expert Integrated Systems Keynote: Sotiropoulos & WieckIBM Events
 
The New Trillium DQ: Big Data Insights When and Where You Need Them
The New Trillium DQ: Big Data Insights When and Where You Need ThemThe New Trillium DQ: Big Data Insights When and Where You Need Them
The New Trillium DQ: Big Data Insights When and Where You Need ThemPrecisely
 
Slides-Discover-Power-of-Live-Data(2).pdf
Slides-Discover-Power-of-Live-Data(2).pdfSlides-Discover-Power-of-Live-Data(2).pdf
Slides-Discover-Power-of-Live-Data(2).pdfbutthead7
 
Enabling Next Gen Analytics with Azure Data Lake and StreamSets
Enabling Next Gen Analytics with Azure Data Lake and StreamSetsEnabling Next Gen Analytics with Azure Data Lake and StreamSets
Enabling Next Gen Analytics with Azure Data Lake and StreamSetsStreamsets Inc.
 
2022 Trends in Enterprise Analytics
2022 Trends in Enterprise Analytics2022 Trends in Enterprise Analytics
2022 Trends in Enterprise AnalyticsDATAVERSITY
 
Essential Prerequisites for Maximizing Success from Big Data
Essential Prerequisites for Maximizing Success from Big DataEssential Prerequisites for Maximizing Success from Big Data
Essential Prerequisites for Maximizing Success from Big DataSociety of Petroleum Engineers
 
Sphere 3D presentation for Credit Suisse technology conference 2014
Sphere 3D presentation for Credit Suisse technology conference 2014Sphere 3D presentation for Credit Suisse technology conference 2014
Sphere 3D presentation for Credit Suisse technology conference 2014Peter Bookman
 
Peek into Neo4j Product Strategy and Roadmap
Peek into Neo4j Product Strategy and RoadmapPeek into Neo4j Product Strategy and Roadmap
Peek into Neo4j Product Strategy and RoadmapNeo4j
 
Fast and Furious: From POC to an Enterprise Big Data Stack in 2014
Fast and Furious: From POC to an Enterprise Big Data Stack in 2014Fast and Furious: From POC to an Enterprise Big Data Stack in 2014
Fast and Furious: From POC to an Enterprise Big Data Stack in 2014MapR Technologies
 
Capgemini Leap Data Transformation Framework with Cloudera
Capgemini Leap Data Transformation Framework with ClouderaCapgemini Leap Data Transformation Framework with Cloudera
Capgemini Leap Data Transformation Framework with ClouderaCapgemini
 

Similaire à Productionizing Hadoop: 7 Architectural Best Practices (20)

Maximizing Oil and Gas (Data) Asset Utilization with a Logical Data Fabric (A...
Maximizing Oil and Gas (Data) Asset Utilization with a Logical Data Fabric (A...Maximizing Oil and Gas (Data) Asset Utilization with a Logical Data Fabric (A...
Maximizing Oil and Gas (Data) Asset Utilization with a Logical Data Fabric (A...
 
R+Hadoop - Ask Bigger (and New) Questions and Get Better, Faster Answers
R+Hadoop - Ask Bigger (and New) Questions and Get Better, Faster AnswersR+Hadoop - Ask Bigger (and New) Questions and Get Better, Faster Answers
R+Hadoop - Ask Bigger (and New) Questions and Get Better, Faster Answers
 
Webinar: DataStax and Microsoft Azure: Empowering the Right-Now Enterprise wi...
Webinar: DataStax and Microsoft Azure: Empowering the Right-Now Enterprise wi...Webinar: DataStax and Microsoft Azure: Empowering the Right-Now Enterprise wi...
Webinar: DataStax and Microsoft Azure: Empowering the Right-Now Enterprise wi...
 
Building Confidence in Big Data - IBM Smarter Business 2013
Building Confidence in Big Data - IBM Smarter Business 2013 Building Confidence in Big Data - IBM Smarter Business 2013
Building Confidence in Big Data - IBM Smarter Business 2013
 
Big Data and Analytics
Big Data and AnalyticsBig Data and Analytics
Big Data and Analytics
 
Big Data and Analytics
Big Data and AnalyticsBig Data and Analytics
Big Data and Analytics
 
Introduction of big data and analytics
Introduction of big data and analyticsIntroduction of big data and analytics
Introduction of big data and analytics
 
AI as a Service, Build Shared AI Service Platforms Based on Deep Learning Tec...
AI as a Service, Build Shared AI Service Platforms Based on Deep Learning Tec...AI as a Service, Build Shared AI Service Platforms Based on Deep Learning Tec...
AI as a Service, Build Shared AI Service Platforms Based on Deep Learning Tec...
 
Benefits of Operating an On-Premises Infrastructure
Benefits of Operating an On-Premises InfrastructureBenefits of Operating an On-Premises Infrastructure
Benefits of Operating an On-Premises Infrastructure
 
Data Architecture Best Practices for Advanced Analytics
Data Architecture Best Practices for Advanced AnalyticsData Architecture Best Practices for Advanced Analytics
Data Architecture Best Practices for Advanced Analytics
 
IBM InterConnect 2013 Expert Integrated Systems Keynote: Sotiropoulos & Wieck
IBM InterConnect 2013 Expert Integrated Systems Keynote: Sotiropoulos & WieckIBM InterConnect 2013 Expert Integrated Systems Keynote: Sotiropoulos & Wieck
IBM InterConnect 2013 Expert Integrated Systems Keynote: Sotiropoulos & Wieck
 
The New Trillium DQ: Big Data Insights When and Where You Need Them
The New Trillium DQ: Big Data Insights When and Where You Need ThemThe New Trillium DQ: Big Data Insights When and Where You Need Them
The New Trillium DQ: Big Data Insights When and Where You Need Them
 
Slides-Discover-Power-of-Live-Data(2).pdf
Slides-Discover-Power-of-Live-Data(2).pdfSlides-Discover-Power-of-Live-Data(2).pdf
Slides-Discover-Power-of-Live-Data(2).pdf
 
Enabling Next Gen Analytics with Azure Data Lake and StreamSets
Enabling Next Gen Analytics with Azure Data Lake and StreamSetsEnabling Next Gen Analytics with Azure Data Lake and StreamSets
Enabling Next Gen Analytics with Azure Data Lake and StreamSets
 
2022 Trends in Enterprise Analytics
2022 Trends in Enterprise Analytics2022 Trends in Enterprise Analytics
2022 Trends in Enterprise Analytics
 
Essential Prerequisites for Maximizing Success from Big Data
Essential Prerequisites for Maximizing Success from Big DataEssential Prerequisites for Maximizing Success from Big Data
Essential Prerequisites for Maximizing Success from Big Data
 
Sphere 3D presentation for Credit Suisse technology conference 2014
Sphere 3D presentation for Credit Suisse technology conference 2014Sphere 3D presentation for Credit Suisse technology conference 2014
Sphere 3D presentation for Credit Suisse technology conference 2014
 
Peek into Neo4j Product Strategy and Roadmap
Peek into Neo4j Product Strategy and RoadmapPeek into Neo4j Product Strategy and Roadmap
Peek into Neo4j Product Strategy and Roadmap
 
Fast and Furious: From POC to an Enterprise Big Data Stack in 2014
Fast and Furious: From POC to an Enterprise Big Data Stack in 2014Fast and Furious: From POC to an Enterprise Big Data Stack in 2014
Fast and Furious: From POC to an Enterprise Big Data Stack in 2014
 
Capgemini Leap Data Transformation Framework with Cloudera
Capgemini Leap Data Transformation Framework with ClouderaCapgemini Leap Data Transformation Framework with Cloudera
Capgemini Leap Data Transformation Framework with Cloudera
 

Plus de MapR Technologies

Converging your data landscape
Converging your data landscapeConverging your data landscape
Converging your data landscapeMapR Technologies
 
ML Workshop 2: Machine Learning Model Comparison & Evaluation
ML Workshop 2: Machine Learning Model Comparison & EvaluationML Workshop 2: Machine Learning Model Comparison & Evaluation
ML Workshop 2: Machine Learning Model Comparison & EvaluationMapR Technologies
 
Self-Service Data Science for Leveraging ML & AI on All of Your Data
Self-Service Data Science for Leveraging ML & AI on All of Your DataSelf-Service Data Science for Leveraging ML & AI on All of Your Data
Self-Service Data Science for Leveraging ML & AI on All of Your DataMapR Technologies
 
Enabling Real-Time Business with Change Data Capture
Enabling Real-Time Business with Change Data CaptureEnabling Real-Time Business with Change Data Capture
Enabling Real-Time Business with Change Data CaptureMapR Technologies
 
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...MapR Technologies
 
ML Workshop 1: A New Architecture for Machine Learning Logistics
ML Workshop 1: A New Architecture for Machine Learning LogisticsML Workshop 1: A New Architecture for Machine Learning Logistics
ML Workshop 1: A New Architecture for Machine Learning LogisticsMapR Technologies
 
Machine Learning Success: The Key to Easier Model Management
Machine Learning Success: The Key to Easier Model ManagementMachine Learning Success: The Key to Easier Model Management
Machine Learning Success: The Key to Easier Model ManagementMapR Technologies
 
Data Warehouse Modernization: Accelerating Time-To-Action
Data Warehouse Modernization: Accelerating Time-To-Action Data Warehouse Modernization: Accelerating Time-To-Action
Data Warehouse Modernization: Accelerating Time-To-Action MapR Technologies
 
Live Tutorial – Streaming Real-Time Events Using Apache APIs
Live Tutorial – Streaming Real-Time Events Using Apache APIsLive Tutorial – Streaming Real-Time Events Using Apache APIs
Live Tutorial – Streaming Real-Time Events Using Apache APIsMapR Technologies
 
Bringing Structure, Scalability, and Services to Cloud-Scale Storage
Bringing Structure, Scalability, and Services to Cloud-Scale StorageBringing Structure, Scalability, and Services to Cloud-Scale Storage
Bringing Structure, Scalability, and Services to Cloud-Scale StorageMapR Technologies
 
Live Machine Learning Tutorial: Churn Prediction
Live Machine Learning Tutorial: Churn PredictionLive Machine Learning Tutorial: Churn Prediction
Live Machine Learning Tutorial: Churn PredictionMapR Technologies
 
An Introduction to the MapR Converged Data Platform
An Introduction to the MapR Converged Data PlatformAn Introduction to the MapR Converged Data Platform
An Introduction to the MapR Converged Data PlatformMapR Technologies
 
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...MapR Technologies
 
Best Practices for Data Convergence in Healthcare
Best Practices for Data Convergence in HealthcareBest Practices for Data Convergence in Healthcare
Best Practices for Data Convergence in HealthcareMapR Technologies
 
Geo-Distributed Big Data and Analytics
Geo-Distributed Big Data and AnalyticsGeo-Distributed Big Data and Analytics
Geo-Distributed Big Data and AnalyticsMapR Technologies
 
MapR Product Update - Spring 2017
MapR Product Update - Spring 2017MapR Product Update - Spring 2017
MapR Product Update - Spring 2017MapR Technologies
 
3 Benefits of Multi-Temperature Data Management for Data Analytics
3 Benefits of Multi-Temperature Data Management for Data Analytics3 Benefits of Multi-Temperature Data Management for Data Analytics
3 Benefits of Multi-Temperature Data Management for Data AnalyticsMapR Technologies
 
Cisco & MapR bring 3 Superpowers to SAP HANA Deployments
Cisco & MapR bring 3 Superpowers to SAP HANA DeploymentsCisco & MapR bring 3 Superpowers to SAP HANA Deployments
Cisco & MapR bring 3 Superpowers to SAP HANA DeploymentsMapR Technologies
 
MapR and Cisco Make IT Better
MapR and Cisco Make IT BetterMapR and Cisco Make IT Better
MapR and Cisco Make IT BetterMapR Technologies
 
Evolving from RDBMS to NoSQL + SQL
Evolving from RDBMS to NoSQL + SQLEvolving from RDBMS to NoSQL + SQL
Evolving from RDBMS to NoSQL + SQLMapR Technologies
 

Plus de MapR Technologies (20)

Converging your data landscape
Converging your data landscapeConverging your data landscape
Converging your data landscape
 
ML Workshop 2: Machine Learning Model Comparison & Evaluation
ML Workshop 2: Machine Learning Model Comparison & EvaluationML Workshop 2: Machine Learning Model Comparison & Evaluation
ML Workshop 2: Machine Learning Model Comparison & Evaluation
 
Self-Service Data Science for Leveraging ML & AI on All of Your Data
Self-Service Data Science for Leveraging ML & AI on All of Your DataSelf-Service Data Science for Leveraging ML & AI on All of Your Data
Self-Service Data Science for Leveraging ML & AI on All of Your Data
 
Enabling Real-Time Business with Change Data Capture
Enabling Real-Time Business with Change Data CaptureEnabling Real-Time Business with Change Data Capture
Enabling Real-Time Business with Change Data Capture
 
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...
 
ML Workshop 1: A New Architecture for Machine Learning Logistics
ML Workshop 1: A New Architecture for Machine Learning LogisticsML Workshop 1: A New Architecture for Machine Learning Logistics
ML Workshop 1: A New Architecture for Machine Learning Logistics
 
Machine Learning Success: The Key to Easier Model Management
Machine Learning Success: The Key to Easier Model ManagementMachine Learning Success: The Key to Easier Model Management
Machine Learning Success: The Key to Easier Model Management
 
Data Warehouse Modernization: Accelerating Time-To-Action
Data Warehouse Modernization: Accelerating Time-To-Action Data Warehouse Modernization: Accelerating Time-To-Action
Data Warehouse Modernization: Accelerating Time-To-Action
 
Live Tutorial – Streaming Real-Time Events Using Apache APIs
Live Tutorial – Streaming Real-Time Events Using Apache APIsLive Tutorial – Streaming Real-Time Events Using Apache APIs
Live Tutorial – Streaming Real-Time Events Using Apache APIs
 
Bringing Structure, Scalability, and Services to Cloud-Scale Storage
Bringing Structure, Scalability, and Services to Cloud-Scale StorageBringing Structure, Scalability, and Services to Cloud-Scale Storage
Bringing Structure, Scalability, and Services to Cloud-Scale Storage
 
Live Machine Learning Tutorial: Churn Prediction
Live Machine Learning Tutorial: Churn PredictionLive Machine Learning Tutorial: Churn Prediction
Live Machine Learning Tutorial: Churn Prediction
 
An Introduction to the MapR Converged Data Platform
An Introduction to the MapR Converged Data PlatformAn Introduction to the MapR Converged Data Platform
An Introduction to the MapR Converged Data Platform
 
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...
 
Best Practices for Data Convergence in Healthcare
Best Practices for Data Convergence in HealthcareBest Practices for Data Convergence in Healthcare
Best Practices for Data Convergence in Healthcare
 
Geo-Distributed Big Data and Analytics
Geo-Distributed Big Data and AnalyticsGeo-Distributed Big Data and Analytics
Geo-Distributed Big Data and Analytics
 
MapR Product Update - Spring 2017
MapR Product Update - Spring 2017MapR Product Update - Spring 2017
MapR Product Update - Spring 2017
 
3 Benefits of Multi-Temperature Data Management for Data Analytics
3 Benefits of Multi-Temperature Data Management for Data Analytics3 Benefits of Multi-Temperature Data Management for Data Analytics
3 Benefits of Multi-Temperature Data Management for Data Analytics
 
Cisco & MapR bring 3 Superpowers to SAP HANA Deployments
Cisco & MapR bring 3 Superpowers to SAP HANA DeploymentsCisco & MapR bring 3 Superpowers to SAP HANA Deployments
Cisco & MapR bring 3 Superpowers to SAP HANA Deployments
 
MapR and Cisco Make IT Better
MapR and Cisco Make IT BetterMapR and Cisco Make IT Better
MapR and Cisco Make IT Better
 
Evolving from RDBMS to NoSQL + SQL
Evolving from RDBMS to NoSQL + SQLEvolving from RDBMS to NoSQL + SQL
Evolving from RDBMS to NoSQL + SQL
 

Dernier

Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 

Dernier (20)

Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 

Productionizing Hadoop: 7 Architectural Best Practices

  • 1. Productionizing Hadoop: 7 Architectural Best Practices Mike Gualtieri, Principal Analyst
  • 3. © 2013 Forrester Research, Inc. Reproduction Prohibited 7% 13% 7% 17% 31% Implemented, not expanding Expanding/upgrading implementation Planning to implement in the next 12 months Planning to implement in more than 1 year Interested but no plans Base: 634 business intelligence users and planners “What best describes your firm's current usage/plans to adopt Big Data technologies and solutions?” Source: Forrsights BI/Big Data Survey, Q3 2012 Big Data has momentum 20% have implemented some big data technology 37% are planning some big data technology project
  • 4. “Big Data is the frontier of a firm’s ability to store, process, and access (SPA) all of the data it needs to operate, make decisions, reduce risks, and serve customers.” DEFINITION FORRESTER
  • 5. © 2013 Forrester Research, Inc. Reproduction Prohibited 2% 3% 21% 22% 28% 32% 32% 36% 36% 38% 41% Other Don't know Earlier generation technology is too expensive The velocity of data is too high for earlier technologies We can achieve (or are achieving) significant cost reductions by changing our data management and analytic architecture Data changes or becomes available much faster than we can process in support of business decisions The number of data formats that we must be able to deal with exceeds our ability to cost-effectively integrate Analysis requirements change too fast to keep up with We want to access data that was not accessible for us with existing technologies Data volumes have grown beyond what we can cost effectively manage We don't know what our entire data universe contains, we need new ways to explore data and discover patterns and… “What are the main business and technical requirements or inadequacies of earlier-generation BI technologies that lead you to consider new BI techniques and technologies?” Firms seek more value in data, struggle to wrangle it, & seek lower cost solutions
  • 6. © 2013 Forrester Research, Inc. Reproduction Prohibited Integrating data from a variety of data sources is a top challenge
  • 7. © 2013 Forrester Research, Inc. Reproduction Prohibited Big Data architecture must support three core capabilities (SPA): •Can you capture and store all your data++?Store •Do you have the compute power to cleanse, enrich, & analyze your data++? Process •Can you retrieve, search, integrate, a nd visualize all your data++? Access 7
  • 8. 8
  • 10. How can you keep your Big Data operations running smoothly? Production
  • 11. © 2013 Forrester Research, Inc. Reproduction Prohibited Productionizing Big Data can be complex because of: Integration with heterogeneous infrastructure Use of multiple analytical software applications Reliance on 3rd-party cloud services Always available modeling and visualization sandboxes Increasing volume, velocity, variety of data from multiple data sources Compute intensive analytics
  • 12. Big Data production requires sound architecture. Production
  • 13. The 7 architectural qualities of Big Data production platforms Quality What it means 1 Experience Users’ perceptions of the usefulness, usability, and desirability of the application. 2 Availability The readiness of the service or application to perform its functions when needed 3 Performance The speed to perform functions to meet business and user expectations 4 Scalability Handle increasing volumes of data, transactions, services, and applications. 5 Adaptability The ease with which an application or service can be changed or extended 6 Security Supports the security properties of confidentiality, integrity, authentication, authorization, and nonrepudiation 7 Economy Minimize cost to build, operate, & change an application or service without compromising its business value
  • 14. Operational experience is critical to production. 1. Experience
  • 15. Best practices: User experience Usefulness, Usability, Desirability of applications require ease of use with power Developers Administrators • Standard Tools • Linux Commands • Direct Access with NFS • Visibility • Self Healing • Architectural Simplicity
  • 16. Easy Workflow Management Workload Automation with Cisco Tidal Enterprise Scheduler • Detailed, dependency-driven event execution • Point-and-click dynamic variables and parameters • Scalable, extensible architecture • Granular notification and alerts
  • 17. High-availability strategy and architecture are often overlooked in proof-of-concepts. 2. Availability
  • 18. What does high availability mean? Uptime %* Downtime per year 99.999% (5 nines) 5.26 minutes 99.99% (4 nines) 52.6 minutes 99.5% 1.83 days 99% (2 nines) 3.65 days 98% 7.30 days 95% 18.25 days *Uptime calculations assume no scheduled downtime.
  • 19. 19©MapR Technologies - Confidential High Availability and Dependability Reliable Compute Dependable Storage  Automated stateful failover  Automated re-replication  Self-healing from HW and SW failures  Load balancing  Rolling upgrades  No lost jobs or data  99999’s of uptime  Business continuity with snapshots and mirrors  Recover to a point in time  End-to-end check summing  Strong consistency  Data safe  Mirror across sites to meet Recovery Time Objectives
  • 20. Unexpected latencies can emerge from rapid fluctuations in volume, velocity, & variety of data and interactions of the larger Big Data ecosystem. 3. Performance
  • 21. 21©MapR Technologies - Confidential World Record Performance New Minute Sort World Record 1.5 TB in 1 minute 2103 nodes Previous Record: 1.4 TB Benchmark MapR 2.1.1 CDH 4.1.1 MapR Speed Increase Terasort (1x replication, compression disabled) Total 13m 35s 26m 6s 1.9x Map 7m 58s 21m 8s 2.7x Reduce 13m 32s 23m 37s 1.7x DFSIO throughput/node Read 1003 MB/s 656 MB/s 1.5x Write 924 MB/s 654 MB/s 1.4x YCSB (50% read, 50% update) Throughput 36,584.4 op/s 12,500.5 op/s 2.9x Runtime 3.80 hr 11.11 hr 2.9x YCSB (95% read, 5% update) Throughput 24,704.3 op/s 10,776.4 op/s 2.3x Runtime 0.56 hr 1.29 hr 2.3x
  • 22. Scalability is as much about scaling up as it is about scaling down. 4. Scalability
  • 23. 23©MapR Technologies - Confidential MapR’s Relative Scale Testing completed on 10 node cluster, 2x Quad-Core, 24G DRAM 12 x 1TB SATA Drives @ 7200 rpm 0 2000 4000 6000 8000 10000 12000 14000 16000 18000 0 1000 2000 3000 4000 5000 6000 Filecreates/s Files (M) 0 100 200 400 600 800 1000 0 100 200 300 400 0 0.5 1 1.5 Filecreates/s Files (M) Other distribution MapR distribution Scale Advantage: 4600x
  • 24. Firms have barely scratched the surface of what is possible with Big Data analytics. Change is always in the wind. 5. Adaptability
  • 25. I am a data scientist. I am a data scientist. I am a data scientist. Data scientists will constantly have new requirements
  • 26. …to accelerate the pace of discovery Compress… Production must address and help compress the full Big Data analytics life cycle
  • 27. 27©MapR Technologies - Confidential Direct Integration with Existing Applications  100% POSIX compliant  Industry standard APIs - NFS, ODBC, LDAP, REST  More 3rd party solutions  Proprietary connectors unnecessary  Language neutral
  • 28. A breach can devastate an organization's reputation with customers or have legal repercussions. 6. Security
  • 29. All, some, or none of these 6 security properties may apply to Big Data • Information is available only to the people intended to use it or see itConfidentiality • Information is only changed in appropriate ways by people authorized to change itIntegrity • Applications are available when needed and perform acceptablyReadiness • A person’s identity is determined before access is granted if anonymous people are not allowedAuthentication • People are allowed or denied access to applications or application resourcesAuthorization • A person cannot perform and action and then later deny performing that actionNonrepudiation
  • 30. 30©MapR Technologies - Confidential Securing Big Data Corporate Security Requirements  Authentication Wire-level security  Authorization (Access Control) Standard: UID, GID based Granular: File, Table, Column Family, Column, Cell  Integration into Existing Environments Kerberos or non-Kerberos Use existing Directory for credential lookups  Seamless Access with Single Sign-On
  • 31. Every architectural decision has an impact on the return on investment for Big Data analytics platforms. 7. Economy
  • 32. Production Sweet Spot Beware of pilot programs that don’t scale economically Business value of big data Investment People- intensive platforms Technology- intensive platforms
  • 33. 33©MapR Technologies - Confidential Maximizing Economic Value  Analytics – Ability to perform broader and deeper analytics – Real-time streaming – Mission critical SLAs – Cloud based analysis  Ease of Development  Ease of Administration  Value of Uptime  Value of Data Protection  Hardware Efficiency  First Class Support
  • 34. 34©MapR Technologies - Confidential One Platform for Big Data … 99.999% HA Data Protection Disaster Recovery Scalability & Performance Enterprise Integration Multi- tenancy Map Reduce File-Based Applications SQL Database Search Stream Processing Batch Interactive Real-time
  • 35. The 7 qualities of Big Data production platforms Quality What it means 1 Experience Users’ perceptions of the usefulness, usability, and desirability of the application. 2 Availability The readiness of the service or application to perform its functions when needed 3 Performance The speed to perform functions to meet business and user expectations 4 Scalability Handle increasing or decreasing volumes of transactions, services, and data 5 Adaptability The ease with which an application or service can be changed or extended 6 Security Supports the security properties of confidentiality, integrity, authentication, authorization, and nonrepudiation 7 Economy Minimize cost to build, operate, & change an application or service without compromising its business value
  • 36. Big Data is about innovation, but not if you don’t productionize it. 36 Collectors • Capture • Store Journalists • Reports • Dashboards Innovators • Predictive analytics Operations Business Intelligence Predictive Power
  • 37. Frontier Big data is about pushing limits. Exponential growth in data means the frontier is vast.

Notes de l'éditeur

  1. Best practices in productionizing Hadoop
  2. Question Q26
  3. Base: 634 Business Intelligence users and planners
  4. Base: 634 Business Intelligence users and planners
  5. Image source: Google (http://www.google.com/)
  6. Image source: Google (http://www.google.com/)
  7. Image source: istockphoto
  8. Image source: istockphoto
  9. 1 year = 525,948.766 minutes1-.9999 = .00011-.9995 = .00054 nines = .0001 x 525949 = 52 minutes per year5 nines = .00001 x 525949 = 5 minutes99.5 = .0005 x 525949 = 263 minutes = 4 hours and
  10. With MapR Hadoop is Lights out Data Center ReadyMapR provides 5 99999’s of availability including support for rolling upgrades, self –healing and automated stateful failover. MapR is the only distribution that provides these capabilities, MapR also provides dependable data storage with full data protection and business continuity features. MapR provides point in time recovery to protect against application and user errors. There is end to end check summing so data corruption is automatically detected and corrected with MapR’s self healing capabilities. Mirroring across sites is fully supported.All these features support lights out data center operations. Every two weeks an administrator can take a MapR report and a shopping cart full of drives and replace failed drives.
  11. Image source: istockphoto
  12. Image source: istockphoto.com
  13. Image source: istockphoto
  14. Image source (clothing): istockphotoImage source (to tell the truth logo): Wikimedia
  15. MapR enables integration by providing industry-standard interfacesMore 3rd party solutions work with MapR than any other distributionProprietary connectors not neededNFSAll file-based applications can read and write dataExamples: Linux utilities, file browsers, Informatica UltraMessagingODBC 3.52All BI applications can leverage HiveExamples: Excel, Crystal Reports, Tableau, MicroStrategyLinux PAMAny authentication provider can be usedExamples: LDAP, Kerberos, 3rd party
  16. Image source: Mike Gualtieri
  17. Image source: istockphoto
  18. A recent Wall Street Journal article cited that “MapR is Cheaper than Free”We provide a very powerful ROI that encompasses hard dollar opex and capex savings and provides value across multiple dimensions
  19. Source: NASA (http://www.nasa.gov/)