4. “Big Data is the frontier of a firm’s
ability to store, process, and access
(SPA) all of the data it needs to
operate, make decisions, reduce
risks, and serve customers.”
DEFINITION
FORRESTER
13. The 7 architectural qualities of Big Data
production platforms
Quality What it means
1 Experience
Users’ perceptions of the usefulness, usability, and
desirability of the application.
2 Availability
The readiness of the service or application to perform its
functions when needed
3 Performance
The speed to perform functions to meet business and
user expectations
4 Scalability
Handle increasing volumes of data, transactions,
services, and applications.
5 Adaptability
The ease with which an application or service can be
changed or extended
6 Security
Supports the security properties of confidentiality,
integrity, authentication, authorization, and
nonrepudiation
7 Economy
Minimize cost to build, operate, & change an application
or service without compromising its business value
15. Best practices: User experience
Usefulness, Usability, Desirability of applications
require ease of use with power
Developers Administrators
• Standard Tools
• Linux Commands
• Direct Access with NFS
• Visibility
• Self Healing
• Architectural Simplicity
18. What does high availability mean?
Uptime %* Downtime per year
99.999% (5 nines) 5.26 minutes
99.99% (4 nines) 52.6 minutes
99.5% 1.83 days
99% (2 nines) 3.65 days
98% 7.30 days
95% 18.25 days
*Uptime calculations assume no scheduled downtime.
20. Unexpected latencies can emerge from rapid
fluctuations in volume, velocity, & variety of data
and interactions of the larger Big Data
ecosystem.
3. Performance
28. A breach can devastate an organization's
reputation with customers or have legal
repercussions.
6. Security
29. All, some, or none of these 6 security
properties may apply to Big Data
• Information is available only to the people
intended to use it or see itConfidentiality
• Information is only changed in appropriate ways
by people authorized to change itIntegrity
• Applications are available when needed and
perform acceptablyReadiness
• A person’s identity is determined before access
is granted if anonymous people are not allowedAuthentication
• People are allowed or denied access to
applications or application resourcesAuthorization
• A person cannot perform and action and then
later deny performing that actionNonrepudiation
31. Every architectural decision has an impact on
the return on investment for Big Data analytics
platforms.
7. Economy
32. Production
Sweet
Spot
Beware of pilot programs that don’t scale
economically
Business value of big data
Investment
People-
intensive
platforms
Technology-
intensive
platforms
35. The 7 qualities of Big Data production
platforms
Quality What it means
1 Experience
Users’ perceptions of the usefulness, usability, and
desirability of the application.
2 Availability
The readiness of the service or application to perform its
functions when needed
3 Performance
The speed to perform functions to meet business and
user expectations
4 Scalability
Handle increasing or decreasing volumes of
transactions, services, and data
5 Adaptability
The ease with which an application or service can be
changed or extended
6 Security
Supports the security properties of confidentiality,
integrity, authentication, authorization, and
nonrepudiation
7 Economy
Minimize cost to build, operate, & change an application
or service without compromising its business value
36. Big Data is about innovation, but not if you
don’t productionize it.
36
Collectors
• Capture
• Store
Journalists
• Reports
• Dashboards
Innovators
• Predictive
analytics
Operations
Business
Intelligence
Predictive
Power
37. Frontier
Big data is about pushing limits. Exponential
growth in data means the frontier is vast.
Base: 634 Business Intelligence users and planners
Base: 634 Business Intelligence users and planners
Image source: Google (http://www.google.com/)
Image source: Google (http://www.google.com/)
Image source: istockphoto
Image source: istockphoto
1 year = 525,948.766 minutes1-.9999 = .00011-.9995 = .00054 nines = .0001 x 525949 = 52 minutes per year5 nines = .00001 x 525949 = 5 minutes99.5 = .0005 x 525949 = 263 minutes = 4 hours and
With MapR Hadoop is Lights out Data Center ReadyMapR provides 5 99999’s of availability including support for rolling upgrades, self –healing and automated stateful failover. MapR is the only distribution that provides these capabilities, MapR also provides dependable data storage with full data protection and business continuity features. MapR provides point in time recovery to protect against application and user errors. There is end to end check summing so data corruption is automatically detected and corrected with MapR’s self healing capabilities. Mirroring across sites is fully supported.All these features support lights out data center operations. Every two weeks an administrator can take a MapR report and a shopping cart full of drives and replace failed drives.
Image source: istockphoto
Image source: istockphoto.com
Image source: istockphoto
Image source (clothing): istockphotoImage source (to tell the truth logo): Wikimedia
MapR enables integration by providing industry-standard interfacesMore 3rd party solutions work with MapR than any other distributionProprietary connectors not neededNFSAll file-based applications can read and write dataExamples: Linux utilities, file browsers, Informatica UltraMessagingODBC 3.52All BI applications can leverage HiveExamples: Excel, Crystal Reports, Tableau, MicroStrategyLinux PAMAny authentication provider can be usedExamples: LDAP, Kerberos, 3rd party
Image source: Mike Gualtieri
Image source: istockphoto
A recent Wall Street Journal article cited that “MapR is Cheaper than Free”We provide a very powerful ROI that encompasses hard dollar opex and capex savings and provides value across multiple dimensions