The document discusses the evolution of data and analytics. It notes that early predictions of future "big data" were inaccurate and that scaling laws are changing radically. The document then summarizes MapR's data platform which enhances Apache Hadoop to provide better performance, reliability, integration and administration compared to other Hadoop distributions. MapR delivers a unified platform for file, analytics and NoSQL workloads with innovations like lockless storage and high throughput.
The different kinds of scaling laws have different shape and I think that shape is the key.
The value of analytics always increases with more data, but the rate of increase drops dramatically after an initial quick increase.
In classical analytics, the cost of doing analytics increases sharply.
The result is a net value that has a sharp optimum in the area where value is increasing rapidly and cost is not yet increasing so rapidly.
New techniques such as Hadoop result in linear scaling of cost. This is a change in shape and it causes a qualitative change in the way that costs trade off against value to give net value. As technology improves, the slope of this cost line is also changing rapidly over time.
This next sequence shows how the net value changes with different slope linear cost models.
Notice how the best net value has jumped up significantly
And as the line approaches horizontal, the highest net value occurs at dramatically larger data scale.
MapR has been selected by two of the companies most experienced with MapReduce technology which is a testament to the technology advanges of MapR’s distribution. Amazon through its Elastic MapReduce service (EMR) hosted over 2 million clusters in the past year. Amazon selected MapR to complement EMR as the only commercial Hadoop distribution being offered, sold and supported as a service by Amazon to its customers. MapR was also selected by Google – the pioneer of MapReduce and the company whose white paper on MapReduce inspired the creation of Hadoop – has also selected MapR to make our distribution available on Google Compute Engine. Hadoop in the cloud makes a great deal of sense: the elastic resource allocation that cloud computing is premised on works well for cluster-based data processing infrastructure used on varying analyses and data sets of indeterminate size. MapR has unique features such as mirroring between sites and multi-tenancy support that further enhance cloud deployments
MapR is used today across industries. We have 10 of the Fortune 100 that are using MapR in production. We have leading web 2.0 properties such as leading digital advertising platforms, using MapR.These customers are using MapR in production for a variety of use cases. Examples include one of the largest credit card issuers in the world that has standardized on MapR for fraud and consumer targeting applications.Other examples include a major health care group,national cyber security, and one of the largest retailers in the world. These are all provided by MapR’s complete distribution for Apache Hadoop
MapR enables integration by providing industry-standard interfacesMore 3rd party solutions work with MapR than any other distributionProprietary connectors not neededNFSAll file-based applications can read and write dataExamples: Linux utilities, file browsers, Informatica UltraMessagingODBC 3.52All BI applications can leverage HiveExamples: Excel, Crystal Reports, Tableau, MicroStrategyLinux PAMAny authentication provider can be usedExamples: LDAP, Kerberos, 3rd party
With MapR Hadoop is Lights out Data Center ReadyMapR provides 5 99999’s of availability including support for rolling upgrades, self –healing and automated stateful failover. MapR is the only distribution that provides these capabilities, MapR also provides dependable data storage with full data protection and business continuity features. MapR provides point in time recovery to protect against application and user errors. There is end to end check summing so data corruption is automatically detected and corrected with MapR’s self healing capabilities. Mirroring across sites is fully supported.All these features support lights out data center operations. Every two weeks an administrator can take a MapR report and a shopping cart full of drives and replace failed drives.