Big Data has existed for a while and is far more than just massive volumes of data from novel sources. As customers note it is about how you manage, analyze and use and combine the data. SAP makes Big Data Real.what big data is not –e.g. SOH, BWoH.What would make them big data –e.g. sales forecasting on soh/bwoh; mashing up corporate data with social media data or with call center notes.
What I say: HANA is tremendously important to SAP’s vision. We are no longer the “ERP company”, as you may think. Following the acquisitions of Business Objects, Sybase and recently SuccessFactors, SAP now leads several important enterprise business categories. Furthermore, by investing in in-house innovation, we have now assembled a vertically integrated business data management stack, all the way from data management appliances to applications to on-demand application services, providing increased customer value. And HANA is at the heart of this strategy!
SAP big data platform: Open StrategyWe are planning on SAP formally reselling a HANA+ Hadoop bundle on SAP’s price list. vendors HortonWorks, Intel, Mappr, cloudera, Hadoop distribution are in every account and we work with them allStrategic announcement coming
SAP offers the applications and analytic tools that help you to infuse Big Data insights directly into your business processes, and equipping your employees, partners and customers with access to data in order to uncover and monetize insights
Big Data is a big opportunity for most companies and is something that they must embrace to competitive. It has existed for a while and is far more than just massive volumes of data from novel sources. As customers note it is about how you manage, analyze and use and combine the data. SAP makes Big Data Real.
Hadoop runs on the Hadoop Distributed File System (HDFS), a distributed file system that scales out on commodity servers. Since Hadoop is file-based, developers don’t need to create a data model to store or process data, which makes Hadoop ideal for managing semi-structured Web data, which comes in many shapes and sizes. Because it is “schema-less,” Hadoop can be used to store and process any kind of data, including structured transactional data and unstructured audio and video data. However, the biggest advantage of Hadoop is that is open source, which means that the up-front costs of implementing a system to process large volumes of data are lower than for commercial systems. However, Hadoop does require companies to purchase and manage dozens, if not hundreds, of servers and train developers and administrators to use this new technology.Apache Hadoop enables applications to work with thousands of independent computers (nodes) which are collectively referred to as a cluster (if all nodes use the same hardware) or a grid (if the nodes use different hardware). The main components used in Hadoop to run a job include: Client: submits the MapReduce job.Jobtracker: coordinates the job run. The jobtracker is a Java application whose main class is JobTracker.Task trackers: Run the tasks that the job has been split into. Tasktrackers are Java applications whose main class is TaskTracker.Hadoop distributed file system (HDSF)HDFS is file system that sits on top of native file systemDifferent blocks of a file stored in different nodesName node keeps tracks of which blocks are make up a file and where those blocks are locatedAuto rebalances, auto replicationsUniform namespaceMapReduceHadoopMapReduce is a programming model and software framework for writing applications that rapidly process vast amounts of data in parallel on large clusters of compute nodes. Computational processing can occur on data stored either in a file system (unstructured) or in a database (structured).Map: The master node takes the input, divides it into smaller sub-problems, and distributes them to worker nodes. A worker node may do this again in turn, leading to a multi-leve tree structure. The worker node processes the smaller problem, and passes the answer back to its master node.Reduce: The master node then collects the answers to all the sub-problems and combines them in some way to form the output – the answer to the problem it was originally trying to solve.MapReduce allows for distributed processing of the map and reduction operations. Since each mapping operation is independent of the others, all maps can be performed in parallel – though in practice it is limited by the number of independent data sources and/or the number of CPUs near each source. Similarly, a set of 'reducers' can perform the reduction phase - provided all outputs of the map operation that share the same key are presented to the same reducer at the same time. While this process can often appear inefficient compared to algorithms that are more sequential, MapReduce can be applied to significantly larger datasets than "commodity" servers can handle – a large server farm can use MapReduce to sort a petabyte of data in only a few hours. The parallelism also offers some possibility of recovering from partial failure of servers or storage during the operation: if one mapper or reducer fails, the work can be rescheduled – assuming the input data is still available.
SAP HANAIn-memory platformStore billions of recordsAnalyze in real-timeBuilt-in predictive, text, and spatial algorithmsHADOOPDistributed disk platformStore infinite amounts of unstructured data Search in batchNon-relational data storeSpecialized skills to implement and codeMany add-on libraries and packages