SlideShare une entreprise Scribd logo
1  sur  13
Page 1 Author – Ramkumar Rajendran
Integration of SAP HANA
with Hadoop
Page 2 Author – Ramkumar Rajendran
Author Biography
Ramkumar Rajendran
Ramkumar Rajendran is a Consultant at a leading firm with an
experience of 4 years. He has specialized in various tools like SAP HANA,
SAP BI, SAP BO (Xcelsius, Webi and IDT), Tableau, Lumira and Hadoop-
Hive. He has worked upon the Sentiment Analysis of Twitter data. He
has involved in the integration of HANA and Hadoop. He has worked on
multiple implementation projects for various industry sectors.
Page 3 Author – Ramkumar Rajendran
Table of Contents
1 About this document.....................................................................................4
2 Introduction..................................................................................................5
SAP HANA......................................................................................................5
Hadoop..........................................................................................................5
3 Combined Potential of HANA and Hadoop..........Error!Bookmark notdefined.
4 Scenarios of Hadoop and Hana integration....................................................7
Federated Data Query through Smart Data Access (SDA).................................8
Business Objects Data Services.......................................................................9
SQOOP ........................................................................................................10
JAVA Program..............................................................................................12
5 Summary.....................................................................................................13
6 Reference Material......................................................................................13
Page 4 Author – Ramkumar Rajendran
About this document
This document would be talking about the combined potential of the in-memory database
’SAP HANA’ and the bigdata solution ‘Hadoop’ and the various methods of integration of
both these technologies and the scenarios where each of these methods would be
applicable .
SAP HANA is specialized in real-time in-memory processing, while Hadoop is apt for massive
parallel processing. Integration of both these technologies would have the advantages from
both of them.
Hadoop handles both structured and unstructured data from social media, machine logs,
etc. which can be further used along with the transactional data present in HANA resulting
in more mature business analysis.
This document has been prepared based upon SAP HANA SP6 and Hadoop CDH 4.5.
Page 5 Author – Ramkumar Rajendran
Introduction
SAP HANA
SAP HANA is an innovative in-memory database and data management platform,
specifically developed to take full advantage of the capabilities provided by modern
hardware to increase application performance. By keeping all relevant data in main
memory, data processing operations are significantly accelerated.
Design for scalability is a core SAP HANA principle. SAP HANA can be distributed across
many multiple hosts to achieve scalability in terms of both data volume and user
concurrency. Unlike clusters, distributed HANA systems also distribute the data efficiently,
achieving high scaling without I/O locks.
The key performance indicators of SAP HANA appeal to many of our customers, and
thousands of deployments are in progress. SAP HANA has become the fastest growing
product in SAP’s 40+ year history.
Hadoop
Hadoop is an open source software project that enables the distributed processing of large
data sets across clusters of commodity servers. It is designed to scale up from a single
server to thousands of machines, with a very high degree of fault tolerance. Rather than
relying on high-end hardware, the resiliency of these clusters comes from the software’s
ability to detect and handle failures at the application layer.
Hadoop is known for its massive parallel processing capabilities on large datasets. It is also
scalable, cost effective owing to cheaper processers, flexible and fault tolerant.
Page 6 Author – Ramkumar Rajendran
CombinedPotential of HANAand Hadoop
Hadoop can store very huge amount of data. It is well suited for storing unstructured data,
is good for manipulating very large files and is tolerant to hardware and software failures.
But the main challenge with Hadoop is getting information out of this huge data in real
time.
HANA is well suited for processing data in real time, thanks to its in-memory technology.
By integrating Hadoop’s massive parallel processing and HANA’s in-memory computing
capabilities the resultant solution would be capable of the following:
 Accommodation of both structured and un-structured data.
 Provision of cost efficient data storage and processing for large volumes data.
 Computation of complex Information Processing.
 Enabling heavily recursive algorithms, machine learning and queries that cannot be
easily expressed in SQL.
 Low Value Data Archive & Data stays available, though access is slower.
 Mine raw data that is either schema-less or where schema changes over time.
Page 7 Author – Ramkumar Rajendran
Scenarios ofHadoopand Hana integration
Smart Data Access Business Objects Data Services
SQOOP Java
Federated Data Query
through Smart Data
Access(SDA)
Hadoop
Reporting Tools
SDA
Data Loading from Hadoop to
HANA
Hadoop
SAP HANA
Reporting Tools
BODS
Data Loading with
Java Programming
Hadoop
SAP HANA
Reporting Tools
Java
Hadoop
SAP HANA
Reporting Tools
Data Loading from Hadoop to
HANA
SQOOOP
PULL
mechanism
PUSH
mechanism
PUSH or PULL
mechanism
SAP HANA
No Data
Loading
Page 8 Author – Ramkumar Rajendran
Federated Data Query throughSmart Data Access (SDA)
SAP HANA smart data access enables remote Hadoop data to be accessed as if they are local
tables in SAP HANA, without loading the data into SAP HANA.
Not only does this capability provide operational and cost benefits, but most importantly it
supports the development and deployment of the next generation of analytical applications
which require the ability to access, synthesize and integrate data from multiple systems in
real-time regardless of where the data is located or what systems are generating it.
Specifically in SAP HANA, we can create virtual tables which point to remote tables in
Hadoop. Customers can then write SQL queries in SAP HANA, which could operate on virtual
tables. The SAP HANA query processor optimizes these queries, and executes the relevant
part of the query in the target database, returns the results of the query to SAP HANA, and
completes the operation.
Recommended Scenarios
Using SDA to access Hadoop from HANA would involve federated query being fired on
Hadoop with the execution of the report. This technique is recommended when large
amount of result set gets generated at Hadoop when the reporting query is fired. Smart
Data Access involves aggregating the dataset at Hadoop using its system resources,
resulting in the transfer of only end results from Hadoop to HANA.
Advantages of this technique
 Real-time data access from Hadoop without actually having to load it into HANA
 Helps in scenarios where the data residing in Hadoop is updated very frequently and
data loading would make no sense.
 Query can be optimized by pushing the processing down to Hadoop, as it will return
aggregated data.
Disadvantages of this technique
 Federated Query gets slowed down when huge processing needs to be done on the
data at Hadoop end.
 Data transformation is not possible while using Smart Data Access.
Page 9 Author – Ramkumar Rajendran
 With this technique the reporting query would also be fired on Hadoop, which
makes it critical for it to be up at all times. In cases of multiple Hadoop systems, it
would become more potent of risk.
 Data can only be extracted from HIVE.
 Data access can happen only from Hadoop to HANA.
Business Objects Data Services
SAP Data Services delivers a single enterprise-class solution for data integration, data
quality, data profiling and text data processing. This technique involves data PULL
mechanism from Hadoop to HANA; so the entire control is based on BODS.
This wide range of features helps to -
 Integrate, transform, improve, and deliver trusted data from Hadoop to HANA
 Provides development user interfaces, a metadata repository, a data connectivity
layer, a run-time environment, and a management console enabling IT organizations
to lower total cost of ownership and accelerate time to value.
 Enable IT organizations to maximize operational efficiency with a single solution to
improve data quality and gain access to heterogeneous sources and applications.
Recommended Scenarios
Integrating HANA with Hadoop using BODS would involve data loading on a timely manner.
This can be utilized in scenarios where there is not requirement of real-time reporting, but
involves complex calculations on large datasets. This technique would prove very effective
in scenarios which involve multiple Hadoop systems with variety of unstructured data to be
processed on a large scale.
Page 10 Author– Ramkumar Rajendran
Advantages of this technique
 Unstructured data can be loaded from Hadoop to HANA with all the transformation
done while data loading.
 It is better suited for loading of large dataset.
 BODS can be utilized to implement complex transformations while loading data from
Hadoop to HANA.
 Performance of HANA can be improved by moving complex calculations to BODS.
 Its Error Handling aspect helps in better support and maintenance.
 Data encryption function to encrypt sensitive data is one of the niche aspects of data
loading through BODS.
 Centralized monitoring favors better IT support.
 Delta loads are also supported.
 Data transfer can happen from both the sides.
Disadvantages of this technique
 Data present in Hadoop cannot be availed on a real time basis since BODS loads data
from Hadoop to HANA as a batch job.
SQOOP
SQOOP is a tool designed for efficiently transferring bulk data between Hadoop and
structured data stores like Oracle, MsSQL, SAP HANA, etc. SQOOP can be used to import
data from external structured data stores into Hadoop Distributed File System or related
systems like Hive and HBase. Conversely, SQOOP can be used to extract data from Hadoop
and export it to external structured data stores such as relational databases and enterprise
data warehouses.
SQOOP provides a pluggable connector mechanism for optimal connectivity to external
systems. The SQOOP extension API provides a convenient framework for building new
connectors. New connectors can be dropped into SQOOP installations to provide
connectivity to various systems. SQOOP itself comes bundled with various connectors that
can be used for popular database and data warehousing systems.
Page 11 Author– Ramkumar Rajendran
By utilizing SQOOP data transfer would be automated through batch jobs and it utilizes the
native tools for high performance data transfer. It uses data store metadata to infer
structure definitions. It utilizes the MapReduce framework of Hadoop to transfer data in
parallel, which proves fruitful for huge amount of data. It provides an extension mechanism
to incorporate high performance connectors for external systems.
For exporting data to external targets, SQOOP supports the functionality of Staging Tables
which considerably improves the efficiency of data transfer and also acts as insulation from
data corruption during times of failure.
This technique involves PUSH mechanism to load data from Hadoop to HANA; so the entire
control is based upon SQOOP in Hadoop.
Recommended Scenarios
SQQOP is a component in Hadoop which helps in data transfer from HDFS to external
databases and vice versa. This technique of integrating SAP HANA with Hadoop would
involve periodic loading of data directly from the underlying Hadoop files to HANA tables.
SQOOP doesn’t support any transformation while transferring data. Hence this technique
can be used in scenarios which require no real-time reporting and readily formatted source
data which requires no cleansing. Also this would be most suited for bulk data transfers
since SQOOP uses the underlying MapReduce framework of Hadoop enabling parallel data
transfer.
Advantages of this technique
 It is better suited for loading of bulk datasets.
 Data transfers can happen from both the sides.
 It is open-source and hence cost-effective.
Disadvantages of this technique
 Data present in Hadoop cannot be availed on a real time basis since SQOOP loads
data from Hadoop to HANA as a batch job.
 No cleansing and formatting on the data can be done with SQOOP.
Page 12 Author– Ramkumar Rajendran
JAVA Program
Java program can be used to load data from Hadoop to HANA through JDBC connectivity.
This technique of HANA-Hadoop offers very high level of customization in terms of
cleansing, transformation, refining, filtering, etc. We can implement both PUSH and PULL
mechanism to transfer data from Hadoop to HANA, depending upon where the program is
installed and scheduled.
Recommended Scenarios
Data transfer from Hadoop to HANA is recommended in scenarios where it involves very
less data transfer. This technique offers very high level of control with the developers; so
they can come with a very customizable solution.
Advantages of this technique
 It offers customization at a greater extent.
 Java is open source; and hence it would be a cost-effective solution.
 Java program can be executed from the command line and doesn’t require any
additional setup to host.
Disadvantages of this technique
 It would require high level of programming skills.
 Error tracking and debugging becomes difficult.
Page 13 Author– Ramkumar Rajendran
Summary
The integration of HANA with Hadoop enables customers to move data between Hive and
Hadoop’s Distributed File System and SAP HANA. Hadoop is good at processing bulk data at
a very cheaper rate. Hence if a particular junk of data is not much valuable to the users, and
they don’t access them often, storing it in HANA will be cost-prohibitive.
By combining SAP HANA and Hadoop together, customers get the power of instant access
with SAP HANA and infinite scale with Hadoop. This gives SAP users a broad range of
options for storing and analyzing new types of data and the ability to create applications
that can uncover new business opportunities from vast amounts of data that would not
have been previously possible.
References
http://blog.cloudera.com/blog/
https://www.brighttalk.com/webcast/9727/86361
http://scn.sap.com/community/developer-center/hana/blog/2014/01/27/exporting-and-importing-
data-to-hana-with-hadoop-sqoop
http://www.saphana.com/docs/DOC-2934

Contenu connexe

Tendances

Hadoop integration with SAP HANA
Hadoop integration with SAP HANAHadoop integration with SAP HANA
Hadoop integration with SAP HANADebajit Banerjee
 
SAP HANA Vora SITMTY 20160707
SAP HANA Vora SITMTY 20160707SAP HANA Vora SITMTY 20160707
SAP HANA Vora SITMTY 20160707Henrique Pinto
 
Leveraging SAP, Hadoop, and Big Data to Redefine Business
Leveraging SAP, Hadoop, and Big Data to Redefine BusinessLeveraging SAP, Hadoop, and Big Data to Redefine Business
Leveraging SAP, Hadoop, and Big Data to Redefine BusinessDataWorks Summit
 
SAP HANA and Apache Hadoop for Big Data Management (SF Scalable Systems Meetup)
SAP HANA and Apache Hadoop for Big Data Management (SF Scalable Systems Meetup)SAP HANA and Apache Hadoop for Big Data Management (SF Scalable Systems Meetup)
SAP HANA and Apache Hadoop for Big Data Management (SF Scalable Systems Meetup)Will Gardella
 
Hadoop, Spark and Big Data Summit presentation with SAP HANA Vora and a path ...
Hadoop, Spark and Big Data Summit presentation with SAP HANA Vora and a path ...Hadoop, Spark and Big Data Summit presentation with SAP HANA Vora and a path ...
Hadoop, Spark and Big Data Summit presentation with SAP HANA Vora and a path ...Ocean9, Inc.
 
SAP HANA SPS10- Hadoop Integration
SAP HANA SPS10- Hadoop IntegrationSAP HANA SPS10- Hadoop Integration
SAP HANA SPS10- Hadoop IntegrationSAP Technology
 
DoneDeal - AWS Data Analytics Platform
DoneDeal - AWS Data Analytics PlatformDoneDeal - AWS Data Analytics Platform
DoneDeal - AWS Data Analytics Platformmartinbpeters
 
Hadoop Innovation Summit 2014
Hadoop Innovation Summit 2014Hadoop Innovation Summit 2014
Hadoop Innovation Summit 2014Data Con LA
 
SAP HANA - The Foundation of Real Time, Now on the AWS Cloud Computing Platform
SAP HANA - The Foundation of Real Time, Now on the AWS Cloud Computing PlatformSAP HANA - The Foundation of Real Time, Now on the AWS Cloud Computing Platform
SAP HANA - The Foundation of Real Time, Now on the AWS Cloud Computing PlatformAmazon Web Services
 
Hawq wp 042313_final
Hawq wp 042313_finalHawq wp 042313_final
Hawq wp 042313_finalEMC
 
What's new on SAP HANA Smart Data Access
What's new on SAP HANA Smart Data AccessWhat's new on SAP HANA Smart Data Access
What's new on SAP HANA Smart Data AccessSAP Technology
 
Tableau and hadoop
Tableau and hadoopTableau and hadoop
Tableau and hadoopCraig Jordan
 
Introduction to HANA in-memory from SAP
Introduction to HANA in-memory from SAPIntroduction to HANA in-memory from SAP
Introduction to HANA in-memory from SAPugur candan
 

Tendances (20)

Hadoop integration with SAP HANA
Hadoop integration with SAP HANAHadoop integration with SAP HANA
Hadoop integration with SAP HANA
 
SAP HANA Vora SITMTY 20160707
SAP HANA Vora SITMTY 20160707SAP HANA Vora SITMTY 20160707
SAP HANA Vora SITMTY 20160707
 
Leveraging SAP, Hadoop, and Big Data to Redefine Business
Leveraging SAP, Hadoop, and Big Data to Redefine BusinessLeveraging SAP, Hadoop, and Big Data to Redefine Business
Leveraging SAP, Hadoop, and Big Data to Redefine Business
 
SAP HANA and Apache Hadoop for Big Data Management (SF Scalable Systems Meetup)
SAP HANA and Apache Hadoop for Big Data Management (SF Scalable Systems Meetup)SAP HANA and Apache Hadoop for Big Data Management (SF Scalable Systems Meetup)
SAP HANA and Apache Hadoop for Big Data Management (SF Scalable Systems Meetup)
 
Hadoop, Spark and Big Data Summit presentation with SAP HANA Vora and a path ...
Hadoop, Spark and Big Data Summit presentation with SAP HANA Vora and a path ...Hadoop, Spark and Big Data Summit presentation with SAP HANA Vora and a path ...
Hadoop, Spark and Big Data Summit presentation with SAP HANA Vora and a path ...
 
SAP HANA SPS10- Hadoop Integration
SAP HANA SPS10- Hadoop IntegrationSAP HANA SPS10- Hadoop Integration
SAP HANA SPS10- Hadoop Integration
 
SAP Vora CodeJam
SAP Vora CodeJamSAP Vora CodeJam
SAP Vora CodeJam
 
Filling the Data Lake
Filling the Data LakeFilling the Data Lake
Filling the Data Lake
 
DoneDeal - AWS Data Analytics Platform
DoneDeal - AWS Data Analytics PlatformDoneDeal - AWS Data Analytics Platform
DoneDeal - AWS Data Analytics Platform
 
Hadoop Innovation Summit 2014
Hadoop Innovation Summit 2014Hadoop Innovation Summit 2014
Hadoop Innovation Summit 2014
 
SAP HANA - The Foundation of Real Time, Now on the AWS Cloud Computing Platform
SAP HANA - The Foundation of Real Time, Now on the AWS Cloud Computing PlatformSAP HANA - The Foundation of Real Time, Now on the AWS Cloud Computing Platform
SAP HANA - The Foundation of Real Time, Now on the AWS Cloud Computing Platform
 
Hadoop in a Nutshell
Hadoop in a NutshellHadoop in a Nutshell
Hadoop in a Nutshell
 
Hawq wp 042313_final
Hawq wp 042313_finalHawq wp 042313_final
Hawq wp 042313_final
 
Hortonworks.bdb
Hortonworks.bdbHortonworks.bdb
Hortonworks.bdb
 
What's new on SAP HANA Smart Data Access
What's new on SAP HANA Smart Data AccessWhat's new on SAP HANA Smart Data Access
What's new on SAP HANA Smart Data Access
 
OOP 2014
OOP 2014OOP 2014
OOP 2014
 
Splice machine-bloor-webinar-data-lakes
Splice machine-bloor-webinar-data-lakesSplice machine-bloor-webinar-data-lakes
Splice machine-bloor-webinar-data-lakes
 
Tableau and hadoop
Tableau and hadoopTableau and hadoop
Tableau and hadoop
 
Hackathon bonn
Hackathon bonnHackathon bonn
Hackathon bonn
 
Introduction to HANA in-memory from SAP
Introduction to HANA in-memory from SAPIntroduction to HANA in-memory from SAP
Introduction to HANA in-memory from SAP
 

En vedette

4AA6-8601ENW-HPE_RA_SAP_HANA_Vora_Hortonworks_HDP
4AA6-8601ENW-HPE_RA_SAP_HANA_Vora_Hortonworks_HDP4AA6-8601ENW-HPE_RA_SAP_HANA_Vora_Hortonworks_HDP
4AA6-8601ENW-HPE_RA_SAP_HANA_Vora_Hortonworks_HDPViplava Kumar Madasu
 
Advanced analytics with sap hana and r
Advanced analytics with sap hana and rAdvanced analytics with sap hana and r
Advanced analytics with sap hana and rSAP Technology
 
NetScaler TCP Performance Tuning
NetScaler TCP Performance TuningNetScaler TCP Performance Tuning
NetScaler TCP Performance TuningKevin Mason
 
Building LinkedIn's Learning Platform with MongoDB
Building LinkedIn's Learning Platform with MongoDBBuilding LinkedIn's Learning Platform with MongoDB
Building LinkedIn's Learning Platform with MongoDBMongoDB
 
Loading text data from SAP source systems
Loading text data from SAP source systemsLoading text data from SAP source systems
Loading text data from SAP source systemsMarcelo Honores
 
Performance Testing: Eliminate System Outages and Save Millions
Performance Testing: Eliminate System Outages and Save MillionsPerformance Testing: Eliminate System Outages and Save Millions
Performance Testing: Eliminate System Outages and Save MillionsMethod360
 
Leverage Social Media Data with SAP Data Services
Leverage Social Media Data with SAP Data ServicesLeverage Social Media Data with SAP Data Services
Leverage Social Media Data with SAP Data ServicesMethod360
 
Leverage Data Services to Boost Sales
Leverage Data Services to Boost SalesLeverage Data Services to Boost Sales
Leverage Data Services to Boost SalesMethod360
 
HANA SPS07 Smart Data Access
HANA SPS07 Smart Data AccessHANA SPS07 Smart Data Access
HANA SPS07 Smart Data AccessSAP Technology
 
Translating Big Data Insight Into Action
Translating Big Data Insight Into ActionTranslating Big Data Insight Into Action
Translating Big Data Insight Into ActionMethod360
 

En vedette (13)

4AA6-8601ENW-HPE_RA_SAP_HANA_Vora_Hortonworks_HDP
4AA6-8601ENW-HPE_RA_SAP_HANA_Vora_Hortonworks_HDP4AA6-8601ENW-HPE_RA_SAP_HANA_Vora_Hortonworks_HDP
4AA6-8601ENW-HPE_RA_SAP_HANA_Vora_Hortonworks_HDP
 
Imagem estadao
Imagem estadaoImagem estadao
Imagem estadao
 
Advanced analytics with sap hana and r
Advanced analytics with sap hana and rAdvanced analytics with sap hana and r
Advanced analytics with sap hana and r
 
NetScaler TCP Performance Tuning
NetScaler TCP Performance TuningNetScaler TCP Performance Tuning
NetScaler TCP Performance Tuning
 
Building LinkedIn's Learning Platform with MongoDB
Building LinkedIn's Learning Platform with MongoDBBuilding LinkedIn's Learning Platform with MongoDB
Building LinkedIn's Learning Platform with MongoDB
 
Loading text data from SAP source systems
Loading text data from SAP source systemsLoading text data from SAP source systems
Loading text data from SAP source systems
 
Performance Testing: Eliminate System Outages and Save Millions
Performance Testing: Eliminate System Outages and Save MillionsPerformance Testing: Eliminate System Outages and Save Millions
Performance Testing: Eliminate System Outages and Save Millions
 
Leverage Social Media Data with SAP Data Services
Leverage Social Media Data with SAP Data ServicesLeverage Social Media Data with SAP Data Services
Leverage Social Media Data with SAP Data Services
 
Leverage Data Services to Boost Sales
Leverage Data Services to Boost SalesLeverage Data Services to Boost Sales
Leverage Data Services to Boost Sales
 
HANA SPS07 Smart Data Access
HANA SPS07 Smart Data AccessHANA SPS07 Smart Data Access
HANA SPS07 Smart Data Access
 
Big data/Hadoop/HANA Basics
Big data/Hadoop/HANA BasicsBig data/Hadoop/HANA Basics
Big data/Hadoop/HANA Basics
 
RDS Supporting SAP HANA
RDS Supporting SAP HANARDS Supporting SAP HANA
RDS Supporting SAP HANA
 
Translating Big Data Insight Into Action
Translating Big Data Insight Into ActionTranslating Big Data Insight Into Action
Translating Big Data Insight Into Action
 

Similaire à Integration of SAP HANA with Hadoop

SAP Lambda Architecture Point of View
SAP Lambda Architecture Point of ViewSAP Lambda Architecture Point of View
SAP Lambda Architecture Point of ViewSnehanshu Shah
 
What Is SAP HANA And Its Benefits?
What Is SAP HANA And Its Benefits?What Is SAP HANA And Its Benefits?
What Is SAP HANA And Its Benefits?ManojAgrawal74
 
How is sap data services unique for sap hana integration
How is sap data services unique for sap hana integrationHow is sap data services unique for sap hana integration
How is sap data services unique for sap hana integrationFlavio Alejandro Corradini
 
Pervasive DataRush
Pervasive DataRushPervasive DataRush
Pervasive DataRushtempledf
 
1310 success stories_and_lessons_learned_implementing_sap_hana_solutions
1310 success stories_and_lessons_learned_implementing_sap_hana_solutions1310 success stories_and_lessons_learned_implementing_sap_hana_solutions
1310 success stories_and_lessons_learned_implementing_sap_hana_solutionsBobby Shah
 
SAP HANA - Big Data and Fast Data
SAP HANA - Big Data and Fast DataSAP HANA - Big Data and Fast Data
SAP HANA - Big Data and Fast DataVitaliy Rudnytskiy
 
Hadoop Training in Delhi
Hadoop Training in DelhiHadoop Training in Delhi
Hadoop Training in DelhiAPTRON
 
Hadoop vs spark
Hadoop vs sparkHadoop vs spark
Hadoop vs sparkamarkayam
 
Hadoop data-lake-white-paper
Hadoop data-lake-white-paperHadoop data-lake-white-paper
Hadoop data-lake-white-paperSupratim Ray
 
Hadoop is not an Island in the Enterprise
Hadoop is not an Island in the EnterpriseHadoop is not an Island in the Enterprise
Hadoop is not an Island in the EnterpriseDataWorks Summit
 
Rajesh Angadi Brochure
Rajesh Angadi Brochure Rajesh Angadi Brochure
Rajesh Angadi Brochure Rajesh Angadi
 
Introduction to Apache hadoop
Introduction to Apache hadoopIntroduction to Apache hadoop
Introduction to Apache hadoopOmar Jaber
 
Hadoop a Natural Choice for Data Intensive Log Processing
Hadoop a Natural Choice for Data Intensive Log ProcessingHadoop a Natural Choice for Data Intensive Log Processing
Hadoop a Natural Choice for Data Intensive Log ProcessingHitendra Kumar
 

Similaire à Integration of SAP HANA with Hadoop (20)

finap ppt conference.pptx
finap ppt conference.pptxfinap ppt conference.pptx
finap ppt conference.pptx
 
SAP Lambda Architecture Point of View
SAP Lambda Architecture Point of ViewSAP Lambda Architecture Point of View
SAP Lambda Architecture Point of View
 
What Is SAP HANA And Its Benefits?
What Is SAP HANA And Its Benefits?What Is SAP HANA And Its Benefits?
What Is SAP HANA And Its Benefits?
 
SDA - POC
SDA - POCSDA - POC
SDA - POC
 
How is sap data services unique for sap hana integration
How is sap data services unique for sap hana integrationHow is sap data services unique for sap hana integration
How is sap data services unique for sap hana integration
 
Pervasive DataRush
Pervasive DataRushPervasive DataRush
Pervasive DataRush
 
Actian DataFlow Whitepaper
Actian DataFlow WhitepaperActian DataFlow Whitepaper
Actian DataFlow Whitepaper
 
Why Hadoop as a Service?
Why Hadoop as a Service?Why Hadoop as a Service?
Why Hadoop as a Service?
 
View on big data technologies
View on big data technologiesView on big data technologies
View on big data technologies
 
1310 success stories_and_lessons_learned_implementing_sap_hana_solutions
1310 success stories_and_lessons_learned_implementing_sap_hana_solutions1310 success stories_and_lessons_learned_implementing_sap_hana_solutions
1310 success stories_and_lessons_learned_implementing_sap_hana_solutions
 
SAP HANA - Big Data and Fast Data
SAP HANA - Big Data and Fast DataSAP HANA - Big Data and Fast Data
SAP HANA - Big Data and Fast Data
 
Hadoop Training in Delhi
Hadoop Training in DelhiHadoop Training in Delhi
Hadoop Training in Delhi
 
Hadoop vs spark
Hadoop vs sparkHadoop vs spark
Hadoop vs spark
 
Hadoop data-lake-white-paper
Hadoop data-lake-white-paperHadoop data-lake-white-paper
Hadoop data-lake-white-paper
 
Hadoop is not an Island in the Enterprise
Hadoop is not an Island in the EnterpriseHadoop is not an Island in the Enterprise
Hadoop is not an Island in the Enterprise
 
Rajesh Angadi Brochure
Rajesh Angadi Brochure Rajesh Angadi Brochure
Rajesh Angadi Brochure
 
Introduction to Apache hadoop
Introduction to Apache hadoopIntroduction to Apache hadoop
Introduction to Apache hadoop
 
Hadoop in action
Hadoop in actionHadoop in action
Hadoop in action
 
IJET-V3I2P14
IJET-V3I2P14IJET-V3I2P14
IJET-V3I2P14
 
Hadoop a Natural Choice for Data Intensive Log Processing
Hadoop a Natural Choice for Data Intensive Log ProcessingHadoop a Natural Choice for Data Intensive Log Processing
Hadoop a Natural Choice for Data Intensive Log Processing
 

Dernier

The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...ICS
 
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfThe Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfkalichargn70th171
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...OnePlan Solutions
 
Asset Management Software - Infographic
Asset Management Software - InfographicAsset Management Software - Infographic
Asset Management Software - InfographicHr365.us smith
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...MyIntelliSource, Inc.
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)OPEN KNOWLEDGE GmbH
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comFatema Valibhai
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityNeo4j
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxComplianceQuest1
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio, Inc.
 
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEBATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEOrtus Solutions, Corp
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software DevelopersVinodh Ram
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxbodapatigopi8531
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...stazi3110
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideChristina Lin
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...MyIntelliSource, Inc.
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVshikhaohhpro
 
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...Christina Lin
 
why an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfwhy an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfjoe51371421
 
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataBradBedford3
 

Dernier (20)

The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
 
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfThe Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...
 
Asset Management Software - Infographic
Asset Management Software - InfographicAsset Management Software - Infographic
Asset Management Software - Infographic
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered Sustainability
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docx
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
 
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEBATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software Developers
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptx
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
 
why an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfwhy an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdf
 
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
 

Integration of SAP HANA with Hadoop

  • 1. Page 1 Author – Ramkumar Rajendran Integration of SAP HANA with Hadoop
  • 2. Page 2 Author – Ramkumar Rajendran Author Biography Ramkumar Rajendran Ramkumar Rajendran is a Consultant at a leading firm with an experience of 4 years. He has specialized in various tools like SAP HANA, SAP BI, SAP BO (Xcelsius, Webi and IDT), Tableau, Lumira and Hadoop- Hive. He has worked upon the Sentiment Analysis of Twitter data. He has involved in the integration of HANA and Hadoop. He has worked on multiple implementation projects for various industry sectors.
  • 3. Page 3 Author – Ramkumar Rajendran Table of Contents 1 About this document.....................................................................................4 2 Introduction..................................................................................................5 SAP HANA......................................................................................................5 Hadoop..........................................................................................................5 3 Combined Potential of HANA and Hadoop..........Error!Bookmark notdefined. 4 Scenarios of Hadoop and Hana integration....................................................7 Federated Data Query through Smart Data Access (SDA).................................8 Business Objects Data Services.......................................................................9 SQOOP ........................................................................................................10 JAVA Program..............................................................................................12 5 Summary.....................................................................................................13 6 Reference Material......................................................................................13
  • 4. Page 4 Author – Ramkumar Rajendran About this document This document would be talking about the combined potential of the in-memory database ’SAP HANA’ and the bigdata solution ‘Hadoop’ and the various methods of integration of both these technologies and the scenarios where each of these methods would be applicable . SAP HANA is specialized in real-time in-memory processing, while Hadoop is apt for massive parallel processing. Integration of both these technologies would have the advantages from both of them. Hadoop handles both structured and unstructured data from social media, machine logs, etc. which can be further used along with the transactional data present in HANA resulting in more mature business analysis. This document has been prepared based upon SAP HANA SP6 and Hadoop CDH 4.5.
  • 5. Page 5 Author – Ramkumar Rajendran Introduction SAP HANA SAP HANA is an innovative in-memory database and data management platform, specifically developed to take full advantage of the capabilities provided by modern hardware to increase application performance. By keeping all relevant data in main memory, data processing operations are significantly accelerated. Design for scalability is a core SAP HANA principle. SAP HANA can be distributed across many multiple hosts to achieve scalability in terms of both data volume and user concurrency. Unlike clusters, distributed HANA systems also distribute the data efficiently, achieving high scaling without I/O locks. The key performance indicators of SAP HANA appeal to many of our customers, and thousands of deployments are in progress. SAP HANA has become the fastest growing product in SAP’s 40+ year history. Hadoop Hadoop is an open source software project that enables the distributed processing of large data sets across clusters of commodity servers. It is designed to scale up from a single server to thousands of machines, with a very high degree of fault tolerance. Rather than relying on high-end hardware, the resiliency of these clusters comes from the software’s ability to detect and handle failures at the application layer. Hadoop is known for its massive parallel processing capabilities on large datasets. It is also scalable, cost effective owing to cheaper processers, flexible and fault tolerant.
  • 6. Page 6 Author – Ramkumar Rajendran CombinedPotential of HANAand Hadoop Hadoop can store very huge amount of data. It is well suited for storing unstructured data, is good for manipulating very large files and is tolerant to hardware and software failures. But the main challenge with Hadoop is getting information out of this huge data in real time. HANA is well suited for processing data in real time, thanks to its in-memory technology. By integrating Hadoop’s massive parallel processing and HANA’s in-memory computing capabilities the resultant solution would be capable of the following:  Accommodation of both structured and un-structured data.  Provision of cost efficient data storage and processing for large volumes data.  Computation of complex Information Processing.  Enabling heavily recursive algorithms, machine learning and queries that cannot be easily expressed in SQL.  Low Value Data Archive & Data stays available, though access is slower.  Mine raw data that is either schema-less or where schema changes over time.
  • 7. Page 7 Author – Ramkumar Rajendran Scenarios ofHadoopand Hana integration Smart Data Access Business Objects Data Services SQOOP Java Federated Data Query through Smart Data Access(SDA) Hadoop Reporting Tools SDA Data Loading from Hadoop to HANA Hadoop SAP HANA Reporting Tools BODS Data Loading with Java Programming Hadoop SAP HANA Reporting Tools Java Hadoop SAP HANA Reporting Tools Data Loading from Hadoop to HANA SQOOOP PULL mechanism PUSH mechanism PUSH or PULL mechanism SAP HANA No Data Loading
  • 8. Page 8 Author – Ramkumar Rajendran Federated Data Query throughSmart Data Access (SDA) SAP HANA smart data access enables remote Hadoop data to be accessed as if they are local tables in SAP HANA, without loading the data into SAP HANA. Not only does this capability provide operational and cost benefits, but most importantly it supports the development and deployment of the next generation of analytical applications which require the ability to access, synthesize and integrate data from multiple systems in real-time regardless of where the data is located or what systems are generating it. Specifically in SAP HANA, we can create virtual tables which point to remote tables in Hadoop. Customers can then write SQL queries in SAP HANA, which could operate on virtual tables. The SAP HANA query processor optimizes these queries, and executes the relevant part of the query in the target database, returns the results of the query to SAP HANA, and completes the operation. Recommended Scenarios Using SDA to access Hadoop from HANA would involve federated query being fired on Hadoop with the execution of the report. This technique is recommended when large amount of result set gets generated at Hadoop when the reporting query is fired. Smart Data Access involves aggregating the dataset at Hadoop using its system resources, resulting in the transfer of only end results from Hadoop to HANA. Advantages of this technique  Real-time data access from Hadoop without actually having to load it into HANA  Helps in scenarios where the data residing in Hadoop is updated very frequently and data loading would make no sense.  Query can be optimized by pushing the processing down to Hadoop, as it will return aggregated data. Disadvantages of this technique  Federated Query gets slowed down when huge processing needs to be done on the data at Hadoop end.  Data transformation is not possible while using Smart Data Access.
  • 9. Page 9 Author – Ramkumar Rajendran  With this technique the reporting query would also be fired on Hadoop, which makes it critical for it to be up at all times. In cases of multiple Hadoop systems, it would become more potent of risk.  Data can only be extracted from HIVE.  Data access can happen only from Hadoop to HANA. Business Objects Data Services SAP Data Services delivers a single enterprise-class solution for data integration, data quality, data profiling and text data processing. This technique involves data PULL mechanism from Hadoop to HANA; so the entire control is based on BODS. This wide range of features helps to -  Integrate, transform, improve, and deliver trusted data from Hadoop to HANA  Provides development user interfaces, a metadata repository, a data connectivity layer, a run-time environment, and a management console enabling IT organizations to lower total cost of ownership and accelerate time to value.  Enable IT organizations to maximize operational efficiency with a single solution to improve data quality and gain access to heterogeneous sources and applications. Recommended Scenarios Integrating HANA with Hadoop using BODS would involve data loading on a timely manner. This can be utilized in scenarios where there is not requirement of real-time reporting, but involves complex calculations on large datasets. This technique would prove very effective in scenarios which involve multiple Hadoop systems with variety of unstructured data to be processed on a large scale.
  • 10. Page 10 Author– Ramkumar Rajendran Advantages of this technique  Unstructured data can be loaded from Hadoop to HANA with all the transformation done while data loading.  It is better suited for loading of large dataset.  BODS can be utilized to implement complex transformations while loading data from Hadoop to HANA.  Performance of HANA can be improved by moving complex calculations to BODS.  Its Error Handling aspect helps in better support and maintenance.  Data encryption function to encrypt sensitive data is one of the niche aspects of data loading through BODS.  Centralized monitoring favors better IT support.  Delta loads are also supported.  Data transfer can happen from both the sides. Disadvantages of this technique  Data present in Hadoop cannot be availed on a real time basis since BODS loads data from Hadoop to HANA as a batch job. SQOOP SQOOP is a tool designed for efficiently transferring bulk data between Hadoop and structured data stores like Oracle, MsSQL, SAP HANA, etc. SQOOP can be used to import data from external structured data stores into Hadoop Distributed File System or related systems like Hive and HBase. Conversely, SQOOP can be used to extract data from Hadoop and export it to external structured data stores such as relational databases and enterprise data warehouses. SQOOP provides a pluggable connector mechanism for optimal connectivity to external systems. The SQOOP extension API provides a convenient framework for building new connectors. New connectors can be dropped into SQOOP installations to provide connectivity to various systems. SQOOP itself comes bundled with various connectors that can be used for popular database and data warehousing systems.
  • 11. Page 11 Author– Ramkumar Rajendran By utilizing SQOOP data transfer would be automated through batch jobs and it utilizes the native tools for high performance data transfer. It uses data store metadata to infer structure definitions. It utilizes the MapReduce framework of Hadoop to transfer data in parallel, which proves fruitful for huge amount of data. It provides an extension mechanism to incorporate high performance connectors for external systems. For exporting data to external targets, SQOOP supports the functionality of Staging Tables which considerably improves the efficiency of data transfer and also acts as insulation from data corruption during times of failure. This technique involves PUSH mechanism to load data from Hadoop to HANA; so the entire control is based upon SQOOP in Hadoop. Recommended Scenarios SQQOP is a component in Hadoop which helps in data transfer from HDFS to external databases and vice versa. This technique of integrating SAP HANA with Hadoop would involve periodic loading of data directly from the underlying Hadoop files to HANA tables. SQOOP doesn’t support any transformation while transferring data. Hence this technique can be used in scenarios which require no real-time reporting and readily formatted source data which requires no cleansing. Also this would be most suited for bulk data transfers since SQOOP uses the underlying MapReduce framework of Hadoop enabling parallel data transfer. Advantages of this technique  It is better suited for loading of bulk datasets.  Data transfers can happen from both the sides.  It is open-source and hence cost-effective. Disadvantages of this technique  Data present in Hadoop cannot be availed on a real time basis since SQOOP loads data from Hadoop to HANA as a batch job.  No cleansing and formatting on the data can be done with SQOOP.
  • 12. Page 12 Author– Ramkumar Rajendran JAVA Program Java program can be used to load data from Hadoop to HANA through JDBC connectivity. This technique of HANA-Hadoop offers very high level of customization in terms of cleansing, transformation, refining, filtering, etc. We can implement both PUSH and PULL mechanism to transfer data from Hadoop to HANA, depending upon where the program is installed and scheduled. Recommended Scenarios Data transfer from Hadoop to HANA is recommended in scenarios where it involves very less data transfer. This technique offers very high level of control with the developers; so they can come with a very customizable solution. Advantages of this technique  It offers customization at a greater extent.  Java is open source; and hence it would be a cost-effective solution.  Java program can be executed from the command line and doesn’t require any additional setup to host. Disadvantages of this technique  It would require high level of programming skills.  Error tracking and debugging becomes difficult.
  • 13. Page 13 Author– Ramkumar Rajendran Summary The integration of HANA with Hadoop enables customers to move data between Hive and Hadoop’s Distributed File System and SAP HANA. Hadoop is good at processing bulk data at a very cheaper rate. Hence if a particular junk of data is not much valuable to the users, and they don’t access them often, storing it in HANA will be cost-prohibitive. By combining SAP HANA and Hadoop together, customers get the power of instant access with SAP HANA and infinite scale with Hadoop. This gives SAP users a broad range of options for storing and analyzing new types of data and the ability to create applications that can uncover new business opportunities from vast amounts of data that would not have been previously possible. References http://blog.cloudera.com/blog/ https://www.brighttalk.com/webcast/9727/86361 http://scn.sap.com/community/developer-center/hana/blog/2014/01/27/exporting-and-importing- data-to-hana-with-hadoop-sqoop http://www.saphana.com/docs/DOC-2934