SlideShare une entreprise Scribd logo
1  sur  6
Télécharger pour lire hors ligne
SECURE YOUR DATA IN
HADOOP
Current state of security, approach for
comprehensive strategy
1

CONTENTS
Introduction ........................................................................................................................................................ 2
Big data- What is happening?........................................................................................................................ 2
Hadoop- Security .............................................................................................................................................. 3
Current Hadoop Security Features/Initiatives .......................................................................................... 3
Work to be done in Hadoop ....................................................................................................................... 4
XA Secure - Big Data Security Approach ..................................................................................................... 4
XA Secure differentiators ............................................................................................................................. 5
Summary ............................................................................................................................................................. 5

www.xasecure.com| +1.510.585.3289|7100 Stevenson Blvd Fremont CA 94538
2

INTRODUCTION
Big data is emerging as the next technology wave and enterprises across different
industries are adopting tools such as Hadoop. While there are efficiencies in processing
varied and distributed data, big data presents a unique challenge for managing
information security.
BIG DATA- WHAT IS HAPPENING?
Digital data is everywhere and global data is growing at 40% per year. Companies are
capturing trillions of bytes of information about their customers, suppliers, and
operations, and millions of networked sensors are being embedded in the physical
world in devices such as mobile phones, energy meters and automobiles, sensing,
creating, and communicating data. By collecting and analyzing all this information,
companies can gain insight into new business opportunities and threats. To harness the
ever expanding data volumes, new technologies have emerged to enable processing
of massive sets of data in a technique called massive parallel processing (mpp). In a
recent survey by Talend, it was found 60% of companies looking at big data are
considering open source Apache Hadoop or Hadoop based distributions.
From its initial development to supporting Yahoo’s increasing search and web
management needs, Hadoop has emerged as the leading platform to support big
data analytics applications. Hadoop software market itself is predicted to be around
$813 million by 2016(IDC research). Enterprises are moving to a phase whether they
have completed pilot or proof of concept work and embracing Hadoop to solve core
business needs in production.
At the same time, organizations are trying to analyze different kinds of data, from web
logs, social media streams to sales and customer information to get better insights. With
Hadoop, they are able to achieve this at a fraction of a cost compared to traditional
data warehouses. There is a movement towards creating large data lakes or data hubs
where enterprise wide can be stored and processed using Hadoop.
Therein presents the risk of data security, as data moves from protected walls of
enterprise applications to the kitchen sink called Big Data. Organizations need to
provide the same level of security across their organization. Data within big data
initiatives are no exception.

www.xasecure.com| +1.510.585.3289|7100 Stevenson Blvd Fremont CA 94538
3

HADOOP- SECURITY
Hadoop was developed to process massive amounts of disparate data using
commodity hardware. From its initial success in Yahoo, it has matured as an application
to support various verticals. However, the security controls inside Hadoop are very basic
and still evolving.
CURRENT HADOOP SECURITY FE ATURES/INITIATIVES

Given the security challenges, there has been lot of work being undertaken within the
open source and vendor community to enable Hadoop to be a more secure
environment. We have summarized some of the important initiatives
Kerberos Authentication: As one of the first steps towards security, Kerberos
authentication was introduced in Hadoop in 2008 to add a basic level of security that
was missing before and today it is the primary method for providing secure
authentication in Hadoop. Kerberos is a computer network authentication protocol
which works on the basis of "tickets" to allow nodes communicating over a non-secure
network to prove their identity to one another in a secure manner. Kerberos
authentication enables the MapReduce jobs or Namenode tracker in Hadoop to
authenticate the user and enabling permissions based on that
Access Control Lists (ACLs): In core HDFS, file permissions are similar to permission in a
UNIX system. Read-write access is maintained for each user groups which are basically
a string of characters. At the MapReduce level, which users can be used to submit jobs
can be defined by MapReduce ACLs. The list of users groups can be maintained with
the Hadoop layer or can be configured to get it from external LDAP or Active directory
systems. HBase ACLs were introduced from HBase 0.92 onward and gives the ability to
define authorization policy (Read/Write/Create/Admin), with table/family/qualifier
granularity, for a specified user.
Sentry (Cloudera): Cloudera recently introduced role-based authorization framework
which provides access to user and groups over Hive and Cloudera’s Impala. The
authorization framework uses a file based policy provider and can be configured at
multiple levels i.e., server, database, table, column etc.
Project Knox (Hortonworks): Project Knox from Hortonworks is currently focused on
providing a gateway to the Hadoop clusters, to provide a single point of authentication
and access for Apache Hadoop services in a cluster. Features planned include
providing perimeter security for Hadoop, single cluster end point for data and jobs,
management of security across multiple clusters and Hadoop versions among other
areas. The initiative, started in 2013, has already delivered couple of releases

www.xasecure.com| +1.510.585.3289|7100 Stevenson Blvd Fremont CA 94538
4

WORK TO BE DONE IN H ADOOP

There is a long way to go before Hadoop can meet the exacting security standards in
large enterprises. Despite the current work, there are still some challenges for CIOs and
CISOs adopting the Hadoop stack, including












No framework for managing enterprise policies. Large enterprises have complex
and constantly evolving policies for managing data access. The native Hadoop
framework does not offer an easy framework for customizing and managing
employee policies.
Fine grained authorization. The current authorization lets user or user groups get
access to tables or file systems/directories as a whole. Enterprises are looking for
more fine grained authorization to ensure sensitive data is protected from access
while still be able to analyze complete set of data and leveraging its full potential
Decentralizing data ownership. As the use of Hadoop expands in the organization,
business units would still want to retain control of their data and provide access
themselves to users from other units.
Lack of uniform authorization method. While HBase uses ACLs for managing
authorization, HDFS nodes refer to its own set of groups defined for vetting
authorization. Enterprises are looking for a universal process for authorization across
all components.
Lack of universal audit control mechanism. Currently each component is built to
have its own audit tracking mechanism and there is no uniformity in elements
tracked or format of the audit log. Enterprise are looking for easy way of reporting
access history of their employees
Lack of reporting and governance capabilities. Enterprises would need tools to
readily report policy status, access history and check compliance conformance
across various assets.

XA SECURE - BIG DATA SECURITY APPROACH
At XA Secure, we recognize these challenges for Hadoop and other big data tools, and
are trying to solve them through our solution offerings. Our initial product is completely
built ground up for the big data infrastructure. We are trying to address some of the
security challenges with Hadoop infrastructure by providing a governance layer to
enable
a) Centralized policy management with ability to define policies for fine grained
access controls to files (HDFS), column families, cells (Hbase, Hive) etc,
Differentiated views of data based on user function
b) Protect sensitive data through masking and encryption

www.xasecure.com| +1.510.585.3289|7100 Stevenson Blvd Fremont CA 94538
5

c) Common extensive audit layer across Hadoop components. Audit can be set at
resource and user group level.
d) Delegated administration of data
e) Policy analytics to monitor and report access, enable compliance conformance
The tool is currently built over HBase, Hive and HDFS components with planned
incorporation of other big data tools, such as Greenplum, Mongo DB, in the future
releases.
XA SECURE DIFFERENTIATORS
As noted before, there is a lot of work being done in making Hadoop more secure and
at XA Secure, we continue to work with the open source community in leveraging the
collective work and delivering value to our customers. As a company with a rich history
in security and identity management, and being purely focused on big data, we
believe we bring unique value proposition through our offerings, which includes
a) An end to end complete access management and governance suite over
Hadoop. We focus on making it easier for both business users as well as
administrators to manage data security over Hadoop
b) Distribution agnostic solution. We support most of the prevalent Hadoop
distributions and can easily integrate into management tools that come as part
of the distribution
c) Hooks to integrate with enterprise’s existing provisioning or access management
systems. We currently integrate with LDAP, and also support import and export of
our policies
d) Industry specific compliance and audit reports. We are building support for
government, financial and healthcare compliance requirements.
e) Leverage and built over current open sources efforts on authentication and
encryption. We will continue to embed other open source initiatives as they are
released

SUMMARY
The big data ecosystem is evolving and there are a lot of initiatives in the open source
and vendor community for building mature capabilities, It is important that enterprises
embed security strategy as part of their plan early and think about what data they
would put into big data tools and how they are going to extend the security controls
over the data. CISOs can choose to adopt XA Secure’s solution to provide enterprise
level security and credibility to their big data initiatives.

www.xasecure.com| +1.510.585.3289|7100 Stevenson Blvd Fremont CA 94538

Contenu connexe

Tendances

Data warehouse-optimization-with-hadoop-informatica-cloudera
Data warehouse-optimization-with-hadoop-informatica-clouderaData warehouse-optimization-with-hadoop-informatica-cloudera
Data warehouse-optimization-with-hadoop-informatica-cloudera
Jyrki Määttä
 
Blue Canopy Semantic Web Approach v25 brief
Blue Canopy Semantic Web Approach v25 briefBlue Canopy Semantic Web Approach v25 brief
Blue Canopy Semantic Web Approach v25 brief
Nick Savage
 
Hadoop Based Data Discovery
Hadoop Based Data DiscoveryHadoop Based Data Discovery
Hadoop Based Data Discovery
Benjamin Ashkar
 
Big Data Practice_Planning_steps_RK
Big Data Practice_Planning_steps_RKBig Data Practice_Planning_steps_RK
Big Data Practice_Planning_steps_RK
Rajesh Jayarman
 

Tendances (20)

Big data and apache hadoop adoption
Big data and apache hadoop adoptionBig data and apache hadoop adoption
Big data and apache hadoop adoption
 
4AA4-1812ENW
4AA4-1812ENW4AA4-1812ENW
4AA4-1812ENW
 
Voltage Security, Protecting Sensitive Data in Hadoop
Voltage Security, Protecting Sensitive Data in HadoopVoltage Security, Protecting Sensitive Data in Hadoop
Voltage Security, Protecting Sensitive Data in Hadoop
 
Data Lake - Multitenancy Best Practices
Data Lake - Multitenancy Best PracticesData Lake - Multitenancy Best Practices
Data Lake - Multitenancy Best Practices
 
Security issues in big data
Security issues in big data Security issues in big data
Security issues in big data
 
Data warehouse-optimization-with-hadoop-informatica-cloudera
Data warehouse-optimization-with-hadoop-informatica-clouderaData warehouse-optimization-with-hadoop-informatica-cloudera
Data warehouse-optimization-with-hadoop-informatica-cloudera
 
Blue Canopy Semantic Web Approach v25 brief
Blue Canopy Semantic Web Approach v25 briefBlue Canopy Semantic Web Approach v25 brief
Blue Canopy Semantic Web Approach v25 brief
 
Review on Big Data Security in Hadoop
Review on Big Data Security in HadoopReview on Big Data Security in Hadoop
Review on Big Data Security in Hadoop
 
Big data analysis concepts and references by Cloud Security Alliance
Big data analysis concepts and references by Cloud Security AllianceBig data analysis concepts and references by Cloud Security Alliance
Big data analysis concepts and references by Cloud Security Alliance
 
Hadoop Based Data Discovery
Hadoop Based Data DiscoveryHadoop Based Data Discovery
Hadoop Based Data Discovery
 
Big Data Practice_Planning_steps_RK
Big Data Practice_Planning_steps_RKBig Data Practice_Planning_steps_RK
Big Data Practice_Planning_steps_RK
 
Big Data
Big DataBig Data
Big Data
 
A Survey on Access Control Scheme for Data in Cloud with Anonymous Authentica...
A Survey on Access Control Scheme for Data in Cloud with Anonymous Authentica...A Survey on Access Control Scheme for Data in Cloud with Anonymous Authentica...
A Survey on Access Control Scheme for Data in Cloud with Anonymous Authentica...
 
EMC Isilon Scale-Out NAS for In-Place Hadoop Data Analytics
EMC Isilon Scale-Out NAS for In-Place Hadoop Data AnalyticsEMC Isilon Scale-Out NAS for In-Place Hadoop Data Analytics
EMC Isilon Scale-Out NAS for In-Place Hadoop Data Analytics
 
The Journey Toward the Software-Defined Data Center
The Journey Toward the Software-Defined Data CenterThe Journey Toward the Software-Defined Data Center
The Journey Toward the Software-Defined Data Center
 
A Security and Privacy Measure for Encrypted Cloud Database
A Security and Privacy Measure for Encrypted Cloud DatabaseA Security and Privacy Measure for Encrypted Cloud Database
A Security and Privacy Measure for Encrypted Cloud Database
 
Big Data Analytics Tutorial | Big Data Analytics for Beginners | Hadoop Tutor...
Big Data Analytics Tutorial | Big Data Analytics for Beginners | Hadoop Tutor...Big Data Analytics Tutorial | Big Data Analytics for Beginners | Hadoop Tutor...
Big Data Analytics Tutorial | Big Data Analytics for Beginners | Hadoop Tutor...
 
Big Data-Survey
Big Data-SurveyBig Data-Survey
Big Data-Survey
 
Capgemini Data Warehouse Optimization Using Hadoop
Capgemini Data Warehouse Optimization Using HadoopCapgemini Data Warehouse Optimization Using Hadoop
Capgemini Data Warehouse Optimization Using Hadoop
 
Hortonworks and Red Hat Webinar - Part 2
Hortonworks and Red Hat Webinar - Part 2Hortonworks and Red Hat Webinar - Part 2
Hortonworks and Red Hat Webinar - Part 2
 

En vedette

En vedette (19)

Auto formation *WinDev
Auto formation *WinDev Auto formation *WinDev
Auto formation *WinDev
 
Tp4 - PHP
Tp4 - PHPTp4 - PHP
Tp4 - PHP
 
Programmation événementielle avec Windev
Programmation événementielle avec WindevProgrammation événementielle avec Windev
Programmation événementielle avec Windev
 
Resumer sur les tris
Resumer sur les trisResumer sur les tris
Resumer sur les tris
 
Big Data, Security Intelligence, (And Why I Hate This Title)
Big Data, Security Intelligence, (And Why I Hate This Title) Big Data, Security Intelligence, (And Why I Hate This Title)
Big Data, Security Intelligence, (And Why I Hate This Title)
 
Workshop: Big Data Visualization for Security
Workshop: Big Data Visualization for SecurityWorkshop: Big Data Visualization for Security
Workshop: Big Data Visualization for Security
 
Hadoop and Data Access Security
Hadoop and Data Access SecurityHadoop and Data Access Security
Hadoop and Data Access Security
 
Security and Audit for Big Data
Security and Audit for Big DataSecurity and Audit for Big Data
Security and Audit for Big Data
 
Sentry - An Introduction
Sentry - An Introduction Sentry - An Introduction
Sentry - An Introduction
 
End-to-End Security and Auditing in a Big Data as a Service Deployment
End-to-End Security and Auditing in a Big Data as a Service DeploymentEnd-to-End Security and Auditing in a Big Data as a Service Deployment
End-to-End Security and Auditing in a Big Data as a Service Deployment
 
Big Data and Security - Where are we now? (2015)
Big Data and Security - Where are we now? (2015)Big Data and Security - Where are we now? (2015)
Big Data and Security - Where are we now? (2015)
 
Big Data Security and Governance
Big Data Security and GovernanceBig Data Security and Governance
Big Data Security and Governance
 
Big data security
Big data securityBig data security
Big data security
 
IoT - Big Data & Security
IoT - Big Data & SecurityIoT - Big Data & Security
IoT - Big Data & Security
 
那些你知道的,但還沒看過的 Big Data 風景 ─ 致 Hadooper
那些你知道的,但還沒看過的 Big Data 風景 ─ 致 Hadooper那些你知道的,但還沒看過的 Big Data 風景 ─ 致 Hadooper
那些你知道的,但還沒看過的 Big Data 風景 ─ 致 Hadooper
 
PHP & MySQL
PHP & MySQLPHP & MySQL
PHP & MySQL
 
Cours php bac info
Cours php bac infoCours php bac info
Cours php bac info
 
#nwxtech6 Olivier Martineau - Les démons en PHP
#nwxtech6 Olivier Martineau - Les démons en PHP#nwxtech6 Olivier Martineau - Les démons en PHP
#nwxtech6 Olivier Martineau - Les démons en PHP
 
Méthode : Réalisation d'un projet mobile (Sushee)
Méthode : Réalisation d'un projet mobile (Sushee)Méthode : Réalisation d'un projet mobile (Sushee)
Méthode : Réalisation d'un projet mobile (Sushee)
 

Similaire à XA Secure | Whitepaper on data security within Hadoop

Rajesh Angadi Brochure
Rajesh Angadi Brochure Rajesh Angadi Brochure
Rajesh Angadi Brochure
Rajesh Angadi
 

Similaire à XA Secure | Whitepaper on data security within Hadoop (20)

Bigdata overview
Bigdata overviewBigdata overview
Bigdata overview
 
IRJET- Secured Hadoop Environment
IRJET- Secured Hadoop EnvironmentIRJET- Secured Hadoop Environment
IRJET- Secured Hadoop Environment
 
Infrastructure Considerations for Analytical Workloads
Infrastructure Considerations for Analytical WorkloadsInfrastructure Considerations for Analytical Workloads
Infrastructure Considerations for Analytical Workloads
 
Security and Compliance for Scale-Out Hadoop Data Lakes
Security and Compliance for Scale-Out Hadoop Data LakesSecurity and Compliance for Scale-Out Hadoop Data Lakes
Security and Compliance for Scale-Out Hadoop Data Lakes
 
paper
paperpaper
paper
 
Big Data Security on Microsoft Azure - HDInsight and HortonWorks
Big Data Security on Microsoft Azure - HDInsight and HortonWorksBig Data Security on Microsoft Azure - HDInsight and HortonWorks
Big Data Security on Microsoft Azure - HDInsight and HortonWorks
 
Security for Big Data
Security for Big DataSecurity for Big Data
Security for Big Data
 
Rajesh Angadi Brochure
Rajesh Angadi Brochure Rajesh Angadi Brochure
Rajesh Angadi Brochure
 
Hadoop and Big Data Analytics | Sysfore
Hadoop and Big Data Analytics | SysforeHadoop and Big Data Analytics | Sysfore
Hadoop and Big Data Analytics | Sysfore
 
Eu gdpr technical workflow and productionalization neccessary w privacy ass...
Eu gdpr technical workflow and productionalization   neccessary w privacy ass...Eu gdpr technical workflow and productionalization   neccessary w privacy ass...
Eu gdpr technical workflow and productionalization neccessary w privacy ass...
 
Big data and Hadoop overview
Big data and Hadoop overviewBig data and Hadoop overview
Big data and Hadoop overview
 
Hd insight overview
Hd insight overviewHd insight overview
Hd insight overview
 
Privacy Preserving Data Analytics using Cryptographic Technique for Large Dat...
Privacy Preserving Data Analytics using Cryptographic Technique for Large Dat...Privacy Preserving Data Analytics using Cryptographic Technique for Large Dat...
Privacy Preserving Data Analytics using Cryptographic Technique for Large Dat...
 
Big Data Processing with Hadoop : A Review
Big Data Processing with Hadoop : A ReviewBig Data Processing with Hadoop : A Review
Big Data Processing with Hadoop : A Review
 
Big data
Big dataBig data
Big data
 
Big data Hadoop presentation
Big data  Hadoop  presentation Big data  Hadoop  presentation
Big data Hadoop presentation
 
Enterprise Data Lake
Enterprise Data LakeEnterprise Data Lake
Enterprise Data Lake
 
Isaca journal - bridging the gap between access and security in big data...
Isaca journal  - bridging the gap between access and security in big data...Isaca journal  - bridging the gap between access and security in big data...
Isaca journal - bridging the gap between access and security in big data...
 
Big Data
Big DataBig Data
Big Data
 
Lecture4 big data technology foundations
Lecture4 big data technology foundationsLecture4 big data technology foundations
Lecture4 big data technology foundations
 

Dernier

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
Earley Information Science
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
 

Dernier (20)

The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 

XA Secure | Whitepaper on data security within Hadoop

  • 1. SECURE YOUR DATA IN HADOOP Current state of security, approach for comprehensive strategy
  • 2. 1 CONTENTS Introduction ........................................................................................................................................................ 2 Big data- What is happening?........................................................................................................................ 2 Hadoop- Security .............................................................................................................................................. 3 Current Hadoop Security Features/Initiatives .......................................................................................... 3 Work to be done in Hadoop ....................................................................................................................... 4 XA Secure - Big Data Security Approach ..................................................................................................... 4 XA Secure differentiators ............................................................................................................................. 5 Summary ............................................................................................................................................................. 5 www.xasecure.com| +1.510.585.3289|7100 Stevenson Blvd Fremont CA 94538
  • 3. 2 INTRODUCTION Big data is emerging as the next technology wave and enterprises across different industries are adopting tools such as Hadoop. While there are efficiencies in processing varied and distributed data, big data presents a unique challenge for managing information security. BIG DATA- WHAT IS HAPPENING? Digital data is everywhere and global data is growing at 40% per year. Companies are capturing trillions of bytes of information about their customers, suppliers, and operations, and millions of networked sensors are being embedded in the physical world in devices such as mobile phones, energy meters and automobiles, sensing, creating, and communicating data. By collecting and analyzing all this information, companies can gain insight into new business opportunities and threats. To harness the ever expanding data volumes, new technologies have emerged to enable processing of massive sets of data in a technique called massive parallel processing (mpp). In a recent survey by Talend, it was found 60% of companies looking at big data are considering open source Apache Hadoop or Hadoop based distributions. From its initial development to supporting Yahoo’s increasing search and web management needs, Hadoop has emerged as the leading platform to support big data analytics applications. Hadoop software market itself is predicted to be around $813 million by 2016(IDC research). Enterprises are moving to a phase whether they have completed pilot or proof of concept work and embracing Hadoop to solve core business needs in production. At the same time, organizations are trying to analyze different kinds of data, from web logs, social media streams to sales and customer information to get better insights. With Hadoop, they are able to achieve this at a fraction of a cost compared to traditional data warehouses. There is a movement towards creating large data lakes or data hubs where enterprise wide can be stored and processed using Hadoop. Therein presents the risk of data security, as data moves from protected walls of enterprise applications to the kitchen sink called Big Data. Organizations need to provide the same level of security across their organization. Data within big data initiatives are no exception. www.xasecure.com| +1.510.585.3289|7100 Stevenson Blvd Fremont CA 94538
  • 4. 3 HADOOP- SECURITY Hadoop was developed to process massive amounts of disparate data using commodity hardware. From its initial success in Yahoo, it has matured as an application to support various verticals. However, the security controls inside Hadoop are very basic and still evolving. CURRENT HADOOP SECURITY FE ATURES/INITIATIVES Given the security challenges, there has been lot of work being undertaken within the open source and vendor community to enable Hadoop to be a more secure environment. We have summarized some of the important initiatives Kerberos Authentication: As one of the first steps towards security, Kerberos authentication was introduced in Hadoop in 2008 to add a basic level of security that was missing before and today it is the primary method for providing secure authentication in Hadoop. Kerberos is a computer network authentication protocol which works on the basis of "tickets" to allow nodes communicating over a non-secure network to prove their identity to one another in a secure manner. Kerberos authentication enables the MapReduce jobs or Namenode tracker in Hadoop to authenticate the user and enabling permissions based on that Access Control Lists (ACLs): In core HDFS, file permissions are similar to permission in a UNIX system. Read-write access is maintained for each user groups which are basically a string of characters. At the MapReduce level, which users can be used to submit jobs can be defined by MapReduce ACLs. The list of users groups can be maintained with the Hadoop layer or can be configured to get it from external LDAP or Active directory systems. HBase ACLs were introduced from HBase 0.92 onward and gives the ability to define authorization policy (Read/Write/Create/Admin), with table/family/qualifier granularity, for a specified user. Sentry (Cloudera): Cloudera recently introduced role-based authorization framework which provides access to user and groups over Hive and Cloudera’s Impala. The authorization framework uses a file based policy provider and can be configured at multiple levels i.e., server, database, table, column etc. Project Knox (Hortonworks): Project Knox from Hortonworks is currently focused on providing a gateway to the Hadoop clusters, to provide a single point of authentication and access for Apache Hadoop services in a cluster. Features planned include providing perimeter security for Hadoop, single cluster end point for data and jobs, management of security across multiple clusters and Hadoop versions among other areas. The initiative, started in 2013, has already delivered couple of releases www.xasecure.com| +1.510.585.3289|7100 Stevenson Blvd Fremont CA 94538
  • 5. 4 WORK TO BE DONE IN H ADOOP There is a long way to go before Hadoop can meet the exacting security standards in large enterprises. Despite the current work, there are still some challenges for CIOs and CISOs adopting the Hadoop stack, including       No framework for managing enterprise policies. Large enterprises have complex and constantly evolving policies for managing data access. The native Hadoop framework does not offer an easy framework for customizing and managing employee policies. Fine grained authorization. The current authorization lets user or user groups get access to tables or file systems/directories as a whole. Enterprises are looking for more fine grained authorization to ensure sensitive data is protected from access while still be able to analyze complete set of data and leveraging its full potential Decentralizing data ownership. As the use of Hadoop expands in the organization, business units would still want to retain control of their data and provide access themselves to users from other units. Lack of uniform authorization method. While HBase uses ACLs for managing authorization, HDFS nodes refer to its own set of groups defined for vetting authorization. Enterprises are looking for a universal process for authorization across all components. Lack of universal audit control mechanism. Currently each component is built to have its own audit tracking mechanism and there is no uniformity in elements tracked or format of the audit log. Enterprise are looking for easy way of reporting access history of their employees Lack of reporting and governance capabilities. Enterprises would need tools to readily report policy status, access history and check compliance conformance across various assets. XA SECURE - BIG DATA SECURITY APPROACH At XA Secure, we recognize these challenges for Hadoop and other big data tools, and are trying to solve them through our solution offerings. Our initial product is completely built ground up for the big data infrastructure. We are trying to address some of the security challenges with Hadoop infrastructure by providing a governance layer to enable a) Centralized policy management with ability to define policies for fine grained access controls to files (HDFS), column families, cells (Hbase, Hive) etc, Differentiated views of data based on user function b) Protect sensitive data through masking and encryption www.xasecure.com| +1.510.585.3289|7100 Stevenson Blvd Fremont CA 94538
  • 6. 5 c) Common extensive audit layer across Hadoop components. Audit can be set at resource and user group level. d) Delegated administration of data e) Policy analytics to monitor and report access, enable compliance conformance The tool is currently built over HBase, Hive and HDFS components with planned incorporation of other big data tools, such as Greenplum, Mongo DB, in the future releases. XA SECURE DIFFERENTIATORS As noted before, there is a lot of work being done in making Hadoop more secure and at XA Secure, we continue to work with the open source community in leveraging the collective work and delivering value to our customers. As a company with a rich history in security and identity management, and being purely focused on big data, we believe we bring unique value proposition through our offerings, which includes a) An end to end complete access management and governance suite over Hadoop. We focus on making it easier for both business users as well as administrators to manage data security over Hadoop b) Distribution agnostic solution. We support most of the prevalent Hadoop distributions and can easily integrate into management tools that come as part of the distribution c) Hooks to integrate with enterprise’s existing provisioning or access management systems. We currently integrate with LDAP, and also support import and export of our policies d) Industry specific compliance and audit reports. We are building support for government, financial and healthcare compliance requirements. e) Leverage and built over current open sources efforts on authentication and encryption. We will continue to embed other open source initiatives as they are released SUMMARY The big data ecosystem is evolving and there are a lot of initiatives in the open source and vendor community for building mature capabilities, It is important that enterprises embed security strategy as part of their plan early and think about what data they would put into big data tools and how they are going to extend the security controls over the data. CISOs can choose to adopt XA Secure’s solution to provide enterprise level security and credibility to their big data initiatives. www.xasecure.com| +1.510.585.3289|7100 Stevenson Blvd Fremont CA 94538