SlideShare une entreprise Scribd logo
1  sur  7
Télécharger pour lire hors ligne
A Dell Big data White paper 
Security for Big Data 
A Dell Big Data White Paper 
By Joey Jablonski
A Dell Big Data White Paper: Security for Big Data 
2 
Security for Big Data 
As more and more companies deploy big data technologies, including Apache Hadoop[1], Cassandra[2] and other related technologies, security is becoming more critical. As big data technologies become more mainstream, it is critical that they are deployed with the same safeguards, auditing and protection capabilities inherent in existing IT platforms such as BI tools, RDBMS platforms and data storage platforms. Because of the relative new-ness of big data platforms, the security community is working rapidly to create the necessary capabilities for seamless integration into existing security frameworks. Gaps do still exist, but are rapidly being closed. These gaps demand focus by organizations as they manage risk related to security and access of corporate data assets. 
Planning security for a big data solution is similar to that of other data-centric platforms deployed by many IT departments. The striking difference is two-fold: 
1) Maturity of the technology – Many Big data technologies are less then 10 years old, while existing relational database management systems have been around for 20+ years. This lifecycle leads to fewer features and less flexibility in how big data platforms are deployed and integrated with existing systems. 
2) Pan-organizational data in a single platform – Big data platforms, like Hadoop, commonly become an integration point across other existing systems. This leads Hadoop to store data sets together that had not previously been combined. This combined data can introduce new risks around who accesses it, when, and the level of risk associated with data that is more detailed than other data silos. 
Big Data Security Challenges 
Figure 1 - Todays Big Data Security Challenge 
The Problem Today
 Movement,	Analysis,	Presentaon	 Security	Magic
A Dell Big Data White Paper: Security for Big Data 
3 
As shown in Figure 1, many big data projects today are plagued by struggles with integration of the modern technology platforms into existing mechanisms for access control, authentication and authorization. Today’s big data technologies, including platforms like Hadoop and Cassandra, have the ability to be integrated with existing identity and access control systems like Lightweight Directory Access Protocol (LDAP), Kerberos and Active Directory. All newly deployed platforms should leverage existing implementations for access control to ensure uniformity across silos that store and house data. 
Often times new big data deployments demand that organizations make a tradeoff from their existing security policies, and new technologies that have not matured to the point of being able to implement access controls as a company has defined or prefers. It is important that each organization weighs their own risk profile in this situation and determines if they are best fit to deploy this new technology for organizational benefit, or if the risk of data compromise is too great that deployment of the technology should wait until it has matured further. 
Big Data Design Considerations 
Figure 2 - Big Data Design Considerations
A Dell Big Data White Paper: Security for Big Data 
4 
Figure 2 outlines the key considerations for designing any big data environment. There are four key areas of design consideration, all equally weighted in a final solution deployment: 
 Performance – Any system for analyzing data and enabling decisions must be designed with performance in mind, this ensures that all users get access to data in the time needed to make effective decisions that positively affect the direction of the organization. 
 Compliance – Compliance is a key component of all big data system designs. Compliance is the assurance that defined policies can be reported on and adverse effects acted upon. 
 Security – Security, and the key focus of this white paper, includes the controls, safeguards and supporting technologies to control access to data and restrict it to authorized users, applications and processes. 
 Access – Access is the alignment of user preferences with supplied technology for the use and presentation of the data stored in the big data solutions and platforms. Access is about ensuring the right user interfaces are available for consumption of the data. 
Change of Paradigms in Big Data 
Today’s big data platforms, like Hadoop, have the ability to integrate a multitude of data from different sources and different types. Particularly in Hadoop, it is common to have a mix of structured and unstructured data, as well as primary copies of information supplemented by secondary copies from other systems. This mix introduces specific challenges that must be addressed in system design and implementation: 
 Analyze versus view – Many traditional data systems were responsible for responding to user requests to provide access to data, very little analysis or transformation occurred. Today, many big data platforms are asked to both analyze data and present information. Access controls can be different for these activities, it may be that a user has the appropriate access to analyze a data set of de-identified data, but does not have the ability to view the individual records of that dataset. Big data implementations should account for this difference and ensure users can properly analyze data, while not exposing details that are unnecessary for carrying out the business decisions from that information. One method to enable consistent views of data is through “tokenization.” Tokenization in data security is the process of replacing sensitive data with unique identification symbols that retain all the essential information about the data without compromising its security. Tools like Accumulo [3] enable all data to be tagged, as well as users and access carried throughout the process of encryption, decryption, analysis and presentation. 
 Validation of Input, Validation of Results – Many big data platforms pull data from a large variety of sources. These sources can contain varied qualities of data that is then used for analysis and processing. All systems should have checkpoints in all workflows to ensure ingested data is of high quality and valid data. Invalid data should be discarded including data from unvalidated sources so that results are not adversely affected by users that do not have clear visibility into data sources. 
 Data with a date component – Much of today’s data created has a date component and often that date component can affect access levels. Financial information is the most common, the current quarter of financials is often much more sensitive because it cannot be shared publicly except under
A Dell Big Data White Paper: Security for Big Data 
5 
specific conditions. Big data platforms should always factor in creation dates as part of tracking data lineage and access. 
 Data + Data - Traditional data systems like databases were very effective at isolating data types through the use of tables and instances. Modern big data platforms do not have this separation, leading to instances where two pieces of data, when combined, require a higher level of protection than the individual components. This Data + Data problem, leads to organizations having to be conscious of how people access and combine data, and leads IT departments to think about how data will be exposed to end users and what access controls are required to protect data at the higher levels required. 
 Documents with mixed data levels – Many documents today stored in big data environments continue multiple different levels of information that should be protected at different levels. Because these documents are unstructured, it is challenging to identify which data is of which classification level. The most common methods for protecting this information is to protect documents at the highest level of data they contain, but also creating replica copies of the files that are manually scrubbed of confidential information to allow wider access to a subset of the file. 
 System Integration – As more and more systems become integrated with data flowing among them, it is critical that data models and access controls be consistent across systems. This ensures uniformity in data access for users that access similar data from different systems. 
 Single View across access types – Tools like Hadoop provide many different methods for accessing data, some of these tools include Apache Hive[4], Apache HBase[5] and Apache Pig[6]. Each of these tools has the ability to create and manage independent sets of access controls. Tools like Apache Sentry[7] enable a uniform set of access controls across tools; this ensures lower risk data access, easier management of data access and uniform auditing of data access by users. 
Figure 3 - Big Data Security Best Practices 
Security	 context	needs	 to	be	sent	with	 the	data	 Idenes	 need	to	be	 consistent	 Data	+	Data	 has	to	be	 managed	 ETL	 ETL	 ETL	 Public	Data	Public	Data	Restricted	 Data
A Dell Big Data White Paper: Security for Big Data 
6 
As shown in Figure 3, Dell recommends three best practices for designing security implementations for big data solutions: 
 Security Context – It is recommended that any time data is transmitted between systems that the context of that data is carried as well, including access policies, data lineage and aging policies. 
 Identities – Identities should be managed across an organization with a centralized identify database used for verifying access to data in big data platforms. This ensures uniform access and compliance reporting across data silos in an organization. 
 Data + Data – As part of the data access policies, a record of data sets and the implications of their integration should be tracked and updated on a regular basis. This record can be used to modify access controls on the fly and report on possible risks created by integrating data in new platforms and systems. 
In addition to the above, it is critical that organizations standardize across all departments and teams on how to tag data. A standard tagging mechanism ensures that access controls and auditing can be applied uniformly. Users are the best source of this tagging as they can provide feedback real-time on data they create or access that is untagged. 
Compliance is an important part of the security of data. Strong tools should be deployed as part of any big data solution to ensure that all data access and use can be reported on, and alerts generated for inappropriate data access. As data sets become more complex and more desperate data sets are integrated, ensuring compliance will become more difficult, but can be managed if data is integrated in steps, rather then all at once. 
Security is a key component of all big data projects. All solution designs should encompass Performance, Access, Compliance and Security. Security should be defined at all levels of the system implementation and account for both at-rest and in-flight data. Big data systems introduce new challenges for security that must be accounted for including data + data policies and the handling of documents that may contain multiple levels of security. All big data projects should start small, with low risk data being the focal point; this enables organizations to get comfortable with new technologies as well as how best to ensure it can be implemented to conform to corporate security policies.
A Dell Big Data White Paper: Security for Big Data 
7 
References 
[1] Apache Hadoop - http://hadoop.apache.org/ 
[2] Cassandra - http://cassandra.apache.org/ 
[3] Accumulo - https://accumulo.apache.org/ 
[4] Apache Hive - https://hive.apache.org/ 
[5] Apache hbase - http://hbase.apache.org/ 
[6] Apache Pig - http://pig.apache.org/ 
[7] Apache Sentry - http://sentry.incubator.apache.org/ 
To learn more 
To learn more about Dell big data solutions, contact your Dell representative or visit: 
www.Dell.com/bigdata www.DellBigData.com 
©2014 Dell Inc. All rights reserved. Trademarks and trade names may be used in this document to refer to either the entities claiming the marks and names or their products. Specifications are correct at date of publication but are subject to availability or change without notice at any time. Dell and its affiliates cannot be responsible for errors or omissions in typography or photography. Dell’s Terms and Conditions of Sales and Service apply and are available on request. Dell service offerings do not affect consumer’s statutory rights. 
Dell, the DELL logo, and the DELL badge, PowerConnect, and PowerVault are trademarks of Dell Inc.

Contenu connexe

Tendances

Big Data Systems: Past, Present & (Possibly) Future with @techmilind
Big Data Systems: Past, Present &  (Possibly) Future with @techmilindBig Data Systems: Past, Present &  (Possibly) Future with @techmilind
Big Data Systems: Past, Present & (Possibly) Future with @techmilindEMC
 
Information economics and big data
Information economics and big dataInformation economics and big data
Information economics and big dataMark Albala
 
Data Virtualization: An Introduction
Data Virtualization: An IntroductionData Virtualization: An Introduction
Data Virtualization: An IntroductionDenodo
 
Matthew Johnston - Big Data Futures Outlook BCM
Matthew Johnston - Big Data Futures Outlook BCMMatthew Johnston - Big Data Futures Outlook BCM
Matthew Johnston - Big Data Futures Outlook BCMHoi Lan Leong
 
2010 07 BSidesLV Mobilizing The PCI Resistance 1c
2010 07 BSidesLV Mobilizing The PCI Resistance 1c2010 07 BSidesLV Mobilizing The PCI Resistance 1c
2010 07 BSidesLV Mobilizing The PCI Resistance 1cGene Kim
 
Mobile Data Analytics
Mobile Data AnalyticsMobile Data Analytics
Mobile Data AnalyticsRICHARD AMUOK
 
A Survey on Big Data Mining Challenges
A Survey on Big Data Mining ChallengesA Survey on Big Data Mining Challenges
A Survey on Big Data Mining ChallengesEditor IJMTER
 
Myth Busters: I’m Building a Data Lake, So I Don’t Need Data Virtualization (...
Myth Busters: I’m Building a Data Lake, So I Don’t Need Data Virtualization (...Myth Busters: I’m Building a Data Lake, So I Don’t Need Data Virtualization (...
Myth Busters: I’m Building a Data Lake, So I Don’t Need Data Virtualization (...Denodo
 
Intro to big data and applications -day 3
Intro to big data and applications -day 3Intro to big data and applications -day 3
Intro to big data and applications -day 3Parviz Vakili
 
IT Solutions for 3 Common Small Business Problems
IT Solutions for 3 Common Small Business ProblemsIT Solutions for 3 Common Small Business Problems
IT Solutions for 3 Common Small Business ProblemsBrooke Bordelon
 
Data Virtualization enabled Data Fabric: Operationalize the Data Lake (APAC)
Data Virtualization enabled Data Fabric: Operationalize the Data Lake (APAC)Data Virtualization enabled Data Fabric: Operationalize the Data Lake (APAC)
Data Virtualization enabled Data Fabric: Operationalize the Data Lake (APAC)Denodo
 
Logical Data Fabric: Architectural Components
Logical Data Fabric: Architectural ComponentsLogical Data Fabric: Architectural Components
Logical Data Fabric: Architectural ComponentsDenodo
 
Halloween Infographic
Halloween InfographicHalloween Infographic
Halloween InfographicNetAppUK
 
Data Services Marketplace
Data Services MarketplaceData Services Marketplace
Data Services MarketplaceDenodo
 
A Journey to the Cloud with Data Virtualization
A Journey to the Cloud with Data VirtualizationA Journey to the Cloud with Data Virtualization
A Journey to the Cloud with Data VirtualizationDenodo
 
GDPR Compliance Made Easy with Data Virtualization
GDPR Compliance Made Easy with Data VirtualizationGDPR Compliance Made Easy with Data Virtualization
GDPR Compliance Made Easy with Data VirtualizationDenodo
 
Big data security and privacy issues in the
Big data security and privacy issues in theBig data security and privacy issues in the
Big data security and privacy issues in theIJNSA Journal
 
A COMPARATIVE STUDY OF NOSQL SYSTEM VULNERABILITIES WITH BIG DATA
A COMPARATIVE STUDY OF NOSQL SYSTEM VULNERABILITIES WITH BIG DATAA COMPARATIVE STUDY OF NOSQL SYSTEM VULNERABILITIES WITH BIG DATA
A COMPARATIVE STUDY OF NOSQL SYSTEM VULNERABILITIES WITH BIG DATAIJMIT JOURNAL
 
A COMPARATIVE STUDY OF NOSQL SYSTEM VULNERABILITIES WITH BIG DATA
A COMPARATIVE STUDY OF NOSQL SYSTEM VULNERABILITIES WITH BIG DATAA COMPARATIVE STUDY OF NOSQL SYSTEM VULNERABILITIES WITH BIG DATA
A COMPARATIVE STUDY OF NOSQL SYSTEM VULNERABILITIES WITH BIG DATAIJMIT JOURNAL
 

Tendances (20)

Big Data Systems: Past, Present & (Possibly) Future with @techmilind
Big Data Systems: Past, Present &  (Possibly) Future with @techmilindBig Data Systems: Past, Present &  (Possibly) Future with @techmilind
Big Data Systems: Past, Present & (Possibly) Future with @techmilind
 
Information economics and big data
Information economics and big dataInformation economics and big data
Information economics and big data
 
Data Virtualization: An Introduction
Data Virtualization: An IntroductionData Virtualization: An Introduction
Data Virtualization: An Introduction
 
Matthew Johnston - Big Data Futures Outlook BCM
Matthew Johnston - Big Data Futures Outlook BCMMatthew Johnston - Big Data Futures Outlook BCM
Matthew Johnston - Big Data Futures Outlook BCM
 
2010 07 BSidesLV Mobilizing The PCI Resistance 1c
2010 07 BSidesLV Mobilizing The PCI Resistance 1c2010 07 BSidesLV Mobilizing The PCI Resistance 1c
2010 07 BSidesLV Mobilizing The PCI Resistance 1c
 
Mobile Data Analytics
Mobile Data AnalyticsMobile Data Analytics
Mobile Data Analytics
 
A Survey on Big Data Mining Challenges
A Survey on Big Data Mining ChallengesA Survey on Big Data Mining Challenges
A Survey on Big Data Mining Challenges
 
Myth Busters: I’m Building a Data Lake, So I Don’t Need Data Virtualization (...
Myth Busters: I’m Building a Data Lake, So I Don’t Need Data Virtualization (...Myth Busters: I’m Building a Data Lake, So I Don’t Need Data Virtualization (...
Myth Busters: I’m Building a Data Lake, So I Don’t Need Data Virtualization (...
 
Intro to big data and applications -day 3
Intro to big data and applications -day 3Intro to big data and applications -day 3
Intro to big data and applications -day 3
 
IT Solutions for 3 Common Small Business Problems
IT Solutions for 3 Common Small Business ProblemsIT Solutions for 3 Common Small Business Problems
IT Solutions for 3 Common Small Business Problems
 
Data Virtualization enabled Data Fabric: Operationalize the Data Lake (APAC)
Data Virtualization enabled Data Fabric: Operationalize the Data Lake (APAC)Data Virtualization enabled Data Fabric: Operationalize the Data Lake (APAC)
Data Virtualization enabled Data Fabric: Operationalize the Data Lake (APAC)
 
iri-highres
iri-highresiri-highres
iri-highres
 
Logical Data Fabric: Architectural Components
Logical Data Fabric: Architectural ComponentsLogical Data Fabric: Architectural Components
Logical Data Fabric: Architectural Components
 
Halloween Infographic
Halloween InfographicHalloween Infographic
Halloween Infographic
 
Data Services Marketplace
Data Services MarketplaceData Services Marketplace
Data Services Marketplace
 
A Journey to the Cloud with Data Virtualization
A Journey to the Cloud with Data VirtualizationA Journey to the Cloud with Data Virtualization
A Journey to the Cloud with Data Virtualization
 
GDPR Compliance Made Easy with Data Virtualization
GDPR Compliance Made Easy with Data VirtualizationGDPR Compliance Made Easy with Data Virtualization
GDPR Compliance Made Easy with Data Virtualization
 
Big data security and privacy issues in the
Big data security and privacy issues in theBig data security and privacy issues in the
Big data security and privacy issues in the
 
A COMPARATIVE STUDY OF NOSQL SYSTEM VULNERABILITIES WITH BIG DATA
A COMPARATIVE STUDY OF NOSQL SYSTEM VULNERABILITIES WITH BIG DATAA COMPARATIVE STUDY OF NOSQL SYSTEM VULNERABILITIES WITH BIG DATA
A COMPARATIVE STUDY OF NOSQL SYSTEM VULNERABILITIES WITH BIG DATA
 
A COMPARATIVE STUDY OF NOSQL SYSTEM VULNERABILITIES WITH BIG DATA
A COMPARATIVE STUDY OF NOSQL SYSTEM VULNERABILITIES WITH BIG DATAA COMPARATIVE STUDY OF NOSQL SYSTEM VULNERABILITIES WITH BIG DATA
A COMPARATIVE STUDY OF NOSQL SYSTEM VULNERABILITIES WITH BIG DATA
 

Similaire à Security for Big Data

REAL-TIME INTRUSION DETECTION SYSTEM FOR BIG DATA
REAL-TIME INTRUSION DETECTION SYSTEM FOR BIG DATAREAL-TIME INTRUSION DETECTION SYSTEM FOR BIG DATA
REAL-TIME INTRUSION DETECTION SYSTEM FOR BIG DATAijp2p
 
Security issues in big data
Security issues in big data Security issues in big data
Security issues in big data Shallote Dsouza
 
Scope of Data Integration
Scope of Data IntegrationScope of Data Integration
Scope of Data IntegrationHEXANIKA
 
Big data security
Big data securityBig data security
Big data securityAnne ndolo
 
Big data security
Big data securityBig data security
Big data securityAnne ndolo
 
High level view of cloud security
High level view of cloud securityHigh level view of cloud security
High level view of cloud securitycsandit
 
HIGH LEVEL VIEW OF CLOUD SECURITY: ISSUES AND SOLUTIONS
HIGH LEVEL VIEW OF CLOUD SECURITY: ISSUES AND SOLUTIONSHIGH LEVEL VIEW OF CLOUD SECURITY: ISSUES AND SOLUTIONS
HIGH LEVEL VIEW OF CLOUD SECURITY: ISSUES AND SOLUTIONScscpconf
 
DOCUMENT SELECTION USING MAPREDUCE
DOCUMENT SELECTION USING MAPREDUCEDOCUMENT SELECTION USING MAPREDUCE
DOCUMENT SELECTION USING MAPREDUCEijsptm
 
BLOCKCHAIN BASED DATA SECURITY AS A SERVICE IN CLOUD PLATFORM SECURITY
BLOCKCHAIN BASED DATA SECURITY AS A SERVICE IN CLOUD PLATFORM SECURITYBLOCKCHAIN BASED DATA SECURITY AS A SERVICE IN CLOUD PLATFORM SECURITY
BLOCKCHAIN BASED DATA SECURITY AS A SERVICE IN CLOUD PLATFORM SECURITYijccsa
 
CRITICISMS OF THE FUTURE AVAILABILITY IN SUSTAINABLE GENDER GOAL, ACCESS TO L...
CRITICISMS OF THE FUTURE AVAILABILITY IN SUSTAINABLE GENDER GOAL, ACCESS TO L...CRITICISMS OF THE FUTURE AVAILABILITY IN SUSTAINABLE GENDER GOAL, ACCESS TO L...
CRITICISMS OF THE FUTURE AVAILABILITY IN SUSTAINABLE GENDER GOAL, ACCESS TO L...csijjournal
 
Blockchain based Data Security as a Service in Cloud Platform Security
Blockchain based Data Security as a Service in Cloud Platform SecurityBlockchain based Data Security as a Service in Cloud Platform Security
Blockchain based Data Security as a Service in Cloud Platform Securityijccsa
 
BLOCKCHAIN BASED DATA SECURITY AS A SERVICE IN CLOUD PLATFORM SECURITY
BLOCKCHAIN BASED DATA SECURITY AS A SERVICE IN CLOUD PLATFORM SECURITYBLOCKCHAIN BASED DATA SECURITY AS A SERVICE IN CLOUD PLATFORM SECURITY
BLOCKCHAIN BASED DATA SECURITY AS A SERVICE IN CLOUD PLATFORM SECURITYijccsa
 
Isaca journal - bridging the gap between access and security in big data...
Isaca journal  - bridging the gap between access and security in big data...Isaca journal  - bridging the gap between access and security in big data...
Isaca journal - bridging the gap between access and security in big data...Ulf Mattsson
 
Intro to Big Data Hadoop
Intro to Big Data HadoopIntro to Big Data Hadoop
Intro to Big Data HadoopApache Apex
 
Global Data Management: Governance, Security and Usefulness in a Hybrid World
Global Data Management: Governance, Security and Usefulness in a Hybrid WorldGlobal Data Management: Governance, Security and Usefulness in a Hybrid World
Global Data Management: Governance, Security and Usefulness in a Hybrid WorldNeil Raden
 
Big Data Processing with Hadoop : A Review
Big Data Processing with Hadoop : A ReviewBig Data Processing with Hadoop : A Review
Big Data Processing with Hadoop : A ReviewIRJET Journal
 
Characterizing and Processing of Big Data Using Data Mining Techniques
Characterizing and Processing of Big Data Using Data Mining TechniquesCharacterizing and Processing of Big Data Using Data Mining Techniques
Characterizing and Processing of Big Data Using Data Mining TechniquesIJTET Journal
 

Similaire à Security for Big Data (20)

REAL-TIME INTRUSION DETECTION SYSTEM FOR BIG DATA
REAL-TIME INTRUSION DETECTION SYSTEM FOR BIG DATAREAL-TIME INTRUSION DETECTION SYSTEM FOR BIG DATA
REAL-TIME INTRUSION DETECTION SYSTEM FOR BIG DATA
 
Security issues in big data
Security issues in big data Security issues in big data
Security issues in big data
 
Scope of Data Integration
Scope of Data IntegrationScope of Data Integration
Scope of Data Integration
 
Unit 1
Unit 1Unit 1
Unit 1
 
Wp security-data-safe
Wp security-data-safeWp security-data-safe
Wp security-data-safe
 
Research paper on big data and hadoop
Research paper on big data and hadoopResearch paper on big data and hadoop
Research paper on big data and hadoop
 
Big data security
Big data securityBig data security
Big data security
 
Big data security
Big data securityBig data security
Big data security
 
High level view of cloud security
High level view of cloud securityHigh level view of cloud security
High level view of cloud security
 
HIGH LEVEL VIEW OF CLOUD SECURITY: ISSUES AND SOLUTIONS
HIGH LEVEL VIEW OF CLOUD SECURITY: ISSUES AND SOLUTIONSHIGH LEVEL VIEW OF CLOUD SECURITY: ISSUES AND SOLUTIONS
HIGH LEVEL VIEW OF CLOUD SECURITY: ISSUES AND SOLUTIONS
 
DOCUMENT SELECTION USING MAPREDUCE
DOCUMENT SELECTION USING MAPREDUCEDOCUMENT SELECTION USING MAPREDUCE
DOCUMENT SELECTION USING MAPREDUCE
 
BLOCKCHAIN BASED DATA SECURITY AS A SERVICE IN CLOUD PLATFORM SECURITY
BLOCKCHAIN BASED DATA SECURITY AS A SERVICE IN CLOUD PLATFORM SECURITYBLOCKCHAIN BASED DATA SECURITY AS A SERVICE IN CLOUD PLATFORM SECURITY
BLOCKCHAIN BASED DATA SECURITY AS A SERVICE IN CLOUD PLATFORM SECURITY
 
CRITICISMS OF THE FUTURE AVAILABILITY IN SUSTAINABLE GENDER GOAL, ACCESS TO L...
CRITICISMS OF THE FUTURE AVAILABILITY IN SUSTAINABLE GENDER GOAL, ACCESS TO L...CRITICISMS OF THE FUTURE AVAILABILITY IN SUSTAINABLE GENDER GOAL, ACCESS TO L...
CRITICISMS OF THE FUTURE AVAILABILITY IN SUSTAINABLE GENDER GOAL, ACCESS TO L...
 
Blockchain based Data Security as a Service in Cloud Platform Security
Blockchain based Data Security as a Service in Cloud Platform SecurityBlockchain based Data Security as a Service in Cloud Platform Security
Blockchain based Data Security as a Service in Cloud Platform Security
 
BLOCKCHAIN BASED DATA SECURITY AS A SERVICE IN CLOUD PLATFORM SECURITY
BLOCKCHAIN BASED DATA SECURITY AS A SERVICE IN CLOUD PLATFORM SECURITYBLOCKCHAIN BASED DATA SECURITY AS A SERVICE IN CLOUD PLATFORM SECURITY
BLOCKCHAIN BASED DATA SECURITY AS A SERVICE IN CLOUD PLATFORM SECURITY
 
Isaca journal - bridging the gap between access and security in big data...
Isaca journal  - bridging the gap between access and security in big data...Isaca journal  - bridging the gap between access and security in big data...
Isaca journal - bridging the gap between access and security in big data...
 
Intro to Big Data Hadoop
Intro to Big Data HadoopIntro to Big Data Hadoop
Intro to Big Data Hadoop
 
Global Data Management: Governance, Security and Usefulness in a Hybrid World
Global Data Management: Governance, Security and Usefulness in a Hybrid WorldGlobal Data Management: Governance, Security and Usefulness in a Hybrid World
Global Data Management: Governance, Security and Usefulness in a Hybrid World
 
Big Data Processing with Hadoop : A Review
Big Data Processing with Hadoop : A ReviewBig Data Processing with Hadoop : A Review
Big Data Processing with Hadoop : A Review
 
Characterizing and Processing of Big Data Using Data Mining Techniques
Characterizing and Processing of Big Data Using Data Mining TechniquesCharacterizing and Processing of Big Data Using Data Mining Techniques
Characterizing and Processing of Big Data Using Data Mining Techniques
 

Plus de Joey Jablonski

PCA26 - Product Management in IT
PCA26 - Product Management in ITPCA26 - Product Management in IT
PCA26 - Product Management in ITJoey Jablonski
 
Redefining Security for Big Data - Cassandra Summit 2013
Redefining Security for Big Data - Cassandra Summit 2013Redefining Security for Big Data - Cassandra Summit 2013
Redefining Security for Big Data - Cassandra Summit 2013Joey Jablonski
 
SNIA 2012 - Creating an Enterprise Hadoop Platform
SNIA 2012 - Creating an Enterprise Hadoop PlatformSNIA 2012 - Creating an Enterprise Hadoop Platform
SNIA 2012 - Creating an Enterprise Hadoop PlatformJoey Jablonski
 
Hadoop in the Enterprise
Hadoop in the EnterpriseHadoop in the Enterprise
Hadoop in the EnterpriseJoey Jablonski
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to HadoopJoey Jablonski
 

Plus de Joey Jablonski (8)

PCA26 - Product Management in IT
PCA26 - Product Management in ITPCA26 - Product Management in IT
PCA26 - Product Management in IT
 
Big Data for Security
Big Data for SecurityBig Data for Security
Big Data for Security
 
Virtualized Hadoop
Virtualized HadoopVirtualized Hadoop
Virtualized Hadoop
 
Redefining Security for Big Data - Cassandra Summit 2013
Redefining Security for Big Data - Cassandra Summit 2013Redefining Security for Big Data - Cassandra Summit 2013
Redefining Security for Big Data - Cassandra Summit 2013
 
SNIA 2012 - Creating an Enterprise Hadoop Platform
SNIA 2012 - Creating an Enterprise Hadoop PlatformSNIA 2012 - Creating an Enterprise Hadoop Platform
SNIA 2012 - Creating an Enterprise Hadoop Platform
 
Hadoop Business Cases
Hadoop Business CasesHadoop Business Cases
Hadoop Business Cases
 
Hadoop in the Enterprise
Hadoop in the EnterpriseHadoop in the Enterprise
Hadoop in the Enterprise
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
 

Dernier

Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch TuesdayIvanti
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityIES VE
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfNeo4j
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rick Flair
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...panagenda
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI AgeCprime
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 

Dernier (20)

Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch Tuesday
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI Age
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 

Security for Big Data

  • 1. A Dell Big data White paper Security for Big Data A Dell Big Data White Paper By Joey Jablonski
  • 2. A Dell Big Data White Paper: Security for Big Data 2 Security for Big Data As more and more companies deploy big data technologies, including Apache Hadoop[1], Cassandra[2] and other related technologies, security is becoming more critical. As big data technologies become more mainstream, it is critical that they are deployed with the same safeguards, auditing and protection capabilities inherent in existing IT platforms such as BI tools, RDBMS platforms and data storage platforms. Because of the relative new-ness of big data platforms, the security community is working rapidly to create the necessary capabilities for seamless integration into existing security frameworks. Gaps do still exist, but are rapidly being closed. These gaps demand focus by organizations as they manage risk related to security and access of corporate data assets. Planning security for a big data solution is similar to that of other data-centric platforms deployed by many IT departments. The striking difference is two-fold: 1) Maturity of the technology – Many Big data technologies are less then 10 years old, while existing relational database management systems have been around for 20+ years. This lifecycle leads to fewer features and less flexibility in how big data platforms are deployed and integrated with existing systems. 2) Pan-organizational data in a single platform – Big data platforms, like Hadoop, commonly become an integration point across other existing systems. This leads Hadoop to store data sets together that had not previously been combined. This combined data can introduce new risks around who accesses it, when, and the level of risk associated with data that is more detailed than other data silos. Big Data Security Challenges Figure 1 - Todays Big Data Security Challenge The Problem Today Movement, Analysis, Presentaon Security Magic
  • 3. A Dell Big Data White Paper: Security for Big Data 3 As shown in Figure 1, many big data projects today are plagued by struggles with integration of the modern technology platforms into existing mechanisms for access control, authentication and authorization. Today’s big data technologies, including platforms like Hadoop and Cassandra, have the ability to be integrated with existing identity and access control systems like Lightweight Directory Access Protocol (LDAP), Kerberos and Active Directory. All newly deployed platforms should leverage existing implementations for access control to ensure uniformity across silos that store and house data. Often times new big data deployments demand that organizations make a tradeoff from their existing security policies, and new technologies that have not matured to the point of being able to implement access controls as a company has defined or prefers. It is important that each organization weighs their own risk profile in this situation and determines if they are best fit to deploy this new technology for organizational benefit, or if the risk of data compromise is too great that deployment of the technology should wait until it has matured further. Big Data Design Considerations Figure 2 - Big Data Design Considerations
  • 4. A Dell Big Data White Paper: Security for Big Data 4 Figure 2 outlines the key considerations for designing any big data environment. There are four key areas of design consideration, all equally weighted in a final solution deployment:  Performance – Any system for analyzing data and enabling decisions must be designed with performance in mind, this ensures that all users get access to data in the time needed to make effective decisions that positively affect the direction of the organization.  Compliance – Compliance is a key component of all big data system designs. Compliance is the assurance that defined policies can be reported on and adverse effects acted upon.  Security – Security, and the key focus of this white paper, includes the controls, safeguards and supporting technologies to control access to data and restrict it to authorized users, applications and processes.  Access – Access is the alignment of user preferences with supplied technology for the use and presentation of the data stored in the big data solutions and platforms. Access is about ensuring the right user interfaces are available for consumption of the data. Change of Paradigms in Big Data Today’s big data platforms, like Hadoop, have the ability to integrate a multitude of data from different sources and different types. Particularly in Hadoop, it is common to have a mix of structured and unstructured data, as well as primary copies of information supplemented by secondary copies from other systems. This mix introduces specific challenges that must be addressed in system design and implementation:  Analyze versus view – Many traditional data systems were responsible for responding to user requests to provide access to data, very little analysis or transformation occurred. Today, many big data platforms are asked to both analyze data and present information. Access controls can be different for these activities, it may be that a user has the appropriate access to analyze a data set of de-identified data, but does not have the ability to view the individual records of that dataset. Big data implementations should account for this difference and ensure users can properly analyze data, while not exposing details that are unnecessary for carrying out the business decisions from that information. One method to enable consistent views of data is through “tokenization.” Tokenization in data security is the process of replacing sensitive data with unique identification symbols that retain all the essential information about the data without compromising its security. Tools like Accumulo [3] enable all data to be tagged, as well as users and access carried throughout the process of encryption, decryption, analysis and presentation.  Validation of Input, Validation of Results – Many big data platforms pull data from a large variety of sources. These sources can contain varied qualities of data that is then used for analysis and processing. All systems should have checkpoints in all workflows to ensure ingested data is of high quality and valid data. Invalid data should be discarded including data from unvalidated sources so that results are not adversely affected by users that do not have clear visibility into data sources.  Data with a date component – Much of today’s data created has a date component and often that date component can affect access levels. Financial information is the most common, the current quarter of financials is often much more sensitive because it cannot be shared publicly except under
  • 5. A Dell Big Data White Paper: Security for Big Data 5 specific conditions. Big data platforms should always factor in creation dates as part of tracking data lineage and access.  Data + Data - Traditional data systems like databases were very effective at isolating data types through the use of tables and instances. Modern big data platforms do not have this separation, leading to instances where two pieces of data, when combined, require a higher level of protection than the individual components. This Data + Data problem, leads to organizations having to be conscious of how people access and combine data, and leads IT departments to think about how data will be exposed to end users and what access controls are required to protect data at the higher levels required.  Documents with mixed data levels – Many documents today stored in big data environments continue multiple different levels of information that should be protected at different levels. Because these documents are unstructured, it is challenging to identify which data is of which classification level. The most common methods for protecting this information is to protect documents at the highest level of data they contain, but also creating replica copies of the files that are manually scrubbed of confidential information to allow wider access to a subset of the file.  System Integration – As more and more systems become integrated with data flowing among them, it is critical that data models and access controls be consistent across systems. This ensures uniformity in data access for users that access similar data from different systems.  Single View across access types – Tools like Hadoop provide many different methods for accessing data, some of these tools include Apache Hive[4], Apache HBase[5] and Apache Pig[6]. Each of these tools has the ability to create and manage independent sets of access controls. Tools like Apache Sentry[7] enable a uniform set of access controls across tools; this ensures lower risk data access, easier management of data access and uniform auditing of data access by users. Figure 3 - Big Data Security Best Practices Security context needs to be sent with the data Idenes need to be consistent Data + Data has to be managed ETL ETL ETL Public Data Public Data Restricted Data
  • 6. A Dell Big Data White Paper: Security for Big Data 6 As shown in Figure 3, Dell recommends three best practices for designing security implementations for big data solutions:  Security Context – It is recommended that any time data is transmitted between systems that the context of that data is carried as well, including access policies, data lineage and aging policies.  Identities – Identities should be managed across an organization with a centralized identify database used for verifying access to data in big data platforms. This ensures uniform access and compliance reporting across data silos in an organization.  Data + Data – As part of the data access policies, a record of data sets and the implications of their integration should be tracked and updated on a regular basis. This record can be used to modify access controls on the fly and report on possible risks created by integrating data in new platforms and systems. In addition to the above, it is critical that organizations standardize across all departments and teams on how to tag data. A standard tagging mechanism ensures that access controls and auditing can be applied uniformly. Users are the best source of this tagging as they can provide feedback real-time on data they create or access that is untagged. Compliance is an important part of the security of data. Strong tools should be deployed as part of any big data solution to ensure that all data access and use can be reported on, and alerts generated for inappropriate data access. As data sets become more complex and more desperate data sets are integrated, ensuring compliance will become more difficult, but can be managed if data is integrated in steps, rather then all at once. Security is a key component of all big data projects. All solution designs should encompass Performance, Access, Compliance and Security. Security should be defined at all levels of the system implementation and account for both at-rest and in-flight data. Big data systems introduce new challenges for security that must be accounted for including data + data policies and the handling of documents that may contain multiple levels of security. All big data projects should start small, with low risk data being the focal point; this enables organizations to get comfortable with new technologies as well as how best to ensure it can be implemented to conform to corporate security policies.
  • 7. A Dell Big Data White Paper: Security for Big Data 7 References [1] Apache Hadoop - http://hadoop.apache.org/ [2] Cassandra - http://cassandra.apache.org/ [3] Accumulo - https://accumulo.apache.org/ [4] Apache Hive - https://hive.apache.org/ [5] Apache hbase - http://hbase.apache.org/ [6] Apache Pig - http://pig.apache.org/ [7] Apache Sentry - http://sentry.incubator.apache.org/ To learn more To learn more about Dell big data solutions, contact your Dell representative or visit: www.Dell.com/bigdata www.DellBigData.com ©2014 Dell Inc. All rights reserved. Trademarks and trade names may be used in this document to refer to either the entities claiming the marks and names or their products. Specifications are correct at date of publication but are subject to availability or change without notice at any time. Dell and its affiliates cannot be responsible for errors or omissions in typography or photography. Dell’s Terms and Conditions of Sales and Service apply and are available on request. Dell service offerings do not affect consumer’s statutory rights. Dell, the DELL logo, and the DELL badge, PowerConnect, and PowerVault are trademarks of Dell Inc.