SlideShare une entreprise Scribd logo
1  sur  16
SECURITY ISSUES IN BIG
DATA
Shallote Dsouza
WHAT IS BIG DATA?
Big data refers to data that is so large and complex that it exceeds the processing
capability of conventional data management systems and software techniques.
Data becomes big data when individual data stops mattering and only a large
collection of it or analysis derived from it are of value
Offers many opportunities - advancement of science, improvement of health care,
promotion of economic growth, enhancement of education system and more ways
of social interaction and entertainment.
 But Big data has its issues of security and privacy too due to its huge volume,
high velocity, large variety in data sources and formats etc.
DIMENSIONS OF BIG DATA
Big Data possesses characteristics that can be defined by several V’s
Volume
Refers to quantity of data. Big data is defined as massive data sets with measures such as
petabytes and zeta bytes. Vast amounts of data are generated every second. Today big
data is generated by machines, networks and human interaction on systems like social
media. Volume of data to be analysed is massive.
Velocity
Deals with the accelerating speed at which data flows in from sources like business
processes, machines, networks like social media sites, mobile devices, etc. The flow of
data is continuous. Reacting quickly enough to deal with data velocity is a challenge for
most organizations.
Variety
Refers to various formats of data . Structured, numeric data in traditional databases.
Unstructured text documents, email, video, audio, stock ticker data and financial
Veracity
Refers to the quality of big data like biases, noise, abnormality of data, immeasurable
uncertainties and truthfulness and trustworthiness of data. Data that are erroneous,
duplicate and incomplete or outdated, as a whole are referred to as dirty data.
Valence
Refers to the connectedness of big data in the form of graphs just like atoms. Data items
are often directly connected to one another like a city is connected to its country. Two
Facebook users are connected as they are friends. A high valence data is denser.
Value
Refers to the fact how big data is going to benefit us and our organization. It helps in
measuring the usefulness of data in decision making. Queries can be run on the stored
data so as to deduce important results and gain insights
TOOLS FOR BIG DATA
Big Data storage and management tools
 Hadoop- Provides a software framework for distributed storage and processing of big
data using the Map Reduce programming model
 Cassandra- used for fast processing during very heavy writes and reads the environment
and stored data which is very large to fit on the server, but still want a friendly familiar
interface
MongoDB- used for dynamic queries, defining indexes for good performance on a big
database which makes applications faster and more efficiently at scale.
Apache Hive- Analysis of large datasets stored in HDFS. Also, used for data
summarization, query and ad-hoc analysis to process structured and semi-structured data
in Hadoop
Hbase- Used for real-time big data applications which contain billions of rows and
millions of columns in tables built for low latency operations
Cloudera- 100% open source and is the only Hadoop solution to offer batch processing,
interactive SQL and interactive search as well as enterprise-grade continuous availability.
TYPICAL BIG DATA ARCHITECTURE
Big data architecture varies based on a company's infrastructure and needs, but it usually
contains the following components:
1. Data sources: This can include data from databases, data from real-time sources, and
static files generated from applications, such as Windows logs.
2. Data store: Need storage for the data that will be processed via big data architecture.
Often, data will be stored in a data lake, which is a large unstructured database that
scales easily.
3. A combination of batch processing and real-time processing: Large volume of data
processed can be handled efficiently using batch processing, while real-time data
needs to be processed immediately to bring value.
4. Analytical data store: Helps keep all the data is in one place so analysis can be
comprehensive, and it is optimized for analysis rather than transactions. This might
take the form of a cloud-based data warehouse or a relational database
5. Automation: Ingesting and transforming the data, moving it in batches and stream
processes, loading it to an analytical data store, and finally deriving insights must be
in a repeatable workflow so that you can continually gain insights from your big data
GENERAL BIG DATA SECURITY
ISSUESInsecure Computation
Malicious programs are used by attackers to extract sensitive information from data
sources. This can also corrupt the data, leading to incorrect results in prediction or
analysis. It can also result into Denial of Services (DoS)
Input Validation and Filtering
Big Data collects inputs from multiple sources hence input validation is required. This
involves validating trusted data sources and filtering malicious data from the good one.
In big data gigabytes and terabytes of continuous data flow makes it really very difficult
to perform input validation or data filtering on the incoming batch of data.
Privacy Concerns in Data Mining and Analytics
Monetization of Big Data involves sharing of analytical results which involves multiple
challenges like invasion of privacy, invasive marketing and unintentional disclosure of
information. Quite a few examples of these include - AOL Inc. released search logs where
users could be identified easily, which was really concerning.
Granular Access Controls
Big data was traditionally designed with almost no security in mind. As a way out, the
parts of needed data sets, that users have right to see, are copied to a separate big data
warehouse and provided to particular user groups. For a medical research, only the
medical info (without the names, addresses) gets copied. Volumes of big data grow even
faster this way. Complex solutions adversely affect the system’s performance and
maintenance.
Insecure data storage
Authentication, authorization and encryption of data at thousands of nodes becomes a
challenging work. Auto–tiering moves cold data, which might be of use, to lesser secure
medium. Also encryption of real time data may have performance impacts. Secure
communication amongst various nodes, middlewares, and end users is disabled by
default, hence it needs to be enabled explicitly.
SECURITY ISSUES IN BIG DATA – SOME
RELEVANT USE CASES
Vulnerability to fake data generation
For instance, if a manufacturing company uses sensor data to detect malfunctioning
production processes, cybercriminals can penetrate the system and make the sensors
show fake results. The company can fail to notice alarming trends and miss the
opportunity to solve problems before serious damage is caused. Such challenges can be
solved through applying fraud detection approach.
Amazon’s Galaxy Data Lakes
Challenges faced by Amazon: data silos, difficulty analyzing diverse datasets, managing
data access and security.
1. A data silo is a situation wherein only one group in an organization can access a set of
data. Data is stored in different places and in different ways for international
expansion which keeps important data hidden. A data lake solves this problem by
uniting all the data into one central location.
2. Amazon Prime has data for fulfilment centres and packaged goods, while Amazon
Fresh has data for grocery stores and food. Even shipping programs differ
internationally. For example, different countries sometimes have different box sizes
and shapes. Different systems may also have the same type of information, but it’s
labeled differently. For example, in Europe, the term used is “cost per unit,” but in
North America, the term used is “cost per package.”
Data lakes allow you to import any amount of data in any format because there is no
predefined schema
3. Amazon’s operations finance data are spread across more than 25 databases, with
regional teams creating their own local version of datasets. Audits and controls must
be in place for each database to ensure that nobody has improper access.With a data
lake, it’s easier to get the right data to the right people at the right time
Possibility of sensitive information mining
Lack of control within big data solutions may let corrupt IT specialists or evil
business rivals mine unprotected data and sell it for their own benefit.
Companies, can incur huge losses, if such information is connected with new
product/service launch, or users’ personal information. An employee of a
company in charge of the big data store can misuse his power and violate
privacy policies. For example: stalk people by monitoring through chats. To
avoid this, proper security tools should be in place and access controls should
be applied strictly at different levels in the organizations.
High speed of NoSQL databases’ evolution and lack of security focus
NoSQL databases, handle many challenges of big data analytics without concerning much
over security issues which is embedded only in the middleware and no explicit security
enforcement is provided. NoSQL databases have weak authentication techniques and
weak password storage mechanisms. They are subjected to attacks like JSON injection,
REST injection, man-in-the-middle attack and schema injection and others. NoSQL
databases are subjected to inside attacks as well due to lenient security mechanisms. To
avoid this the following should be done:
1. Encrypting sensitive database fields
2. Keeping unencrypted values in a sandboxed environment
3. Using sufficient input validation
4. Applying strong user authentication policies
RECOMMENDATIONS TO ENHANCE BIG
DATA SECURITY
Secure Your Computation Code
To prevent malicious data entry, implement access control, code signing and dynamic
analysis of the computational code. Proper strategies need to be made to control the
impact of untrusted code if it has been able to get into the big data solution.
There are generally two ways of preventing attacks: securing the data when insecure
mapper is present, and securing the mapper.
Implement Comprehensive Input Validation and Filtering.
For better security practices, implementation of input validation and filtering on internal
and external sources is recommended. Proper evaluation of key input validation and
filtering features is required
Implement Granular Access Control.
Defining and enforcing the roles to different the kinds of users like admin,
knowledge workers, end users, developers etc. is the core part for the
implementation of granular access control.
Use policy to define which SUDO sessions are keystroke logged based on risk
and user. Implement granular assignments for who can switch sessions ("SU”)
and Audit privileged activity
Secure data storage and computation.
Important as much part of sensitive data leakage portions are encountered in
this phase. For this, the sensitive data should be segregated. Enabling Data
Encryption for sensitive data and audit administrative access on Data Nodes
marks to be a major step.
Finally the verification of proper configuration of API security of all
components is the final step for secure data storage and computation.
CONCLUSION
Big data is trending. No new application can be imagined without it producing
new forms of data, operating on data driven algorithms, and consuming
specified amount of data.
With data storing and computing environments becoming more cheaper–
encryption and compliance have introduced challenges that practically need to
be handled in a very systematic manner.
There is a big ecosystem exists for specific big data problems. Major
recommendations for dealing with the security issues are implementation of
data lakes, access controls, validation, filtration and securing data storage and
computation.
THANK YOU

Contenu connexe

Tendances

IoT - Data Management Trends, Best Practices, & Use Cases
IoT - Data Management Trends, Best Practices, & Use CasesIoT - Data Management Trends, Best Practices, & Use Cases
IoT - Data Management Trends, Best Practices, & Use CasesCloudera, Inc.
 
Data Management, Metadata Management, and Data Governance – Working Together
Data Management, Metadata Management, and Data Governance – Working TogetherData Management, Metadata Management, and Data Governance – Working Together
Data Management, Metadata Management, and Data Governance – Working TogetherDATAVERSITY
 
Data Quality & Data Governance
Data Quality & Data GovernanceData Quality & Data Governance
Data Quality & Data GovernanceTuba Yaman Him
 
Big Data & Analytics (Conceptual and Practical Introduction)
Big Data & Analytics (Conceptual and Practical Introduction)Big Data & Analytics (Conceptual and Practical Introduction)
Big Data & Analytics (Conceptual and Practical Introduction)Yaman Hajja, Ph.D.
 
Privacy and Data Security
Privacy and Data SecurityPrivacy and Data Security
Privacy and Data SecurityWilmerHale
 
Data mining concepts and work
Data mining concepts and workData mining concepts and work
Data mining concepts and workAmr Abd El Latief
 
Introduction to data science
Introduction to data scienceIntroduction to data science
Introduction to data scienceSampath Kumar
 
Privacy, security and ethics in data science
Privacy, security and ethics in data sciencePrivacy, security and ethics in data science
Privacy, security and ethics in data scienceNikolaos Vasiloglou
 
Data Governance Best Practices and Lessons Learned
Data Governance Best Practices and Lessons LearnedData Governance Best Practices and Lessons Learned
Data Governance Best Practices and Lessons LearnedDATAVERSITY
 

Tendances (20)

IoT - Data Management Trends, Best Practices, & Use Cases
IoT - Data Management Trends, Best Practices, & Use CasesIoT - Data Management Trends, Best Practices, & Use Cases
IoT - Data Management Trends, Best Practices, & Use Cases
 
Big Data Analytics
Big Data AnalyticsBig Data Analytics
Big Data Analytics
 
Big Data
Big DataBig Data
Big Data
 
Data Management, Metadata Management, and Data Governance – Working Together
Data Management, Metadata Management, and Data Governance – Working TogetherData Management, Metadata Management, and Data Governance – Working Together
Data Management, Metadata Management, and Data Governance – Working Together
 
Big data analysis
Big data analysisBig data analysis
Big data analysis
 
Data Quality & Data Governance
Data Quality & Data GovernanceData Quality & Data Governance
Data Quality & Data Governance
 
Big Data
Big DataBig Data
Big Data
 
Big Data & Analytics (Conceptual and Practical Introduction)
Big Data & Analytics (Conceptual and Practical Introduction)Big Data & Analytics (Conceptual and Practical Introduction)
Big Data & Analytics (Conceptual and Practical Introduction)
 
Big Data
Big DataBig Data
Big Data
 
Iot and ethics
Iot and ethicsIot and ethics
Iot and ethics
 
Big Data Ecosystem
Big Data EcosystemBig Data Ecosystem
Big Data Ecosystem
 
Big_data_ppt
Big_data_ppt Big_data_ppt
Big_data_ppt
 
Big data security
Big data securityBig data security
Big data security
 
Privacy and Data Security
Privacy and Data SecurityPrivacy and Data Security
Privacy and Data Security
 
Big data
Big dataBig data
Big data
 
Big data ppt
Big data pptBig data ppt
Big data ppt
 
Data mining concepts and work
Data mining concepts and workData mining concepts and work
Data mining concepts and work
 
Introduction to data science
Introduction to data scienceIntroduction to data science
Introduction to data science
 
Privacy, security and ethics in data science
Privacy, security and ethics in data sciencePrivacy, security and ethics in data science
Privacy, security and ethics in data science
 
Data Governance Best Practices and Lessons Learned
Data Governance Best Practices and Lessons LearnedData Governance Best Practices and Lessons Learned
Data Governance Best Practices and Lessons Learned
 

Similaire à Security issues in big data

Similaire à Security issues in big data (20)

Big-Data-Analytics.8592259.powerpoint.pdf
Big-Data-Analytics.8592259.powerpoint.pdfBig-Data-Analytics.8592259.powerpoint.pdf
Big-Data-Analytics.8592259.powerpoint.pdf
 
1
11
1
 
Big Data Analytics
Big Data AnalyticsBig Data Analytics
Big Data Analytics
 
What Is Big Data How Big Data Works.pdf
What Is Big Data How Big Data Works.pdfWhat Is Big Data How Big Data Works.pdf
What Is Big Data How Big Data Works.pdf
 
Unit 1
Unit 1Unit 1
Unit 1
 
What Is Big Data How Big Data Works.pdf
What Is Big Data How Big Data Works.pdfWhat Is Big Data How Big Data Works.pdf
What Is Big Data How Big Data Works.pdf
 
Research paper on big data and hadoop
Research paper on big data and hadoopResearch paper on big data and hadoop
Research paper on big data and hadoop
 
All About Big Data
All About Big Data All About Big Data
All About Big Data
 
A Review Paper on Big Data and Hadoop for Data Science
A Review Paper on Big Data and Hadoop for Data ScienceA Review Paper on Big Data and Hadoop for Data Science
A Review Paper on Big Data and Hadoop for Data Science
 
Data lake ppt
Data lake pptData lake ppt
Data lake ppt
 
Big Data: Issues and Challenges
Big Data: Issues and ChallengesBig Data: Issues and Challenges
Big Data: Issues and Challenges
 
Big data security
Big data securityBig data security
Big data security
 
Big data security
Big data securityBig data security
Big data security
 
Big Data.pptx
Big Data.pptxBig Data.pptx
Big Data.pptx
 
Unit No2 Introduction to big data.pdf
Unit No2 Introduction to big data.pdfUnit No2 Introduction to big data.pdf
Unit No2 Introduction to big data.pdf
 
BD1.pptx
BD1.pptxBD1.pptx
BD1.pptx
 
Lecture 5 - Big Data and Hadoop Intro.ppt
Lecture 5 - Big Data and Hadoop Intro.pptLecture 5 - Big Data and Hadoop Intro.ppt
Lecture 5 - Big Data and Hadoop Intro.ppt
 
big_data.ppt
big_data.pptbig_data.ppt
big_data.ppt
 
big_data.ppt
big_data.pptbig_data.ppt
big_data.ppt
 
big_data.ppt
big_data.pptbig_data.ppt
big_data.ppt
 

Dernier

CALL ON ➥8923113531 🔝Call Girls Charbagh Lucknow best sexual service
CALL ON ➥8923113531 🔝Call Girls Charbagh Lucknow best sexual serviceCALL ON ➥8923113531 🔝Call Girls Charbagh Lucknow best sexual service
CALL ON ➥8923113531 🔝Call Girls Charbagh Lucknow best sexual serviceanilsa9823
 
Call now : 9892124323 Nalasopara Beautiful Call Girls Vasai virar Best Call G...
Call now : 9892124323 Nalasopara Beautiful Call Girls Vasai virar Best Call G...Call now : 9892124323 Nalasopara Beautiful Call Girls Vasai virar Best Call G...
Call now : 9892124323 Nalasopara Beautiful Call Girls Vasai virar Best Call G...Pooja Nehwal
 
GENUINE Babe,Call Girls IN Baderpur Delhi | +91-8377087607
GENUINE Babe,Call Girls IN Baderpur  Delhi | +91-8377087607GENUINE Babe,Call Girls IN Baderpur  Delhi | +91-8377087607
GENUINE Babe,Call Girls IN Baderpur Delhi | +91-8377087607dollysharma2066
 
Continuous Improvement Infographics for Learning
Continuous Improvement Infographics for LearningContinuous Improvement Infographics for Learning
Continuous Improvement Infographics for LearningCIToolkit
 
situational leadership theory by Misba Fathima S
situational leadership theory by Misba Fathima Ssituational leadership theory by Misba Fathima S
situational leadership theory by Misba Fathima Smisbafathima9940
 
{ 9892124323 }} Call Girls & Escorts in Hotel JW Marriott juhu, Mumbai
{ 9892124323 }} Call Girls & Escorts in Hotel JW Marriott juhu, Mumbai{ 9892124323 }} Call Girls & Escorts in Hotel JW Marriott juhu, Mumbai
{ 9892124323 }} Call Girls & Escorts in Hotel JW Marriott juhu, MumbaiPooja Nehwal
 
Does Leadership Possible Without a Vision.pptx
Does Leadership Possible Without a Vision.pptxDoes Leadership Possible Without a Vision.pptx
Does Leadership Possible Without a Vision.pptxSaqib Mansoor Ahmed
 
Call Now Pooja Mehta : 7738631006 Door Step Call Girls Rate 100% Satisfactio...
Call Now Pooja Mehta :  7738631006 Door Step Call Girls Rate 100% Satisfactio...Call Now Pooja Mehta :  7738631006 Door Step Call Girls Rate 100% Satisfactio...
Call Now Pooja Mehta : 7738631006 Door Step Call Girls Rate 100% Satisfactio...Pooja Nehwal
 
Agile Coaching Change Management Framework.pptx
Agile Coaching Change Management Framework.pptxAgile Coaching Change Management Framework.pptx
Agile Coaching Change Management Framework.pptxalinstan901
 
Day 0- Bootcamp Roadmap for PLC Bootcamp
Day 0- Bootcamp Roadmap for PLC BootcampDay 0- Bootcamp Roadmap for PLC Bootcamp
Day 0- Bootcamp Roadmap for PLC BootcampPLCLeadershipDevelop
 
operational plan ppt.pptx nursing management
operational plan ppt.pptx nursing managementoperational plan ppt.pptx nursing management
operational plan ppt.pptx nursing managementTulsiDhidhi1
 
BDSM⚡Call Girls in Sector 99 Noida Escorts >༒8448380779 Escort Service
BDSM⚡Call Girls in Sector 99 Noida Escorts >༒8448380779 Escort ServiceBDSM⚡Call Girls in Sector 99 Noida Escorts >༒8448380779 Escort Service
BDSM⚡Call Girls in Sector 99 Noida Escorts >༒8448380779 Escort ServiceDelhi Call girls
 

Dernier (20)

Imagine - Creating Healthy Workplaces - Anthony Montgomery.pdf
Imagine - Creating Healthy Workplaces - Anthony Montgomery.pdfImagine - Creating Healthy Workplaces - Anthony Montgomery.pdf
Imagine - Creating Healthy Workplaces - Anthony Montgomery.pdf
 
CALL ON ➥8923113531 🔝Call Girls Charbagh Lucknow best sexual service
CALL ON ➥8923113531 🔝Call Girls Charbagh Lucknow best sexual serviceCALL ON ➥8923113531 🔝Call Girls Charbagh Lucknow best sexual service
CALL ON ➥8923113531 🔝Call Girls Charbagh Lucknow best sexual service
 
Call now : 9892124323 Nalasopara Beautiful Call Girls Vasai virar Best Call G...
Call now : 9892124323 Nalasopara Beautiful Call Girls Vasai virar Best Call G...Call now : 9892124323 Nalasopara Beautiful Call Girls Vasai virar Best Call G...
Call now : 9892124323 Nalasopara Beautiful Call Girls Vasai virar Best Call G...
 
GENUINE Babe,Call Girls IN Baderpur Delhi | +91-8377087607
GENUINE Babe,Call Girls IN Baderpur  Delhi | +91-8377087607GENUINE Babe,Call Girls IN Baderpur  Delhi | +91-8377087607
GENUINE Babe,Call Girls IN Baderpur Delhi | +91-8377087607
 
Leadership in Crisis - Helio Vogas, Risk & Leadership Keynote Speaker
Leadership in Crisis - Helio Vogas, Risk & Leadership Keynote SpeakerLeadership in Crisis - Helio Vogas, Risk & Leadership Keynote Speaker
Leadership in Crisis - Helio Vogas, Risk & Leadership Keynote Speaker
 
Empowering Local Government Frontline Services - Mo Baines.pdf
Empowering Local Government Frontline Services - Mo Baines.pdfEmpowering Local Government Frontline Services - Mo Baines.pdf
Empowering Local Government Frontline Services - Mo Baines.pdf
 
Peak Performance & Resilience - Dr Dorian Dugmore
Peak Performance & Resilience - Dr Dorian DugmorePeak Performance & Resilience - Dr Dorian Dugmore
Peak Performance & Resilience - Dr Dorian Dugmore
 
Continuous Improvement Infographics for Learning
Continuous Improvement Infographics for LearningContinuous Improvement Infographics for Learning
Continuous Improvement Infographics for Learning
 
situational leadership theory by Misba Fathima S
situational leadership theory by Misba Fathima Ssituational leadership theory by Misba Fathima S
situational leadership theory by Misba Fathima S
 
{ 9892124323 }} Call Girls & Escorts in Hotel JW Marriott juhu, Mumbai
{ 9892124323 }} Call Girls & Escorts in Hotel JW Marriott juhu, Mumbai{ 9892124323 }} Call Girls & Escorts in Hotel JW Marriott juhu, Mumbai
{ 9892124323 }} Call Girls & Escorts in Hotel JW Marriott juhu, Mumbai
 
LoveLocalGov - Chris Twigg, Inner Circle
LoveLocalGov - Chris Twigg, Inner CircleLoveLocalGov - Chris Twigg, Inner Circle
LoveLocalGov - Chris Twigg, Inner Circle
 
Intro_University_Ranking_Introduction.pptx
Intro_University_Ranking_Introduction.pptxIntro_University_Ranking_Introduction.pptx
Intro_University_Ranking_Introduction.pptx
 
Does Leadership Possible Without a Vision.pptx
Does Leadership Possible Without a Vision.pptxDoes Leadership Possible Without a Vision.pptx
Does Leadership Possible Without a Vision.pptx
 
Call Now Pooja Mehta : 7738631006 Door Step Call Girls Rate 100% Satisfactio...
Call Now Pooja Mehta :  7738631006 Door Step Call Girls Rate 100% Satisfactio...Call Now Pooja Mehta :  7738631006 Door Step Call Girls Rate 100% Satisfactio...
Call Now Pooja Mehta : 7738631006 Door Step Call Girls Rate 100% Satisfactio...
 
Agile Coaching Change Management Framework.pptx
Agile Coaching Change Management Framework.pptxAgile Coaching Change Management Framework.pptx
Agile Coaching Change Management Framework.pptx
 
Unlocking the Future - Dr Max Blumberg, Founder of Blumberg Partnership
Unlocking the Future - Dr Max Blumberg, Founder of Blumberg PartnershipUnlocking the Future - Dr Max Blumberg, Founder of Blumberg Partnership
Unlocking the Future - Dr Max Blumberg, Founder of Blumberg Partnership
 
Day 0- Bootcamp Roadmap for PLC Bootcamp
Day 0- Bootcamp Roadmap for PLC BootcampDay 0- Bootcamp Roadmap for PLC Bootcamp
Day 0- Bootcamp Roadmap for PLC Bootcamp
 
operational plan ppt.pptx nursing management
operational plan ppt.pptx nursing managementoperational plan ppt.pptx nursing management
operational plan ppt.pptx nursing management
 
BDSM⚡Call Girls in Sector 99 Noida Escorts >༒8448380779 Escort Service
BDSM⚡Call Girls in Sector 99 Noida Escorts >༒8448380779 Escort ServiceBDSM⚡Call Girls in Sector 99 Noida Escorts >༒8448380779 Escort Service
BDSM⚡Call Girls in Sector 99 Noida Escorts >༒8448380779 Escort Service
 
Rohini Sector 16 Call Girls Delhi 9999965857 @Sabina Saikh No Advance
Rohini Sector 16 Call Girls Delhi 9999965857 @Sabina Saikh No AdvanceRohini Sector 16 Call Girls Delhi 9999965857 @Sabina Saikh No Advance
Rohini Sector 16 Call Girls Delhi 9999965857 @Sabina Saikh No Advance
 

Security issues in big data

  • 1. SECURITY ISSUES IN BIG DATA Shallote Dsouza
  • 2. WHAT IS BIG DATA? Big data refers to data that is so large and complex that it exceeds the processing capability of conventional data management systems and software techniques. Data becomes big data when individual data stops mattering and only a large collection of it or analysis derived from it are of value Offers many opportunities - advancement of science, improvement of health care, promotion of economic growth, enhancement of education system and more ways of social interaction and entertainment.  But Big data has its issues of security and privacy too due to its huge volume, high velocity, large variety in data sources and formats etc.
  • 3. DIMENSIONS OF BIG DATA Big Data possesses characteristics that can be defined by several V’s Volume Refers to quantity of data. Big data is defined as massive data sets with measures such as petabytes and zeta bytes. Vast amounts of data are generated every second. Today big data is generated by machines, networks and human interaction on systems like social media. Volume of data to be analysed is massive. Velocity Deals with the accelerating speed at which data flows in from sources like business processes, machines, networks like social media sites, mobile devices, etc. The flow of data is continuous. Reacting quickly enough to deal with data velocity is a challenge for most organizations. Variety Refers to various formats of data . Structured, numeric data in traditional databases. Unstructured text documents, email, video, audio, stock ticker data and financial
  • 4. Veracity Refers to the quality of big data like biases, noise, abnormality of data, immeasurable uncertainties and truthfulness and trustworthiness of data. Data that are erroneous, duplicate and incomplete or outdated, as a whole are referred to as dirty data. Valence Refers to the connectedness of big data in the form of graphs just like atoms. Data items are often directly connected to one another like a city is connected to its country. Two Facebook users are connected as they are friends. A high valence data is denser. Value Refers to the fact how big data is going to benefit us and our organization. It helps in measuring the usefulness of data in decision making. Queries can be run on the stored data so as to deduce important results and gain insights
  • 5. TOOLS FOR BIG DATA Big Data storage and management tools  Hadoop- Provides a software framework for distributed storage and processing of big data using the Map Reduce programming model  Cassandra- used for fast processing during very heavy writes and reads the environment and stored data which is very large to fit on the server, but still want a friendly familiar interface MongoDB- used for dynamic queries, defining indexes for good performance on a big database which makes applications faster and more efficiently at scale. Apache Hive- Analysis of large datasets stored in HDFS. Also, used for data summarization, query and ad-hoc analysis to process structured and semi-structured data in Hadoop Hbase- Used for real-time big data applications which contain billions of rows and millions of columns in tables built for low latency operations Cloudera- 100% open source and is the only Hadoop solution to offer batch processing, interactive SQL and interactive search as well as enterprise-grade continuous availability.
  • 6. TYPICAL BIG DATA ARCHITECTURE Big data architecture varies based on a company's infrastructure and needs, but it usually contains the following components: 1. Data sources: This can include data from databases, data from real-time sources, and static files generated from applications, such as Windows logs. 2. Data store: Need storage for the data that will be processed via big data architecture. Often, data will be stored in a data lake, which is a large unstructured database that scales easily. 3. A combination of batch processing and real-time processing: Large volume of data processed can be handled efficiently using batch processing, while real-time data needs to be processed immediately to bring value. 4. Analytical data store: Helps keep all the data is in one place so analysis can be comprehensive, and it is optimized for analysis rather than transactions. This might take the form of a cloud-based data warehouse or a relational database 5. Automation: Ingesting and transforming the data, moving it in batches and stream processes, loading it to an analytical data store, and finally deriving insights must be in a repeatable workflow so that you can continually gain insights from your big data
  • 7. GENERAL BIG DATA SECURITY ISSUESInsecure Computation Malicious programs are used by attackers to extract sensitive information from data sources. This can also corrupt the data, leading to incorrect results in prediction or analysis. It can also result into Denial of Services (DoS) Input Validation and Filtering Big Data collects inputs from multiple sources hence input validation is required. This involves validating trusted data sources and filtering malicious data from the good one. In big data gigabytes and terabytes of continuous data flow makes it really very difficult to perform input validation or data filtering on the incoming batch of data. Privacy Concerns in Data Mining and Analytics Monetization of Big Data involves sharing of analytical results which involves multiple challenges like invasion of privacy, invasive marketing and unintentional disclosure of information. Quite a few examples of these include - AOL Inc. released search logs where users could be identified easily, which was really concerning.
  • 8. Granular Access Controls Big data was traditionally designed with almost no security in mind. As a way out, the parts of needed data sets, that users have right to see, are copied to a separate big data warehouse and provided to particular user groups. For a medical research, only the medical info (without the names, addresses) gets copied. Volumes of big data grow even faster this way. Complex solutions adversely affect the system’s performance and maintenance. Insecure data storage Authentication, authorization and encryption of data at thousands of nodes becomes a challenging work. Auto–tiering moves cold data, which might be of use, to lesser secure medium. Also encryption of real time data may have performance impacts. Secure communication amongst various nodes, middlewares, and end users is disabled by default, hence it needs to be enabled explicitly.
  • 9. SECURITY ISSUES IN BIG DATA – SOME RELEVANT USE CASES Vulnerability to fake data generation For instance, if a manufacturing company uses sensor data to detect malfunctioning production processes, cybercriminals can penetrate the system and make the sensors show fake results. The company can fail to notice alarming trends and miss the opportunity to solve problems before serious damage is caused. Such challenges can be solved through applying fraud detection approach. Amazon’s Galaxy Data Lakes Challenges faced by Amazon: data silos, difficulty analyzing diverse datasets, managing data access and security. 1. A data silo is a situation wherein only one group in an organization can access a set of data. Data is stored in different places and in different ways for international expansion which keeps important data hidden. A data lake solves this problem by uniting all the data into one central location.
  • 10. 2. Amazon Prime has data for fulfilment centres and packaged goods, while Amazon Fresh has data for grocery stores and food. Even shipping programs differ internationally. For example, different countries sometimes have different box sizes and shapes. Different systems may also have the same type of information, but it’s labeled differently. For example, in Europe, the term used is “cost per unit,” but in North America, the term used is “cost per package.” Data lakes allow you to import any amount of data in any format because there is no predefined schema 3. Amazon’s operations finance data are spread across more than 25 databases, with regional teams creating their own local version of datasets. Audits and controls must be in place for each database to ensure that nobody has improper access.With a data lake, it’s easier to get the right data to the right people at the right time
  • 11. Possibility of sensitive information mining Lack of control within big data solutions may let corrupt IT specialists or evil business rivals mine unprotected data and sell it for their own benefit. Companies, can incur huge losses, if such information is connected with new product/service launch, or users’ personal information. An employee of a company in charge of the big data store can misuse his power and violate privacy policies. For example: stalk people by monitoring through chats. To avoid this, proper security tools should be in place and access controls should be applied strictly at different levels in the organizations.
  • 12. High speed of NoSQL databases’ evolution and lack of security focus NoSQL databases, handle many challenges of big data analytics without concerning much over security issues which is embedded only in the middleware and no explicit security enforcement is provided. NoSQL databases have weak authentication techniques and weak password storage mechanisms. They are subjected to attacks like JSON injection, REST injection, man-in-the-middle attack and schema injection and others. NoSQL databases are subjected to inside attacks as well due to lenient security mechanisms. To avoid this the following should be done: 1. Encrypting sensitive database fields 2. Keeping unencrypted values in a sandboxed environment 3. Using sufficient input validation 4. Applying strong user authentication policies
  • 13. RECOMMENDATIONS TO ENHANCE BIG DATA SECURITY Secure Your Computation Code To prevent malicious data entry, implement access control, code signing and dynamic analysis of the computational code. Proper strategies need to be made to control the impact of untrusted code if it has been able to get into the big data solution. There are generally two ways of preventing attacks: securing the data when insecure mapper is present, and securing the mapper. Implement Comprehensive Input Validation and Filtering. For better security practices, implementation of input validation and filtering on internal and external sources is recommended. Proper evaluation of key input validation and filtering features is required
  • 14. Implement Granular Access Control. Defining and enforcing the roles to different the kinds of users like admin, knowledge workers, end users, developers etc. is the core part for the implementation of granular access control. Use policy to define which SUDO sessions are keystroke logged based on risk and user. Implement granular assignments for who can switch sessions ("SU”) and Audit privileged activity Secure data storage and computation. Important as much part of sensitive data leakage portions are encountered in this phase. For this, the sensitive data should be segregated. Enabling Data Encryption for sensitive data and audit administrative access on Data Nodes marks to be a major step. Finally the verification of proper configuration of API security of all components is the final step for secure data storage and computation.
  • 15. CONCLUSION Big data is trending. No new application can be imagined without it producing new forms of data, operating on data driven algorithms, and consuming specified amount of data. With data storing and computing environments becoming more cheaper– encryption and compliance have introduced challenges that practically need to be handled in a very systematic manner. There is a big ecosystem exists for specific big data problems. Major recommendations for dealing with the security issues are implementation of data lakes, access controls, validation, filtration and securing data storage and computation.