SlideShare une entreprise Scribd logo
1  sur  33
Guided By:
Mrs. Basima Yoosaf
Assistant professor
Dept. Computer Science
and Engineering
Presented By:
Sherin Mariam Reji
R12U024
S7,CSE
1
Introduction
Literature survey
Existing System
Proposed System
Advantages
Conclusion
Future Work
References
2
Big Data
 Data sets so large that traditional application can’t
process.
 It can reduce the processing time of large volume of
data in distributed computing environment using
HADOOP.
 Often referred to extract value from big data sets.
 Big data plays a big role in Critical Infrastructure(is a term
used by governments to describe assets that are essential
for the functioning of a society and economy.)
3
 Applications in
 Distributed Analytics
 Systematic analysis of data in different platforms.
 Massively Multiplayer Online Game.
 Cyber security
 Protection of information system.
 Identify malicious activity hidden in the masses of
data.
 Digital Forensics
 Recovery & investigation of materials found in
digital devices. 4
What Comes Under Big Data?
 Big data involves the data produced by different devices and
applications.
 Social Media Data : Social media such as Facebook and
Twitter hold information and the views posted by millions of
people across the globe.
 Search Engine Data : Search engines retrieve lots of data
from different databases.
 Black Box Data : It is a component of helicopter, airplanes,
and jets, etc. It captures voices of the flight crew, recordings
of microphones and earphones, and the performance
information of the aircraft.
5
6
 DIVE-C: Distributed-parallel Virtual Environment
on Cloud Computing Platform.
 DIVE-C: for distributed parallel data processing
applications.
 It hides the complexity of the cloud, and helps
users to focus on their new applications and
core services.
7
 Traditional Approach
• Data stored in RDBMS.
• Software's interact with database.
• Process data & present to users.
8
 Limitations
Less volume of data.
Most event logs and other recorded computer
activities were deleted after a fixed retention
period.
Traditional database is expensive to scale.
Design difficult to distribute.
9
What is Big Data?
‘Big Data’ is large amount of datasets.
An aim to solve new problems or old problems in
a better way.
It generate value from storage.
Cannot be analyzed with traditional computing
techniques.
10
 Facebook generates
10TB data daily.
 Fb handles 40 billion
photos from its user
base.
 Decoding the human
genome originally took
10 years to process;
now it can be achieved
in one week.
 Twitter generates 7TB
of data daily.
11
BIG
DATA
Volume:
Quantity
of data
Veracity:
Accuracy
Variety:
Types of
data
Velocity:
Speed
12
13
14
1. Data Source Layer:-
• In this layer data arrives from different sources.
• It includes customer database, e-mails , social
media channels, feedbacks etc.
2. Data storage layer:-
• Here Big Data lives, once it is gathered from our
sources.
 HDFS(Hadoop Distributed File System).
3. Data processing/analysis layer:-
• Here stored data is used to find out something
useful, need to process and analyze it.
 MapReduce tool
4. Data output layer:-
• Here we get the output.
• Output take form of reports , charts , figures etc.
 Distributed computing
 Refers to the use of distributed systems to solve
computational problems.
 A problem is divided into many tasks, each of which is
solved by one or more computers.
 Big Data technologies include distributed computational
systems, distributed file systems, massively parallel-
processing (MPP) systems, cloud-based storage and
computing, and data mining based on grid computing,
etc.
15
16
 Apache Hadoop is a software platform supporting data-
intensive distributed applications.
 NoSQL database is used for large and distributed data
management and database design.
 The data in big data is unstructured that is no schema for
them in order to access them NoSQL is used.
 A distributed database (DDB) is interconnected and
distributed over a computer network.
17
• A distributed database management system (DBMS)
allows for managing of the distributed database and
makes the distribution transparent to the users.
• A parallel DBMS is implemented on a multiprocessor
computer.
• Parallel database systems help improve data
processing performance through parallelizing indexing,
loading, and querying data .
• Hadoop is a framework for distributed processing of
large data sets across clusters of computers.
18
 It is also a parallel data processing model intended for
substantial data processing on cluster based computing
architectures.
 Here clusters of computers and collects the results to
single system.
Figure shows distributed processing of
Big Data
19
In a distributed method, the file system is expected to achieve
the following goals :-
• Reliability: The file system can recreate the original data from
the distributed nodes.
• High performance: It can locate the data of interest in a
timely manner on the distributed nodes.
• High availability: It can account for failures and incorporate
mechanisms for monitoring, fault tolerance, error detection,
and automatic recovery.
• Scalability: The file system should permit additional
hardware to be added for more storing capacity and/or better
performance.
20
• Big Data is by nature a distributed processing and
distributed analytics method.
• It can handle large and diverse structured, semi-
structured, and unstructured datasets.
• It helps reduce the processing time of the growing
volumes of data that are common in today’s
distributed computing environments.
WHY BIG DATA TURNS AN ESSENTIAL KEY IN A
CYBERSECURITY STRATEGY?
 Currently, there is a continuous increase of devices
connected together.
 In 2016 there will be about 18,900 million devices
connected to the Internet worldwide.
 Every day we create 2.5 quintillion bytes of data.
 Big Data being used in the cyber security sector offers a
number of benefits.
21
22
Big Data
Detection
Agile
Psychosocial
Risks
23
 Take for example terrorists hacking into secure government
networks.
• Big Data analysis can present information regarding
which IP addresses are associated with the individuals.
• Big Data analysis can also provide information about an IT
environment as possible.
• Understanding the underlying IT infrastructure allows to
recognize irregular activities and abnormalities which
indicate high-risk events.
• The unusual is what matters the most when it comes to
security threats.
• Big Data delivers this information directly to security
analysts.
 Digital forensics (DF):- Is a set of techniques and
method for collecting, analyzing, and preserving digital
data collected from digital media.
 DF uses scientific methods to analyze and interpret
electronically stored information (ESI) to reconstruct
events.
 Here reconstructing events from beginning it will be a
huge data so here is the use of Big Data technology.
 Traditional forensics analyzes entire hard drives though
the forensic examiner.
24
25
• An integrated proactive digital forensic (IPDF) model was
proposed for internal and external attacks and overall
network security in context of high-volume network traffic,
big data and virtualized cloud environment.
• The model is a three layered intrusion detection system
(IDS).
• The first layer registers malicious attacks from black-
listed web sites and unauthorized internal user
processes.
• The second layer capture the internal unauthorized
processes associated with particular user role.
• The third layer performs statistical analysis over the
remaining users’ processes for any “low-and-slow”
deviations from the referenced process patterns
associated with user and group of users’ roles .
26
• Big Data analytics can provide help for fraud
detection.
• Big Data can provide security intelligence by
shortening the time of correlating long-term
historical data for forensic purposes
ADVANTAGES
 Shorten processing time.
 Enhance cyber security.
 Facilitates cyber defense.
 Detect fraud and identity theft.
 Facilitates digital forensics analysis.
DISADVANTAGES
 Privacy Invasion
 Data Provenance
 Privacy Violation
27
 Distributed Analytics
 Cyber security
 Digital Forensics
 Health care
 Transportation
 Business sector
28
 Big Data in cybersecurity and cyber warfare
domains with Non-Internet-connected
networks, etc. can be further research topics.
 In the future work the challenges in Big Data
is overcomed.
29
 The Big Data proposed in this seminar identifies
the early challenges and successes in reducing
processing time of growing volume of data.
 Here it shows Big Data applications in distributed
analytics, general cyber security, cyber warfare,
cyber defense, and digital forensics
30
 A. A. Cárdenas, P. K. Manadhata, S. P. Rajan, Big Data Analytics for
Security, IEEE Security & Privacy, 11 (6), 2013, pp. 74-76.
 E. S. Crabb, “Journal of Digital Forensics, Security & Law”, 9(2),
2014, pp. 167-179.
 K. Geers, Cyberspace and the changing nature of warfare. SC
Magazine, 27 August, 2008.
 D. Schweitzer, Incident Response: Computer Forensics Toolkit,
Willey Publishing, Inc., 2003.
 S.-H. Kim and I.-Y. Lee, Block Access Token Renewal Scheme Based
on Secret Sharing in Apache Hadoop, Entropy, 16, 2014, pp. 4185-
4198
31
DISCUSSION
32
33

Contenu connexe

Tendances

Big Data Analytics | What Is Big Data Analytics? | Big Data Analytics For Beg...
Big Data Analytics | What Is Big Data Analytics? | Big Data Analytics For Beg...Big Data Analytics | What Is Big Data Analytics? | Big Data Analytics For Beg...
Big Data Analytics | What Is Big Data Analytics? | Big Data Analytics For Beg...
Simplilearn
 

Tendances (20)

Big data by Mithlesh sadh
Big data by Mithlesh sadhBig data by Mithlesh sadh
Big data by Mithlesh sadh
 
Applications of Big Data Analytics in Businesses
Applications of Big Data Analytics in BusinessesApplications of Big Data Analytics in Businesses
Applications of Big Data Analytics in Businesses
 
Big Data Characteristics And Process PowerPoint Presentation Slides
Big Data Characteristics And Process PowerPoint Presentation SlidesBig Data Characteristics And Process PowerPoint Presentation Slides
Big Data Characteristics And Process PowerPoint Presentation Slides
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Big data analytics
Big data analyticsBig data analytics
Big data analytics
 
Blockchain & the IoT
Blockchain & the IoTBlockchain & the IoT
Blockchain & the IoT
 
security and privacy-Internet of things
security and privacy-Internet of thingssecurity and privacy-Internet of things
security and privacy-Internet of things
 
Data science
Data science Data science
Data science
 
Big Data Analytics | What Is Big Data Analytics? | Big Data Analytics For Beg...
Big Data Analytics | What Is Big Data Analytics? | Big Data Analytics For Beg...Big Data Analytics | What Is Big Data Analytics? | Big Data Analytics For Beg...
Big Data Analytics | What Is Big Data Analytics? | Big Data Analytics For Beg...
 
Big data
Big dataBig data
Big data
 
Big Data: Its Characteristics And Architecture Capabilities
Big Data: Its Characteristics And Architecture CapabilitiesBig Data: Its Characteristics And Architecture Capabilities
Big Data: Its Characteristics And Architecture Capabilities
 
Credit card fraud detection through machine learning
Credit card fraud detection through machine learningCredit card fraud detection through machine learning
Credit card fraud detection through machine learning
 
Big Data Ppt PowerPoint Presentation Slides
Big Data Ppt PowerPoint Presentation Slides Big Data Ppt PowerPoint Presentation Slides
Big Data Ppt PowerPoint Presentation Slides
 
Big Data - Applications and Technologies Overview
Big Data - Applications and Technologies OverviewBig Data - Applications and Technologies Overview
Big Data - Applications and Technologies Overview
 
Introduction to Big Data
Introduction to Big Data Introduction to Big Data
Introduction to Big Data
 
Big Data Analytics
Big Data AnalyticsBig Data Analytics
Big Data Analytics
 
Data science presentation
Data science presentationData science presentation
Data science presentation
 
Lecture1 introduction to big data
Lecture1 introduction to big dataLecture1 introduction to big data
Lecture1 introduction to big data
 
Fraud and Risk in Big Data
Fraud and Risk in Big DataFraud and Risk in Big Data
Fraud and Risk in Big Data
 
Data analytics
Data analyticsData analytics
Data analytics
 

En vedette

Big data
Big dataBig data
Big data
hsn99
 
02 a holistic approach to big data
02 a holistic approach to big data02 a holistic approach to big data
02 a holistic approach to big data
Raul Chong
 
Event Management System Document
Event Management System Document Event Management System Document
Event Management System Document
LJ PROJECTS
 
Big Data v Data Mining
Big Data v Data MiningBig Data v Data Mining
Big Data v Data Mining
University of Hertfordshire
 

En vedette (15)

Big data
Big dataBig data
Big data
 
Big Data Concepts
Big Data ConceptsBig Data Concepts
Big Data Concepts
 
Big Data Final Presentation
Big Data Final PresentationBig Data Final Presentation
Big Data Final Presentation
 
Big data concepts
Big data conceptsBig data concepts
Big data concepts
 
02 a holistic approach to big data
02 a holistic approach to big data02 a holistic approach to big data
02 a holistic approach to big data
 
The Future Of Big Data
The Future Of Big DataThe Future Of Big Data
The Future Of Big Data
 
Privacy in the Age of Big Data
Privacy in the Age of Big DataPrivacy in the Age of Big Data
Privacy in the Age of Big Data
 
Big Data World
Big Data WorldBig Data World
Big Data World
 
Ethics in cyber space
Ethics in cyber spaceEthics in cyber space
Ethics in cyber space
 
Data minig with Big data analysis
Data minig with Big data analysisData minig with Big data analysis
Data minig with Big data analysis
 
Ethical issues in cyberspace
Ethical issues in cyberspaceEthical issues in cyberspace
Ethical issues in cyberspace
 
Digital forensics
Digital forensicsDigital forensics
Digital forensics
 
Event Management System Document
Event Management System Document Event Management System Document
Event Management System Document
 
Big Data v Data Mining
Big Data v Data MiningBig Data v Data Mining
Big Data v Data Mining
 
In-Memory Database Platform for Big Data
In-Memory Database Platform for Big DataIn-Memory Database Platform for Big Data
In-Memory Database Platform for Big Data
 

Similaire à Big Data in Distributed Analytics,Cybersecurity And Digital Forensics

Bigdata " new level"
Bigdata " new level"Bigdata " new level"
Bigdata " new level"
Vamshikrishna Goud
 
UNIT 1 -BIG DATA ANALYTICS Full.pdf
UNIT 1 -BIG DATA ANALYTICS Full.pdfUNIT 1 -BIG DATA ANALYTICS Full.pdf
UNIT 1 -BIG DATA ANALYTICS Full.pdf
vvpadhu
 

Similaire à Big Data in Distributed Analytics,Cybersecurity And Digital Forensics (20)

Bigdata
Bigdata Bigdata
Bigdata
 
Bigdata " new level"
Bigdata " new level"Bigdata " new level"
Bigdata " new level"
 
Data mining with big data
Data mining with big dataData mining with big data
Data mining with big data
 
How do data analysts work with big data and distributed computing frameworks.pdf
How do data analysts work with big data and distributed computing frameworks.pdfHow do data analysts work with big data and distributed computing frameworks.pdf
How do data analysts work with big data and distributed computing frameworks.pdf
 
UNIT 1 -BIG DATA ANALYTICS Full.pdf
UNIT 1 -BIG DATA ANALYTICS Full.pdfUNIT 1 -BIG DATA ANALYTICS Full.pdf
UNIT 1 -BIG DATA ANALYTICS Full.pdf
 
A Survey on Big Data Mining Challenges
A Survey on Big Data Mining ChallengesA Survey on Big Data Mining Challenges
A Survey on Big Data Mining Challenges
 
Unit 2
Unit 2Unit 2
Unit 2
 
Unit 1
Unit 1Unit 1
Unit 1
 
Real World Application of Big Data In Data Mining Tools
Real World Application of Big Data In Data Mining ToolsReal World Application of Big Data In Data Mining Tools
Real World Application of Big Data In Data Mining Tools
 
Kartikey tripathi
Kartikey tripathiKartikey tripathi
Kartikey tripathi
 
Big data: Challenges, Practices and Technologies
Big data: Challenges, Practices and TechnologiesBig data: Challenges, Practices and Technologies
Big data: Challenges, Practices and Technologies
 
Big data (word file)
Big data  (word file)Big data  (word file)
Big data (word file)
 
1
11
1
 
Big data
Big dataBig data
Big data
 
E018142329
E018142329E018142329
E018142329
 
Complete-SRS.doc
Complete-SRS.docComplete-SRS.doc
Complete-SRS.doc
 
Big data seminor
Big data seminorBig data seminor
Big data seminor
 
Data mining with big data
Data mining with big dataData mining with big data
Data mining with big data
 
data mining with big data
data mining with big datadata mining with big data
data mining with big data
 
Big data
Big dataBig data
Big data
 

Dernier

VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
dharasingh5698
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Dr.Costas Sachpazis
 
AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdf
ankushspencer015
 

Dernier (20)

Thermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - VThermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - V
 
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
 
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
 
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
 
Intze Overhead Water Tank Design by Working Stress - IS Method.pdf
Intze Overhead Water Tank  Design by Working Stress - IS Method.pdfIntze Overhead Water Tank  Design by Working Stress - IS Method.pdf
Intze Overhead Water Tank Design by Working Stress - IS Method.pdf
 
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
 
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
 
Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)
 
Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024
 
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
 
Thermal Engineering Unit - I & II . ppt
Thermal Engineering  Unit - I & II . pptThermal Engineering  Unit - I & II . ppt
Thermal Engineering Unit - I & II . ppt
 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
 
Roadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and RoutesRoadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and Routes
 
PVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELL
PVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELLPVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELL
PVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELL
 
data_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdfdata_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdf
 
Online banking management system project.pdf
Online banking management system project.pdfOnline banking management system project.pdf
Online banking management system project.pdf
 
AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdf
 
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
 
Unit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdfUnit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdf
 

Big Data in Distributed Analytics,Cybersecurity And Digital Forensics

  • 1. Guided By: Mrs. Basima Yoosaf Assistant professor Dept. Computer Science and Engineering Presented By: Sherin Mariam Reji R12U024 S7,CSE 1
  • 2. Introduction Literature survey Existing System Proposed System Advantages Conclusion Future Work References 2
  • 3. Big Data  Data sets so large that traditional application can’t process.  It can reduce the processing time of large volume of data in distributed computing environment using HADOOP.  Often referred to extract value from big data sets.  Big data plays a big role in Critical Infrastructure(is a term used by governments to describe assets that are essential for the functioning of a society and economy.) 3
  • 4.  Applications in  Distributed Analytics  Systematic analysis of data in different platforms.  Massively Multiplayer Online Game.  Cyber security  Protection of information system.  Identify malicious activity hidden in the masses of data.  Digital Forensics  Recovery & investigation of materials found in digital devices. 4
  • 5. What Comes Under Big Data?  Big data involves the data produced by different devices and applications.  Social Media Data : Social media such as Facebook and Twitter hold information and the views posted by millions of people across the globe.  Search Engine Data : Search engines retrieve lots of data from different databases.  Black Box Data : It is a component of helicopter, airplanes, and jets, etc. It captures voices of the flight crew, recordings of microphones and earphones, and the performance information of the aircraft. 5
  • 6. 6
  • 7.  DIVE-C: Distributed-parallel Virtual Environment on Cloud Computing Platform.  DIVE-C: for distributed parallel data processing applications.  It hides the complexity of the cloud, and helps users to focus on their new applications and core services. 7
  • 8.  Traditional Approach • Data stored in RDBMS. • Software's interact with database. • Process data & present to users. 8
  • 9.  Limitations Less volume of data. Most event logs and other recorded computer activities were deleted after a fixed retention period. Traditional database is expensive to scale. Design difficult to distribute. 9
  • 10. What is Big Data? ‘Big Data’ is large amount of datasets. An aim to solve new problems or old problems in a better way. It generate value from storage. Cannot be analyzed with traditional computing techniques. 10
  • 11.  Facebook generates 10TB data daily.  Fb handles 40 billion photos from its user base.  Decoding the human genome originally took 10 years to process; now it can be achieved in one week.  Twitter generates 7TB of data daily. 11
  • 13. 13
  • 14. 14 1. Data Source Layer:- • In this layer data arrives from different sources. • It includes customer database, e-mails , social media channels, feedbacks etc. 2. Data storage layer:- • Here Big Data lives, once it is gathered from our sources.  HDFS(Hadoop Distributed File System). 3. Data processing/analysis layer:- • Here stored data is used to find out something useful, need to process and analyze it.  MapReduce tool 4. Data output layer:- • Here we get the output. • Output take form of reports , charts , figures etc.
  • 15.  Distributed computing  Refers to the use of distributed systems to solve computational problems.  A problem is divided into many tasks, each of which is solved by one or more computers.  Big Data technologies include distributed computational systems, distributed file systems, massively parallel- processing (MPP) systems, cloud-based storage and computing, and data mining based on grid computing, etc. 15
  • 16. 16  Apache Hadoop is a software platform supporting data- intensive distributed applications.  NoSQL database is used for large and distributed data management and database design.  The data in big data is unstructured that is no schema for them in order to access them NoSQL is used.  A distributed database (DDB) is interconnected and distributed over a computer network.
  • 17. 17 • A distributed database management system (DBMS) allows for managing of the distributed database and makes the distribution transparent to the users. • A parallel DBMS is implemented on a multiprocessor computer. • Parallel database systems help improve data processing performance through parallelizing indexing, loading, and querying data . • Hadoop is a framework for distributed processing of large data sets across clusters of computers.
  • 18. 18  It is also a parallel data processing model intended for substantial data processing on cluster based computing architectures.  Here clusters of computers and collects the results to single system. Figure shows distributed processing of Big Data
  • 19. 19 In a distributed method, the file system is expected to achieve the following goals :- • Reliability: The file system can recreate the original data from the distributed nodes. • High performance: It can locate the data of interest in a timely manner on the distributed nodes. • High availability: It can account for failures and incorporate mechanisms for monitoring, fault tolerance, error detection, and automatic recovery. • Scalability: The file system should permit additional hardware to be added for more storing capacity and/or better performance.
  • 20. 20 • Big Data is by nature a distributed processing and distributed analytics method. • It can handle large and diverse structured, semi- structured, and unstructured datasets. • It helps reduce the processing time of the growing volumes of data that are common in today’s distributed computing environments.
  • 21. WHY BIG DATA TURNS AN ESSENTIAL KEY IN A CYBERSECURITY STRATEGY?  Currently, there is a continuous increase of devices connected together.  In 2016 there will be about 18,900 million devices connected to the Internet worldwide.  Every day we create 2.5 quintillion bytes of data.  Big Data being used in the cyber security sector offers a number of benefits. 21
  • 23. 23  Take for example terrorists hacking into secure government networks. • Big Data analysis can present information regarding which IP addresses are associated with the individuals. • Big Data analysis can also provide information about an IT environment as possible. • Understanding the underlying IT infrastructure allows to recognize irregular activities and abnormalities which indicate high-risk events. • The unusual is what matters the most when it comes to security threats. • Big Data delivers this information directly to security analysts.
  • 24.  Digital forensics (DF):- Is a set of techniques and method for collecting, analyzing, and preserving digital data collected from digital media.  DF uses scientific methods to analyze and interpret electronically stored information (ESI) to reconstruct events.  Here reconstructing events from beginning it will be a huge data so here is the use of Big Data technology.  Traditional forensics analyzes entire hard drives though the forensic examiner. 24
  • 25. 25 • An integrated proactive digital forensic (IPDF) model was proposed for internal and external attacks and overall network security in context of high-volume network traffic, big data and virtualized cloud environment. • The model is a three layered intrusion detection system (IDS). • The first layer registers malicious attacks from black- listed web sites and unauthorized internal user processes. • The second layer capture the internal unauthorized processes associated with particular user role. • The third layer performs statistical analysis over the remaining users’ processes for any “low-and-slow” deviations from the referenced process patterns associated with user and group of users’ roles .
  • 26. 26 • Big Data analytics can provide help for fraud detection. • Big Data can provide security intelligence by shortening the time of correlating long-term historical data for forensic purposes
  • 27. ADVANTAGES  Shorten processing time.  Enhance cyber security.  Facilitates cyber defense.  Detect fraud and identity theft.  Facilitates digital forensics analysis. DISADVANTAGES  Privacy Invasion  Data Provenance  Privacy Violation 27
  • 28.  Distributed Analytics  Cyber security  Digital Forensics  Health care  Transportation  Business sector 28
  • 29.  Big Data in cybersecurity and cyber warfare domains with Non-Internet-connected networks, etc. can be further research topics.  In the future work the challenges in Big Data is overcomed. 29
  • 30.  The Big Data proposed in this seminar identifies the early challenges and successes in reducing processing time of growing volume of data.  Here it shows Big Data applications in distributed analytics, general cyber security, cyber warfare, cyber defense, and digital forensics 30
  • 31.  A. A. Cárdenas, P. K. Manadhata, S. P. Rajan, Big Data Analytics for Security, IEEE Security & Privacy, 11 (6), 2013, pp. 74-76.  E. S. Crabb, “Journal of Digital Forensics, Security & Law”, 9(2), 2014, pp. 167-179.  K. Geers, Cyberspace and the changing nature of warfare. SC Magazine, 27 August, 2008.  D. Schweitzer, Incident Response: Computer Forensics Toolkit, Willey Publishing, Inc., 2003.  S.-H. Kim and I.-Y. Lee, Block Access Token Renewal Scheme Based on Secret Sharing in Apache Hadoop, Entropy, 16, 2014, pp. 4185- 4198 31
  • 33. 33