This document provides an overview of big data and its applications in distributed analytics, cyber security, and digital forensics. It discusses how big data can reduce the processing time of large volumes of data in distributed computing environments using Hadoop. Examples of big data applications include using social media, search engine, and aircraft black box data for analysis. The document also outlines the challenges of traditional systems and how distributed big data architectures help address them by allowing data to be processed across clustered computers.
3. Big Data
Data sets so large that traditional application can’t
process.
It can reduce the processing time of large volume of
data in distributed computing environment using
HADOOP.
Often referred to extract value from big data sets.
Big data plays a big role in Critical Infrastructure(is a term
used by governments to describe assets that are essential
for the functioning of a society and economy.)
3
4. Applications in
Distributed Analytics
Systematic analysis of data in different platforms.
Massively Multiplayer Online Game.
Cyber security
Protection of information system.
Identify malicious activity hidden in the masses of
data.
Digital Forensics
Recovery & investigation of materials found in
digital devices. 4
5. What Comes Under Big Data?
Big data involves the data produced by different devices and
applications.
Social Media Data : Social media such as Facebook and
Twitter hold information and the views posted by millions of
people across the globe.
Search Engine Data : Search engines retrieve lots of data
from different databases.
Black Box Data : It is a component of helicopter, airplanes,
and jets, etc. It captures voices of the flight crew, recordings
of microphones and earphones, and the performance
information of the aircraft.
5
7. DIVE-C: Distributed-parallel Virtual Environment
on Cloud Computing Platform.
DIVE-C: for distributed parallel data processing
applications.
It hides the complexity of the cloud, and helps
users to focus on their new applications and
core services.
7
8. Traditional Approach
• Data stored in RDBMS.
• Software's interact with database.
• Process data & present to users.
8
9. Limitations
Less volume of data.
Most event logs and other recorded computer
activities were deleted after a fixed retention
period.
Traditional database is expensive to scale.
Design difficult to distribute.
9
10. What is Big Data?
‘Big Data’ is large amount of datasets.
An aim to solve new problems or old problems in
a better way.
It generate value from storage.
Cannot be analyzed with traditional computing
techniques.
10
11. Facebook generates
10TB data daily.
Fb handles 40 billion
photos from its user
base.
Decoding the human
genome originally took
10 years to process;
now it can be achieved
in one week.
Twitter generates 7TB
of data daily.
11
14. 14
1. Data Source Layer:-
• In this layer data arrives from different sources.
• It includes customer database, e-mails , social
media channels, feedbacks etc.
2. Data storage layer:-
• Here Big Data lives, once it is gathered from our
sources.
HDFS(Hadoop Distributed File System).
3. Data processing/analysis layer:-
• Here stored data is used to find out something
useful, need to process and analyze it.
MapReduce tool
4. Data output layer:-
• Here we get the output.
• Output take form of reports , charts , figures etc.
15. Distributed computing
Refers to the use of distributed systems to solve
computational problems.
A problem is divided into many tasks, each of which is
solved by one or more computers.
Big Data technologies include distributed computational
systems, distributed file systems, massively parallel-
processing (MPP) systems, cloud-based storage and
computing, and data mining based on grid computing,
etc.
15
16. 16
Apache Hadoop is a software platform supporting data-
intensive distributed applications.
NoSQL database is used for large and distributed data
management and database design.
The data in big data is unstructured that is no schema for
them in order to access them NoSQL is used.
A distributed database (DDB) is interconnected and
distributed over a computer network.
17. 17
• A distributed database management system (DBMS)
allows for managing of the distributed database and
makes the distribution transparent to the users.
• A parallel DBMS is implemented on a multiprocessor
computer.
• Parallel database systems help improve data
processing performance through parallelizing indexing,
loading, and querying data .
• Hadoop is a framework for distributed processing of
large data sets across clusters of computers.
18. 18
It is also a parallel data processing model intended for
substantial data processing on cluster based computing
architectures.
Here clusters of computers and collects the results to
single system.
Figure shows distributed processing of
Big Data
19. 19
In a distributed method, the file system is expected to achieve
the following goals :-
• Reliability: The file system can recreate the original data from
the distributed nodes.
• High performance: It can locate the data of interest in a
timely manner on the distributed nodes.
• High availability: It can account for failures and incorporate
mechanisms for monitoring, fault tolerance, error detection,
and automatic recovery.
• Scalability: The file system should permit additional
hardware to be added for more storing capacity and/or better
performance.
20. 20
• Big Data is by nature a distributed processing and
distributed analytics method.
• It can handle large and diverse structured, semi-
structured, and unstructured datasets.
• It helps reduce the processing time of the growing
volumes of data that are common in today’s
distributed computing environments.
21. WHY BIG DATA TURNS AN ESSENTIAL KEY IN A
CYBERSECURITY STRATEGY?
Currently, there is a continuous increase of devices
connected together.
In 2016 there will be about 18,900 million devices
connected to the Internet worldwide.
Every day we create 2.5 quintillion bytes of data.
Big Data being used in the cyber security sector offers a
number of benefits.
21
23. 23
Take for example terrorists hacking into secure government
networks.
• Big Data analysis can present information regarding
which IP addresses are associated with the individuals.
• Big Data analysis can also provide information about an IT
environment as possible.
• Understanding the underlying IT infrastructure allows to
recognize irregular activities and abnormalities which
indicate high-risk events.
• The unusual is what matters the most when it comes to
security threats.
• Big Data delivers this information directly to security
analysts.
24. Digital forensics (DF):- Is a set of techniques and
method for collecting, analyzing, and preserving digital
data collected from digital media.
DF uses scientific methods to analyze and interpret
electronically stored information (ESI) to reconstruct
events.
Here reconstructing events from beginning it will be a
huge data so here is the use of Big Data technology.
Traditional forensics analyzes entire hard drives though
the forensic examiner.
24
25. 25
• An integrated proactive digital forensic (IPDF) model was
proposed for internal and external attacks and overall
network security in context of high-volume network traffic,
big data and virtualized cloud environment.
• The model is a three layered intrusion detection system
(IDS).
• The first layer registers malicious attacks from black-
listed web sites and unauthorized internal user
processes.
• The second layer capture the internal unauthorized
processes associated with particular user role.
• The third layer performs statistical analysis over the
remaining users’ processes for any “low-and-slow”
deviations from the referenced process patterns
associated with user and group of users’ roles .
26. 26
• Big Data analytics can provide help for fraud
detection.
• Big Data can provide security intelligence by
shortening the time of correlating long-term
historical data for forensic purposes
28. Distributed Analytics
Cyber security
Digital Forensics
Health care
Transportation
Business sector
28
29. Big Data in cybersecurity and cyber warfare
domains with Non-Internet-connected
networks, etc. can be further research topics.
In the future work the challenges in Big Data
is overcomed.
29
30. The Big Data proposed in this seminar identifies
the early challenges and successes in reducing
processing time of growing volume of data.
Here it shows Big Data applications in distributed
analytics, general cyber security, cyber warfare,
cyber defense, and digital forensics
30
31. A. A. Cárdenas, P. K. Manadhata, S. P. Rajan, Big Data Analytics for
Security, IEEE Security & Privacy, 11 (6), 2013, pp. 74-76.
E. S. Crabb, “Journal of Digital Forensics, Security & Law”, 9(2),
2014, pp. 167-179.
K. Geers, Cyberspace and the changing nature of warfare. SC
Magazine, 27 August, 2008.
D. Schweitzer, Incident Response: Computer Forensics Toolkit,
Willey Publishing, Inc., 2003.
S.-H. Kim and I.-Y. Lee, Block Access Token Renewal Scheme Based
on Secret Sharing in Apache Hadoop, Entropy, 16, 2014, pp. 4185-
4198
31