5. INTRODUCTION
• An algorithm is a sequence of steps
that take inputs from the user and
after some computation, produces an
output.
• A parallel algorithm is an
algorithm that can execute several
instructions simultaneously on
different processing devices and then
combine all the individual outputs to
produce the final result.
• Parallel Random Access Machines (PR
AM) is a model, which
is considered for most of the
parallel algorithms.
• In this case, multiple processors are
attached to a single block of memory.
6. • A set of similar type of
processors.
• All the processors
share a common
memory unit.
Processors can
communicate among
themselves through
the shared memory
only.
• A memory access unit
(MAU) connects the
processors with the
single shared memory.
PRAM MODEL
7. DIFFERENT TYPE PRAM−
• Exclusive Read Exclusive Write (EREW) − Here no two processors
are allowed to read from or write to the same memory location at
the same time.
• Exclusive Read Concurrent Write (ERCW) − Here no two processors
are allowed to read from the same memory location at the same
time, but are allowed to write to the same memory location at the
same time.
• Concurrent Read Exclusive Write (CREW) − Here all the processors
are allowed to read from the same memory location at the same
time, but are not allowed to write to the same memory location at
the same time.
• Concurrent Read Concurrent Write (CRCW) − All the processors are
allowed to read from or write to the same memory location at the
same time.
8. Hadoop Eco
System
Introduction: Hadoop Ecosystem is a
platform or a suite which provides various
services to solve the big data problems. It
includes Apache projects and various
commercial tools and solutions. There
are four major elements of Hadoop:
• HDFS
• MapReduce
• YARN, and Hadoop Common
Most of the tools or solutions are used to
supplement or support these major
elements. All these tools work collectively
to provide services such as absorption,
analysis, storage and maintenance of data
etc.
9. COMPONENTS
OF HADOOP
Following are the components that collectively
form a Hadoop ecosystem:
• HDFS: Hadoop Distributed File System
• YARN: Yet Another Resource Negotiator
• MapReduce: Programming based Data
Processing
• Spark: In-Memory data processing
• PIG, HIVE: Query based processing of data
services
• HBase: NoSQL Database
• Mahout, Spark MLLib: Machine
Learning algorithm libraries
• Solar, Lucene: Searching and Indexing
• Zookeeper: Managing cluster
• Oozie: Job Scheduling
12. Data Ingestion Data Storage
Data Processing
Data Analysis
and Exploration
Data
Presentation and
Visualization
13. stages of big data processing
Data Ingestion:
• Description: The process of collecting and importing raw data from
diverse sources into a data storage system.
• Objective: Capture data from various origins, including databases,
logs, sensors, and external feeds.
Data Storage:
• Description: Storing the ingested data in a format suitable for
processing and analysis.
• Objective: Establish a scalable and distributed storage system capable
of handling massive amounts of data.
14. stages of big data processing
Data Processing:
• Description: Transforming and processing the stored data to derive
valuable insights.
• Objective: Apply batch or real-time processing to cleanse, aggregate,
and analyze data.
Data Analysis and Exploration:
• Description: Analyzing the processed data to extract patterns, trends,
and meaningful information.
• Objective: Use querying and exploration tools to gain insights and
make data-driven decisions.
15. stages of big data processing
Data Presentation and Visualization:
• Description: Presenting the analyzed data to end-users through
reports, dashboards, or visualizations.
• Objective: Communicate findings effectively, enabling stakeholders to
understand and act upon the insights.