2. contents
Introduction.
Components.
Methods.
What is Hadoop.
Hadoop Offers.
Map reduce.
What is HPCC.
HPCC Components.
Big Data Samples.
Difference between Hpcc and Hadoop.
Private and Security issues.
Knowledge Discovery.
Conclusion.
3. Introduction
Big data and its analysis are at the center of modern science and
business.
These data are generated from online transactions, emails, videos,
audios, images etc.
They are stored in databases grow massively and become difficult to
capture, store, manage, share.
It is predicted to double every two years reaching about 8zettabytes
of data by 2015.
4. Components
Vareity.
Variety makes big data really big.
Big data comes from a great variety of sources.
Generally has in three types structured, unstructured and semi-
structured.
Structured data inserts a data warehouse already tagged and
easily sorted.
Unstructured data is random and difficult to analyze.
5. Components
Semi-structured data does not conform to fixed fields but contains
tags to separate data elements.
Volume.
Volume or the size of data now is larger than terabytes, petabytes and
zettabytes.
Velocity.
The flow of data is massive and continuous.
Big data should be used as it streams into the organization in order to
maximize its value.
6. Methods
Facing lots of new data which arrives in many different forms.
Big data has generated a whole new industry of supporting
architectures such as MapReduce.
MapReduce is a programming framework for distributed computing.
Created by google using divide and conquer method.
MapReduce can be divided into two stages.
Map Step. Hpcc.
Reduce Step. Hadoop.
7. What is Hadoop?
Hadoop is an open-source software framework.
Its Java based framework.
Essentially it accomplishes two tasks massive data storage and faster
processing.
Its not replace in database warehouse or ETL.
8. Hadoop Offers
HDFS - responsible for storing data on the clusters.
MapReduce.
Hbase - distributed database for random read/write access.
Pig - high level data processing system.
Hive - data warehouse application.
Sqoop - transferring data between relational databases and Hadoop.
9. Mapreduce
MapReduce is a programming framework for distributed computing.
Created by google using divide and conquer method.
MapReduce can be divided into two stages.
Map Step.
Reduce Step.
11. What is HPCC?
HPCC also known as DAS.
HPCC Systems distributed data intensive open source computing
platform and provides big data workflow management services.
Unlike Hadoop, HPCC’s data model defined by user.
HPCC Platform does not require third party tools like GreenPlum,
Cassandra, RDBMS, Oozie.
12. HPCC Components
HPCC Data Refinery
Massively parallel ETL engine that enables data integration
and provides batch oriented data manipulation.
HPCC Data Delivery Engine
High throughput, ultra fast, low latency.
Enterprise Control Language
Simple usage programming language optimized for big data
operations and query transactions.
13. Big Data Samples
Biological science.
Life sciences.
Medical records.
Scientific research.
Mobile phones.
Government.
15. Knowledge Discovery
Some operations designed to get information from complicated data
sets.
Removing noise, handling missing data fields and calculating time
information.
Mapping purposes to a particular data mining methods.
Choose data mining algorithm and method for searching data
patterns.
16. Privacy and Security Issues
It required that big data stores are rightly controlled.
To ensure authentication a cryptographically secure communication
framework has to be implemented.
They control data according to specified by the regulations such as
imposing store periods.
Organizations have to consider legal branching for storing data.
17. Knowledge Discovery
Some operations designed to get information from complicated data
sets.
Removing noise, handling missing data fields and calculating time
information.
Mapping purposes to a particular data mining methods.
Choose data mining algorithm and method for searching data
patterns.
18. Conclusion
Difficult to managing the data.
Data keep in secure manner.
Its used more no of organization.