SlideShare une entreprise Scribd logo
1  sur  5
Télécharger pour lire hors ligne
© VH Education Services Pvt. Ltd.
http://venturehire.co
Tools for Hadoop and NoSQL
In our previous blogs, we defined Big Data and Big Data Analytics. In this we are going to
discuss the tools being used for solving big data from technology standpoint – Hadoop
(HDFS, MapReduce) which is an open source computing framework and NoSQL which is
non-relational database.
Big Data Technologies and Tools
HADOOP
High-availability distributed object-oriented platform or “Hadoop” is a software
framework which analyze structured and unstructured data and distribute applications on
different servers. Below is an overall Hadoop architecture-
Source: Cisco
Basic Application of Hadoop
Hadoop is used in maintaining, scaling, error handling, self healing and securing large
scale of data. These data can be structured or unstructured. What I mean to say that if the
data is large then, traditional systems are unable to handle and finally Hadoop comes in
the picture. Below are some basic features of Hadoop -
© VH Education Services Pvt. Ltd.
http://venturehire.co
 Hadoop maintains and secures the data by storing and keeping its replica.
 It is focused on scaling according to data usage.
 It can detect and delete the failed task and as well as failed transaction of data.
 It not only recovers the data but also automatically restores the data at its place.
Typical Hadoop Platform Stack – HDFS + Hive + HBase + Pig
HDFS stands for Hadoop Distributed File System – is part of Hadoop and known as a
special file system which deals with distribution and storage of large sets of data. HDFS
stores file as sequence of same size of block except the last block. It also deals with
hardware failure and smoothen the data handling.
Hive – Hive was initiated by Facebook. Hive is data warehouse tool which is based on
Hadoop and converts query language into MapReduce jobs. It deals with the storage ,
analysis and queries of large set of data. Query language in hive used as HQL statement.
Hive Query Language is similar to standard SQL statement.
Hbase – Hbase is a Hadoop application which runs on top of HDFS. Hbase system
represents set of table but Hbase is column oriented database management system i.e.
different from the row oriented database management system. Generally if we talk about
database then we think of relational database system but unlikely Hbase is not relational
database at all and also it doesn’t support Structured Query Language like SQL. Java is
prefered language use for Hbase application. One most important feature of Hbase is to
real time read or write to large set of data.
Pig – initiated by Yahoo and in 2007 it became open source. Do you know why it is named
as Pig? It is called as Pig because it can handle any type of data!! strange but true. Pig is
high level procedural programming platform developed for simplifying large data sets
query in Hadoop and MapReduce. Pig have two component one is PigLatin which is
programming language and other is run time environment where PigLatin programs are
executed.
Advantage and Disadvantage of Hadoop:
© VH Education Services Pvt. Ltd.
http://venturehire.co
NoSQL
As the term says NoSQL, it means non relational or Non-SQL database, refer to Hbase,
Cassandra, MongoDb, Riak, CouchDB. It is not based on table formats and that’s the
reason we don’t use SQL for data access. A traditional database deals with structured data
while a relational database deals with the vertical as well as horizontal storage system.
NoSQL deals with the unstructured, unpredictable kind of data according to the system
requirement.
NoSQL Technologies HBase (part of the Hadoop
ecosystem),Cassandra, MongoDB, Riak, CouchDB.
Cassandra database is used to handle the large set of data when we need to scale the
database with high performance. Cassandra deals with the fault tolerance and replication
of the data. With this we can go deeper in columns, supercolumns and more. It is a partial
relational database system, supports best query capability but don’t have joins feature. It
follows the column family model map with two dimensional and 3 dimensional. 2D model
includes column family with some column in it, while 3D model created by associating
super column in column family.
© VH Education Services Pvt. Ltd.
http://venturehire.co
MongoDB is an agile NoSQL document database, unlike the traditional database which
store the data in rows and column, MongoDB stores the document data in binary form of
JSON document which is also known as BSON format. It is used for high scalability,
availability and performance. In MongoDB dynamic schemas are the unit of database,
which found in document where set of documents are found in collection while set of
collection makes the database.
Riak is open source NoSQL database system which is designed for availability, fault
tolerance, scalability and high performance. It provides three kind of storage key/value
store, document oriented store and web shaped store. It also store document in the JSON
format. when we talk about data modeling then we will see that there is no ‘Master’, only
nodes are there. All nodes are same and don’t have different responsibility.
CouchDB is open source NoSQL database ,distributed, and schemaless.It stores the
document data in the JSON format. It also provide feature related with web like access of
document from the web browser through HTTP. Javascript can also be use to modify the
document. In CouchDB document is combination of strings “keys” and “values”.
Advantage and Disadvantage of NoSQL:
Tools from Companies – Cassandra, Riak, Redis, HBase, Oracle, membase, mongoDB.
© VH Education Services Pvt. Ltd.
http://venturehire.co
Related Article:
1. What is Big Data and How Fast is it Growing?
2. Why Big Data is Next Big Thing?
Links which can be useful to you regarding this topic are
Big Data Training Course in Bangalore
Big Data Analytics Training in Bangalore

Contenu connexe

Dernier

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 

Dernier (20)

Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 

En vedette

How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental Health
ThinkNow
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
Kurio // The Social Media Age(ncy)
 

En vedette (20)

2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot
 
Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPT
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage Engineerings
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental Health
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
 
Skeleton Culture Code
Skeleton Culture CodeSkeleton Culture Code
Skeleton Culture Code
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 

Tools for hadoop and no sql

  • 1. © VH Education Services Pvt. Ltd. http://venturehire.co Tools for Hadoop and NoSQL In our previous blogs, we defined Big Data and Big Data Analytics. In this we are going to discuss the tools being used for solving big data from technology standpoint – Hadoop (HDFS, MapReduce) which is an open source computing framework and NoSQL which is non-relational database. Big Data Technologies and Tools HADOOP High-availability distributed object-oriented platform or “Hadoop” is a software framework which analyze structured and unstructured data and distribute applications on different servers. Below is an overall Hadoop architecture- Source: Cisco Basic Application of Hadoop Hadoop is used in maintaining, scaling, error handling, self healing and securing large scale of data. These data can be structured or unstructured. What I mean to say that if the data is large then, traditional systems are unable to handle and finally Hadoop comes in the picture. Below are some basic features of Hadoop -
  • 2. © VH Education Services Pvt. Ltd. http://venturehire.co  Hadoop maintains and secures the data by storing and keeping its replica.  It is focused on scaling according to data usage.  It can detect and delete the failed task and as well as failed transaction of data.  It not only recovers the data but also automatically restores the data at its place. Typical Hadoop Platform Stack – HDFS + Hive + HBase + Pig HDFS stands for Hadoop Distributed File System – is part of Hadoop and known as a special file system which deals with distribution and storage of large sets of data. HDFS stores file as sequence of same size of block except the last block. It also deals with hardware failure and smoothen the data handling. Hive – Hive was initiated by Facebook. Hive is data warehouse tool which is based on Hadoop and converts query language into MapReduce jobs. It deals with the storage , analysis and queries of large set of data. Query language in hive used as HQL statement. Hive Query Language is similar to standard SQL statement. Hbase – Hbase is a Hadoop application which runs on top of HDFS. Hbase system represents set of table but Hbase is column oriented database management system i.e. different from the row oriented database management system. Generally if we talk about database then we think of relational database system but unlikely Hbase is not relational database at all and also it doesn’t support Structured Query Language like SQL. Java is prefered language use for Hbase application. One most important feature of Hbase is to real time read or write to large set of data. Pig – initiated by Yahoo and in 2007 it became open source. Do you know why it is named as Pig? It is called as Pig because it can handle any type of data!! strange but true. Pig is high level procedural programming platform developed for simplifying large data sets query in Hadoop and MapReduce. Pig have two component one is PigLatin which is programming language and other is run time environment where PigLatin programs are executed. Advantage and Disadvantage of Hadoop:
  • 3. © VH Education Services Pvt. Ltd. http://venturehire.co NoSQL As the term says NoSQL, it means non relational or Non-SQL database, refer to Hbase, Cassandra, MongoDb, Riak, CouchDB. It is not based on table formats and that’s the reason we don’t use SQL for data access. A traditional database deals with structured data while a relational database deals with the vertical as well as horizontal storage system. NoSQL deals with the unstructured, unpredictable kind of data according to the system requirement. NoSQL Technologies HBase (part of the Hadoop ecosystem),Cassandra, MongoDB, Riak, CouchDB. Cassandra database is used to handle the large set of data when we need to scale the database with high performance. Cassandra deals with the fault tolerance and replication of the data. With this we can go deeper in columns, supercolumns and more. It is a partial relational database system, supports best query capability but don’t have joins feature. It follows the column family model map with two dimensional and 3 dimensional. 2D model includes column family with some column in it, while 3D model created by associating super column in column family.
  • 4. © VH Education Services Pvt. Ltd. http://venturehire.co MongoDB is an agile NoSQL document database, unlike the traditional database which store the data in rows and column, MongoDB stores the document data in binary form of JSON document which is also known as BSON format. It is used for high scalability, availability and performance. In MongoDB dynamic schemas are the unit of database, which found in document where set of documents are found in collection while set of collection makes the database. Riak is open source NoSQL database system which is designed for availability, fault tolerance, scalability and high performance. It provides three kind of storage key/value store, document oriented store and web shaped store. It also store document in the JSON format. when we talk about data modeling then we will see that there is no ‘Master’, only nodes are there. All nodes are same and don’t have different responsibility. CouchDB is open source NoSQL database ,distributed, and schemaless.It stores the document data in the JSON format. It also provide feature related with web like access of document from the web browser through HTTP. Javascript can also be use to modify the document. In CouchDB document is combination of strings “keys” and “values”. Advantage and Disadvantage of NoSQL: Tools from Companies – Cassandra, Riak, Redis, HBase, Oracle, membase, mongoDB.
  • 5. © VH Education Services Pvt. Ltd. http://venturehire.co Related Article: 1. What is Big Data and How Fast is it Growing? 2. Why Big Data is Next Big Thing? Links which can be useful to you regarding this topic are Big Data Training Course in Bangalore Big Data Analytics Training in Bangalore