SlideShare une entreprise Scribd logo
1  sur  8
Msquare Systems Inc.,
What is Hadoop?

Apache Hadoop is an open source project governed by the Apache Software Foundation (ASF) that allows you to gain
insight from massive amounts of structured and unstructured data quickly and without significant investment.

Hadoop is designed to run on commodity hardware and can scale up or down without system interruption. It consists
of three main functions: storage, processing and resource management.
Core services on Hadoop

MapReduce: MapReduce is a framework for writing applications that process large amounts of structured and unstructured data in
parallel across a cluster of several machines in a reliable and fault-tolerant.

HDFS: Hadoop Distributed File System is a java-based file system that provides scalable and reliable data storage for large group of
clusters.

Hadoop Yarn: Yarn is a next generation framework for Hadoop Data processing extending MapReduce capabilities by supporting nonMapReduce workloads associated with other programming models.

Apache Tez: Tez generalizes the MapReduce paradigm to a more powerful framework for executing a complex DAG (directed acyclic
graph) of tasks for near real-time big data processing
Hadoop Data Services

Apache Pig: Its platform for processing and analyzing large data sets.

Apache Hbase: A column-oriented No SQL data storage system that provides random real-time read/write access to big data for user
applications.

Apache Hive: Built on the MapReduce framework, Hive is a data warehouse that enables easy data summarization and add-hoc queries via
SQL-like interface for large datasets stored in HDFS.

Apache Flume: Allows efficiently aggregating and moving large amounts of log data from many different sources to Hadoop.

Apache Mahout: Apache Mahout scalable machine learning algorithms for hadoop, which aids with data science for clustering, classification
and batch based collaborative filtering
Hadoop Data Services

Apache Accumulo : Accumulo is a high performance data storage and retrieval system with cell-level access control. It is a scalable
implementation of Google’s Big Table design that works on top of Apache Hadoop and Apache ZooKeeper.

Apache Storm : Storm is a distributed real-time computation system for processing fast, large streams of data adding reliable real-time data
processing capabilities to Apache Hadoop 2.x.
Apache Catalog : A table and metadata management service that provides a centralized way for data processing systems to understand the
structure and location of the data stored within Apache Hadoop

Apache Sqoop : Sqoop is a tool that speeds and eases movement of data in and out of Hadoop. It provides a reliable parallel load for
various, popular enterprise data sources.
Hadoop Operational Services

Apache Zookeeper: A highly available system for coordinating distributing processes.

Apache Falcon: Falcon is a data management framework for simplifying data lifecycle management and processing pipelines on Apache
hadoop.

Apache Ambari: Open source installation lifecycle management, administration, and monitoring system for Apache Hadoop Clusters.

Apache knox: “Knox” gateway is a system that provides a single point of authentication and access for Apache Hadoop services in a cluster.

Apache Oozie: Oozie Java web application used to schedule Apache Hadoop Jobs. Oozie combines multiple jobs sequentially into one logical
unit of work.
What Hadoop can, and can't do

What Hadoop can't do
You can't use Hadoop for
 Structured data
 Transactional data

What Hadoop can do
You can use Hadoop for
 Big Data
Support & Partner
Getting Started or Support –

Muthu Natarajan

muthu.n@msquaresystems.com

www.msquaresystems.com

Phone: 212-941-6000

Contenu connexe

Tendances

The Hadoop Path by Subash DSouza of Archangel Technology Consultants, LLC.
The Hadoop Path by Subash DSouza of Archangel Technology Consultants, LLC.The Hadoop Path by Subash DSouza of Archangel Technology Consultants, LLC.
The Hadoop Path by Subash DSouza of Archangel Technology Consultants, LLC.Data Con LA
 
Introduction to Apache hadoop
Introduction to Apache hadoopIntroduction to Apache hadoop
Introduction to Apache hadoopOmar Jaber
 
Big Data and Hadoop
Big Data and HadoopBig Data and Hadoop
Big Data and HadoopFlavio Vit
 
عصر کلان داده، چرا و چگونه؟
عصر کلان داده، چرا و چگونه؟عصر کلان داده، چرا و چگونه؟
عصر کلان داده، چرا و چگونه؟datastack
 
Big data vahidamiri-tabriz-13960226-datastack.ir
Big data vahidamiri-tabriz-13960226-datastack.irBig data vahidamiri-tabriz-13960226-datastack.ir
Big data vahidamiri-tabriz-13960226-datastack.irdatastack
 
Seminar_Report_hadoop
Seminar_Report_hadoopSeminar_Report_hadoop
Seminar_Report_hadoopVarun Narang
 
Big data vahidamiri-datastack.ir
Big data vahidamiri-datastack.irBig data vahidamiri-datastack.ir
Big data vahidamiri-datastack.irdatastack
 
Big Data and Hadoop Introduction
 Big Data and Hadoop Introduction Big Data and Hadoop Introduction
Big Data and Hadoop IntroductionDzung Nguyen
 
Introduction to Hadoop and MapReduce
Introduction to Hadoop and MapReduceIntroduction to Hadoop and MapReduce
Introduction to Hadoop and MapReduceeakasit_dpu
 
HADOOP TECHNOLOGY ppt
HADOOP  TECHNOLOGY pptHADOOP  TECHNOLOGY ppt
HADOOP TECHNOLOGY pptsravya raju
 
Big data processing with apache spark part1
Big data processing with apache spark   part1Big data processing with apache spark   part1
Big data processing with apache spark part1Abbas Maazallahi
 
Top Hadoop Big Data Interview Questions and Answers for Fresher
Top Hadoop Big Data Interview Questions and Answers for FresherTop Hadoop Big Data Interview Questions and Answers for Fresher
Top Hadoop Big Data Interview Questions and Answers for FresherJanBask Training
 
Big data Hadoop presentation
Big data  Hadoop  presentation Big data  Hadoop  presentation
Big data Hadoop presentation Shivanee garg
 

Tendances (20)

The Hadoop Path by Subash DSouza of Archangel Technology Consultants, LLC.
The Hadoop Path by Subash DSouza of Archangel Technology Consultants, LLC.The Hadoop Path by Subash DSouza of Archangel Technology Consultants, LLC.
The Hadoop Path by Subash DSouza of Archangel Technology Consultants, LLC.
 
Introduction to Apache hadoop
Introduction to Apache hadoopIntroduction to Apache hadoop
Introduction to Apache hadoop
 
Big Data and Hadoop
Big Data and HadoopBig Data and Hadoop
Big Data and Hadoop
 
عصر کلان داده، چرا و چگونه؟
عصر کلان داده، چرا و چگونه؟عصر کلان داده، چرا و چگونه؟
عصر کلان داده، چرا و چگونه؟
 
Big data vahidamiri-tabriz-13960226-datastack.ir
Big data vahidamiri-tabriz-13960226-datastack.irBig data vahidamiri-tabriz-13960226-datastack.ir
Big data vahidamiri-tabriz-13960226-datastack.ir
 
Hadoop Tutorial For Beginners
Hadoop Tutorial For BeginnersHadoop Tutorial For Beginners
Hadoop Tutorial For Beginners
 
Seminar_Report_hadoop
Seminar_Report_hadoopSeminar_Report_hadoop
Seminar_Report_hadoop
 
Big data vahidamiri-datastack.ir
Big data vahidamiri-datastack.irBig data vahidamiri-datastack.ir
Big data vahidamiri-datastack.ir
 
Big Data and Hadoop Introduction
 Big Data and Hadoop Introduction Big Data and Hadoop Introduction
Big Data and Hadoop Introduction
 
Hadoop seminar
Hadoop seminarHadoop seminar
Hadoop seminar
 
Introduction to Hadoop and MapReduce
Introduction to Hadoop and MapReduceIntroduction to Hadoop and MapReduce
Introduction to Hadoop and MapReduce
 
HADOOP TECHNOLOGY ppt
HADOOP  TECHNOLOGY pptHADOOP  TECHNOLOGY ppt
HADOOP TECHNOLOGY ppt
 
Big Data and Hadoop - An Introduction
Big Data and Hadoop - An IntroductionBig Data and Hadoop - An Introduction
Big Data and Hadoop - An Introduction
 
Big data and Hadoop
Big data and HadoopBig data and Hadoop
Big data and Hadoop
 
Hadoop info
Hadoop infoHadoop info
Hadoop info
 
HDFS
HDFSHDFS
HDFS
 
Big data processing with apache spark part1
Big data processing with apache spark   part1Big data processing with apache spark   part1
Big data processing with apache spark part1
 
Hadoop Seminar Report
Hadoop Seminar ReportHadoop Seminar Report
Hadoop Seminar Report
 
Top Hadoop Big Data Interview Questions and Answers for Fresher
Top Hadoop Big Data Interview Questions and Answers for FresherTop Hadoop Big Data Interview Questions and Answers for Fresher
Top Hadoop Big Data Interview Questions and Answers for Fresher
 
Big data Hadoop presentation
Big data  Hadoop  presentation Big data  Hadoop  presentation
Big data Hadoop presentation
 

En vedette

Processing cassandra datasets with hadoop streaming based approaches
Processing cassandra datasets with hadoop streaming based approachesProcessing cassandra datasets with hadoop streaming based approaches
Processing cassandra datasets with hadoop streaming based approachesLeMeniz Infotech
 
An Introduction of Recent Research on MapReduce (2011)
An Introduction of Recent Research on MapReduce (2011)An Introduction of Recent Research on MapReduce (2011)
An Introduction of Recent Research on MapReduce (2011)Yu Liu
 
Silicon Halton - Meetup 72 - Co-Accelerate Your Business
Silicon Halton - Meetup 72 - Co-Accelerate Your BusinessSilicon Halton - Meetup 72 - Co-Accelerate Your Business
Silicon Halton - Meetup 72 - Co-Accelerate Your BusinessSilicon Halton
 
Silicon Halton Meetup 75: Augmented Reality, A Primer
Silicon Halton Meetup 75: Augmented Reality, A PrimerSilicon Halton Meetup 75: Augmented Reality, A Primer
Silicon Halton Meetup 75: Augmented Reality, A PrimerSilicon Halton
 
Silicon Halton Meetup 77 - Work Unscripted
Silicon Halton Meetup 77 - Work UnscriptedSilicon Halton Meetup 77 - Work Unscripted
Silicon Halton Meetup 77 - Work UnscriptedSilicon Halton
 
Silicon Halton Meetup 83 - Sr. HR Panel
Silicon Halton Meetup 83 - Sr. HR PanelSilicon Halton Meetup 83 - Sr. HR Panel
Silicon Halton Meetup 83 - Sr. HR PanelSilicon Halton
 
Silicon Halton Meetup 79 - Chart of Accounts
Silicon Halton Meetup 79 - Chart of AccountsSilicon Halton Meetup 79 - Chart of Accounts
Silicon Halton Meetup 79 - Chart of AccountsSilicon Halton
 

En vedette (7)

Processing cassandra datasets with hadoop streaming based approaches
Processing cassandra datasets with hadoop streaming based approachesProcessing cassandra datasets with hadoop streaming based approaches
Processing cassandra datasets with hadoop streaming based approaches
 
An Introduction of Recent Research on MapReduce (2011)
An Introduction of Recent Research on MapReduce (2011)An Introduction of Recent Research on MapReduce (2011)
An Introduction of Recent Research on MapReduce (2011)
 
Silicon Halton - Meetup 72 - Co-Accelerate Your Business
Silicon Halton - Meetup 72 - Co-Accelerate Your BusinessSilicon Halton - Meetup 72 - Co-Accelerate Your Business
Silicon Halton - Meetup 72 - Co-Accelerate Your Business
 
Silicon Halton Meetup 75: Augmented Reality, A Primer
Silicon Halton Meetup 75: Augmented Reality, A PrimerSilicon Halton Meetup 75: Augmented Reality, A Primer
Silicon Halton Meetup 75: Augmented Reality, A Primer
 
Silicon Halton Meetup 77 - Work Unscripted
Silicon Halton Meetup 77 - Work UnscriptedSilicon Halton Meetup 77 - Work Unscripted
Silicon Halton Meetup 77 - Work Unscripted
 
Silicon Halton Meetup 83 - Sr. HR Panel
Silicon Halton Meetup 83 - Sr. HR PanelSilicon Halton Meetup 83 - Sr. HR Panel
Silicon Halton Meetup 83 - Sr. HR Panel
 
Silicon Halton Meetup 79 - Chart of Accounts
Silicon Halton Meetup 79 - Chart of AccountsSilicon Halton Meetup 79 - Chart of Accounts
Silicon Halton Meetup 79 - Chart of Accounts
 

Similaire à Hadoop white papers

Brief Introduction about Hadoop and Core Services.
Brief Introduction about Hadoop and Core Services.Brief Introduction about Hadoop and Core Services.
Brief Introduction about Hadoop and Core Services.Muthu Natarajan
 
Big Data Technology Stack : Nutshell
Big Data Technology Stack : NutshellBig Data Technology Stack : Nutshell
Big Data Technology Stack : NutshellKhalid Imran
 
Hadoop Big Data A big picture
Hadoop Big Data A big pictureHadoop Big Data A big picture
Hadoop Big Data A big pictureJ S Jodha
 
Storage and-compute-hdfs-map reduce
Storage and-compute-hdfs-map reduceStorage and-compute-hdfs-map reduce
Storage and-compute-hdfs-map reduceChris Nauroth
 
Hadoop Ecosystem at a Glance
Hadoop Ecosystem at a GlanceHadoop Ecosystem at a Glance
Hadoop Ecosystem at a GlanceNeev Technologies
 
project report on hadoop
project report on hadoopproject report on hadoop
project report on hadoopManoj Jangalva
 
BIG DATA: Apache Hadoop
BIG DATA: Apache HadoopBIG DATA: Apache Hadoop
BIG DATA: Apache HadoopOleksiy Krotov
 
RDBMS vs Hadoop vs Spark
RDBMS vs Hadoop vs SparkRDBMS vs Hadoop vs Spark
RDBMS vs Hadoop vs SparkLaxmi8
 
Taylor bosc2010
Taylor bosc2010Taylor bosc2010
Taylor bosc2010BOSC 2010
 
Hadoop Platforms - Introduction, Importance, Providers
Hadoop Platforms - Introduction, Importance, ProvidersHadoop Platforms - Introduction, Importance, Providers
Hadoop Platforms - Introduction, Importance, ProvidersMrigendra Sharma
 

Similaire à Hadoop white papers (20)

Brief Introduction about Hadoop and Core Services.
Brief Introduction about Hadoop and Core Services.Brief Introduction about Hadoop and Core Services.
Brief Introduction about Hadoop and Core Services.
 
Big Data Technology Stack : Nutshell
Big Data Technology Stack : NutshellBig Data Technology Stack : Nutshell
Big Data Technology Stack : Nutshell
 
In15orlesss hadoop
In15orlesss hadoopIn15orlesss hadoop
In15orlesss hadoop
 
Hadoop vs Apache Spark
Hadoop vs Apache SparkHadoop vs Apache Spark
Hadoop vs Apache Spark
 
Hadoop
HadoopHadoop
Hadoop
 
Hadoop Big Data A big picture
Hadoop Big Data A big pictureHadoop Big Data A big picture
Hadoop Big Data A big picture
 
Intro to Hadoop
Intro to HadoopIntro to Hadoop
Intro to Hadoop
 
Storage and-compute-hdfs-map reduce
Storage and-compute-hdfs-map reduceStorage and-compute-hdfs-map reduce
Storage and-compute-hdfs-map reduce
 
Hadoop Ecosystem at a Glance
Hadoop Ecosystem at a GlanceHadoop Ecosystem at a Glance
Hadoop Ecosystem at a Glance
 
Hadoop basics
Hadoop basicsHadoop basics
Hadoop basics
 
2.1-HADOOP.pdf
2.1-HADOOP.pdf2.1-HADOOP.pdf
2.1-HADOOP.pdf
 
Hadoop Introduction
Hadoop IntroductionHadoop Introduction
Hadoop Introduction
 
BIGDATA ppts
BIGDATA pptsBIGDATA ppts
BIGDATA ppts
 
project report on hadoop
project report on hadoopproject report on hadoop
project report on hadoop
 
Case study on big data
Case study on big dataCase study on big data
Case study on big data
 
BIG DATA: Apache Hadoop
BIG DATA: Apache HadoopBIG DATA: Apache Hadoop
BIG DATA: Apache Hadoop
 
RDBMS vs Hadoop vs Spark
RDBMS vs Hadoop vs SparkRDBMS vs Hadoop vs Spark
RDBMS vs Hadoop vs Spark
 
Unit IV.pdf
Unit IV.pdfUnit IV.pdf
Unit IV.pdf
 
Taylor bosc2010
Taylor bosc2010Taylor bosc2010
Taylor bosc2010
 
Hadoop Platforms - Introduction, Importance, Providers
Hadoop Platforms - Introduction, Importance, ProvidersHadoop Platforms - Introduction, Importance, Providers
Hadoop Platforms - Introduction, Importance, Providers
 

Plus de Muthu Natarajan

Understanding about relational database m-square systems inc
Understanding about relational database m-square systems incUnderstanding about relational database m-square systems inc
Understanding about relational database m-square systems incMuthu Natarajan
 
Agile methodologiesvswaterfall
Agile methodologiesvswaterfallAgile methodologiesvswaterfall
Agile methodologiesvswaterfallMuthu Natarajan
 
Business intelligence data analytics-visualization
Business intelligence data analytics-visualizationBusiness intelligence data analytics-visualization
Business intelligence data analytics-visualizationMuthu Natarajan
 
Business intelligence, Data Analytics & Data Visualization
Business intelligence, Data Analytics & Data VisualizationBusiness intelligence, Data Analytics & Data Visualization
Business intelligence, Data Analytics & Data VisualizationMuthu Natarajan
 
Social Media Strategies and Social Marketing
Social Media Strategies and Social MarketingSocial Media Strategies and Social Marketing
Social Media Strategies and Social MarketingMuthu Natarajan
 
Cloud Computing & Benefits
Cloud Computing & BenefitsCloud Computing & Benefits
Cloud Computing & BenefitsMuthu Natarajan
 

Plus de Muthu Natarajan (8)

Understanding about relational database m-square systems inc
Understanding about relational database m-square systems incUnderstanding about relational database m-square systems inc
Understanding about relational database m-square systems inc
 
Agile methodologiesvswaterfall
Agile methodologiesvswaterfallAgile methodologiesvswaterfall
Agile methodologiesvswaterfall
 
Business intelligence data analytics-visualization
Business intelligence data analytics-visualizationBusiness intelligence data analytics-visualization
Business intelligence data analytics-visualization
 
Business intelligence, Data Analytics & Data Visualization
Business intelligence, Data Analytics & Data VisualizationBusiness intelligence, Data Analytics & Data Visualization
Business intelligence, Data Analytics & Data Visualization
 
Social Media Strategies and Social Marketing
Social Media Strategies and Social MarketingSocial Media Strategies and Social Marketing
Social Media Strategies and Social Marketing
 
Protect your website
Protect your websiteProtect your website
Protect your website
 
Hr presentation
Hr presentationHr presentation
Hr presentation
 
Cloud Computing & Benefits
Cloud Computing & BenefitsCloud Computing & Benefits
Cloud Computing & Benefits
 

Dernier

DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 

Dernier (20)

DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 

Hadoop white papers

  • 2. What is Hadoop? Apache Hadoop is an open source project governed by the Apache Software Foundation (ASF) that allows you to gain insight from massive amounts of structured and unstructured data quickly and without significant investment. Hadoop is designed to run on commodity hardware and can scale up or down without system interruption. It consists of three main functions: storage, processing and resource management.
  • 3. Core services on Hadoop MapReduce: MapReduce is a framework for writing applications that process large amounts of structured and unstructured data in parallel across a cluster of several machines in a reliable and fault-tolerant. HDFS: Hadoop Distributed File System is a java-based file system that provides scalable and reliable data storage for large group of clusters. Hadoop Yarn: Yarn is a next generation framework for Hadoop Data processing extending MapReduce capabilities by supporting nonMapReduce workloads associated with other programming models. Apache Tez: Tez generalizes the MapReduce paradigm to a more powerful framework for executing a complex DAG (directed acyclic graph) of tasks for near real-time big data processing
  • 4. Hadoop Data Services Apache Pig: Its platform for processing and analyzing large data sets. Apache Hbase: A column-oriented No SQL data storage system that provides random real-time read/write access to big data for user applications. Apache Hive: Built on the MapReduce framework, Hive is a data warehouse that enables easy data summarization and add-hoc queries via SQL-like interface for large datasets stored in HDFS. Apache Flume: Allows efficiently aggregating and moving large amounts of log data from many different sources to Hadoop. Apache Mahout: Apache Mahout scalable machine learning algorithms for hadoop, which aids with data science for clustering, classification and batch based collaborative filtering
  • 5. Hadoop Data Services Apache Accumulo : Accumulo is a high performance data storage and retrieval system with cell-level access control. It is a scalable implementation of Google’s Big Table design that works on top of Apache Hadoop and Apache ZooKeeper. Apache Storm : Storm is a distributed real-time computation system for processing fast, large streams of data adding reliable real-time data processing capabilities to Apache Hadoop 2.x. Apache Catalog : A table and metadata management service that provides a centralized way for data processing systems to understand the structure and location of the data stored within Apache Hadoop Apache Sqoop : Sqoop is a tool that speeds and eases movement of data in and out of Hadoop. It provides a reliable parallel load for various, popular enterprise data sources.
  • 6. Hadoop Operational Services Apache Zookeeper: A highly available system for coordinating distributing processes. Apache Falcon: Falcon is a data management framework for simplifying data lifecycle management and processing pipelines on Apache hadoop. Apache Ambari: Open source installation lifecycle management, administration, and monitoring system for Apache Hadoop Clusters. Apache knox: “Knox” gateway is a system that provides a single point of authentication and access for Apache Hadoop services in a cluster. Apache Oozie: Oozie Java web application used to schedule Apache Hadoop Jobs. Oozie combines multiple jobs sequentially into one logical unit of work.
  • 7. What Hadoop can, and can't do What Hadoop can't do You can't use Hadoop for  Structured data  Transactional data What Hadoop can do You can use Hadoop for  Big Data
  • 8. Support & Partner Getting Started or Support – Muthu Natarajan muthu.n@msquaresystems.com www.msquaresystems.com Phone: 212-941-6000