Hadoop - An Introduction

•Télécharger en tant que PPTX, PDF•

0 j'aime•849 vues

Shankar R

Introduction to Hadoop

Technologie

State of the Data What is Hadoop Hadoop Ecosystem References Agenda

Data driven businesses Businesses have been collecting information all the time Mine more == Collect more (and vice-versa) Challenges Application Complexities Data growth Infrastructure Economics Need of the day State of the data

Data driven business Businesses have been collecting informationall the time Mine more == Collect more (and vice-versa) Challenges Application Complexities Data growth Infrastructure Economics State of the data

Applications Searches, Message posts, Comments, Emails,Blogs, Photos, Video Clips, Product Listings ERP, CRM, Databases, Internal Applications, Customer/Consumer facing products Mobile Context Web, Customers, Products, Business Systems,Processes, Services Support Systems CRM, SOA, Recommendation Systems/processes,Data warehouses, Business Intelligence, BPM Data driven business

Drivers ROI Customer Retention Product Affinity Market Trends Research Analysis Customer/Consumer Analytics Process Clustering Classification Build Relationships Regression Types Structured Semi-structured Unstructured Mine more

Complex Applications Data integration is a good but complex problem to solve Data Growth Growth is exponential Infrastructure Availability Unscalablehardware Economics Managing high data volume comes at a price Failures are very costly Challenges

System that can handle high volume data System that can perform complex operations Scalable Robust Highly Available Fault Tolerant Cheap Need of the day

Top level Apache project Open source Inspired by Google’s white papers onMap/Reduce (MR), Google File System (GFS) Originally developed to support Apache Nutch Search Engine Software Framework - Java Designed For sophisticated analysis To deal with structured and unstructured complex data

Runs on commodity hardware Shared-nothing architecture Scale hardware when ever you want System compensates for hardware scalingand issues (if any) Run large-scale, high volume data processes Scales well with complex analysis jobs Handles failures Ideal to consolidate data from both new and legacy data sources Value to the business Why Hadoop?

HDFS Hadoop Distributed File System Map/Reduce Software framework for Clustered, Distributed data processing ZooKeeper Scheduler Avro Data Serialization Chukwa Data Collection System to monitor Distributed Systems HBase Data storage for distributed large tables Hive Data warehousing infrastructure Pig High-Level Query Language Hadoop Ecosystem

Master/Slave Architecture Runs on commodity hardware Fault Tolerant Handle large volumes of data Provides High Throughput Streaming data-access Simple file coherency model Portable to heterogeneous hardware and software Robust Handles disk failures, replication (& re-replication) Performs cluster rebalancing, data integrity checks HDFS – Hadoop Distributed File System

HDFS – Example Name node ,[object Object]

Maps data-nodesData node ,[object Object]

Contenu connexe

Tendances

Big data pptShweta Sahu

Big Tools for Big DataLewis Crawford

Introducing the Big Data Ecosystem with Caserta Concepts & TalendCaserta

Big data abstractnandhiniarumugam619

Big Data- Automotive Industry Use CaseSophie (C.F.) Tsai

Big Data Tech StackAbdullah Çetin ÇAVDAR

Big data 101Paresh Motiwala, PMP®

Big Data Analysis Patterns - TriHUG 6/27/2013boorad

Big data Analytics HadoopMishika Bharadwaj

Big Data Use Casesboorad

Big Data Analytics 2014Stratebi

AI meets Big DataJan Wiegelmann

Exploring Big Data Analytics ToolsMultisoft Virtual Academy

BigData Analytics with Hadoop and BIRTAmrit Chhetri

Big Data Final Presentation17aroumougamh

big data overview pptVIKAS KATARE

Introduction of big data unit 1RojaT4

Big data ecosystemmagda3695

Big Data Analytics with Hadoop, MongoDB and SQL ServerMark Kromer

Big data unit 2RojaT4

Tendances (20)

Big data ppt

Big Tools for Big Data

Introducing the Big Data Ecosystem with Caserta Concepts & Talend

Big data abstract

Big Data- Automotive Industry Use Case

Big Data Tech Stack

Big data 101

Big Data Analysis Patterns - TriHUG 6/27/2013

Big data Analytics Hadoop

Big Data Use Cases

Big Data Analytics 2014

AI meets Big Data

Exploring Big Data Analytics Tools

BigData Analytics with Hadoop and BIRT

Big Data Final Presentation

big data overview ppt

Introduction of big data unit 1

Big data ecosystem

Big Data Analytics with Hadoop, MongoDB and SQL Server

Big data unit 2

Similaire à Hadoop - An Introduction

Is the traditional data warehouse dead?James Serra

Addressing Big Data Challenges - The Hadoop WayXoriant Corporation

Big data: Descoberta de conhecimento em ambientes de big data e computação na...Rio Info

Cloud Computing: Hadoopdarugar

Hadoop DeveloperEdureka!

Testing Big Data: Automated ETL Testing of HadoopRTTS

Introduction To Big Data & HadoopBlackvard

Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...Cloudera, Inc.

BDW16 London - Deenar Toraskar, Think Reactive - Fast Data Key to Efficient C...Big Data Week

Hadoop & Data Warehouse Mohit Srivastava

Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Mod...Hortonworks

Modernizing Your Data Warehouse using APSStéphane Fréchette

Hadoop data-lake-white-paperSupratim Ray

Stratebi Big DataStratebi

Google Data Engineering.pdfavenkatram

Data Engineering on GCPBlibBlobb

data analytics lecture4.pptxNamrataBhatt8

Big data architectures and the data lakeJames Serra

Big Data Serving with Vespa - Jon Bratseth, Distinguished Architect, OathYahoo Developer Network

Pervasive DataRushtempledf

Similaire à Hadoop - An Introduction (20)

Is the traditional data warehouse dead?

Addressing Big Data Challenges - The Hadoop Way

Big data: Descoberta de conhecimento em ambientes de big data e computação na...

Cloud Computing: Hadoop

Hadoop Developer

Testing Big Data: Automated ETL Testing of Hadoop

Introduction To Big Data & Hadoop

Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...

BDW16 London - Deenar Toraskar, Think Reactive - Fast Data Key to Efficient C...

Hadoop & Data Warehouse

Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Mod...

Modernizing Your Data Warehouse using APS

Hadoop data-lake-white-paper

Stratebi Big Data

Google Data Engineering.pdf

Data Engineering on GCP

data analytics lecture4.pptx

Big data architectures and the data lake

Big Data Serving with Vespa - Jon Bratseth, Distinguished Architect, Oath

Pervasive DataRush

Dernier

Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC

TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc

Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2

08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls

08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls

IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung

Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer

Slack Application Development 101 Slidespraypatel2

Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal

04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG

Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1

Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko

Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia

How to convert PDF to text with Nanonetsnaman860154

What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco

Presentation on how to chat with PDF using ChatGPT code interpreternaman860154

A Year of the Servo Reboot: Where Are We Now?Igalia

Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech

[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745

Dernier (20)

Breaking the Kubernetes Kill Chain: Host Path Mount

TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments

Exploring the Future Potential of AI-Enabled Smartphone Processors

08448380779 Call Girls In Diplomatic Enclave Women Seeking Men

08448380779 Call Girls In Friends Colony Women Seeking Men

IAC 2024 - IA Fast Track to Search Focused AI Solutions

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...

Axa Assurance Maroc - Insurer Innovation Award 2024

Slack Application Development 101 Slides

Understanding Discord NSFW Servers A Guide for Responsible Users.pdf

04-2024-HHUG-Sales-and-Marketing-Alignment.pptx

Boost Fertility New Invention Ups Success Rates.pdf

Handwritten Text Recognition for manuscripts and early printed texts

Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...

How to convert PDF to text with Nanonets

What Are The Drone Anti-jamming Systems Technology?

Presentation on how to chat with PDF using ChatGPT code interpreter

A Year of the Servo Reboot: Where Are We Now?

Advantages of Hiring UIUX Design Service Providers for Your Business

[2024]Digital Global Overview Report 2024 Meltwater.pdf

Hadoop - An Introduction

1. Shankar Radhakrishnan HCL Technologies Hadoop – An Introduction

2. State of the Data What is Hadoop Hadoop Ecosystem References Agenda

3. Data driven businesses Businesses have been collecting information all the time Mine more == Collect more (and vice-versa) Challenges Application Complexities Data growth Infrastructure Economics Need of the day State of the data

4. Data driven business Businesses have been collecting informationall the time Mine more == Collect more (and vice-versa) Challenges Application Complexities Data growth Infrastructure Economics State of the data

5. Applications Searches, Message posts, Comments, Emails,Blogs, Photos, Video Clips, Product Listings ERP, CRM, Databases, Internal Applications, Customer/Consumer facing products Mobile Context Web, Customers, Products, Business Systems,Processes, Services Support Systems CRM, SOA, Recommendation Systems/processes,Data warehouses, Business Intelligence, BPM Data driven business

6. Data driven businesses Businesses have been collecting informationall the time Mine more == Collect more (and vice-versa) Challenges Application Complexities Data growth Infrastructure Economics State of the data

7. Drivers ROI Customer Retention Product Affinity Market Trends Research Analysis Customer/Consumer Analytics Process Clustering Classification Build Relationships Regression Types Structured Semi-structured Unstructured Mine more

8. Data driven businesses Businesses have been collecting informationall the time Mine more == Collect more (and vice-versa) Challenges Application Complexities Data growth Infrastructure Economics State of the data

9. Complex Applications Data integration is a good but complex problem to solve Data Growth Growth is exponential Infrastructure Availability Unscalablehardware Economics Managing high data volume comes at a price Failures are very costly Challenges

10. System that can handle high volume data System that can perform complex operations Scalable Robust Highly Available Fault Tolerant Cheap Need of the day

11. Top level Apache project Open source Inspired by Google’s white papers onMap/Reduce (MR), Google File System (GFS) Originally developed to support Apache Nutch Search Engine Software Framework - Java Designed For sophisticated analysis To deal with structured and unstructured complex data

12. Runs on commodity hardware Shared-nothing architecture Scale hardware when ever you want System compensates for hardware scalingand issues (if any) Run large-scale, high volume data processes Scales well with complex analysis jobs Handles failures Ideal to consolidate data from both new and legacy data sources Value to the business Why Hadoop?

13. Hadoop in an enterprise - Example

14. HDFS Hadoop Distributed File System Map/Reduce Software framework for Clustered, Distributed data processing ZooKeeper Scheduler Avro Data Serialization Chukwa Data Collection System to monitor Distributed Systems HBase Data storage for distributed large tables Hive Data warehousing infrastructure Pig High-Level Query Language Hadoop Ecosystem

15. Master/Slave Architecture Runs on commodity hardware Fault Tolerant Handle large volumes of data Provides High Throughput Streaming data-access Simple file coherency model Portable to heterogeneous hardware and software Robust Handles disk failures, replication (& re-replication) Performs cluster rebalancing, data integrity checks HDFS – Hadoop Distributed File System

16.

17.

18. Handles Data-blocks

19.

20. Example : Mapper Function

21. Example : Reduce Function

22. Who runs Hadoop?