SlideShare une entreprise Scribd logo
1  sur  43
EcossistemaHadoop
Motivações
Játentouprocessarumarquivo-textocom1TB?
ODougCuttingtentou...
OqueéHadoop?
Umsistemadearquivosdistribuído
● HDFS (Hadoop Distributed File
System)
○ Baseado em blocos gigantes
○ Alta disponibilidade
○ Quase POSIX
○ Fricção baixa para usuários *nix
HDFSéumFS
>hdfs dfs -ls -h /data/customers/customers_joined
Found 3 items
-rw-r--r-- 4 hadoop supergroup 0 2016-05-18 21:37
/data/customers/customers_joined/_SUCCESS
drwxr-xr-x - hadoop supergroup 0 2016-05-18 22:43
/data/customers/customers_joined/output
-rw-r--r-- 4 hadoop supergroup 380.1 M 2016-05-18 21:37
/data/customers/customers_joined/part-r-00000.lzo
>hdfs fsck /data/customers/customers_joined/part-r-00000.lzo -files -blocks
0. BP-351819023-10.113.142.94-1441385804011:blk_1096286844_357944571
len=134217728 repl=4
1. BP-351819023-10.113.142.94-1441385804011:blk_1096286845_357945328
len=134217728 repl=4
2. BP-351819023-10.113.142.94-1441385804011:blk_1096286846_357946005
len=130107025 repl=4
The filesystem under path '/data/customers/customers_joined/part-r-00000.lzo'
is HEALTHY
Umaplataforma deprocessamento distribuído
YARN (Yet Another Resource
Negotiator)
Distribui aplicações MapReduce
Divide processamento para os nós
Gerencia recursos
Hadoopéumaplataformafacilmente
escaláveldearmazenamentoe
processamentodistribuído,
agnósticoaotipodedado,hardwaree
sistemaoperacional.
Topologia
Namenode
Secondaries
Secondaries
Datanodes
H
D
F
S
ResourceManager
Secondaries
Secondaries
NodeManager
Y
A
R
N
Localidade
D=2
D=0
D=4
D=6
MapReduce
MapReducecomPYdoop
import pydoop.mapreduce.api as api
import pydoop.mapreduce.pipes as pp
class Mapper(api.Mapper):
def map(self, context):
words = context.value.split()
for w in words:
context.emit(w, 1)
class Reducer(api.Reducer):
def reduce(self, context):
s = sum(context.values)
context.emit(context.key, s)
def __main__():
pp.run_task(pp.Factory(Mapper, Reducer))
Ecossistema
BigDataLandscape2016
mattturck.com/2016/02/01/big-data-landscape
Hive
Criado pelo Facebook
Expõe MapReduce como SQL
Mapeia arquivos gigantescos
Suporta compressão (particionável)
Recursos e síntaxe similares ao MySQL
Analítico
Hiveemação
hive> select keyword, count(1) total from searches group by keyword order by
total desc limit 10;
Hadoop job information for Stage-1: number of mappers: 70; number of reducers:
2
2016-05-17 11:39:27,729 Stage-1 map = 0%, reduce = 0%
2016-05-17 11:40:12,255 Stage-1 map = 100%, reduce = 100%, Cumulative CPU
406.39 sec
OK
sofa 138361
tablet 125837
fogao 106451
geladeira 99641
notebook 92787
celular 92195
tv 87594
iphone 82045
microondas 75806
Time taken: 116.755 seconds, Fetched: 10 row(s)
Flume
Coletor/Agregador/Streamer/Sinker
Ingestor de tail -f distribuído
Agregador de logs
Compressão do stream
Requisições HTTP
Sinker para HBase, HDFS, Avro,
BuscasnoFlume
# Sources
search_stream.sources = SearchStream
search_stream.sources.SearchStream.type = http
search_stream.sources.SearchStream.handler = org.apache.flume.source.http.JSONHandler
# Channels
search_stream.channels = FileChannel
search_stream.channels.FileChannel.type = file
search_stream.channels.FileChannel.dataDirs = /mnt/flume/data
# Sinks
search_stream.sinks = HDFSSink
search_stream.sinks.HDFSSink.type = hdfs
search_stream.sinks.HDFSSink.hdfs.filePrefix = search
search_stream.sinks.HDFSSink.hdfs.path = /flume/events/search
search_stream.sinks.HDFSSink.hdfs.fileType = CompressedStream
search_stream.sinks.HDFSSink.hdfs.codeC = lzop
# Tie everything together
search_stream.sources.SearchStream.channels = FileChannel
search_stream.sinks.HDFSSink.channel = FileChannel
Impala
Usa engine própria (MPP)
Usa metadados do Hive
Aproveita-se bem do Apache Parquet
Bilhões de linhas em pouquíssimos
segundos
Performance linear ao tamanho do cluster
Impalaesuasbilhõesdelinhasempoucossegundos
impala-shell> select count(1) total from price_changes;
+------------+
| total |
+------------+
| 2127126382 |
+------------+
Returned 1 row(s) in 2.11s
impala-shell> select status, count(1) total from price_changes group by status;
+-----------+------------+
| status | total |
+-----------+------------+
| 0 | 2119762471 |
| 1 | 7363911 |
+-----------+------------+
Returned 2 row(s) in 2.82s
HBase
Baseado no Google BigTable
Colunar (dinâmico): trilhões de colunas e
linhas
Acesso aleatório e em tempo-real
Divide chaves em regiões, que estão em
diferentes máquinas
ClientenoHBase
hbase> get customers, '40019869212'
COLUMN CELL
cliente:nomcli
timestamp=1462020386697, value=NELSON FORTE
cliente:numdoc
timestamp=1462020386697, value=598745328
cliente:origem
timestamp=1462020386697, value=ERP
cliente:qtcompras
timestamp=1462020386697, value=3
cliente:sexo
timestamp=1462020386697, value=M
1 row(s) in 0.0910 seconds
Apache Pig
Pig
Pig Latin
Simplicidade nas expressões
Constrói resultados analíticos com
MapReduce
ETL sobre grandes dados
Extensível (Java, Python, Ruby,
Relacionandobasesdecliente
grunt> set output.compression.enable true;
grunt> set output.compression.codec com.hadoop.compression.lzo.LzopCodec;
erp = LOAD 'hbase://customers_erp' USING
org.apache.pig.backend.hadoop.hbase.HBaseStorage('client:customer_id', '-
loadKey=true') AS (id:bytearray, name_erp:chararray);
site = LOAD 'hbase://customers_site' USING
org.apache.pig.backend.hadoop.hbase.HBaseStorage('client:name', '-loadKey=true') AS
(id:bytearray, name_site:chararray);
joined = JOIN erp BY id, site BY id;
stripped = FOREACH joined GENERATE name_erp, LOWER(name_erp) AS lowered_name_erp,
name_site, LOWER(name_site) AS lowered_name_site;
rmf /data/customers/joined
STORE stripped INTO '/data/customers/joined' USING PigStorage('|');
Sqoop
Importador de dados estruturados pro HDFS
e vice-versa
Distribui a importação pelas máquinas
(via MapReduce)
Configuração simples
Servidor de Jobs
Importandopushes
sqoop job --create push_notification -- import --connect
jdbc:mysql://server/push 
--username importer 
--password-file file:///.password_file 
--split-by created_at 
--num-mappers 4 
--compress 
--compression-codec com.hadoop.compression.lzo.LzopCodec 
--delete-target-dir 
--target-dir /tmp/push 
--null-string 'N' 
--null-non-string 'N' 
--as-textfile 
--outdir /sqoop-scripts/src 
--bindir /sqoop-scripts/jar 
--direct --query 'select push_id, created_at, email, sent + 0 from pushes
where $CONDITIONS'
Quemusae
contribui?
https://wiki.apache.org/hadoop/PoweredBy
2000+nós|8800cores|60PB
Hive+Presto
Cassandra->HDFS(Análise)
“Novo”LZO+ElephantBird
1000+nós
Genie+Lipstick
900+nós
Luigi+Snakebite
200+nós|~1PB/dia
(metadados)
~1GB/dia
Dadosclimáticos
100+nós 1400+nós
10000+nós|100000+ cores
72PBRAM
6nós|56cores|458GBRAM|30TB
1000jobs/dia
Perguntas?
Obrigado!

Contenu connexe

Tendances

Introduction to apache hadoop
Introduction to apache hadoopIntroduction to apache hadoop
Introduction to apache hadoopShashwat Shriparv
 
Hadoop installation with an example
Hadoop installation with an exampleHadoop installation with an example
Hadoop installation with an exampleNikita Kesharwani
 
Hadoop migration and upgradation
Hadoop migration and upgradationHadoop migration and upgradation
Hadoop migration and upgradationShashwat Shriparv
 
Hadoop Installation presentation
Hadoop Installation presentationHadoop Installation presentation
Hadoop Installation presentationpuneet yadav
 
Apache Drill - Why, What, How
Apache Drill - Why, What, HowApache Drill - Why, What, How
Apache Drill - Why, What, Howmcsrivas
 
Learn Hadoop Administration
Learn Hadoop AdministrationLearn Hadoop Administration
Learn Hadoop AdministrationEdureka!
 
Big data interview questions and answers
Big data interview questions and answersBig data interview questions and answers
Big data interview questions and answersKalyan Hadoop
 
Introduction to HDFS and MapReduce
Introduction to HDFS and MapReduceIntroduction to HDFS and MapReduce
Introduction to HDFS and MapReduceUday Vakalapudi
 
Hadoop installation, Configuration, and Mapreduce program
Hadoop installation, Configuration, and Mapreduce programHadoop installation, Configuration, and Mapreduce program
Hadoop installation, Configuration, and Mapreduce programPraveen Kumar Donta
 
Learn to setup a Hadoop Multi Node Cluster
Learn to setup a Hadoop Multi Node ClusterLearn to setup a Hadoop Multi Node Cluster
Learn to setup a Hadoop Multi Node ClusterEdureka!
 
Hadoop - Disk Fail In Place (DFIP)
Hadoop - Disk Fail In Place (DFIP)Hadoop - Disk Fail In Place (DFIP)
Hadoop - Disk Fail In Place (DFIP)mundlapudi
 
Introducción a hadoop
Introducción a hadoopIntroducción a hadoop
Introducción a hadoopdatasalt
 
Session 03 - Hadoop Installation and Basic Commands
Session 03 - Hadoop Installation and Basic CommandsSession 03 - Hadoop Installation and Basic Commands
Session 03 - Hadoop Installation and Basic CommandsAnandMHadoop
 
Hadoop architecture (Delhi Hadoop User Group Meetup 10 Sep 2011)
Hadoop architecture (Delhi Hadoop User Group Meetup 10 Sep 2011)Hadoop architecture (Delhi Hadoop User Group Meetup 10 Sep 2011)
Hadoop architecture (Delhi Hadoop User Group Meetup 10 Sep 2011)Hari Shankar Sreekumar
 

Tendances (20)

Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
 
Introduction to apache hadoop
Introduction to apache hadoopIntroduction to apache hadoop
Introduction to apache hadoop
 
Hadoop installation with an example
Hadoop installation with an exampleHadoop installation with an example
Hadoop installation with an example
 
Hadoop migration and upgradation
Hadoop migration and upgradationHadoop migration and upgradation
Hadoop migration and upgradation
 
Hadoop Installation presentation
Hadoop Installation presentationHadoop Installation presentation
Hadoop Installation presentation
 
Apache Drill - Why, What, How
Apache Drill - Why, What, HowApache Drill - Why, What, How
Apache Drill - Why, What, How
 
Learn Hadoop Administration
Learn Hadoop AdministrationLearn Hadoop Administration
Learn Hadoop Administration
 
Big data interview questions and answers
Big data interview questions and answersBig data interview questions and answers
Big data interview questions and answers
 
Introduction to HDFS and MapReduce
Introduction to HDFS and MapReduceIntroduction to HDFS and MapReduce
Introduction to HDFS and MapReduce
 
Hadoop administration
Hadoop administrationHadoop administration
Hadoop administration
 
Hadoop installation, Configuration, and Mapreduce program
Hadoop installation, Configuration, and Mapreduce programHadoop installation, Configuration, and Mapreduce program
Hadoop installation, Configuration, and Mapreduce program
 
Learn to setup a Hadoop Multi Node Cluster
Learn to setup a Hadoop Multi Node ClusterLearn to setup a Hadoop Multi Node Cluster
Learn to setup a Hadoop Multi Node Cluster
 
Hadoop - Disk Fail In Place (DFIP)
Hadoop - Disk Fail In Place (DFIP)Hadoop - Disk Fail In Place (DFIP)
Hadoop - Disk Fail In Place (DFIP)
 
Hadoop 1.x vs 2
Hadoop 1.x vs 2Hadoop 1.x vs 2
Hadoop 1.x vs 2
 
Hadoop2.2
Hadoop2.2Hadoop2.2
Hadoop2.2
 
Hadoop Interview Questions and Answers
Hadoop Interview Questions and AnswersHadoop Interview Questions and Answers
Hadoop Interview Questions and Answers
 
Introducción a hadoop
Introducción a hadoopIntroducción a hadoop
Introducción a hadoop
 
Session 03 - Hadoop Installation and Basic Commands
Session 03 - Hadoop Installation and Basic CommandsSession 03 - Hadoop Installation and Basic Commands
Session 03 - Hadoop Installation and Basic Commands
 
Hadoop architecture (Delhi Hadoop User Group Meetup 10 Sep 2011)
Hadoop architecture (Delhi Hadoop User Group Meetup 10 Sep 2011)Hadoop architecture (Delhi Hadoop User Group Meetup 10 Sep 2011)
Hadoop architecture (Delhi Hadoop User Group Meetup 10 Sep 2011)
 
HDFS Internals
HDFS InternalsHDFS Internals
HDFS Internals
 

En vedette

Arquitetura para solução Big Data – open source
Arquitetura para solução Big Data – open sourceArquitetura para solução Big Data – open source
Arquitetura para solução Big Data – open sourceFelipe RENZ - MBA TI / Big
 
Exemplos de uso de apache spark usando aws elastic map reduce
Exemplos de uso de apache spark usando aws elastic map reduceExemplos de uso de apache spark usando aws elastic map reduce
Exemplos de uso de apache spark usando aws elastic map reduceFelipe
 
Arquitetura do Framework Apache Hadoop 2.6
Arquitetura do Framework Apache Hadoop 2.6Arquitetura do Framework Apache Hadoop 2.6
Arquitetura do Framework Apache Hadoop 2.6Felipe Schimith Batista
 
Estudo sobre ferramentas de BI Open Source
Estudo sobre ferramentas de BI Open SourceEstudo sobre ferramentas de BI Open Source
Estudo sobre ferramentas de BI Open SourceNelson Forte
 
Big Data - O que é o hadoop, map reduce, hdfs e hive
Big Data - O que é o hadoop, map reduce, hdfs e hiveBig Data - O que é o hadoop, map reduce, hdfs e hive
Big Data - O que é o hadoop, map reduce, hdfs e hiveFlavio Fonte, PMP, ITIL
 
Arquiteturas, Tecnologias e Desafios para Análise de BigData
Arquiteturas, Tecnologias e Desafios para Análise de BigDataArquiteturas, Tecnologias e Desafios para Análise de BigData
Arquiteturas, Tecnologias e Desafios para Análise de BigDataSandro Andrade
 
Arquiteturas escaláveis o exemplo da spotify aplicado ao e commerce
Arquiteturas escaláveis  o exemplo da spotify aplicado ao e commerceArquiteturas escaláveis  o exemplo da spotify aplicado ao e commerce
Arquiteturas escaláveis o exemplo da spotify aplicado ao e commerceRafael Rocha
 
SQL to Hive Cheat Sheet
SQL to Hive Cheat SheetSQL to Hive Cheat Sheet
SQL to Hive Cheat SheetHortonworks
 

En vedette (11)

Arquitetura para solução Big Data – open source
Arquitetura para solução Big Data – open sourceArquitetura para solução Big Data – open source
Arquitetura para solução Big Data – open source
 
Exemplos de uso de apache spark usando aws elastic map reduce
Exemplos de uso de apache spark usando aws elastic map reduceExemplos de uso de apache spark usando aws elastic map reduce
Exemplos de uso de apache spark usando aws elastic map reduce
 
Arquitetura do Framework Apache Hadoop 2.6
Arquitetura do Framework Apache Hadoop 2.6Arquitetura do Framework Apache Hadoop 2.6
Arquitetura do Framework Apache Hadoop 2.6
 
Hadoop
HadoopHadoop
Hadoop
 
The ABC of Big Data
The ABC of Big DataThe ABC of Big Data
The ABC of Big Data
 
Proposta de arquitetura Hadoop
Proposta de arquitetura HadoopProposta de arquitetura Hadoop
Proposta de arquitetura Hadoop
 
Estudo sobre ferramentas de BI Open Source
Estudo sobre ferramentas de BI Open SourceEstudo sobre ferramentas de BI Open Source
Estudo sobre ferramentas de BI Open Source
 
Big Data - O que é o hadoop, map reduce, hdfs e hive
Big Data - O que é o hadoop, map reduce, hdfs e hiveBig Data - O que é o hadoop, map reduce, hdfs e hive
Big Data - O que é o hadoop, map reduce, hdfs e hive
 
Arquiteturas, Tecnologias e Desafios para Análise de BigData
Arquiteturas, Tecnologias e Desafios para Análise de BigDataArquiteturas, Tecnologias e Desafios para Análise de BigData
Arquiteturas, Tecnologias e Desafios para Análise de BigData
 
Arquiteturas escaláveis o exemplo da spotify aplicado ao e commerce
Arquiteturas escaláveis  o exemplo da spotify aplicado ao e commerceArquiteturas escaláveis  o exemplo da spotify aplicado ao e commerce
Arquiteturas escaláveis o exemplo da spotify aplicado ao e commerce
 
SQL to Hive Cheat Sheet
SQL to Hive Cheat SheetSQL to Hive Cheat Sheet
SQL to Hive Cheat Sheet
 

Similaire à Ecossistema Hadoop no Magazine Luiza

R the unsung hero of Big Data
R the unsung hero of Big DataR the unsung hero of Big Data
R the unsung hero of Big DataDhafer Malouche
 
Ambari Management Packs (Apache Ambari Meetup 2018)
Ambari Management Packs (Apache Ambari Meetup 2018)Ambari Management Packs (Apache Ambari Meetup 2018)
Ambari Management Packs (Apache Ambari Meetup 2018)Swapan Shridhar
 
What's new in hadoop 3.0
What's new in hadoop 3.0What's new in hadoop 3.0
What's new in hadoop 3.0Heiko Loewe
 
Tajo_Meetup_20141120
Tajo_Meetup_20141120Tajo_Meetup_20141120
Tajo_Meetup_20141120Hyoungjun Kim
 
Tools, not only for Oracle RAC
Tools, not only for Oracle RACTools, not only for Oracle RAC
Tools, not only for Oracle RACMarkus Flechtner
 
RAPIDS: GPU-Accelerated ETL and Feature Engineering
RAPIDS: GPU-Accelerated ETL and Feature EngineeringRAPIDS: GPU-Accelerated ETL and Feature Engineering
RAPIDS: GPU-Accelerated ETL and Feature EngineeringKeith Kraus
 
Hadoop - Lessons Learned
Hadoop - Lessons LearnedHadoop - Lessons Learned
Hadoop - Lessons Learnedtcurdt
 
12c: Testing audit features for Data Pump (Export & Import) and RMAN jobs
12c: Testing audit features for Data Pump (Export & Import) and RMAN jobs12c: Testing audit features for Data Pump (Export & Import) and RMAN jobs
12c: Testing audit features for Data Pump (Export & Import) and RMAN jobsMonowar Mukul
 
Ambari Management Packs (Apache Ambari Meetup 2018)
Ambari Management Packs (Apache Ambari Meetup 2018)Ambari Management Packs (Apache Ambari Meetup 2018)
Ambari Management Packs (Apache Ambari Meetup 2018)Swapan Shridhar
 
Common schema my sql uc 2012
Common schema   my sql uc 2012Common schema   my sql uc 2012
Common schema my sql uc 2012Roland Bouman
 
Common schema my sql uc 2012
Common schema   my sql uc 2012Common schema   my sql uc 2012
Common schema my sql uc 2012Roland Bouman
 
Secure Hadoop Cluster With Kerberos
Secure Hadoop Cluster With KerberosSecure Hadoop Cluster With Kerberos
Secure Hadoop Cluster With KerberosEdureka!
 
Highload Perf Tuning
Highload Perf TuningHighload Perf Tuning
Highload Perf TuningHighLoad2009
 
Bringing OLTP woth OLAP: Lumos on Hadoop
Bringing OLTP woth OLAP: Lumos on HadoopBringing OLTP woth OLAP: Lumos on Hadoop
Bringing OLTP woth OLAP: Lumos on HadoopDataWorks Summit
 
Hadoop Architecture and HDFS
Hadoop Architecture and HDFSHadoop Architecture and HDFS
Hadoop Architecture and HDFSEdureka!
 
Hvordan sette opp en OAI-PMH metadata-innhøster
Hvordan sette opp en OAI-PMH metadata-innhøsterHvordan sette opp en OAI-PMH metadata-innhøster
Hvordan sette opp en OAI-PMH metadata-innhøsterLibriotech
 
Pig on Tez - Low Latency ETL with Big Data
Pig on Tez - Low Latency ETL with Big DataPig on Tez - Low Latency ETL with Big Data
Pig on Tez - Low Latency ETL with Big DataDataWorks Summit
 
Presentation sreenu dwh-services
Presentation sreenu dwh-servicesPresentation sreenu dwh-services
Presentation sreenu dwh-servicesSreenu Musham
 

Similaire à Ecossistema Hadoop no Magazine Luiza (20)

R the unsung hero of Big Data
R the unsung hero of Big DataR the unsung hero of Big Data
R the unsung hero of Big Data
 
Ambari Management Packs (Apache Ambari Meetup 2018)
Ambari Management Packs (Apache Ambari Meetup 2018)Ambari Management Packs (Apache Ambari Meetup 2018)
Ambari Management Packs (Apache Ambari Meetup 2018)
 
What's new in hadoop 3.0
What's new in hadoop 3.0What's new in hadoop 3.0
What's new in hadoop 3.0
 
RHadoop - beginners
RHadoop - beginnersRHadoop - beginners
RHadoop - beginners
 
Hadoop
HadoopHadoop
Hadoop
 
Tajo_Meetup_20141120
Tajo_Meetup_20141120Tajo_Meetup_20141120
Tajo_Meetup_20141120
 
Tools, not only for Oracle RAC
Tools, not only for Oracle RACTools, not only for Oracle RAC
Tools, not only for Oracle RAC
 
RAPIDS: GPU-Accelerated ETL and Feature Engineering
RAPIDS: GPU-Accelerated ETL and Feature EngineeringRAPIDS: GPU-Accelerated ETL and Feature Engineering
RAPIDS: GPU-Accelerated ETL and Feature Engineering
 
Hadoop - Lessons Learned
Hadoop - Lessons LearnedHadoop - Lessons Learned
Hadoop - Lessons Learned
 
12c: Testing audit features for Data Pump (Export & Import) and RMAN jobs
12c: Testing audit features for Data Pump (Export & Import) and RMAN jobs12c: Testing audit features for Data Pump (Export & Import) and RMAN jobs
12c: Testing audit features for Data Pump (Export & Import) and RMAN jobs
 
Ambari Management Packs (Apache Ambari Meetup 2018)
Ambari Management Packs (Apache Ambari Meetup 2018)Ambari Management Packs (Apache Ambari Meetup 2018)
Ambari Management Packs (Apache Ambari Meetup 2018)
 
Common schema my sql uc 2012
Common schema   my sql uc 2012Common schema   my sql uc 2012
Common schema my sql uc 2012
 
Common schema my sql uc 2012
Common schema   my sql uc 2012Common schema   my sql uc 2012
Common schema my sql uc 2012
 
Secure Hadoop Cluster With Kerberos
Secure Hadoop Cluster With KerberosSecure Hadoop Cluster With Kerberos
Secure Hadoop Cluster With Kerberos
 
Highload Perf Tuning
Highload Perf TuningHighload Perf Tuning
Highload Perf Tuning
 
Bringing OLTP woth OLAP: Lumos on Hadoop
Bringing OLTP woth OLAP: Lumos on HadoopBringing OLTP woth OLAP: Lumos on Hadoop
Bringing OLTP woth OLAP: Lumos on Hadoop
 
Hadoop Architecture and HDFS
Hadoop Architecture and HDFSHadoop Architecture and HDFS
Hadoop Architecture and HDFS
 
Hvordan sette opp en OAI-PMH metadata-innhøster
Hvordan sette opp en OAI-PMH metadata-innhøsterHvordan sette opp en OAI-PMH metadata-innhøster
Hvordan sette opp en OAI-PMH metadata-innhøster
 
Pig on Tez - Low Latency ETL with Big Data
Pig on Tez - Low Latency ETL with Big DataPig on Tez - Low Latency ETL with Big Data
Pig on Tez - Low Latency ETL with Big Data
 
Presentation sreenu dwh-services
Presentation sreenu dwh-servicesPresentation sreenu dwh-services
Presentation sreenu dwh-services
 

Dernier

DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 

Dernier (20)

DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 

Ecossistema Hadoop no Magazine Luiza

Notes de l'éditeur

  1. Resolve problema de leitura sequencial e aleatório (100MB/s) de um disco rígido, distribuindo partes do arquivo em várias máquinas Taxa de transferência dos discos atuais é muito maior que a velocidade de busca B-Trees funcionam bem até os gigabytes nessa taxa de transferência
  2. HDDs tem normalmente blocos de 512 bytes Ext4 tem blocos de 4 Kb HDFS tem blocos de 128 Mb
  3. Suporta: S3 FTP RAID distribuído Local WebHDFS (REST)
  4. CERN - 30 PB/dia