SlideShare une entreprise Scribd logo
1  sur  32
Télécharger pour lire hors ligne
Hadoop and
                Subsystems
                     in
                 livedoor
                Hadoop Conference Japan
                        2011 Fall
                      2011/09/26
                       tagomoris
2011   9   26
2011   9   26
we are hiring!



2011   9   26
what's livedoor?



2011   9   26
2011   9   26
large scale web services

                     2800+ servers
                      3200+ hosts

                   530+ web servers


2011   9   26
20 Aug 2009




2011   9   26
Aug 2011


                15Gbps
                (10Gbps + CDN 5Gbps)



2011   9   26
Hadoop in livedoor
                • 10 nodes (1+9)
                 • 36 core, 32TB HDFS
                • CDH3b2
                 • with libhdfs, fuse-hdfs
                 • Hive 0.6.0 (community package)




2011   9   26
Hadoop in livedoor
                   data mining
                     reporting
                  page views, unique users,
                  traffic amount per page,
                             ...
2011   9   26
super large scale
                 'sed | grep | wc'
                         with
                Hadoop Streaming + Hive


2011   9   26
httpd logs

                from 96 servers
                   (apache / nginx)




                580GB/day (raw)

2011   9   26
overview



                           hourly
                            daily     on
                  hourly
                                    demand

2011   9   26
topics

                •log delivery network with scribe
                 •and 'scribeline'
                •hive client web application 'shib'


2011   9   26
overview



                           hourly
                            daily     on
                  hourly
                                    demand

2011   9   26
scribe
                       log delivery daemon
                          based on Thrift
                         scalable, reliable
                          supports HDFS

                https://github.com/facebook/scribe



2011   9   26
scribe nodes
                               scribed




                               scribed




                               scribed




2011   9   26
deliver node traffic




2011   9   26
scribe nodes
                               scribed




                               scribed




                               scribed




2011   9   26
what we want
                from scribe agent
                •easy to deploy
                •works w/o any httpd configurations
                •delivery target failover/takeback
                •lightweight (without JVM)
                •stable
2011   9   26
scribe nodes
                               scribed




                               scribed




                scribeline
                               scribed




2011   9   26
scribeline
                     log delivery agent tool
                        python 2.4, thrift

              easy to setup and start/stop
         works without any httpd configurations
            works with logrotate-ed log files
       automatic delivery target failover/takeback

            https://github.com/tagomoris/scribe_line
2011   9   26
how to setup scribeline
                     in livedoor

                       1. yum install scribeline
                (tar xzf && cd && sudo make install)

                     2. vi /etc/scribeline.conf
                       blog /var/log/httpd/access_log
                      blogimg /var/log/nginx/access_log



                   3. /etc/init.d/scribeline start

2011   9   26
scribe nodes
                               scribed




                               scribed




                               scribed




2011   9   26
overview



                           hourly
                            daily     on
                  hourly
                                    demand

2011   9   26
what we want
                 about hive client
                •easy to experiment
                 •from PC on our desks
                •result caching
                •protection against data loss
                •friendly look & feel
2011   9   26
shib
                   hive client web application
                   node.js, thrift, kyoto tycoon

                       query history browser
                query editor, based on copy&paste
                result caching & download tsv/csv
                   filter INSERT/DROP/CREATE ...

                https://github.com/tagomoris/shib
2011   9   26
2011   9   26
shib system overview




2011   9   26
what shib cannot
                     do now
                •access control
                •graph & chart
                •hive 0.7.0+ features support
                 •database, authentication and ...
                •mapreduce status notification
2011   9   26
what we are trying now

                •New cluster
                 •more nodes
                 •CDH3b2 + Hive 0.6.0 -> CDH3u1
                •New tools
                 •Hoop (instead of fuse-hdfs)
                 •Any stream processing framework
2011   9   26
thanks!



2011   9   26

Contenu connexe

Tendances

Openstack platform -Red Hat Pizza and technology event - Israel
Openstack platform -Red Hat Pizza and technology event - IsraelOpenstack platform -Red Hat Pizza and technology event - Israel
Openstack platform -Red Hat Pizza and technology event - Israel
Arthur Berezin
 

Tendances (20)

CEPH DAY BERLIN - DEPLOYING CEPH IN KUBERNETES WITH ROOK
CEPH DAY BERLIN - DEPLOYING CEPH IN KUBERNETES WITH ROOKCEPH DAY BERLIN - DEPLOYING CEPH IN KUBERNETES WITH ROOK
CEPH DAY BERLIN - DEPLOYING CEPH IN KUBERNETES WITH ROOK
 
Scylla Summit 2022: Operating at Monstrous Scales: Benchmarking Petabyte Work...
Scylla Summit 2022: Operating at Monstrous Scales: Benchmarking Petabyte Work...Scylla Summit 2022: Operating at Monstrous Scales: Benchmarking Petabyte Work...
Scylla Summit 2022: Operating at Monstrous Scales: Benchmarking Petabyte Work...
 
Scylla Summit 2016: Compose on Containing the Database
Scylla Summit 2016: Compose on Containing the DatabaseScylla Summit 2016: Compose on Containing the Database
Scylla Summit 2016: Compose on Containing the Database
 
SAS Institute on Changing All Four Tires While Driving an AdTech Engine at Fu...
SAS Institute on Changing All Four Tires While Driving an AdTech Engine at Fu...SAS Institute on Changing All Four Tires While Driving an AdTech Engine at Fu...
SAS Institute on Changing All Four Tires While Driving an AdTech Engine at Fu...
 
Scylla Summit 2016: Why Kenshoo is about to displace Cassandra with Scylla
Scylla Summit 2016: Why Kenshoo is about to displace Cassandra with ScyllaScylla Summit 2016: Why Kenshoo is about to displace Cassandra with Scylla
Scylla Summit 2016: Why Kenshoo is about to displace Cassandra with Scylla
 
Benchmarking Redis by itself and versus other NoSQL databases
Benchmarking Redis by itself and versus other NoSQL databasesBenchmarking Redis by itself and versus other NoSQL databases
Benchmarking Redis by itself and versus other NoSQL databases
 
Scylla Summit 2016: Scylla at Samsung SDS
Scylla Summit 2016: Scylla at Samsung SDSScylla Summit 2016: Scylla at Samsung SDS
Scylla Summit 2016: Scylla at Samsung SDS
 
Back your App with MySQL & Redis, the Cloud Foundry Way- Kenny Bastani, Pivotal
Back your App with MySQL & Redis, the Cloud Foundry Way- Kenny Bastani, PivotalBack your App with MySQL & Redis, the Cloud Foundry Way- Kenny Bastani, Pivotal
Back your App with MySQL & Redis, the Cloud Foundry Way- Kenny Bastani, Pivotal
 
Understanding Storage I/O Under Load
Understanding Storage I/O Under LoadUnderstanding Storage I/O Under Load
Understanding Storage I/O Under Load
 
Scylla Summit 2018: Scylla 3.0 and Beyond
Scylla Summit 2018: Scylla 3.0 and BeyondScylla Summit 2018: Scylla 3.0 and Beyond
Scylla Summit 2018: Scylla 3.0 and Beyond
 
Scylla Summit 2018: Getting the Most Out of Scylla on Kubernetes
Scylla Summit 2018: Getting the Most Out of Scylla on KubernetesScylla Summit 2018: Getting the Most Out of Scylla on Kubernetes
Scylla Summit 2018: Getting the Most Out of Scylla on Kubernetes
 
Scylla Summit 2022: Building Zeotap's Privacy Compliant Customer Data Platfor...
Scylla Summit 2022: Building Zeotap's Privacy Compliant Customer Data Platfor...Scylla Summit 2022: Building Zeotap's Privacy Compliant Customer Data Platfor...
Scylla Summit 2022: Building Zeotap's Privacy Compliant Customer Data Platfor...
 
Openstack platform -Red Hat Pizza and technology event - Israel
Openstack platform -Red Hat Pizza and technology event - IsraelOpenstack platform -Red Hat Pizza and technology event - Israel
Openstack platform -Red Hat Pizza and technology event - Israel
 
Persistent Storage for Containerized Applications
Persistent Storage for Containerized ApplicationsPersistent Storage for Containerized Applications
Persistent Storage for Containerized Applications
 
Ansible and CloudStack
Ansible and CloudStackAnsible and CloudStack
Ansible and CloudStack
 
Scylla Summit 2022: Scylla 5.0 New Features, Part 1
Scylla Summit 2022: Scylla 5.0 New Features, Part 1Scylla Summit 2022: Scylla 5.0 New Features, Part 1
Scylla Summit 2022: Scylla 5.0 New Features, Part 1
 
Scylla Summit 2022: How ScyllaDB Powers This Next Tech Cycle
Scylla Summit 2022: How ScyllaDB Powers This Next Tech CycleScylla Summit 2022: How ScyllaDB Powers This Next Tech Cycle
Scylla Summit 2022: How ScyllaDB Powers This Next Tech Cycle
 
Scylla Summit 2016: Using ScyllaDB for a Microservice-based Pipeline in Go
Scylla Summit 2016: Using ScyllaDB for a Microservice-based Pipeline in GoScylla Summit 2016: Using ScyllaDB for a Microservice-based Pipeline in Go
Scylla Summit 2016: Using ScyllaDB for a Microservice-based Pipeline in Go
 
Scylla Summit 2018: How We Made Large Partition Scans Over Two Times Faster
Scylla Summit 2018: How We Made Large Partition Scans Over Two Times FasterScylla Summit 2018: How We Made Large Partition Scans Over Two Times Faster
Scylla Summit 2018: How We Made Large Partition Scans Over Two Times Faster
 
Red Hat Ceph Storage Roadmap: January 2016
Red Hat Ceph Storage Roadmap: January 2016Red Hat Ceph Storage Roadmap: January 2016
Red Hat Ceph Storage Roadmap: January 2016
 

Similaire à Hadoop and subsystems in livedoor #Hcj11f

Openstack India May Meetup
Openstack India May MeetupOpenstack India May Meetup
Openstack India May Meetup
Deepak Garg
 
Cloud Deployments with Apache Hadoop and Apache HBase
Cloud Deployments with Apache Hadoop and Apache HBaseCloud Deployments with Apache Hadoop and Apache HBase
Cloud Deployments with Apache Hadoop and Apache HBase
DATAVERSITY
 
Splunk as a_big_data_platform_for_developers_spring_one2gx
Splunk as a_big_data_platform_for_developers_spring_one2gxSplunk as a_big_data_platform_for_developers_spring_one2gx
Splunk as a_big_data_platform_for_developers_spring_one2gx
Damien Dallimore
 
AIDevWorldApacheNiFi101
AIDevWorldApacheNiFi101AIDevWorldApacheNiFi101
AIDevWorldApacheNiFi101
Timothy Spann
 

Similaire à Hadoop and subsystems in livedoor #Hcj11f (20)

Openstack India May Meetup
Openstack India May MeetupOpenstack India May Meetup
Openstack India May Meetup
 
Manuel Hurtado. Couchbase paradigma4oct
Manuel Hurtado. Couchbase paradigma4octManuel Hurtado. Couchbase paradigma4oct
Manuel Hurtado. Couchbase paradigma4oct
 
Couchbase Day
Couchbase DayCouchbase Day
Couchbase Day
 
Mining Your Logs - Gaining Insight Through Visualization
Mining Your Logs - Gaining Insight Through VisualizationMining Your Logs - Gaining Insight Through Visualization
Mining Your Logs - Gaining Insight Through Visualization
 
PLNOG15: Practical deployments of Kea, a high performance scalable DHCP - Tom...
PLNOG15: Practical deployments of Kea, a high performance scalable DHCP - Tom...PLNOG15: Practical deployments of Kea, a high performance scalable DHCP - Tom...
PLNOG15: Practical deployments of Kea, a high performance scalable DHCP - Tom...
 
Cloud Deployments with Apache Hadoop and Apache HBase
Cloud Deployments with Apache Hadoop and Apache HBaseCloud Deployments with Apache Hadoop and Apache HBase
Cloud Deployments with Apache Hadoop and Apache HBase
 
A Big Data Lake Based on Spark for BBVA Bank-(Oscar Mendez, STRATIO)
A Big Data Lake Based on Spark for BBVA Bank-(Oscar Mendez, STRATIO)A Big Data Lake Based on Spark for BBVA Bank-(Oscar Mendez, STRATIO)
A Big Data Lake Based on Spark for BBVA Bank-(Oscar Mendez, STRATIO)
 
Road to NODES - Handling Neo4j Data with Apache Hop
Road to NODES - Handling Neo4j Data with Apache HopRoad to NODES - Handling Neo4j Data with Apache Hop
Road to NODES - Handling Neo4j Data with Apache Hop
 
Splunk as a_big_data_platform_for_developers_spring_one2gx
Splunk as a_big_data_platform_for_developers_spring_one2gxSplunk as a_big_data_platform_for_developers_spring_one2gx
Splunk as a_big_data_platform_for_developers_spring_one2gx
 
Real-Time Data Loading from MySQL to Hadoop with New Tungsten Replicator 3.0
Real-Time Data Loading from MySQL to Hadoop with New Tungsten Replicator 3.0Real-Time Data Loading from MySQL to Hadoop with New Tungsten Replicator 3.0
Real-Time Data Loading from MySQL to Hadoop with New Tungsten Replicator 3.0
 
The path to a serverless-native era with Kubernetes
The path to a serverless-native era with KubernetesThe path to a serverless-native era with Kubernetes
The path to a serverless-native era with Kubernetes
 
OpenStack cloud for ConoHa, Z.com and GMO AppsCloud in okinawa opendays 2015 ...
OpenStack cloud for ConoHa, Z.com and GMO AppsCloud in okinawa opendays 2015 ...OpenStack cloud for ConoHa, Z.com and GMO AppsCloud in okinawa opendays 2015 ...
OpenStack cloud for ConoHa, Z.com and GMO AppsCloud in okinawa opendays 2015 ...
 
Build DynamoDB-Compatible Apps with Python
Build DynamoDB-Compatible Apps with PythonBuild DynamoDB-Compatible Apps with Python
Build DynamoDB-Compatible Apps with Python
 
Couchbase Data Pipeline
Couchbase Data PipelineCouchbase Data Pipeline
Couchbase Data Pipeline
 
Achieving Infrastructure Portability with Chef
Achieving Infrastructure Portability with ChefAchieving Infrastructure Portability with Chef
Achieving Infrastructure Portability with Chef
 
How to run a bank on Apache CloudStack
How to run a bank on Apache CloudStackHow to run a bank on Apache CloudStack
How to run a bank on Apache CloudStack
 
AIDevWorldApacheNiFi101
AIDevWorldApacheNiFi101AIDevWorldApacheNiFi101
AIDevWorldApacheNiFi101
 
Big Data Day LA 2016/ NoSQL track - Apache Kudu: Fast Analytics on Fast Data,...
Big Data Day LA 2016/ NoSQL track - Apache Kudu: Fast Analytics on Fast Data,...Big Data Day LA 2016/ NoSQL track - Apache Kudu: Fast Analytics on Fast Data,...
Big Data Day LA 2016/ NoSQL track - Apache Kudu: Fast Analytics on Fast Data,...
 
dodai grizzly
dodai grizzlydodai grizzly
dodai grizzly
 
OpenStack Deployments with Chef
OpenStack Deployments with ChefOpenStack Deployments with Chef
OpenStack Deployments with Chef
 

Plus de SATOSHI TAGOMORI

Plus de SATOSHI TAGOMORI (20)

Ractor's speed is not light-speed
Ractor's speed is not light-speedRactor's speed is not light-speed
Ractor's speed is not light-speed
 
Good Things and Hard Things of SaaS Development/Operations
Good Things and Hard Things of SaaS Development/OperationsGood Things and Hard Things of SaaS Development/Operations
Good Things and Hard Things of SaaS Development/Operations
 
Maccro Strikes Back
Maccro Strikes BackMaccro Strikes Back
Maccro Strikes Back
 
Invitation to the dark side of Ruby
Invitation to the dark side of RubyInvitation to the dark side of Ruby
Invitation to the dark side of Ruby
 
Hijacking Ruby Syntax in Ruby (RubyConf 2018)
Hijacking Ruby Syntax in Ruby (RubyConf 2018)Hijacking Ruby Syntax in Ruby (RubyConf 2018)
Hijacking Ruby Syntax in Ruby (RubyConf 2018)
 
Make Your Ruby Script Confusing
Make Your Ruby Script ConfusingMake Your Ruby Script Confusing
Make Your Ruby Script Confusing
 
Hijacking Ruby Syntax in Ruby
Hijacking Ruby Syntax in RubyHijacking Ruby Syntax in Ruby
Hijacking Ruby Syntax in Ruby
 
Lock, Concurrency and Throughput of Exclusive Operations
Lock, Concurrency and Throughput of Exclusive OperationsLock, Concurrency and Throughput of Exclusive Operations
Lock, Concurrency and Throughput of Exclusive Operations
 
Data Processing and Ruby in the World
Data Processing and Ruby in the WorldData Processing and Ruby in the World
Data Processing and Ruby in the World
 
Planet-scale Data Ingestion Pipeline: Bigdam
Planet-scale Data Ingestion Pipeline: BigdamPlanet-scale Data Ingestion Pipeline: Bigdam
Planet-scale Data Ingestion Pipeline: Bigdam
 
Technologies, Data Analytics Service and Enterprise Business
Technologies, Data Analytics Service and Enterprise BusinessTechnologies, Data Analytics Service and Enterprise Business
Technologies, Data Analytics Service and Enterprise Business
 
Ruby and Distributed Storage Systems
Ruby and Distributed Storage SystemsRuby and Distributed Storage Systems
Ruby and Distributed Storage Systems
 
Perfect Norikra 2nd Season
Perfect Norikra 2nd SeasonPerfect Norikra 2nd Season
Perfect Norikra 2nd Season
 
Fluentd 101
Fluentd 101Fluentd 101
Fluentd 101
 
To Have Own Data Analytics Platform, Or NOT To
To Have Own Data Analytics Platform, Or NOT ToTo Have Own Data Analytics Platform, Or NOT To
To Have Own Data Analytics Platform, Or NOT To
 
The Patterns of Distributed Logging and Containers
The Patterns of Distributed Logging and ContainersThe Patterns of Distributed Logging and Containers
The Patterns of Distributed Logging and Containers
 
How To Write Middleware In Ruby
How To Write Middleware In RubyHow To Write Middleware In Ruby
How To Write Middleware In Ruby
 
Modern Black Mages Fighting in the Real World
Modern Black Mages Fighting in the Real WorldModern Black Mages Fighting in the Real World
Modern Black Mages Fighting in the Real World
 
Open Source Software, Distributed Systems, Database as a Cloud Service
Open Source Software, Distributed Systems, Database as a Cloud ServiceOpen Source Software, Distributed Systems, Database as a Cloud Service
Open Source Software, Distributed Systems, Database as a Cloud Service
 
Fluentd Overview, Now and Then
Fluentd Overview, Now and ThenFluentd Overview, Now and Then
Fluentd Overview, Now and Then
 

Dernier

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Dernier (20)

CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Introduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDMIntroduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDM
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
JohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptxJohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptx
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 

Hadoop and subsystems in livedoor #Hcj11f

  • 1. Hadoop and Subsystems in livedoor Hadoop Conference Japan 2011 Fall 2011/09/26 tagomoris 2011 9 26
  • 2. 2011 9 26
  • 5. 2011 9 26
  • 6. large scale web services 2800+ servers 3200+ hosts 530+ web servers 2011 9 26
  • 8. Aug 2011 15Gbps (10Gbps + CDN 5Gbps) 2011 9 26
  • 9. Hadoop in livedoor • 10 nodes (1+9) • 36 core, 32TB HDFS • CDH3b2 • with libhdfs, fuse-hdfs • Hive 0.6.0 (community package) 2011 9 26
  • 10. Hadoop in livedoor data mining reporting page views, unique users, traffic amount per page, ... 2011 9 26
  • 11. super large scale 'sed | grep | wc' with Hadoop Streaming + Hive 2011 9 26
  • 12. httpd logs from 96 servers (apache / nginx) 580GB/day (raw) 2011 9 26
  • 13. overview hourly daily on hourly demand 2011 9 26
  • 14. topics •log delivery network with scribe •and 'scribeline' •hive client web application 'shib' 2011 9 26
  • 15. overview hourly daily on hourly demand 2011 9 26
  • 16. scribe log delivery daemon based on Thrift scalable, reliable supports HDFS https://github.com/facebook/scribe 2011 9 26
  • 17. scribe nodes scribed scribed scribed 2011 9 26
  • 19. scribe nodes scribed scribed scribed 2011 9 26
  • 20. what we want from scribe agent •easy to deploy •works w/o any httpd configurations •delivery target failover/takeback •lightweight (without JVM) •stable 2011 9 26
  • 21. scribe nodes scribed scribed scribeline scribed 2011 9 26
  • 22. scribeline log delivery agent tool python 2.4, thrift easy to setup and start/stop works without any httpd configurations works with logrotate-ed log files automatic delivery target failover/takeback https://github.com/tagomoris/scribe_line 2011 9 26
  • 23. how to setup scribeline in livedoor 1. yum install scribeline (tar xzf && cd && sudo make install) 2. vi /etc/scribeline.conf blog /var/log/httpd/access_log blogimg /var/log/nginx/access_log 3. /etc/init.d/scribeline start 2011 9 26
  • 24. scribe nodes scribed scribed scribed 2011 9 26
  • 25. overview hourly daily on hourly demand 2011 9 26
  • 26. what we want about hive client •easy to experiment •from PC on our desks •result caching •protection against data loss •friendly look & feel 2011 9 26
  • 27. shib hive client web application node.js, thrift, kyoto tycoon query history browser query editor, based on copy&paste result caching & download tsv/csv filter INSERT/DROP/CREATE ... https://github.com/tagomoris/shib 2011 9 26
  • 28. 2011 9 26
  • 30. what shib cannot do now •access control •graph & chart •hive 0.7.0+ features support •database, authentication and ... •mapreduce status notification 2011 9 26
  • 31. what we are trying now •New cluster •more nodes •CDH3b2 + Hive 0.6.0 -> CDH3u1 •New tools •Hoop (instead of fuse-hdfs) •Any stream processing framework 2011 9 26
  • 32. thanks! 2011 9 26