SlideShare une entreprise Scribd logo
1  sur  8
Télécharger pour lire hors ligne
Hydra
Chris Birchall
2014/2/17
M3 Tech Talk #m3dev
What is it?
https://github.com/addthis/hydra
● Hadoop-style distrib processing
framework, optimised for trees
● The Big Idea:
data processing = building and navigating
tree data structures
Components
● Spawn: Job control (+ UI)
○ (think JobTracker, in Hadoop-speak)

● Minion: task runner
○ (think TaskTracker)

● QueryMaster + QueryWorker
● Meshy: Distrib filesystem
○ (think read-only HDFS)

● Zookeeper, RabbitMQ
Getting started (OSX)
# Prerequisites
brew install rabbitmq maven coreutils wget
# Check this works without a passphrase
ssh localhost
# Check that the GNU coreutils cmds
# (grm, gcp, gln, gmv) are on your PATH
# Clone & build
git clone https://github.com/addthis/hydra.git
cd hydra
mvn package
Getting started (2)
# Start local stack
hydra-uber/bin/local-stack.sh start
hydra-uber/bin/local-stack.sh start
# yes, twice!
hydra-uber/bin/local-stack.sh seed
# UI should now be running
open http://localhost:5052
Hello world
# Sample job definition file available at
hydra-uber/local/sample/self-gen-tree.json
# Click ‘Create’, copy-paste the job config,
# save the job and click ‘Kick’ to run it.
# Click the ‘Q’ button to open the query UI
# and see the resulting data.
Analysing text files
# Tips:
## “files” source is broken. Use “mesh2”.
## Docs are out of date. Read the source
code!
# Mesh filesystem root is here:
hydra-local/streams/
# Here’s an example job config I used to
parse some TSV-formatted Apache logs
https://gist.github.com/cb372/9046464
Conclusions
● If you have Small Data,
use grep, awk, sort, uniq
● If you have Big Data,
use Hadoop
● If you really like trees,
use Hydra ;)

Contenu connexe

Tendances

LSA2 - 02 Control Groups
LSA2 - 02   Control GroupsLSA2 - 02   Control Groups
LSA2 - 02 Control GroupsMarian Marinov
 
はじめてのGlusterFS
はじめてのGlusterFSはじめてのGlusterFS
はじめてのGlusterFSTakahiro Inoue
 
Philipp Krenn "Elasticsearch (R)Evolution — You Know, for Search…"
Philipp Krenn "Elasticsearch (R)Evolution — You Know, for Search…"Philipp Krenn "Elasticsearch (R)Evolution — You Know, for Search…"
Philipp Krenn "Elasticsearch (R)Evolution — You Know, for Search…"Fwdays
 
Univa and SUSE at SC17: Scaling Machine Learning for SUSE Linux Containers, S...
Univa and SUSE at SC17: Scaling Machine Learning for SUSE Linux Containers, S...Univa and SUSE at SC17: Scaling Machine Learning for SUSE Linux Containers, S...
Univa and SUSE at SC17: Scaling Machine Learning for SUSE Linux Containers, S...Ian Lumb
 
Whirlwind tour of Hadoop and HIve
Whirlwind tour of Hadoop and HIveWhirlwind tour of Hadoop and HIve
Whirlwind tour of Hadoop and HIveEdward Capriolo
 
Building Software Ecosystems for AI Cloud using Singularity HPC Container
Building Software Ecosystems for AI Cloud using Singularity HPC ContainerBuilding Software Ecosystems for AI Cloud using Singularity HPC Container
Building Software Ecosystems for AI Cloud using Singularity HPC ContainerHitoshi Sato
 
仮想化環境におけるバイナリー・ポータビリティの考察 (WebAssemblyの場合)
仮想化環境におけるバイナリー・ポータビリティの考察 (WebAssemblyの場合)仮想化環境におけるバイナリー・ポータビリティの考察 (WebAssemblyの場合)
仮想化環境におけるバイナリー・ポータビリティの考察 (WebAssemblyの場合)Naoto MATSUMOTO
 
Hadoop 101 - Big Data Technology
Hadoop 101 - Big Data TechnologyHadoop 101 - Big Data Technology
Hadoop 101 - Big Data TechnologyFirman Gautama
 
Terraform, Ansible or pure CloudFormation
Terraform, Ansible or pure CloudFormationTerraform, Ansible or pure CloudFormation
Terraform, Ansible or pure CloudFormationgeekQ
 
Setting up repositories: Technical Requirements, Repository Software, Metad...
Setting up repositories:  Technical Requirements,  Repository Software, Metad...Setting up repositories:  Technical Requirements,  Repository Software, Metad...
Setting up repositories: Technical Requirements, Repository Software, Metad...Iryna Kuchma
 
HadoopThe Hadoop Java Software Framework
HadoopThe Hadoop Java Software FrameworkHadoopThe Hadoop Java Software Framework
HadoopThe Hadoop Java Software FrameworkThoughtWorks
 
Cassandra Summit 2014: Down with Tweaking! Removing Tunable Complexity for Ca...
Cassandra Summit 2014: Down with Tweaking! Removing Tunable Complexity for Ca...Cassandra Summit 2014: Down with Tweaking! Removing Tunable Complexity for Ca...
Cassandra Summit 2014: Down with Tweaking! Removing Tunable Complexity for Ca...DataStax Academy
 
Terraform, Ansible, or pure CloudFormation?
Terraform, Ansible, or pure CloudFormation?Terraform, Ansible, or pure CloudFormation?
Terraform, Ansible, or pure CloudFormation?geekQ
 

Tendances (20)

LSA2 - 02 Control Groups
LSA2 - 02   Control GroupsLSA2 - 02   Control Groups
LSA2 - 02 Control Groups
 
Juju 基礎編
Juju 基礎編Juju 基礎編
Juju 基礎編
 
はじめてのGlusterFS
はじめてのGlusterFSはじめてのGlusterFS
はじめてのGlusterFS
 
Philipp Krenn "Elasticsearch (R)Evolution — You Know, for Search…"
Philipp Krenn "Elasticsearch (R)Evolution — You Know, for Search…"Philipp Krenn "Elasticsearch (R)Evolution — You Know, for Search…"
Philipp Krenn "Elasticsearch (R)Evolution — You Know, for Search…"
 
Univa and SUSE at SC17: Scaling Machine Learning for SUSE Linux Containers, S...
Univa and SUSE at SC17: Scaling Machine Learning for SUSE Linux Containers, S...Univa and SUSE at SC17: Scaling Machine Learning for SUSE Linux Containers, S...
Univa and SUSE at SC17: Scaling Machine Learning for SUSE Linux Containers, S...
 
Alluxio
AlluxioAlluxio
Alluxio
 
Whirlwind tour of Hadoop and HIve
Whirlwind tour of Hadoop and HIveWhirlwind tour of Hadoop and HIve
Whirlwind tour of Hadoop and HIve
 
Php dba cache
Php dba cachePhp dba cache
Php dba cache
 
Building Software Ecosystems for AI Cloud using Singularity HPC Container
Building Software Ecosystems for AI Cloud using Singularity HPC ContainerBuilding Software Ecosystems for AI Cloud using Singularity HPC Container
Building Software Ecosystems for AI Cloud using Singularity HPC Container
 
仮想化環境におけるバイナリー・ポータビリティの考察 (WebAssemblyの場合)
仮想化環境におけるバイナリー・ポータビリティの考察 (WebAssemblyの場合)仮想化環境におけるバイナリー・ポータビリティの考察 (WebAssemblyの場合)
仮想化環境におけるバイナリー・ポータビリティの考察 (WebAssemblyの場合)
 
LSA2 - PostgreSQL
LSA2 - PostgreSQLLSA2 - PostgreSQL
LSA2 - PostgreSQL
 
Hadoop 101 - Big Data Technology
Hadoop 101 - Big Data TechnologyHadoop 101 - Big Data Technology
Hadoop 101 - Big Data Technology
 
Terraform, Ansible or pure CloudFormation
Terraform, Ansible or pure CloudFormationTerraform, Ansible or pure CloudFormation
Terraform, Ansible or pure CloudFormation
 
Hadoop 101 v2
Hadoop 101 v2Hadoop 101 v2
Hadoop 101 v2
 
Cluster Drm
Cluster DrmCluster Drm
Cluster Drm
 
Cluster Drm
Cluster DrmCluster Drm
Cluster Drm
 
Setting up repositories: Technical Requirements, Repository Software, Metad...
Setting up repositories:  Technical Requirements,  Repository Software, Metad...Setting up repositories:  Technical Requirements,  Repository Software, Metad...
Setting up repositories: Technical Requirements, Repository Software, Metad...
 
HadoopThe Hadoop Java Software Framework
HadoopThe Hadoop Java Software FrameworkHadoopThe Hadoop Java Software Framework
HadoopThe Hadoop Java Software Framework
 
Cassandra Summit 2014: Down with Tweaking! Removing Tunable Complexity for Ca...
Cassandra Summit 2014: Down with Tweaking! Removing Tunable Complexity for Ca...Cassandra Summit 2014: Down with Tweaking! Removing Tunable Complexity for Ca...
Cassandra Summit 2014: Down with Tweaking! Removing Tunable Complexity for Ca...
 
Terraform, Ansible, or pure CloudFormation?
Terraform, Ansible, or pure CloudFormation?Terraform, Ansible, or pure CloudFormation?
Terraform, Ansible, or pure CloudFormation?
 

En vedette

En vedette (9)

Introducing Hydra – An Open Source Document Processing Framework
Introducing Hydra – An Open Source Document Processing FrameworkIntroducing Hydra – An Open Source Document Processing Framework
Introducing Hydra – An Open Source Document Processing Framework
 
Hid 2
Hid 2Hid 2
Hid 2
 
Introduction to Hydra
Introduction to HydraIntroduction to Hydra
Introduction to Hydra
 
What is Hydra?
What is Hydra?What is Hydra?
What is Hydra?
 
Training_deck_081015
Training_deck_081015Training_deck_081015
Training_deck_081015
 
Protozoa
ProtozoaProtozoa
Protozoa
 
Protozoa
ProtozoaProtozoa
Protozoa
 
Surveys Everywhere
Surveys EverywhereSurveys Everywhere
Surveys Everywhere
 
Veena new ppt
Veena  new pptVeena  new ppt
Veena new ppt
 

Similaire à Hydra

Presentation sreenu dwh-services
Presentation sreenu dwh-servicesPresentation sreenu dwh-services
Presentation sreenu dwh-servicesSreenu Musham
 
App cap2956v2-121001194956-phpapp01 (1)
App cap2956v2-121001194956-phpapp01 (1)App cap2956v2-121001194956-phpapp01 (1)
App cap2956v2-121001194956-phpapp01 (1)outstanding59
 
Inside the Hadoop Machine @ VMworld
Inside the Hadoop Machine @ VMworldInside the Hadoop Machine @ VMworld
Inside the Hadoop Machine @ VMworldRichard McDougall
 
App Cap2956v2 121001194956 Phpapp01 (1)
App Cap2956v2 121001194956 Phpapp01 (1)App Cap2956v2 121001194956 Phpapp01 (1)
App Cap2956v2 121001194956 Phpapp01 (1)outstanding59
 
Hadoop - Introduction to map reduce programming - Reunião 12/04/2014
Hadoop - Introduction to map reduce programming - Reunião 12/04/2014Hadoop - Introduction to map reduce programming - Reunião 12/04/2014
Hadoop - Introduction to map reduce programming - Reunião 12/04/2014soujavajug
 
Automated Hadoop Cluster Construction on EC2
Automated Hadoop Cluster Construction on EC2Automated Hadoop Cluster Construction on EC2
Automated Hadoop Cluster Construction on EC2Mark Kerzner
 
Hadoop Operations - Best practices from the field
Hadoop Operations - Best practices from the fieldHadoop Operations - Best practices from the field
Hadoop Operations - Best practices from the fieldUwe Printz
 
Schedulers optimization to handle multiple jobs in hadoop cluster
Schedulers optimization to handle multiple jobs in hadoop clusterSchedulers optimization to handle multiple jobs in hadoop cluster
Schedulers optimization to handle multiple jobs in hadoop clusterShivraj Raj
 
Big data processing using hadoop poster presentation
Big data processing using hadoop poster presentationBig data processing using hadoop poster presentation
Big data processing using hadoop poster presentationAmrut Patil
 
Hadoop installation with an example
Hadoop installation with an exampleHadoop installation with an example
Hadoop installation with an exampleNikita Kesharwani
 
Introduction to Docker at SF Peninsula Software Development Meetup @Guidewire
Introduction to Docker at SF Peninsula Software Development Meetup @GuidewireIntroduction to Docker at SF Peninsula Software Development Meetup @Guidewire
Introduction to Docker at SF Peninsula Software Development Meetup @GuidewiredotCloud
 
Big data Analytics hands-on sessions
Big data Analytics hands-on sessionsBig data Analytics hands-on sessions
Big data Analytics hands-on sessionsPraveen Hanchinal
 
Introduction to Hadoop part1
Introduction to Hadoop part1Introduction to Hadoop part1
Introduction to Hadoop part1Giovanna Roda
 
Facing enterprise specific challenges – utility programming in hadoop
Facing enterprise specific challenges – utility programming in hadoopFacing enterprise specific challenges – utility programming in hadoop
Facing enterprise specific challenges – utility programming in hadoopfann wu
 
DevOps: Cooking Drupal Deployment
DevOps: Cooking Drupal DeploymentDevOps: Cooking Drupal Deployment
DevOps: Cooking Drupal DeploymentGerald Villorente
 
A "Box" Full of Tools and Distros
A "Box" Full of Tools and DistrosA "Box" Full of Tools and Distros
A "Box" Full of Tools and DistrosDario Faggioli
 

Similaire à Hydra (20)

Presentation sreenu dwh-services
Presentation sreenu dwh-servicesPresentation sreenu dwh-services
Presentation sreenu dwh-services
 
App cap2956v2-121001194956-phpapp01 (1)
App cap2956v2-121001194956-phpapp01 (1)App cap2956v2-121001194956-phpapp01 (1)
App cap2956v2-121001194956-phpapp01 (1)
 
Inside the Hadoop Machine @ VMworld
Inside the Hadoop Machine @ VMworldInside the Hadoop Machine @ VMworld
Inside the Hadoop Machine @ VMworld
 
App Cap2956v2 121001194956 Phpapp01 (1)
App Cap2956v2 121001194956 Phpapp01 (1)App Cap2956v2 121001194956 Phpapp01 (1)
App Cap2956v2 121001194956 Phpapp01 (1)
 
Hadoop - Introduction to map reduce programming - Reunião 12/04/2014
Hadoop - Introduction to map reduce programming - Reunião 12/04/2014Hadoop - Introduction to map reduce programming - Reunião 12/04/2014
Hadoop - Introduction to map reduce programming - Reunião 12/04/2014
 
Automated Hadoop Cluster Construction on EC2
Automated Hadoop Cluster Construction on EC2Automated Hadoop Cluster Construction on EC2
Automated Hadoop Cluster Construction on EC2
 
Hadoop Operations - Best practices from the field
Hadoop Operations - Best practices from the fieldHadoop Operations - Best practices from the field
Hadoop Operations - Best practices from the field
 
Schedulers optimization to handle multiple jobs in hadoop cluster
Schedulers optimization to handle multiple jobs in hadoop clusterSchedulers optimization to handle multiple jobs in hadoop cluster
Schedulers optimization to handle multiple jobs in hadoop cluster
 
Hadoop seminar
Hadoop seminarHadoop seminar
Hadoop seminar
 
Hadoop description
Hadoop descriptionHadoop description
Hadoop description
 
Big data processing using hadoop poster presentation
Big data processing using hadoop poster presentationBig data processing using hadoop poster presentation
Big data processing using hadoop poster presentation
 
Hadoop installation with an example
Hadoop installation with an exampleHadoop installation with an example
Hadoop installation with an example
 
Introduction to Docker at SF Peninsula Software Development Meetup @Guidewire
Introduction to Docker at SF Peninsula Software Development Meetup @GuidewireIntroduction to Docker at SF Peninsula Software Development Meetup @Guidewire
Introduction to Docker at SF Peninsula Software Development Meetup @Guidewire
 
Training
TrainingTraining
Training
 
Big data Analytics hands-on sessions
Big data Analytics hands-on sessionsBig data Analytics hands-on sessions
Big data Analytics hands-on sessions
 
Introduction to Hadoop part1
Introduction to Hadoop part1Introduction to Hadoop part1
Introduction to Hadoop part1
 
Hadoop 2.0 handout 5.0
Hadoop 2.0 handout 5.0Hadoop 2.0 handout 5.0
Hadoop 2.0 handout 5.0
 
Facing enterprise specific challenges – utility programming in hadoop
Facing enterprise specific challenges – utility programming in hadoopFacing enterprise specific challenges – utility programming in hadoop
Facing enterprise specific challenges – utility programming in hadoop
 
DevOps: Cooking Drupal Deployment
DevOps: Cooking Drupal DeploymentDevOps: Cooking Drupal Deployment
DevOps: Cooking Drupal Deployment
 
A "Box" Full of Tools and Distros
A "Box" Full of Tools and DistrosA "Box" Full of Tools and Distros
A "Box" Full of Tools and Distros
 

Plus de Chris Birchall

Scala.js & friends: SCALA ALL THE THINGS
Scala.js & friends: SCALA ALL THE THINGSScala.js & friends: SCALA ALL THE THINGS
Scala.js & friends: SCALA ALL THE THINGSChris Birchall
 
Tour of Distributed Systems 3 - Apache Kafka
Tour of Distributed Systems 3 - Apache KafkaTour of Distributed Systems 3 - Apache Kafka
Tour of Distributed Systems 3 - Apache KafkaChris Birchall
 
Tour of distributed systems 2 - Cassandra
Tour of distributed systems 2 - CassandraTour of distributed systems 2 - Cassandra
Tour of distributed systems 2 - CassandraChris Birchall
 
Guess the Country - Playing with Twitter Streaming API
Guess the Country - Playing with Twitter Streaming APIGuess the Country - Playing with Twitter Streaming API
Guess the Country - Playing with Twitter Streaming APIChris Birchall
 
Tour of distributed systems 1 - ZooKeeper
Tour of distributed systems 1 - ZooKeeperTour of distributed systems 1 - ZooKeeper
Tour of distributed systems 1 - ZooKeeperChris Birchall
 
ScalaCache: simple caching in Scala
ScalaCache: simple caching in ScalaScalaCache: simple caching in Scala
ScalaCache: simple caching in ScalaChris Birchall
 
Load testing with gatling
Load testing with gatlingLoad testing with gatling
Load testing with gatlingChris Birchall
 
Debugging and Testing ES Systems
Debugging and Testing ES SystemsDebugging and Testing ES Systems
Debugging and Testing ES SystemsChris Birchall
 
Phone Home: A client-side error collection system
Phone Home: A client-side error collection systemPhone Home: A client-side error collection system
Phone Home: A client-side error collection systemChris Birchall
 
Branching Strategies: Feature Branches vs Branch by Abstraction
Branching Strategies: Feature Branches vs Branch by AbstractionBranching Strategies: Feature Branches vs Branch by Abstraction
Branching Strategies: Feature Branches vs Branch by AbstractionChris Birchall
 

Plus de Chris Birchall (11)

Scala.js & friends: SCALA ALL THE THINGS
Scala.js & friends: SCALA ALL THE THINGSScala.js & friends: SCALA ALL THE THINGS
Scala.js & friends: SCALA ALL THE THINGS
 
Rust 超入門
Rust 超入門Rust 超入門
Rust 超入門
 
Tour of Distributed Systems 3 - Apache Kafka
Tour of Distributed Systems 3 - Apache KafkaTour of Distributed Systems 3 - Apache Kafka
Tour of Distributed Systems 3 - Apache Kafka
 
Tour of distributed systems 2 - Cassandra
Tour of distributed systems 2 - CassandraTour of distributed systems 2 - Cassandra
Tour of distributed systems 2 - Cassandra
 
Guess the Country - Playing with Twitter Streaming API
Guess the Country - Playing with Twitter Streaming APIGuess the Country - Playing with Twitter Streaming API
Guess the Country - Playing with Twitter Streaming API
 
Tour of distributed systems 1 - ZooKeeper
Tour of distributed systems 1 - ZooKeeperTour of distributed systems 1 - ZooKeeper
Tour of distributed systems 1 - ZooKeeper
 
ScalaCache: simple caching in Scala
ScalaCache: simple caching in ScalaScalaCache: simple caching in Scala
ScalaCache: simple caching in Scala
 
Load testing with gatling
Load testing with gatlingLoad testing with gatling
Load testing with gatling
 
Debugging and Testing ES Systems
Debugging and Testing ES SystemsDebugging and Testing ES Systems
Debugging and Testing ES Systems
 
Phone Home: A client-side error collection system
Phone Home: A client-side error collection systemPhone Home: A client-side error collection system
Phone Home: A client-side error collection system
 
Branching Strategies: Feature Branches vs Branch by Abstraction
Branching Strategies: Feature Branches vs Branch by AbstractionBranching Strategies: Feature Branches vs Branch by Abstraction
Branching Strategies: Feature Branches vs Branch by Abstraction
 

Hydra

  • 2. What is it? https://github.com/addthis/hydra ● Hadoop-style distrib processing framework, optimised for trees ● The Big Idea: data processing = building and navigating tree data structures
  • 3. Components ● Spawn: Job control (+ UI) ○ (think JobTracker, in Hadoop-speak) ● Minion: task runner ○ (think TaskTracker) ● QueryMaster + QueryWorker ● Meshy: Distrib filesystem ○ (think read-only HDFS) ● Zookeeper, RabbitMQ
  • 4. Getting started (OSX) # Prerequisites brew install rabbitmq maven coreutils wget # Check this works without a passphrase ssh localhost # Check that the GNU coreutils cmds # (grm, gcp, gln, gmv) are on your PATH # Clone & build git clone https://github.com/addthis/hydra.git cd hydra mvn package
  • 5. Getting started (2) # Start local stack hydra-uber/bin/local-stack.sh start hydra-uber/bin/local-stack.sh start # yes, twice! hydra-uber/bin/local-stack.sh seed # UI should now be running open http://localhost:5052
  • 6. Hello world # Sample job definition file available at hydra-uber/local/sample/self-gen-tree.json # Click ‘Create’, copy-paste the job config, # save the job and click ‘Kick’ to run it. # Click the ‘Q’ button to open the query UI # and see the resulting data.
  • 7. Analysing text files # Tips: ## “files” source is broken. Use “mesh2”. ## Docs are out of date. Read the source code! # Mesh filesystem root is here: hydra-local/streams/ # Here’s an example job config I used to parse some TSV-formatted Apache logs https://gist.github.com/cb372/9046464
  • 8. Conclusions ● If you have Small Data, use grep, awk, sort, uniq ● If you have Big Data, use Hadoop ● If you really like trees, use Hydra ;)