SlideShare une entreprise Scribd logo
1  sur  20
Jean-Pierre König, MeMo News AG



  OPENING THE TOOL BOX
  DEVELOPMENT, TESTING AND DEPLOYMENT IN THE HADOOP
  ECOSYSTEM

  14.05.12

http://www.flickr.com/photos/theaucitron/5810163712/sizes/l/in/photostream/
Development

 THE APPLICATION


http://www.flickr.com/photos/oskay/2523189273/sizes/l/in/photostream/
Development

The Applicationisa ...
  • Distributed newsagent
  • GUI-less Java Application
  • Spring-based 2-layer architecture
     • Services and data access objects
  • Client of Hadoop
     • Dependencies to Zookeeper and HBase




                                             14.05.12
Development(2)

We use Maven 3 for
  • Project structure -Corporate POM & Modules
  • Dependency Management
  • Build the artifact             Corporate
                                                                   POM


           global                            newsagent           tools   mapred

                                               Loader (Client)
                            Infrastructure
            Model

                    Utils




                                                  Services

                                                Data Access
                                                  Objects

                                                                             14.05.12
Development

 MAPREDUCEJOBS


http://www.flickr.com/photos/elasticsoul/61062372/sizes/l/in/photostream/
MapReduce
6


    • Java MR jobs for business processes
      • Input and output paths either HDFS or HBase
      • MR job chaining by Azkaban
    • PIG, HIVE for ad-hoc queries




                                                14.05.12
Development

 HBASE


http://www.flickr.com/photos/isherwoodchris/6902155937/sizes/l/in/photostream/
HBase

• HBase Schema Manager
  • github.com/jkoenig/hbase-schema-manager
• Utilities to copy/move/rename column-families
  and copy complete tables with it's data
  • github.com/memonews/hbase-utils
• Stargate REST API without compression
  • github.com/memonews/hbase-stargate



                                          14.05.12
Hadoop, HBase, Zookeeper

 TESTING


http://www.flickr.com/photos/42106306@N00/4380803535/sizes/m/in/photostream/
HBase

• We use the Apache HBaseTestingUtility
• It’s in-memory  complete hadoop instance
  with dfs, zk and hbase
• It‘s very slow – conciderlongrunning IT
publicclassConfigurableHBaseClient {
protectedstaticHBaseTestingUtility TEST_UTIL;
static{
   final Configurationconf = HBaseConfiguration.create();
conf.addResource("hbase-default-test.xml");
try{
TEST_UTIL = HBaseTestingUtilityFactory.getMiniCluster(1, conf);
   } catch (final Exception e) {
fail("Couldnot start hadoop mini cluster.");
   }
 }
}

                                                                  14.05.12
MapReduce

• Since business logic involved, we use hadoop-
  mrunit for testing Map/Reduce Jobs
• It’s in-memory testing
    • Parameterized Mapper/Reducer with a driver


@Test
publicvoidreduceShouldWriteExactlyOneLinePerMap() throwsIOException {
final List<DoubleWritable>values = newArrayList<DoubleWritable>();
values.add(new DoubleWritable(399287729));
this.driver.withInput(newText("de.t-online/nachrichten"), values);
this.driver.run();
 assertEquals(1, this.driver.getCounters().findCounter(
MeMoCounters.SIGNALS_WRITTEN).getValue());
}

                                                                        14.05.12
Zookeeper

• We use the Apache Zookeeper ClientBase
• It‘s not in-memory but against the staging
  cluster
    • Prefix paths e.g.: /test/memo/subscribers



@Test
publicvoidgetNumberOfSubscribersShouldSetWatchFlag()
throwsKeeperException,InterruptedException{
  final SubscriberDaoImplsubscriberDao =
newSubscriberDaoImpl(zookeeperDao, DIR, null);
subscriberDao.getNumberOfSubscribers(listener);
verify(this.zookeeper, times(1)).getChildren(eq(DIR), eq(subscriberDao));
}

                                                                            14.05.12
Deployment

 THE APPLICATION


http://www.flickr.com/photos/navalsurfaceforces/5553412190/sizes/l/in/photostream/
The Application

• Automated build and restart via capistrano
• Build on every machine
    • There is a .m2 repository everywhere

set :deploy_to, "/usr/share/memo-newsagent“
set:keep_releases, 1

after "deploy:setup" do
run "mkdir -p /var/run/memo #{shared_path}/logs /var/log/memo/"
  ...
end

after "deploy:update_code" do
run "cd #{current_release} &&mvninstall-Pfast> #{shared_path}/logs/build.log"
end

after "deploy", "rowlog:stop", "newsagent:restart", "rowlog:start"

                                                                           14.05.12
Deployment

 MAPREDUCE JOBS


http://www.flickr.com/photos/navalsurfaceforces/6257239933/sizes/l/in/photostream/
Map Reduce Jobs

• We use a Maven HadoopPlugin
hadoop:pack a la mvn:package
hadoop:deploy HDFS and target folder
• All dependencies packed-in  Careful: Huge
  JARs without dependency management



see github.com/memonews/maven-hadoop

                                       14.05.12
DevOps

 OTHER TOOLS IN USE


http://www.flickr.com/photos/damongman/4979871047/sizes/l/in/photostream/
Other Tools

• Staging environment in-house, 1 to 1 copy
  from production (virtualized)
• Azkaban for MR job scheduling
• Jenkins for (Integration-) Tests and Metrics
• GIT
• Icinga for Monitoring & Alerting
• Ganglia / Graphite for Hadoop Metrics
• Fliwi for automated cluster provisioning

                                           14.05.12
jean-pierre.koenig@menonews.com

THANKS!
14.05.2012 Opening the tool box: Development, testing and deployment in the Hadoop ecosystem (Jean-Pierre König, MeMo News AG)

Contenu connexe

Similaire à 14.05.2012 Opening the tool box: Development, testing and deployment in the Hadoop ecosystem (Jean-Pierre König, MeMo News AG)

Apache Spark Introduction @ University College London
Apache Spark Introduction @ University College LondonApache Spark Introduction @ University College London
Apache Spark Introduction @ University College LondonVitthal Gogate
 
App cap2956v2-121001194956-phpapp01 (1)
App cap2956v2-121001194956-phpapp01 (1)App cap2956v2-121001194956-phpapp01 (1)
App cap2956v2-121001194956-phpapp01 (1)outstanding59
 
Inside the Hadoop Machine @ VMworld
Inside the Hadoop Machine @ VMworldInside the Hadoop Machine @ VMworld
Inside the Hadoop Machine @ VMworldRichard McDougall
 
App Cap2956v2 121001194956 Phpapp01 (1)
App Cap2956v2 121001194956 Phpapp01 (1)App Cap2956v2 121001194956 Phpapp01 (1)
App Cap2956v2 121001194956 Phpapp01 (1)outstanding59
 
Lean microservices through ahead of time compilation (Tobias Piper, Loveholid...
Lean microservices through ahead of time compilation (Tobias Piper, Loveholid...Lean microservices through ahead of time compilation (Tobias Piper, Loveholid...
Lean microservices through ahead of time compilation (Tobias Piper, Loveholid...London Microservices
 
Building hadoop based big data environment
Building hadoop based big data environmentBuilding hadoop based big data environment
Building hadoop based big data environmentEvans Ye
 
Review of Calculation Paradigm and its Components
Review of Calculation Paradigm and its ComponentsReview of Calculation Paradigm and its Components
Review of Calculation Paradigm and its ComponentsNamuk Park
 
How to develop Big Data Pipelines for Hadoop, by Costin Leau
How to develop Big Data Pipelines for Hadoop, by Costin LeauHow to develop Big Data Pipelines for Hadoop, by Costin Leau
How to develop Big Data Pipelines for Hadoop, by Costin LeauCodemotion
 
Apache Spark Overview part1 (20161107)
Apache Spark Overview part1 (20161107)Apache Spark Overview part1 (20161107)
Apache Spark Overview part1 (20161107)Steve Min
 
Big Data Everywhere Chicago: Getting Real with the MapR Platform (MapR)
Big Data Everywhere Chicago: Getting Real with the MapR Platform (MapR)Big Data Everywhere Chicago: Getting Real with the MapR Platform (MapR)
Big Data Everywhere Chicago: Getting Real with the MapR Platform (MapR)BigDataEverywhere
 
A new platform for a new era emc
A new platform for a new era   emcA new platform for a new era   emc
A new platform for a new era emcTaldor Group
 
Farming hadoop in_the_cloud
Farming hadoop in_the_cloudFarming hadoop in_the_cloud
Farming hadoop in_the_cloudSteve Loughran
 
Hd insight essentials quick view
Hd insight essentials quick viewHd insight essentials quick view
Hd insight essentials quick viewRajesh Nadipalli
 
HdInsight essentials Hadoop on Microsoft Platform
HdInsight essentials Hadoop on Microsoft PlatformHdInsight essentials Hadoop on Microsoft Platform
HdInsight essentials Hadoop on Microsoft Platformnvvrajesh
 
Hd insight essentials quick view
Hd insight essentials quick viewHd insight essentials quick view
Hd insight essentials quick viewRajesh Nadipalli
 

Similaire à 14.05.2012 Opening the tool box: Development, testing and deployment in the Hadoop ecosystem (Jean-Pierre König, MeMo News AG) (20)

DavidWible_res
DavidWible_resDavidWible_res
DavidWible_res
 
Apache Spark Introduction @ University College London
Apache Spark Introduction @ University College LondonApache Spark Introduction @ University College London
Apache Spark Introduction @ University College London
 
App cap2956v2-121001194956-phpapp01 (1)
App cap2956v2-121001194956-phpapp01 (1)App cap2956v2-121001194956-phpapp01 (1)
App cap2956v2-121001194956-phpapp01 (1)
 
Inside the Hadoop Machine @ VMworld
Inside the Hadoop Machine @ VMworldInside the Hadoop Machine @ VMworld
Inside the Hadoop Machine @ VMworld
 
App Cap2956v2 121001194956 Phpapp01 (1)
App Cap2956v2 121001194956 Phpapp01 (1)App Cap2956v2 121001194956 Phpapp01 (1)
App Cap2956v2 121001194956 Phpapp01 (1)
 
Prashanth Kumar_Hadoop_NEW
Prashanth Kumar_Hadoop_NEWPrashanth Kumar_Hadoop_NEW
Prashanth Kumar_Hadoop_NEW
 
MapR Unique features
MapR Unique featuresMapR Unique features
MapR Unique features
 
Lean microservices through ahead of time compilation (Tobias Piper, Loveholid...
Lean microservices through ahead of time compilation (Tobias Piper, Loveholid...Lean microservices through ahead of time compilation (Tobias Piper, Loveholid...
Lean microservices through ahead of time compilation (Tobias Piper, Loveholid...
 
Building hadoop based big data environment
Building hadoop based big data environmentBuilding hadoop based big data environment
Building hadoop based big data environment
 
Hackathon bonn
Hackathon bonnHackathon bonn
Hackathon bonn
 
Review of Calculation Paradigm and its Components
Review of Calculation Paradigm and its ComponentsReview of Calculation Paradigm and its Components
Review of Calculation Paradigm and its Components
 
How to develop Big Data Pipelines for Hadoop, by Costin Leau
How to develop Big Data Pipelines for Hadoop, by Costin LeauHow to develop Big Data Pipelines for Hadoop, by Costin Leau
How to develop Big Data Pipelines for Hadoop, by Costin Leau
 
Apache Spark Overview part1 (20161107)
Apache Spark Overview part1 (20161107)Apache Spark Overview part1 (20161107)
Apache Spark Overview part1 (20161107)
 
Big Data Everywhere Chicago: Getting Real with the MapR Platform (MapR)
Big Data Everywhere Chicago: Getting Real with the MapR Platform (MapR)Big Data Everywhere Chicago: Getting Real with the MapR Platform (MapR)
Big Data Everywhere Chicago: Getting Real with the MapR Platform (MapR)
 
Monika_Raghuvanshi
Monika_RaghuvanshiMonika_Raghuvanshi
Monika_Raghuvanshi
 
A new platform for a new era emc
A new platform for a new era   emcA new platform for a new era   emc
A new platform for a new era emc
 
Farming hadoop in_the_cloud
Farming hadoop in_the_cloudFarming hadoop in_the_cloud
Farming hadoop in_the_cloud
 
Hd insight essentials quick view
Hd insight essentials quick viewHd insight essentials quick view
Hd insight essentials quick view
 
HdInsight essentials Hadoop on Microsoft Platform
HdInsight essentials Hadoop on Microsoft PlatformHdInsight essentials Hadoop on Microsoft Platform
HdInsight essentials Hadoop on Microsoft Platform
 
Hd insight essentials quick view
Hd insight essentials quick viewHd insight essentials quick view
Hd insight essentials quick view
 

Plus de Swiss Big Data User Group

Making Hadoop based analytics simple for everyone to use
Making Hadoop based analytics simple for everyone to useMaking Hadoop based analytics simple for everyone to use
Making Hadoop based analytics simple for everyone to useSwiss Big Data User Group
 
A real life project using Cassandra at a large Swiss Telco operator
A real life project using Cassandra at a large Swiss Telco operatorA real life project using Cassandra at a large Swiss Telco operator
A real life project using Cassandra at a large Swiss Telco operatorSwiss Big Data User Group
 
Building a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with ImpalaBuilding a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with ImpalaSwiss Big Data User Group
 
Closing The Loop for Evaluating Big Data Analysis
Closing The Loop for Evaluating Big Data AnalysisClosing The Loop for Evaluating Big Data Analysis
Closing The Loop for Evaluating Big Data AnalysisSwiss Big Data User Group
 
Big Data and Data Science for traditional Swiss companies
Big Data and Data Science for traditional Swiss companiesBig Data and Data Science for traditional Swiss companies
Big Data and Data Science for traditional Swiss companiesSwiss Big Data User Group
 
Design Patterns for Large-Scale Real-Time Learning
Design Patterns for Large-Scale Real-Time LearningDesign Patterns for Large-Scale Real-Time Learning
Design Patterns for Large-Scale Real-Time LearningSwiss Big Data User Group
 
Unleash the power of Big Data in your existing Data Warehouse
Unleash the power of Big Data in your existing Data WarehouseUnleash the power of Big Data in your existing Data Warehouse
Unleash the power of Big Data in your existing Data WarehouseSwiss Big Data User Group
 
Project "Babelfish" - A data warehouse to attack complexity
 Project "Babelfish" - A data warehouse to attack complexity Project "Babelfish" - A data warehouse to attack complexity
Project "Babelfish" - A data warehouse to attack complexitySwiss Big Data User Group
 
Brainserve Datacenter: the High-Density Choice
Brainserve Datacenter: the High-Density ChoiceBrainserve Datacenter: the High-Density Choice
Brainserve Datacenter: the High-Density ChoiceSwiss Big Data User Group
 
Urturn on AWS: scaling infra, cost and time to maket
Urturn on AWS: scaling infra, cost and time to maketUrturn on AWS: scaling infra, cost and time to maket
Urturn on AWS: scaling infra, cost and time to maketSwiss Big Data User Group
 
The World Wide Distributed Computing Architecture of the LHC Datagrid
The World Wide Distributed Computing Architecture of the LHC DatagridThe World Wide Distributed Computing Architecture of the LHC Datagrid
The World Wide Distributed Computing Architecture of the LHC DatagridSwiss Big Data User Group
 
New opportunities for connected data : Neo4j the graph database
New opportunities for connected data : Neo4j the graph databaseNew opportunities for connected data : Neo4j the graph database
New opportunities for connected data : Neo4j the graph databaseSwiss Big Data User Group
 
Technology Outlook - The new Era of computing
Technology Outlook - The new Era of computingTechnology Outlook - The new Era of computing
Technology Outlook - The new Era of computingSwiss Big Data User Group
 

Plus de Swiss Big Data User Group (20)

Making Hadoop based analytics simple for everyone to use
Making Hadoop based analytics simple for everyone to useMaking Hadoop based analytics simple for everyone to use
Making Hadoop based analytics simple for everyone to use
 
A real life project using Cassandra at a large Swiss Telco operator
A real life project using Cassandra at a large Swiss Telco operatorA real life project using Cassandra at a large Swiss Telco operator
A real life project using Cassandra at a large Swiss Telco operator
 
Data Analytics – B2B vs. B2C
Data Analytics – B2B vs. B2CData Analytics – B2B vs. B2C
Data Analytics – B2B vs. B2C
 
SQL on Hadoop
SQL on HadoopSQL on Hadoop
SQL on Hadoop
 
Building a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with ImpalaBuilding a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with Impala
 
Closing The Loop for Evaluating Big Data Analysis
Closing The Loop for Evaluating Big Data AnalysisClosing The Loop for Evaluating Big Data Analysis
Closing The Loop for Evaluating Big Data Analysis
 
Big Data and Data Science for traditional Swiss companies
Big Data and Data Science for traditional Swiss companiesBig Data and Data Science for traditional Swiss companies
Big Data and Data Science for traditional Swiss companies
 
Design Patterns for Large-Scale Real-Time Learning
Design Patterns for Large-Scale Real-Time LearningDesign Patterns for Large-Scale Real-Time Learning
Design Patterns for Large-Scale Real-Time Learning
 
Educating Data Scientists of the Future
Educating Data Scientists of the FutureEducating Data Scientists of the Future
Educating Data Scientists of the Future
 
Unleash the power of Big Data in your existing Data Warehouse
Unleash the power of Big Data in your existing Data WarehouseUnleash the power of Big Data in your existing Data Warehouse
Unleash the power of Big Data in your existing Data Warehouse
 
Big data for Telco: opportunity or threat?
Big data for Telco: opportunity or threat?Big data for Telco: opportunity or threat?
Big data for Telco: opportunity or threat?
 
Project "Babelfish" - A data warehouse to attack complexity
 Project "Babelfish" - A data warehouse to attack complexity Project "Babelfish" - A data warehouse to attack complexity
Project "Babelfish" - A data warehouse to attack complexity
 
Brainserve Datacenter: the High-Density Choice
Brainserve Datacenter: the High-Density ChoiceBrainserve Datacenter: the High-Density Choice
Brainserve Datacenter: the High-Density Choice
 
Urturn on AWS: scaling infra, cost and time to maket
Urturn on AWS: scaling infra, cost and time to maketUrturn on AWS: scaling infra, cost and time to maket
Urturn on AWS: scaling infra, cost and time to maket
 
The World Wide Distributed Computing Architecture of the LHC Datagrid
The World Wide Distributed Computing Architecture of the LHC DatagridThe World Wide Distributed Computing Architecture of the LHC Datagrid
The World Wide Distributed Computing Architecture of the LHC Datagrid
 
New opportunities for connected data : Neo4j the graph database
New opportunities for connected data : Neo4j the graph databaseNew opportunities for connected data : Neo4j the graph database
New opportunities for connected data : Neo4j the graph database
 
Technology Outlook - The new Era of computing
Technology Outlook - The new Era of computingTechnology Outlook - The new Era of computing
Technology Outlook - The new Era of computing
 
In-Store Analysis with Hadoop
In-Store Analysis with HadoopIn-Store Analysis with Hadoop
In-Store Analysis with Hadoop
 
Big Data Visualization With ParaView
Big Data Visualization With ParaViewBig Data Visualization With ParaView
Big Data Visualization With ParaView
 
Introduction to Apache Drill
Introduction to Apache DrillIntroduction to Apache Drill
Introduction to Apache Drill
 

Dernier

Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024The Digital Insurer
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Principled Technologies
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024SynarionITSolutions
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 

Dernier (20)

Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 

14.05.2012 Opening the tool box: Development, testing and deployment in the Hadoop ecosystem (Jean-Pierre König, MeMo News AG)

  • 1. Jean-Pierre König, MeMo News AG OPENING THE TOOL BOX DEVELOPMENT, TESTING AND DEPLOYMENT IN THE HADOOP ECOSYSTEM 14.05.12 http://www.flickr.com/photos/theaucitron/5810163712/sizes/l/in/photostream/
  • 3. Development The Applicationisa ... • Distributed newsagent • GUI-less Java Application • Spring-based 2-layer architecture • Services and data access objects • Client of Hadoop • Dependencies to Zookeeper and HBase 14.05.12
  • 4. Development(2) We use Maven 3 for • Project structure -Corporate POM & Modules • Dependency Management • Build the artifact Corporate POM global newsagent tools mapred Loader (Client) Infrastructure Model Utils Services Data Access Objects 14.05.12
  • 6. MapReduce 6 • Java MR jobs for business processes • Input and output paths either HDFS or HBase • MR job chaining by Azkaban • PIG, HIVE for ad-hoc queries 14.05.12
  • 8. HBase • HBase Schema Manager • github.com/jkoenig/hbase-schema-manager • Utilities to copy/move/rename column-families and copy complete tables with it's data • github.com/memonews/hbase-utils • Stargate REST API without compression • github.com/memonews/hbase-stargate 14.05.12
  • 9. Hadoop, HBase, Zookeeper TESTING http://www.flickr.com/photos/42106306@N00/4380803535/sizes/m/in/photostream/
  • 10. HBase • We use the Apache HBaseTestingUtility • It’s in-memory  complete hadoop instance with dfs, zk and hbase • It‘s very slow – conciderlongrunning IT publicclassConfigurableHBaseClient { protectedstaticHBaseTestingUtility TEST_UTIL; static{ final Configurationconf = HBaseConfiguration.create(); conf.addResource("hbase-default-test.xml"); try{ TEST_UTIL = HBaseTestingUtilityFactory.getMiniCluster(1, conf); } catch (final Exception e) { fail("Couldnot start hadoop mini cluster."); } } } 14.05.12
  • 11. MapReduce • Since business logic involved, we use hadoop- mrunit for testing Map/Reduce Jobs • It’s in-memory testing • Parameterized Mapper/Reducer with a driver @Test publicvoidreduceShouldWriteExactlyOneLinePerMap() throwsIOException { final List<DoubleWritable>values = newArrayList<DoubleWritable>(); values.add(new DoubleWritable(399287729)); this.driver.withInput(newText("de.t-online/nachrichten"), values); this.driver.run(); assertEquals(1, this.driver.getCounters().findCounter( MeMoCounters.SIGNALS_WRITTEN).getValue()); } 14.05.12
  • 12. Zookeeper • We use the Apache Zookeeper ClientBase • It‘s not in-memory but against the staging cluster • Prefix paths e.g.: /test/memo/subscribers @Test publicvoidgetNumberOfSubscribersShouldSetWatchFlag() throwsKeeperException,InterruptedException{ final SubscriberDaoImplsubscriberDao = newSubscriberDaoImpl(zookeeperDao, DIR, null); subscriberDao.getNumberOfSubscribers(listener); verify(this.zookeeper, times(1)).getChildren(eq(DIR), eq(subscriberDao)); } 14.05.12
  • 14. The Application • Automated build and restart via capistrano • Build on every machine • There is a .m2 repository everywhere set :deploy_to, "/usr/share/memo-newsagent“ set:keep_releases, 1 after "deploy:setup" do run "mkdir -p /var/run/memo #{shared_path}/logs /var/log/memo/" ... end after "deploy:update_code" do run "cd #{current_release} &&mvninstall-Pfast> #{shared_path}/logs/build.log" end after "deploy", "rowlog:stop", "newsagent:restart", "rowlog:start" 14.05.12
  • 16. Map Reduce Jobs • We use a Maven HadoopPlugin hadoop:pack a la mvn:package hadoop:deploy HDFS and target folder • All dependencies packed-in  Careful: Huge JARs without dependency management see github.com/memonews/maven-hadoop 14.05.12
  • 17. DevOps OTHER TOOLS IN USE http://www.flickr.com/photos/damongman/4979871047/sizes/l/in/photostream/
  • 18. Other Tools • Staging environment in-house, 1 to 1 copy from production (virtualized) • Azkaban for MR job scheduling • Jenkins for (Integration-) Tests and Metrics • GIT • Icinga for Monitoring & Alerting • Ganglia / Graphite for Hadoop Metrics • Fliwi for automated cluster provisioning 14.05.12