SlideShare une entreprise Scribd logo
1  sur  4
CASE STUDY




Enhancing, Monitoring and
Managing a Hadoop Based
Analytics Solution
In this engagement, Imaginea team contributed over 14 patches to the Hadoop community and
all of them were verified and accepted.



COPYRIGHT © 2012, IMAGINEA TECHNOLOGIES, INC.




COPYRIGHT © 2012, IMAGINEA TECHNOLOGIES, INC. THIS DOCUMENT IS CONFIDENTIAL AND NOT FOR DISTRIBUTION WITHOUT
WRITTEN PERMISSION FROM IMAGINEA TECHNOLOGIES, INC.
CASE STUDY




1. Executive Summary
One of Imaginea’s clients is a video marketing company that deals with branding,
real time media buying, ad serving, targeting, optimization and brand
measurement.

Imaginea enhanced and managed a platform for video playtime statistical
analysis for our client. The solution used Hadoop (Cloudera distribution) and
Hive. The cluster was 500 nodes with 300 TB of existing data and over 200 GBs
data being streamed in and processed every day.



2. Hadoop Migration and New Features
We helped in migration of the entire platform from 0.19 to 0.20.2, porting all the
MR jobs. Migration also included back-porting some feature from .21 to 20.
Features that were back-ported included:

 Map-Side join
 CompositeInputFormat



3. Cluster Monitoring, Management & Resolution
We helped in monitoring and managing the cluster during IST Business hours.
We were able to uncover workflow instability issues and lack of resume feature
during these phases, which we resolved later.

The solution had used a custom workflow manager; it had stability issues
especially as the load increased by orders of magnitude.

Zookeeper was introduced as the central workflow status manager and changes
were made to the workflow manager to use the same. This helped the system
stability improve by about 90%.

We also discovered problems in publishing configuration and code changes to all
the nodes in the cluster during this phase. We used Ganglia and Nagios for



COPYRIGHT © 2012, IMAGINEA TECHNOLOGIES, INC. THIS DOCUMENT IS CONFIDENTIAL AND NOT FOR DISTRIBUTION WITHOUT
WRITTEN PERMISSION FROM IMAGINEA TECHNOLOGIES, INC.
CASE STUDY




monitoring. We also solved some of the memory overflow issues in the Hadoop
nodes.



4. Configuration Management using Puppet
As part of the engagement, Imaginea worked to introduce Puppet in to the system
removing a custom configuration management tool. We had developed some
recipes and were able to solve many issues that were raised with replication of
configuration changes and deployment of new codebase



5. Performance Improvements
Imaginea contributed to enhance performance in a variety of ways. Below are a
couple of highlight scenarios

Job Starvation

Problem: Many cases of data overflow at a collector level

The solution had business analytics Hive queries, which used to starve the
normal MR jobs. Imaginea helped in development of a fair scheduling algorithm
which balances the production tasks and hive query jobs. Before this solution
there were many cases of data overflow at the collector level.

Job Optimization

Problem: Job to identify if the user was unique took over 8 hours

Imaginea helped in optimizing the job from 8-10 hours to 4 hours using better
distribution of keys and better hashing algorithm.




COPYRIGHT © 2012, IMAGINEA TECHNOLOGIES, INC. THIS DOCUMENT IS CONFIDENTIAL AND NOT FOR DISTRIBUTION WITHOUT
WRITTEN PERMISSION FROM IMAGINEA TECHNOLOGIES, INC.
CASE STUDY




6. Apache Hadoop involvement and other
 contributions
We have worked on Apache Hadoop and other components. Following are the list
of patches that were contributed to the community by Imaginea.

Jira Id               Severity/ Priority     Component        Brief Description

MAPREDUCE-3360        Critical-Improvement   mrv2             Provide information about lost nodes in the UI

MAPREDUCE-3686        Critical-Bug           mrv2             history server web ui - job counter values for map/reduce not
                                                              shown properly

MAPREDUCE-3532        Critical-Bug           mrv2,nodema      When 0 is provided as port number in
                                             nager            yarn.nodemanager.webapp.address, NMs webserver
                                                              component picks up random port, NM keeps on Reporting 0
                                                              port to RM

MAPREDUCE-3952        Major-Bug              mrv2             In MR2, when Total input paths to process == 1,
                                                              CombinefileInputFormat.getSplits() returns 0 split.

MAPREDUCE-3686        Critical-Bug           mrv2             history server web ui - job counter values for map/reduce not
                                                              shown properly

MAPREDUCE-3532        Critical-Bug           mrv2,nodema      When 0(zero) is provided as port number in
                                             nager            yarn.nodemanager.webapp.address, NMs webserver
                                                              component picks up random port, NM keeps on Reporting
                                                              0(zero) port to RM

MAPREDUCE-3316        Major-Bug              Resource         Rebooted Link not working
                                             Manager

MAPREDUCE-3708        Major-Bug              mrv2             Metrics: Incorrect Apps submitted count

MAPREDUCE-3723        Major-Bug              mrv2, test,      TestAMWebServicesJobs & TestHSWebServicesJobs
                                             webapp           incorrectly asserting tests

MAPREDUCE-4050        Major-Bug              mrv2             Invalid Node link

MAPREDUCE-3870        Major – Bug            mrv2             Invalid App Metrics

MAPREDUCE-4102        Major – Bug            Webapps          Job counter not available in Job History Web UI for killed jobs

MAPREDUCE-4002        Major – Bug            Examples         MultiFileWordCount job fails if the input path is not from default
                                                              file system

MAPREDUCE-4040        Minor-Bug              mrv2,            History links should use hostname rather than IP address.
                                             jobhistoryserv
                                             er

MAPREDUCE-3212        Minor-Bug              mrv2             Message displays while executing yarn command should be
                                                              proper




COPYRIGHT © 2012, IMAGINEA TECHNOLOGIES, INC. THIS DOCUMENT IS CONFIDENTIAL AND NOT FOR DISTRIBUTION WITHOUT
WRITTEN PERMISSION FROM IMAGINEA TECHNOLOGIES, INC.

Contenu connexe

En vedette

Our family holiday
Our family holidayOur family holiday
Our family holidaythesopha5
 
analyzing hdfs files using apace spark and mapreduce FixedLengthInputformat
analyzing hdfs files using apace spark and mapreduce FixedLengthInputformatanalyzing hdfs files using apace spark and mapreduce FixedLengthInputformat
analyzing hdfs files using apace spark and mapreduce FixedLengthInputformatleorick lin
 
Social Media & Social Game - How They Works Mutually
Social Media & Social Game - How They Works MutuallySocial Media & Social Game - How They Works Mutually
Social Media & Social Game - How They Works MutuallyAbang Edwin Syarif Agustin
 
GlowRadiance cellu
GlowRadiance celluGlowRadiance cellu
GlowRadiance celluDermaHealth_
 
KITCHEN WORKTOP: CAESARSTONE
KITCHEN WORKTOP: CAESARSTONEKITCHEN WORKTOP: CAESARSTONE
KITCHEN WORKTOP: CAESARSTONEMKW Surfaces
 
Bread Loaf School of English Provides Innovative Six-week Study Course
Bread Loaf School of English Provides Innovative Six-week Study CourseBread Loaf School of English Provides Innovative Six-week Study Course
Bread Loaf School of English Provides Innovative Six-week Study CourseWilly Wood
 
Multiclassification with Decision Tree in Spark MLlib 1.3
Multiclassification with Decision Tree in Spark MLlib 1.3Multiclassification with Decision Tree in Spark MLlib 1.3
Multiclassification with Decision Tree in Spark MLlib 1.3leorick lin
 
Опис досвіду вчителя української мови та літератури Веретільник Л.І.
Опис досвіду вчителя української мови та літератури Веретільник Л.І.Опис досвіду вчителя української мови та літератури Веретільник Л.І.
Опис досвіду вчителя української мови та літератури Веретільник Л.І.Галина Сызько
 
Програма дослідно-експериментальної роботи з гуманної педагогіки
Програма дослідно-експериментальної роботи з гуманної педагогікиПрограма дослідно-експериментальної роботи з гуманної педагогіки
Програма дослідно-експериментальної роботи з гуманної педагогікиГалина Сызько
 
Preposições: PowerPoint
Preposições:  PowerPointPreposições:  PowerPoint
Preposições: PowerPointA. Simoes
 

En vedette (13)

Our family holiday
Our family holidayOur family holiday
Our family holiday
 
analyzing hdfs files using apace spark and mapreduce FixedLengthInputformat
analyzing hdfs files using apace spark and mapreduce FixedLengthInputformatanalyzing hdfs files using apace spark and mapreduce FixedLengthInputformat
analyzing hdfs files using apace spark and mapreduce FixedLengthInputformat
 
Kung fu panda session7
Kung fu panda session7Kung fu panda session7
Kung fu panda session7
 
Social Media & Social Game - How They Works Mutually
Social Media & Social Game - How They Works MutuallySocial Media & Social Game - How They Works Mutually
Social Media & Social Game - How They Works Mutually
 
GlowRadiance cellu
GlowRadiance celluGlowRadiance cellu
GlowRadiance cellu
 
The Page Diet
The Page DietThe Page Diet
The Page Diet
 
KITCHEN WORKTOP: CAESARSTONE
KITCHEN WORKTOP: CAESARSTONEKITCHEN WORKTOP: CAESARSTONE
KITCHEN WORKTOP: CAESARSTONE
 
Bread Loaf School of English Provides Innovative Six-week Study Course
Bread Loaf School of English Provides Innovative Six-week Study CourseBread Loaf School of English Provides Innovative Six-week Study Course
Bread Loaf School of English Provides Innovative Six-week Study Course
 
Multiclassification with Decision Tree in Spark MLlib 1.3
Multiclassification with Decision Tree in Spark MLlib 1.3Multiclassification with Decision Tree in Spark MLlib 1.3
Multiclassification with Decision Tree in Spark MLlib 1.3
 
Cognition and learning in education
Cognition and learning in educationCognition and learning in education
Cognition and learning in education
 
Опис досвіду вчителя української мови та літератури Веретільник Л.І.
Опис досвіду вчителя української мови та літератури Веретільник Л.І.Опис досвіду вчителя української мови та літератури Веретільник Л.І.
Опис досвіду вчителя української мови та літератури Веретільник Л.І.
 
Програма дослідно-експериментальної роботи з гуманної педагогіки
Програма дослідно-експериментальної роботи з гуманної педагогікиПрограма дослідно-експериментальної роботи з гуманної педагогіки
Програма дослідно-експериментальної роботи з гуманної педагогіки
 
Preposições: PowerPoint
Preposições:  PowerPointPreposições:  PowerPoint
Preposições: PowerPoint
 

Similaire à Imaginea cs hadoop

Выявление и локализация проблем в сети с помощью инструментов Riverbed
Выявление и локализация проблем в сети с помощью инструментов RiverbedВыявление и локализация проблем в сети с помощью инструментов Riverbed
Выявление и локализация проблем в сети с помощью инструментов RiverbedElena Marianenko
 
INSIDE M2M products & references
INSIDE M2M products & referencesINSIDE M2M products & references
INSIDE M2M products & referencesDaniel Stanke
 
Java Abs Grid Information Retrival System
Java Abs   Grid Information Retrival SystemJava Abs   Grid Information Retrival System
Java Abs Grid Information Retrival Systemncct
 
Cisco discovery d homesb module 9 - v.4 in english.
Cisco discovery   d homesb module 9 - v.4 in english.Cisco discovery   d homesb module 9 - v.4 in english.
Cisco discovery d homesb module 9 - v.4 in english.igede tirtanata
 
Challenges of L2 NID Based Architecture for vCPE and NFV Deployment
Challenges of L2 NID Based Architecture for vCPE and NFV Deployment Challenges of L2 NID Based Architecture for vCPE and NFV Deployment
Challenges of L2 NID Based Architecture for vCPE and NFV Deployment Bangladesh Network Operators Group
 
M|18 How MariaDB Server Scales with Spider
M|18 How MariaDB Server Scales with SpiderM|18 How MariaDB Server Scales with Spider
M|18 How MariaDB Server Scales with SpiderMariaDB plc
 
A simulation-based approach for straggler tasks detection in Hadoop MapReduce
A simulation-based approach for straggler tasks detection in Hadoop MapReduceA simulation-based approach for straggler tasks detection in Hadoop MapReduce
A simulation-based approach for straggler tasks detection in Hadoop MapReduceIRJET Journal
 
Rishikesh Sharma Portfolio
Rishikesh Sharma PortfolioRishikesh Sharma Portfolio
Rishikesh Sharma Portfoliosharmarishikesh
 
Sprint 131
Sprint 131Sprint 131
Sprint 131ManageIQ
 
Five Meteor Dev Power Tools - 2015-04-06
Five Meteor Dev Power Tools - 2015-04-06Five Meteor Dev Power Tools - 2015-04-06
Five Meteor Dev Power Tools - 2015-04-06Mike Seidle
 
ICPDAS - Modbus Concentrator 700 series
ICPDAS - Modbus Concentrator 700 seriesICPDAS - Modbus Concentrator 700 series
ICPDAS - Modbus Concentrator 700 seriesICPDAS
 
АНДРІЙ ШУМАДА «To Cover Uncoverable» Online WDDay 2022 js
АНДРІЙ ШУМАДА «To Cover Uncoverable» Online WDDay 2022 jsАНДРІЙ ШУМАДА «To Cover Uncoverable» Online WDDay 2022 js
АНДРІЙ ШУМАДА «To Cover Uncoverable» Online WDDay 2022 jsWDDay
 
NIG系統報表開發指南
NIG系統報表開發指南NIG系統報表開發指南
NIG系統報表開發指南Guo Albert
 
Ccna 3-discovery-4-0-module-9-100-
Ccna 3-discovery-4-0-module-9-100-Ccna 3-discovery-4-0-module-9-100-
Ccna 3-discovery-4-0-module-9-100-junkut3
 
Ccna 3 Chapter 9 V4.0 Answers
Ccna 3 Chapter 9 V4.0 AnswersCcna 3 Chapter 9 V4.0 Answers
Ccna 3 Chapter 9 V4.0 Answersccna4discovery
 
AUTOMATIC DETECTION OF OVERSPEED VEHICLE
AUTOMATIC DETECTION OF OVERSPEED VEHICLEAUTOMATIC DETECTION OF OVERSPEED VEHICLE
AUTOMATIC DETECTION OF OVERSPEED VEHICLEIRJET Journal
 
Rails App performance at the limit - Bogdan Gusiev
Rails App performance at the limit - Bogdan GusievRails App performance at the limit - Bogdan Gusiev
Rails App performance at the limit - Bogdan GusievRuby Meditation
 

Similaire à Imaginea cs hadoop (20)

Выявление и локализация проблем в сети с помощью инструментов Riverbed
Выявление и локализация проблем в сети с помощью инструментов RiverbedВыявление и локализация проблем в сети с помощью инструментов Riverbed
Выявление и локализация проблем в сети с помощью инструментов Riverbed
 
INSIDE M2M products & references
INSIDE M2M products & referencesINSIDE M2M products & references
INSIDE M2M products & references
 
Java Abs Grid Information Retrival System
Java Abs   Grid Information Retrival SystemJava Abs   Grid Information Retrival System
Java Abs Grid Information Retrival System
 
Cisco discovery d homesb module 9 - v.4 in english.
Cisco discovery   d homesb module 9 - v.4 in english.Cisco discovery   d homesb module 9 - v.4 in english.
Cisco discovery d homesb module 9 - v.4 in english.
 
Challenges of L2 NID Based Architecture for vCPE and NFV Deployment
Challenges of L2 NID Based Architecture for vCPE and NFV Deployment Challenges of L2 NID Based Architecture for vCPE and NFV Deployment
Challenges of L2 NID Based Architecture for vCPE and NFV Deployment
 
M|18 How MariaDB Server Scales with Spider
M|18 How MariaDB Server Scales with SpiderM|18 How MariaDB Server Scales with Spider
M|18 How MariaDB Server Scales with Spider
 
A simulation-based approach for straggler tasks detection in Hadoop MapReduce
A simulation-based approach for straggler tasks detection in Hadoop MapReduceA simulation-based approach for straggler tasks detection in Hadoop MapReduce
A simulation-based approach for straggler tasks detection in Hadoop MapReduce
 
Rishikesh Sharma Portfolio
Rishikesh Sharma PortfolioRishikesh Sharma Portfolio
Rishikesh Sharma Portfolio
 
ccna 4 final 2012
ccna 4 final 2012ccna 4 final 2012
ccna 4 final 2012
 
Sprint 131
Sprint 131Sprint 131
Sprint 131
 
Five Meteor Dev Power Tools - 2015-04-06
Five Meteor Dev Power Tools - 2015-04-06Five Meteor Dev Power Tools - 2015-04-06
Five Meteor Dev Power Tools - 2015-04-06
 
ICPDAS - Modbus Concentrator 700 series
ICPDAS - Modbus Concentrator 700 seriesICPDAS - Modbus Concentrator 700 series
ICPDAS - Modbus Concentrator 700 series
 
АНДРІЙ ШУМАДА «To Cover Uncoverable» Online WDDay 2022 js
АНДРІЙ ШУМАДА «To Cover Uncoverable» Online WDDay 2022 jsАНДРІЙ ШУМАДА «To Cover Uncoverable» Online WDDay 2022 js
АНДРІЙ ШУМАДА «To Cover Uncoverable» Online WDDay 2022 js
 
NIG系統報表開發指南
NIG系統報表開發指南NIG系統報表開發指南
NIG系統報表開發指南
 
Ccna 3-discovery-4-0-module-9-100-
Ccna 3-discovery-4-0-module-9-100-Ccna 3-discovery-4-0-module-9-100-
Ccna 3-discovery-4-0-module-9-100-
 
Map reduce
Map reduceMap reduce
Map reduce
 
Ccna 3 Chapter 9 V4.0 Answers
Ccna 3 Chapter 9 V4.0 AnswersCcna 3 Chapter 9 V4.0 Answers
Ccna 3 Chapter 9 V4.0 Answers
 
AUTOMATIC DETECTION OF OVERSPEED VEHICLE
AUTOMATIC DETECTION OF OVERSPEED VEHICLEAUTOMATIC DETECTION OF OVERSPEED VEHICLE
AUTOMATIC DETECTION OF OVERSPEED VEHICLE
 
Rails App performance at the limit - Bogdan Gusiev
Rails App performance at the limit - Bogdan GusievRails App performance at the limit - Bogdan Gusiev
Rails App performance at the limit - Bogdan Gusiev
 
Fluentd meetup #3
Fluentd meetup #3Fluentd meetup #3
Fluentd meetup #3
 

Dernier

Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 

Dernier (20)

Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 

Imaginea cs hadoop

  • 1. CASE STUDY Enhancing, Monitoring and Managing a Hadoop Based Analytics Solution In this engagement, Imaginea team contributed over 14 patches to the Hadoop community and all of them were verified and accepted. COPYRIGHT © 2012, IMAGINEA TECHNOLOGIES, INC. COPYRIGHT © 2012, IMAGINEA TECHNOLOGIES, INC. THIS DOCUMENT IS CONFIDENTIAL AND NOT FOR DISTRIBUTION WITHOUT WRITTEN PERMISSION FROM IMAGINEA TECHNOLOGIES, INC.
  • 2. CASE STUDY 1. Executive Summary One of Imaginea’s clients is a video marketing company that deals with branding, real time media buying, ad serving, targeting, optimization and brand measurement. Imaginea enhanced and managed a platform for video playtime statistical analysis for our client. The solution used Hadoop (Cloudera distribution) and Hive. The cluster was 500 nodes with 300 TB of existing data and over 200 GBs data being streamed in and processed every day. 2. Hadoop Migration and New Features We helped in migration of the entire platform from 0.19 to 0.20.2, porting all the MR jobs. Migration also included back-porting some feature from .21 to 20. Features that were back-ported included:  Map-Side join  CompositeInputFormat 3. Cluster Monitoring, Management & Resolution We helped in monitoring and managing the cluster during IST Business hours. We were able to uncover workflow instability issues and lack of resume feature during these phases, which we resolved later. The solution had used a custom workflow manager; it had stability issues especially as the load increased by orders of magnitude. Zookeeper was introduced as the central workflow status manager and changes were made to the workflow manager to use the same. This helped the system stability improve by about 90%. We also discovered problems in publishing configuration and code changes to all the nodes in the cluster during this phase. We used Ganglia and Nagios for COPYRIGHT © 2012, IMAGINEA TECHNOLOGIES, INC. THIS DOCUMENT IS CONFIDENTIAL AND NOT FOR DISTRIBUTION WITHOUT WRITTEN PERMISSION FROM IMAGINEA TECHNOLOGIES, INC.
  • 3. CASE STUDY monitoring. We also solved some of the memory overflow issues in the Hadoop nodes. 4. Configuration Management using Puppet As part of the engagement, Imaginea worked to introduce Puppet in to the system removing a custom configuration management tool. We had developed some recipes and were able to solve many issues that were raised with replication of configuration changes and deployment of new codebase 5. Performance Improvements Imaginea contributed to enhance performance in a variety of ways. Below are a couple of highlight scenarios Job Starvation Problem: Many cases of data overflow at a collector level The solution had business analytics Hive queries, which used to starve the normal MR jobs. Imaginea helped in development of a fair scheduling algorithm which balances the production tasks and hive query jobs. Before this solution there were many cases of data overflow at the collector level. Job Optimization Problem: Job to identify if the user was unique took over 8 hours Imaginea helped in optimizing the job from 8-10 hours to 4 hours using better distribution of keys and better hashing algorithm. COPYRIGHT © 2012, IMAGINEA TECHNOLOGIES, INC. THIS DOCUMENT IS CONFIDENTIAL AND NOT FOR DISTRIBUTION WITHOUT WRITTEN PERMISSION FROM IMAGINEA TECHNOLOGIES, INC.
  • 4. CASE STUDY 6. Apache Hadoop involvement and other contributions We have worked on Apache Hadoop and other components. Following are the list of patches that were contributed to the community by Imaginea. Jira Id Severity/ Priority Component Brief Description MAPREDUCE-3360 Critical-Improvement mrv2 Provide information about lost nodes in the UI MAPREDUCE-3686 Critical-Bug mrv2 history server web ui - job counter values for map/reduce not shown properly MAPREDUCE-3532 Critical-Bug mrv2,nodema When 0 is provided as port number in nager yarn.nodemanager.webapp.address, NMs webserver component picks up random port, NM keeps on Reporting 0 port to RM MAPREDUCE-3952 Major-Bug mrv2 In MR2, when Total input paths to process == 1, CombinefileInputFormat.getSplits() returns 0 split. MAPREDUCE-3686 Critical-Bug mrv2 history server web ui - job counter values for map/reduce not shown properly MAPREDUCE-3532 Critical-Bug mrv2,nodema When 0(zero) is provided as port number in nager yarn.nodemanager.webapp.address, NMs webserver component picks up random port, NM keeps on Reporting 0(zero) port to RM MAPREDUCE-3316 Major-Bug Resource Rebooted Link not working Manager MAPREDUCE-3708 Major-Bug mrv2 Metrics: Incorrect Apps submitted count MAPREDUCE-3723 Major-Bug mrv2, test, TestAMWebServicesJobs & TestHSWebServicesJobs webapp incorrectly asserting tests MAPREDUCE-4050 Major-Bug mrv2 Invalid Node link MAPREDUCE-3870 Major – Bug mrv2 Invalid App Metrics MAPREDUCE-4102 Major – Bug Webapps Job counter not available in Job History Web UI for killed jobs MAPREDUCE-4002 Major – Bug Examples MultiFileWordCount job fails if the input path is not from default file system MAPREDUCE-4040 Minor-Bug mrv2, History links should use hostname rather than IP address. jobhistoryserv er MAPREDUCE-3212 Minor-Bug mrv2 Message displays while executing yarn command should be proper COPYRIGHT © 2012, IMAGINEA TECHNOLOGIES, INC. THIS DOCUMENT IS CONFIDENTIAL AND NOT FOR DISTRIBUTION WITHOUT WRITTEN PERMISSION FROM IMAGINEA TECHNOLOGIES, INC.