SlideShare une entreprise Scribd logo
1  sur  36
Page1 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
The Elephant Meets Scrum
Rommel Garcia
Page2 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Agenda
Control
access into
system
Flexibility
in defining
policies
• Introductions
• Why Scrum?
• Scrum Basic Concepts
• Scrum Team
• Scrum Framework
• Hadoop Meets Scrum
• Scrum Exercise
• Open Forum
Page3 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Introductions
What’s your name?
What’s your role?
Why are you here?
Page4 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Why Scrum?
Nobody wants to fail too big….too co$tly…on projects.
Page5 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Monolithic SDLC
• Small change, impacts everything
• Cost of failure, extremely big
• Slow, unpredictable progress
• Hard to prioritize
• Not business friendly
Page6 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Scrum..
• Produces immediate results
• Makes the development team nimble and adaptable
• Full visibility on development process
• Is a perfect fit for Hadoop
• Hadoop provides isolation of data and processing (HDFS and YARN respectively)
• Failure in Hadoop is cheap
• Complete traceability of apps deployed, run, tested by whom, when, where
Page7 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Scrum Concepts
Agile. Iterative. Adaptive. Fast results.
Page8 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Scrum is..
• A framework within which people can address complex adaptive
problems, while productively and creatively delivering products of the
highest possible value
• A framework to employ various processes and techniques
• Lightweight
• Simple to understand
• Difficult to master….if RULES are not followed religiously
Page9 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Success of SCRUM depends on..
• Transparency
• Common language must be shared by all team members
• What does “Done” mean??
• Inspection
• Frequent Scrum artifacts progress check
• But be careful not to overdo it or it gets in the way of work
• Adaptation
• Adjust properly and timely when process deviates outside of acceptable limits
• Adjust immediately to prevent further deviation
Page10 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Scrum Formal Events
1. Sprint Planning
2. Daily Scrum
3. Sprint Review
4. Sprint Retrospective
Page11 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Scrum consists of..
• Team
• Roles
• Events
• Artifacts
• Rules
Page12 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Scrum Team
Committed or Involved.
Page13 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
The ‘Ham-n-Eggs’ Paradigm
Page14 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
The Team
• Product Owner
• Development Team
• Scrum Master
Page15 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Product Owner
• Mainly responsible for Product Story management
• Clearly defines Product Story items
• Effectively order items in Product Story
• Ensures Product Story is visible, transparent, and clear to all, and
shows what the Scrum Team will work on next
• Validates with Scrum team that they understand the items in the
Product Story
• In real world, this could be either the Project Manager, Program
Manager, Development Manager, or Product Manager
Page16 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Development Team
• Self-organizing
• They decide how to produce and release incremental releasable functionality
• Scrum Master has no influence on how the team develop functionality
• Cross-functional
• Pig, Hive, HDFS, YARN, and more
• Develop and release features faster
• Accountability belongs to the Development Team as a whole
• Team size: >=3 but <=9
• Normally composed of Hadoop Developer, Hadoop Architect, Data
Scientist, Data Analyst, QA.
Page17 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Scrum Master
• Ensures Scrum theory, practices, and rules are enacted
• Servant-leader for the Scrum Team
• Coach Development Team in self-organization and cross-functionality
• Remove impediments to Development team’s progress
• Serves the Product Owner
• Find techniques for effective Product Story management
• Help with clear, concise definition of Product Story items
• Ensures Product Owner knows how to arrange Product Story to maximize value
• Facilitate Scrum events as requested/needed
• Serves the Organization
– Leading Scrum adoption
– Work with other Scrum Masters to increase effectiveness of Scrum application in the organization
Page18 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Scrum Framework
Fail fast in Hadoop. Move fast with Scrum.
Page19 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
The Sprint
• It is the heart of Scrum
• Time-boxed at 1 month or less. 2 weeks is pretty common.
• New Sprint starts immediately after conclusion of previous Sprint
• Consists of
• Sprint Planning
• Daily Scrums
• Development Work
• Sprint Review
• Sprint Retrospective
Page20 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
During the Sprint
• No changes are made that would compromise the Sprint Goal
• Quality goals do NOT decrease
• Scope may be clarified and re-negotiated between Product Owner and
Development team as more is learned
• ONLY Product Owner has the authority to cancel a Sprint
Page21 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Sprint Planning
• Time-boxed
• 8 hours planning is to 1 month of Sprint or 2 hours of planning is to 2 weeks of Sprint
• Answers the questions:
• What can be done this Sprint?
– Development Team forecasts what Product Story items it will deliver
– Output is Sprint Goal
• How will the chosen work get done?
– Development Team determines how to deliver the increments
– Output is Sprint Story
Page22 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Daily Scrum
• Driven by Scrum Master
• Time-boxed at 15 mins
• Synchronize activities and plan for the next 24 hours
• Each Development Team member will be asked the questions:
– What has been done yesterday?
– What needs to be done today?
– What were the issues faced that prevented incremental progress to work?
• Highlights and promotes quick decision-making
• Improves communications and eliminate other meetings
Page23 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Sprint Review
• Time-boxed
• 4 hour review is to 1 month Sprint or 2 hour review is to 2 week Sprint
• Scrum Team and Stakeholders collaborate on what was done in the
Sprint.
• Informal meeting, NOT a status meeting. A demo of product is
presented
• Scrum Team discusses
• What went well during the sprint
• What were the issues faced
• What could be improved
• Output is a revised Product Story items for the next Sprint
Page24 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Sprint Retrospective
• Time-boxed
• 3 hour meeting is to 1 month Sprint or <2 hour meeting is to 2 week Sprint
• Main purpose
• Review how the previous Sprint went with respect to people, relationships, process, and tools
• Identify and order the major items that went well and potential improvements
• Create a plan for implementing improvements to the way how the Scrum Team does its work
Page25 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Scrum Timelines
Product Story Sprint Planning Sprint Sprint Review
Sprint
Retrospective
Business Input
Immediate
Driven by Product
Owner,
Stakeholders,
Scrum Master
Immediate
4 hours for 2 wk
Sprint
2 weeks
Daily Scrum
2 hours
2 hours
Immediate
<2 hours
Page26 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Hadoop Meets Scrum
Scrum Tools
Page27 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Scrum Tools: Go modern or Archaic
• Agile Software is available i.e. www.rallydev.com, etc.
• LCD Projector
• Whiteboard and colored markers
• Long, contiguous wall
• Clustered cubicles
• Index card
Page28 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Hadoop Meets Scrum
Supporting Scrum in Hadoop Development
Page29 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
What is needed in Hadoop to support Scrum?
• Multi-tenancy is critical
• Setup security -> LDAP/AD, Ranger, Kerberos, Knox
• Setup HDFS quota for each Scrum Team
• Setup Capacity Scheduler Queue for each Scrum Team or member
• High Availability is important but not critical
• Setup NN HA
• Setup RM HA
• Setup HiveServer2 HA
• Setup Hive Metastore HA
• Setup Multi HBase Master
• Setup Multi Knox Cluster
Page30 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
What is needed in Hadoop to support Scrum?
• Establish a habit of disciplined performance tuning of Hadoop regularly
• YARN, Hive, Tez, Spark, Kafka, Storm, Flume, HBase, Solr, Mapreduce, etc.
• Truncate logs regularly
• All Hadoop component logs
• Truncate when at 80% disk utilization
• Logs are a gold mine. Learn to interpret it correctly.
• Troubleshooting purposes
• Understanding how component operates, interoperate
• Turn off Hadoop services that are not needed
• Save cpu, memory, disk space
• Do not forget to turn on maintenance mode. Ask your Hadoop Admin why.
Page31 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
What is needed in Hadoop to support Scrum?
• Know your tools
Component Best used for
Sqoop Ingesting RDBMS tables into HDFS and/or Hive
Flume Ingesting flat files from network file systems or file servers. Capped at 400,000 records/sec
NFS Ingesting flat files from NFS based file servers. ONLY ingest less than 1GB per file
Kafka, Storm, HBase Realtime, Streaming and Online processing. Perfect for IoT, CEP. They all go together in realtime
systems.
Slider Deploying custom long running applications. i.e. Tomcat Apps, etc.
Spark Data science (Spark ML), Micro-batch Streaming (Spark Streaming)
MapReduce Only use it when Pig and Hive can’t do the job
Pig Perfect for ETL processing. Data mining and statistics (Apache DataFu)
Hive Reporting and Analytics. Data warehousing. Always use ORC!
Tez Never turn it off. Enable both for Pig and Hive for fast data processing
Falcon Process orchestration and data lineage
Knox, Kerberos, Ranger AuthN, AuthZ, Audit. Preventing impersonation.
Ambari Do NOT update config files manually. Use Ambari UI to make config changes in Hadoop.
Page32 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Let’s Scrum!
Putting Hadoop and Scrum to the test
Page33 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
The Project – HVAC Sensor Analytics
• Business wants to understand how the buildings are consuming energy
and wants to start with HVAC. They want to determine which HVAC
systems are working harder and prioritize for maintenance or
replacement.
• Determine which HVAC products have the highest temperature
deviation and order them by age.
• Recommend which buildings have the possible, poorest maintenance
practices
Page34 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
TODO
• Apply SCRUM principles and rules
• Properly size your team
• Break down the requirements into Product Story
• Determine Sprint Goal
• Generate one Spring Story
• Develop the app in Hive
• Any performance tuning to your tables and creates is a big +
Page35 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Rules
• Spend 15 minutes as Sprint Planning
• We will do a 2 hour Sprint
• We will do daily Scrum meeting (just once) in the middle of 2 hour
Sprint
• Spend 15 minutes Sprint Review
Page36 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Q&A…
Discussion

Contenu connexe

Tendances

Intro to Data Science for Non-Data Scientists
Intro to Data Science for Non-Data ScientistsIntro to Data Science for Non-Data Scientists
Intro to Data Science for Non-Data ScientistsSri Ambati
 
Cloud-native Enterprise Data Science Teams
Cloud-native Enterprise Data Science TeamsCloud-native Enterprise Data Science Teams
Cloud-native Enterprise Data Science TeamsBoston Consulting Group
 
Cloud-native Enterprise Data Science Teams
Cloud-native Enterprise Data Science TeamsCloud-native Enterprise Data Science Teams
Cloud-native Enterprise Data Science TeamsBoston Consulting Group
 
Keynote - An overview on Big Data & Data Science - Dr Gregory Piatetsky-Shapiro
Keynote -  An overview on Big Data & Data Science - Dr Gregory Piatetsky-ShapiroKeynote -  An overview on Big Data & Data Science - Dr Gregory Piatetsky-Shapiro
Keynote - An overview on Big Data & Data Science - Dr Gregory Piatetsky-ShapiroData ScienceTech Institute
 
Big Data Agile Analytics by Ken Collier - Director Agile Analytics, Thoughtwo...
Big Data Agile Analytics by Ken Collier - Director Agile Analytics, Thoughtwo...Big Data Agile Analytics by Ken Collier - Director Agile Analytics, Thoughtwo...
Big Data Agile Analytics by Ken Collier - Director Agile Analytics, Thoughtwo...Thoughtworks
 
Big Data [sorry] & Data Science: What Does a Data Scientist Do?
Big Data [sorry] & Data Science: What Does a Data Scientist Do?Big Data [sorry] & Data Science: What Does a Data Scientist Do?
Big Data [sorry] & Data Science: What Does a Data Scientist Do?Data Science London
 
Data science e machine learning
Data science e machine learningData science e machine learning
Data science e machine learningGiuseppe Manco
 
H2O for Medicine and Intro to H2O in Python
H2O for Medicine and Intro to H2O in PythonH2O for Medicine and Intro to H2O in Python
H2O for Medicine and Intro to H2O in PythonSri Ambati
 
Data science presentation 2nd CI day
Data science presentation 2nd CI dayData science presentation 2nd CI day
Data science presentation 2nd CI dayMohammed Barakat
 
Data Science Provenance: From Drug Discovery to Fake Fans
Data Science Provenance: From Drug Discovery to Fake FansData Science Provenance: From Drug Discovery to Fake Fans
Data Science Provenance: From Drug Discovery to Fake FansJameel Syed
 
2015 data-science-salary-survey
2015 data-science-salary-survey2015 data-science-salary-survey
2015 data-science-salary-surveyAdam Rabinovitch
 
H2O World - Machine Learning for non-data scientists
H2O World - Machine Learning for non-data scientistsH2O World - Machine Learning for non-data scientists
H2O World - Machine Learning for non-data scientistsSri Ambati
 
Big Data Science: Intro and Benefits
Big Data Science: Intro and BenefitsBig Data Science: Intro and Benefits
Big Data Science: Intro and BenefitsChandan Rajah
 
Applied Machine Learning for the IoT - Data Science Pop-up Seattle
Applied Machine Learning for the IoT - Data Science Pop-up SeattleApplied Machine Learning for the IoT - Data Science Pop-up Seattle
Applied Machine Learning for the IoT - Data Science Pop-up SeattleDomino Data Lab
 
Real time streaming analytics
Real time streaming analyticsReal time streaming analytics
Real time streaming analyticsAnirudh
 

Tendances (20)

Anaconda Data Science Collaboration
Anaconda Data Science CollaborationAnaconda Data Science Collaboration
Anaconda Data Science Collaboration
 
Intro to Data Science for Non-Data Scientists
Intro to Data Science for Non-Data ScientistsIntro to Data Science for Non-Data Scientists
Intro to Data Science for Non-Data Scientists
 
Beyond the Science Gateway
Beyond the Science GatewayBeyond the Science Gateway
Beyond the Science Gateway
 
Cloud-native Enterprise Data Science Teams
Cloud-native Enterprise Data Science TeamsCloud-native Enterprise Data Science Teams
Cloud-native Enterprise Data Science Teams
 
Cloud-native Enterprise Data Science Teams
Cloud-native Enterprise Data Science TeamsCloud-native Enterprise Data Science Teams
Cloud-native Enterprise Data Science Teams
 
Keynote - An overview on Big Data & Data Science - Dr Gregory Piatetsky-Shapiro
Keynote -  An overview on Big Data & Data Science - Dr Gregory Piatetsky-ShapiroKeynote -  An overview on Big Data & Data Science - Dr Gregory Piatetsky-Shapiro
Keynote - An overview on Big Data & Data Science - Dr Gregory Piatetsky-Shapiro
 
Big Data Agile Analytics by Ken Collier - Director Agile Analytics, Thoughtwo...
Big Data Agile Analytics by Ken Collier - Director Agile Analytics, Thoughtwo...Big Data Agile Analytics by Ken Collier - Director Agile Analytics, Thoughtwo...
Big Data Agile Analytics by Ken Collier - Director Agile Analytics, Thoughtwo...
 
Data Science
Data ScienceData Science
Data Science
 
Big Data [sorry] & Data Science: What Does a Data Scientist Do?
Big Data [sorry] & Data Science: What Does a Data Scientist Do?Big Data [sorry] & Data Science: What Does a Data Scientist Do?
Big Data [sorry] & Data Science: What Does a Data Scientist Do?
 
Data science e machine learning
Data science e machine learningData science e machine learning
Data science e machine learning
 
Data Science: Past, Present, and Future
Data Science: Past, Present, and FutureData Science: Past, Present, and Future
Data Science: Past, Present, and Future
 
H2O for Medicine and Intro to H2O in Python
H2O for Medicine and Intro to H2O in PythonH2O for Medicine and Intro to H2O in Python
H2O for Medicine and Intro to H2O in Python
 
Data science presentation 2nd CI day
Data science presentation 2nd CI dayData science presentation 2nd CI day
Data science presentation 2nd CI day
 
Data Science Provenance: From Drug Discovery to Fake Fans
Data Science Provenance: From Drug Discovery to Fake FansData Science Provenance: From Drug Discovery to Fake Fans
Data Science Provenance: From Drug Discovery to Fake Fans
 
Intro to Data Science by DatalentTeam at Data Science Clinic#11
Intro to Data Science by DatalentTeam at Data Science Clinic#11Intro to Data Science by DatalentTeam at Data Science Clinic#11
Intro to Data Science by DatalentTeam at Data Science Clinic#11
 
2015 data-science-salary-survey
2015 data-science-salary-survey2015 data-science-salary-survey
2015 data-science-salary-survey
 
H2O World - Machine Learning for non-data scientists
H2O World - Machine Learning for non-data scientistsH2O World - Machine Learning for non-data scientists
H2O World - Machine Learning for non-data scientists
 
Big Data Science: Intro and Benefits
Big Data Science: Intro and BenefitsBig Data Science: Intro and Benefits
Big Data Science: Intro and Benefits
 
Applied Machine Learning for the IoT - Data Science Pop-up Seattle
Applied Machine Learning for the IoT - Data Science Pop-up SeattleApplied Machine Learning for the IoT - Data Science Pop-up Seattle
Applied Machine Learning for the IoT - Data Science Pop-up Seattle
 
Real time streaming analytics
Real time streaming analyticsReal time streaming analytics
Real time streaming analytics
 

Similaire à Hadoop Meets Scrum

Scrum workshop for Project Managers
Scrum workshop for Project ManagersScrum workshop for Project Managers
Scrum workshop for Project ManagersJesse Houwing
 
Agile Training March 2015
Agile Training March 2015Agile Training March 2015
Agile Training March 2015David Phipps
 
Introducing the Enterprise Transformation Meta Model
Introducing the Enterprise Transformation Meta ModelIntroducing the Enterprise Transformation Meta Model
Introducing the Enterprise Transformation Meta ModelRenee Troughton
 
Standardization and strategy in agile
Standardization and strategy in agileStandardization and strategy in agile
Standardization and strategy in agileNaveen Gupta
 
A Reference Architecture to Enable Visibility and Traceability across the Ent...
A Reference Architecture to Enable Visibility and Traceability across the Ent...A Reference Architecture to Enable Visibility and Traceability across the Ent...
A Reference Architecture to Enable Visibility and Traceability across the Ent...CollabNet
 
Agile Software Development and DevOps 21092019
Agile Software Development and DevOps 21092019Agile Software Development and DevOps 21092019
Agile Software Development and DevOps 21092019Ahmed Misbah
 
Introduction to scrum
Introduction to scrumIntroduction to scrum
Introduction to scrumSemen Arslan
 
Understanding-Agile &Scrum.pdf
Understanding-Agile &Scrum.pdfUnderstanding-Agile &Scrum.pdf
Understanding-Agile &Scrum.pdfSwapnikaReddy6
 
The Dashlane Agile Journey
The Dashlane Agile JourneyThe Dashlane Agile Journey
The Dashlane Agile JourneyDashlane
 
Apache Hadoop YARN: Past, Present and Future
Apache Hadoop YARN: Past, Present and FutureApache Hadoop YARN: Past, Present and Future
Apache Hadoop YARN: Past, Present and FutureDataWorks Summit
 
Hortonworks Big Data Career Paths and Training
Hortonworks Big Data Career Paths and Training Hortonworks Big Data Career Paths and Training
Hortonworks Big Data Career Paths and Training Aengus Rooney
 
A Peek Under the Hood at FamilySearch - Presentation
A Peek Under the Hood at FamilySearch - PresentationA Peek Under the Hood at FamilySearch - Presentation
A Peek Under the Hood at FamilySearch - Presentationbakers84
 
Project Sherpa: How RightScale Went All in on Docker
Project Sherpa: How RightScale Went All in on DockerProject Sherpa: How RightScale Went All in on Docker
Project Sherpa: How RightScale Went All in on DockerRightScale
 
Emptying Your Cup an Agile Primer
Emptying Your Cup an Agile Primer Emptying Your Cup an Agile Primer
Emptying Your Cup an Agile Primer Todd Shelton
 
Software Agility.pptx
Software Agility.pptxSoftware Agility.pptx
Software Agility.pptxZaid Shabbir
 

Similaire à Hadoop Meets Scrum (20)

Scrum workshop for Project Managers
Scrum workshop for Project ManagersScrum workshop for Project Managers
Scrum workshop for Project Managers
 
Agile Training March 2015
Agile Training March 2015Agile Training March 2015
Agile Training March 2015
 
Introducing the Enterprise Transformation Meta Model
Introducing the Enterprise Transformation Meta ModelIntroducing the Enterprise Transformation Meta Model
Introducing the Enterprise Transformation Meta Model
 
Agile tutorial
Agile tutorialAgile tutorial
Agile tutorial
 
Afganistan Culture Shock
Afganistan Culture ShockAfganistan Culture Shock
Afganistan Culture Shock
 
Standardization and strategy in agile
Standardization and strategy in agileStandardization and strategy in agile
Standardization and strategy in agile
 
A Reference Architecture to Enable Visibility and Traceability across the Ent...
A Reference Architecture to Enable Visibility and Traceability across the Ent...A Reference Architecture to Enable Visibility and Traceability across the Ent...
A Reference Architecture to Enable Visibility and Traceability across the Ent...
 
Agile Software Development and DevOps 21092019
Agile Software Development and DevOps 21092019Agile Software Development and DevOps 21092019
Agile Software Development and DevOps 21092019
 
WMS Overview
WMS OverviewWMS Overview
WMS Overview
 
Introduction to scrum
Introduction to scrumIntroduction to scrum
Introduction to scrum
 
Understanding-Agile &Scrum.pdf
Understanding-Agile &Scrum.pdfUnderstanding-Agile &Scrum.pdf
Understanding-Agile &Scrum.pdf
 
The Dashlane Agile Journey
The Dashlane Agile JourneyThe Dashlane Agile Journey
The Dashlane Agile Journey
 
Apache Hadoop YARN: Past, Present and Future
Apache Hadoop YARN: Past, Present and FutureApache Hadoop YARN: Past, Present and Future
Apache Hadoop YARN: Past, Present and Future
 
Agile Scrum CMMI
Agile Scrum CMMIAgile Scrum CMMI
Agile Scrum CMMI
 
Hortonworks Big Data Career Paths and Training
Hortonworks Big Data Career Paths and Training Hortonworks Big Data Career Paths and Training
Hortonworks Big Data Career Paths and Training
 
A Peek Under the Hood at FamilySearch - Presentation
A Peek Under the Hood at FamilySearch - PresentationA Peek Under the Hood at FamilySearch - Presentation
A Peek Under the Hood at FamilySearch - Presentation
 
Project Sherpa: How RightScale Went All in on Docker
Project Sherpa: How RightScale Went All in on DockerProject Sherpa: How RightScale Went All in on Docker
Project Sherpa: How RightScale Went All in on Docker
 
Emptying Your Cup an Agile Primer
Emptying Your Cup an Agile Primer Emptying Your Cup an Agile Primer
Emptying Your Cup an Agile Primer
 
Scrum 101
Scrum 101 Scrum 101
Scrum 101
 
Software Agility.pptx
Software Agility.pptxSoftware Agility.pptx
Software Agility.pptx
 

Plus de Rommel Garcia

The of Operational Analytics Data Store
The of Operational Analytics Data StoreThe of Operational Analytics Data Store
The of Operational Analytics Data StoreRommel Garcia
 
Apache Druid: The Foundation of Fortune 500 “Analytical Decision-Making"
Apache Druid: The Foundation of Fortune 500 “Analytical Decision-Making"Apache Druid: The Foundation of Fortune 500 “Analytical Decision-Making"
Apache Druid: The Foundation of Fortune 500 “Analytical Decision-Making"Rommel Garcia
 
What does Netflix, NTT and Rubicon Project have in common? Apache Druid.
What does Netflix, NTT and Rubicon Project have in common? Apache Druid.What does Netflix, NTT and Rubicon Project have in common? Apache Druid.
What does Netflix, NTT and Rubicon Project have in common? Apache Druid.Rommel Garcia
 
GPU 101: The Beast In Data Centers
GPU 101: The Beast In Data CentersGPU 101: The Beast In Data Centers
GPU 101: The Beast In Data CentersRommel Garcia
 
PCI Compliane With Hadoop
PCI Compliane With HadoopPCI Compliane With Hadoop
PCI Compliane With HadoopRommel Garcia
 
Open Source Security Tools for Big Data
Open Source Security Tools for Big DataOpen Source Security Tools for Big Data
Open Source Security Tools for Big DataRommel Garcia
 
Realtime analytics + hadoop 2.0
Realtime analytics + hadoop 2.0Realtime analytics + hadoop 2.0
Realtime analytics + hadoop 2.0Rommel Garcia
 
Interactive query in hadoop
Interactive query in hadoopInteractive query in hadoop
Interactive query in hadoopRommel Garcia
 
YARN - Presented At Dallas Hadoop User Group
YARN - Presented At Dallas Hadoop User GroupYARN - Presented At Dallas Hadoop User Group
YARN - Presented At Dallas Hadoop User GroupRommel Garcia
 

Plus de Rommel Garcia (12)

The of Operational Analytics Data Store
The of Operational Analytics Data StoreThe of Operational Analytics Data Store
The of Operational Analytics Data Store
 
Apache Druid: The Foundation of Fortune 500 “Analytical Decision-Making"
Apache Druid: The Foundation of Fortune 500 “Analytical Decision-Making"Apache Druid: The Foundation of Fortune 500 “Analytical Decision-Making"
Apache Druid: The Foundation of Fortune 500 “Analytical Decision-Making"
 
What does Netflix, NTT and Rubicon Project have in common? Apache Druid.
What does Netflix, NTT and Rubicon Project have in common? Apache Druid.What does Netflix, NTT and Rubicon Project have in common? Apache Druid.
What does Netflix, NTT and Rubicon Project have in common? Apache Druid.
 
GPU 101: The Beast In Data Centers
GPU 101: The Beast In Data CentersGPU 101: The Beast In Data Centers
GPU 101: The Beast In Data Centers
 
PCI Compliane With Hadoop
PCI Compliane With HadoopPCI Compliane With Hadoop
PCI Compliane With Hadoop
 
Virtualizing Hadoop
Virtualizing HadoopVirtualizing Hadoop
Virtualizing Hadoop
 
Open Source Security Tools for Big Data
Open Source Security Tools for Big DataOpen Source Security Tools for Big Data
Open Source Security Tools for Big Data
 
Apache Ranger
Apache RangerApache Ranger
Apache Ranger
 
Realtime analytics + hadoop 2.0
Realtime analytics + hadoop 2.0Realtime analytics + hadoop 2.0
Realtime analytics + hadoop 2.0
 
Interactive query in hadoop
Interactive query in hadoopInteractive query in hadoop
Interactive query in hadoop
 
YARN - Presented At Dallas Hadoop User Group
YARN - Presented At Dallas Hadoop User GroupYARN - Presented At Dallas Hadoop User Group
YARN - Presented At Dallas Hadoop User Group
 
Hadoop 1.x vs 2
Hadoop 1.x vs 2Hadoop 1.x vs 2
Hadoop 1.x vs 2
 

Dernier

5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdfWave PLM
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVshikhaohhpro
 
Exploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdfExploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdfproinshot.com
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfkalichargn70th171
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...panagenda
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️Delhi Call girls
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...ICS
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsAlberto González Trastoy
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerThousandEyes
 
How to Choose the Right Laravel Development Partner in New York City_compress...
How to Choose the Right Laravel Development Partner in New York City_compress...How to Choose the Right Laravel Development Partner in New York City_compress...
How to Choose the Right Laravel Development Partner in New York City_compress...software pro Development
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Modelsaagamshah0812
 
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfintroduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfVishalKumarJha10
 
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) SolutionIntroducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) SolutionOnePlan Solutions
 
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
Direct Style Effect Systems -The Print[A] Example- A Comprehension AidDirect Style Effect Systems -The Print[A] Example- A Comprehension Aid
Direct Style Effect Systems - The Print[A] Example - A Comprehension AidPhilip Schwarz
 
8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech studentsHimanshiGarg82
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsAndolasoft Inc
 
Define the academic and professional writing..pdf
Define the academic and professional writing..pdfDefine the academic and professional writing..pdf
Define the academic and professional writing..pdfPearlKirahMaeRagusta1
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplatePresentation.STUDIO
 

Dernier (20)

5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 
Exploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdfExploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdf
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
 
How to Choose the Right Laravel Development Partner in New York City_compress...
How to Choose the Right Laravel Development Partner in New York City_compress...How to Choose the Right Laravel Development Partner in New York City_compress...
How to Choose the Right Laravel Development Partner in New York City_compress...
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS LiveVip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
 
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfintroduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
 
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) SolutionIntroducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
 
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
Direct Style Effect Systems -The Print[A] Example- A Comprehension AidDirect Style Effect Systems -The Print[A] Example- A Comprehension Aid
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
 
8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.js
 
Define the academic and professional writing..pdf
Define the academic and professional writing..pdfDefine the academic and professional writing..pdf
Define the academic and professional writing..pdf
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation Template
 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 

Hadoop Meets Scrum

  • 1. Page1 © Hortonworks Inc. 2011 – 2014. All Rights Reserved The Elephant Meets Scrum Rommel Garcia
  • 2. Page2 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Agenda Control access into system Flexibility in defining policies • Introductions • Why Scrum? • Scrum Basic Concepts • Scrum Team • Scrum Framework • Hadoop Meets Scrum • Scrum Exercise • Open Forum
  • 3. Page3 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Introductions What’s your name? What’s your role? Why are you here?
  • 4. Page4 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Why Scrum? Nobody wants to fail too big….too co$tly…on projects.
  • 5. Page5 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Monolithic SDLC • Small change, impacts everything • Cost of failure, extremely big • Slow, unpredictable progress • Hard to prioritize • Not business friendly
  • 6. Page6 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Scrum.. • Produces immediate results • Makes the development team nimble and adaptable • Full visibility on development process • Is a perfect fit for Hadoop • Hadoop provides isolation of data and processing (HDFS and YARN respectively) • Failure in Hadoop is cheap • Complete traceability of apps deployed, run, tested by whom, when, where
  • 7. Page7 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Scrum Concepts Agile. Iterative. Adaptive. Fast results.
  • 8. Page8 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Scrum is.. • A framework within which people can address complex adaptive problems, while productively and creatively delivering products of the highest possible value • A framework to employ various processes and techniques • Lightweight • Simple to understand • Difficult to master….if RULES are not followed religiously
  • 9. Page9 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Success of SCRUM depends on.. • Transparency • Common language must be shared by all team members • What does “Done” mean?? • Inspection • Frequent Scrum artifacts progress check • But be careful not to overdo it or it gets in the way of work • Adaptation • Adjust properly and timely when process deviates outside of acceptable limits • Adjust immediately to prevent further deviation
  • 10. Page10 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Scrum Formal Events 1. Sprint Planning 2. Daily Scrum 3. Sprint Review 4. Sprint Retrospective
  • 11. Page11 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Scrum consists of.. • Team • Roles • Events • Artifacts • Rules
  • 12. Page12 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Scrum Team Committed or Involved.
  • 13. Page13 © Hortonworks Inc. 2011 – 2014. All Rights Reserved The ‘Ham-n-Eggs’ Paradigm
  • 14. Page14 © Hortonworks Inc. 2011 – 2014. All Rights Reserved The Team • Product Owner • Development Team • Scrum Master
  • 15. Page15 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Product Owner • Mainly responsible for Product Story management • Clearly defines Product Story items • Effectively order items in Product Story • Ensures Product Story is visible, transparent, and clear to all, and shows what the Scrum Team will work on next • Validates with Scrum team that they understand the items in the Product Story • In real world, this could be either the Project Manager, Program Manager, Development Manager, or Product Manager
  • 16. Page16 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Development Team • Self-organizing • They decide how to produce and release incremental releasable functionality • Scrum Master has no influence on how the team develop functionality • Cross-functional • Pig, Hive, HDFS, YARN, and more • Develop and release features faster • Accountability belongs to the Development Team as a whole • Team size: >=3 but <=9 • Normally composed of Hadoop Developer, Hadoop Architect, Data Scientist, Data Analyst, QA.
  • 17. Page17 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Scrum Master • Ensures Scrum theory, practices, and rules are enacted • Servant-leader for the Scrum Team • Coach Development Team in self-organization and cross-functionality • Remove impediments to Development team’s progress • Serves the Product Owner • Find techniques for effective Product Story management • Help with clear, concise definition of Product Story items • Ensures Product Owner knows how to arrange Product Story to maximize value • Facilitate Scrum events as requested/needed • Serves the Organization – Leading Scrum adoption – Work with other Scrum Masters to increase effectiveness of Scrum application in the organization
  • 18. Page18 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Scrum Framework Fail fast in Hadoop. Move fast with Scrum.
  • 19. Page19 © Hortonworks Inc. 2011 – 2014. All Rights Reserved The Sprint • It is the heart of Scrum • Time-boxed at 1 month or less. 2 weeks is pretty common. • New Sprint starts immediately after conclusion of previous Sprint • Consists of • Sprint Planning • Daily Scrums • Development Work • Sprint Review • Sprint Retrospective
  • 20. Page20 © Hortonworks Inc. 2011 – 2014. All Rights Reserved During the Sprint • No changes are made that would compromise the Sprint Goal • Quality goals do NOT decrease • Scope may be clarified and re-negotiated between Product Owner and Development team as more is learned • ONLY Product Owner has the authority to cancel a Sprint
  • 21. Page21 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Sprint Planning • Time-boxed • 8 hours planning is to 1 month of Sprint or 2 hours of planning is to 2 weeks of Sprint • Answers the questions: • What can be done this Sprint? – Development Team forecasts what Product Story items it will deliver – Output is Sprint Goal • How will the chosen work get done? – Development Team determines how to deliver the increments – Output is Sprint Story
  • 22. Page22 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Daily Scrum • Driven by Scrum Master • Time-boxed at 15 mins • Synchronize activities and plan for the next 24 hours • Each Development Team member will be asked the questions: – What has been done yesterday? – What needs to be done today? – What were the issues faced that prevented incremental progress to work? • Highlights and promotes quick decision-making • Improves communications and eliminate other meetings
  • 23. Page23 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Sprint Review • Time-boxed • 4 hour review is to 1 month Sprint or 2 hour review is to 2 week Sprint • Scrum Team and Stakeholders collaborate on what was done in the Sprint. • Informal meeting, NOT a status meeting. A demo of product is presented • Scrum Team discusses • What went well during the sprint • What were the issues faced • What could be improved • Output is a revised Product Story items for the next Sprint
  • 24. Page24 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Sprint Retrospective • Time-boxed • 3 hour meeting is to 1 month Sprint or <2 hour meeting is to 2 week Sprint • Main purpose • Review how the previous Sprint went with respect to people, relationships, process, and tools • Identify and order the major items that went well and potential improvements • Create a plan for implementing improvements to the way how the Scrum Team does its work
  • 25. Page25 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Scrum Timelines Product Story Sprint Planning Sprint Sprint Review Sprint Retrospective Business Input Immediate Driven by Product Owner, Stakeholders, Scrum Master Immediate 4 hours for 2 wk Sprint 2 weeks Daily Scrum 2 hours 2 hours Immediate <2 hours
  • 26. Page26 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Hadoop Meets Scrum Scrum Tools
  • 27. Page27 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Scrum Tools: Go modern or Archaic • Agile Software is available i.e. www.rallydev.com, etc. • LCD Projector • Whiteboard and colored markers • Long, contiguous wall • Clustered cubicles • Index card
  • 28. Page28 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Hadoop Meets Scrum Supporting Scrum in Hadoop Development
  • 29. Page29 © Hortonworks Inc. 2011 – 2014. All Rights Reserved What is needed in Hadoop to support Scrum? • Multi-tenancy is critical • Setup security -> LDAP/AD, Ranger, Kerberos, Knox • Setup HDFS quota for each Scrum Team • Setup Capacity Scheduler Queue for each Scrum Team or member • High Availability is important but not critical • Setup NN HA • Setup RM HA • Setup HiveServer2 HA • Setup Hive Metastore HA • Setup Multi HBase Master • Setup Multi Knox Cluster
  • 30. Page30 © Hortonworks Inc. 2011 – 2014. All Rights Reserved What is needed in Hadoop to support Scrum? • Establish a habit of disciplined performance tuning of Hadoop regularly • YARN, Hive, Tez, Spark, Kafka, Storm, Flume, HBase, Solr, Mapreduce, etc. • Truncate logs regularly • All Hadoop component logs • Truncate when at 80% disk utilization • Logs are a gold mine. Learn to interpret it correctly. • Troubleshooting purposes • Understanding how component operates, interoperate • Turn off Hadoop services that are not needed • Save cpu, memory, disk space • Do not forget to turn on maintenance mode. Ask your Hadoop Admin why.
  • 31. Page31 © Hortonworks Inc. 2011 – 2014. All Rights Reserved What is needed in Hadoop to support Scrum? • Know your tools Component Best used for Sqoop Ingesting RDBMS tables into HDFS and/or Hive Flume Ingesting flat files from network file systems or file servers. Capped at 400,000 records/sec NFS Ingesting flat files from NFS based file servers. ONLY ingest less than 1GB per file Kafka, Storm, HBase Realtime, Streaming and Online processing. Perfect for IoT, CEP. They all go together in realtime systems. Slider Deploying custom long running applications. i.e. Tomcat Apps, etc. Spark Data science (Spark ML), Micro-batch Streaming (Spark Streaming) MapReduce Only use it when Pig and Hive can’t do the job Pig Perfect for ETL processing. Data mining and statistics (Apache DataFu) Hive Reporting and Analytics. Data warehousing. Always use ORC! Tez Never turn it off. Enable both for Pig and Hive for fast data processing Falcon Process orchestration and data lineage Knox, Kerberos, Ranger AuthN, AuthZ, Audit. Preventing impersonation. Ambari Do NOT update config files manually. Use Ambari UI to make config changes in Hadoop.
  • 32. Page32 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Let’s Scrum! Putting Hadoop and Scrum to the test
  • 33. Page33 © Hortonworks Inc. 2011 – 2014. All Rights Reserved The Project – HVAC Sensor Analytics • Business wants to understand how the buildings are consuming energy and wants to start with HVAC. They want to determine which HVAC systems are working harder and prioritize for maintenance or replacement. • Determine which HVAC products have the highest temperature deviation and order them by age. • Recommend which buildings have the possible, poorest maintenance practices
  • 34. Page34 © Hortonworks Inc. 2011 – 2014. All Rights Reserved TODO • Apply SCRUM principles and rules • Properly size your team • Break down the requirements into Product Story • Determine Sprint Goal • Generate one Spring Story • Develop the app in Hive • Any performance tuning to your tables and creates is a big +
  • 35. Page35 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Rules • Spend 15 minutes as Sprint Planning • We will do a 2 hour Sprint • We will do daily Scrum meeting (just once) in the middle of 2 hour Sprint • Spend 15 minutes Sprint Review
  • 36. Page36 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Q&A… Discussion