Accueil
Explorer
Soumettre la recherche
Mettre en ligne
S’identifier
S’inscrire
Publicité
Check these out next
Tuning Apache Ambari performance for Big Data at scale with 3000 agents
DataWorks Summit
HadoopCon2015 Multi-Cluster Live Synchronization with Kerberos Federated Hadoop
Yafang Chang
Ingest and Stream Processing - What will you choose?
DataWorks Summit/Hadoop Summit
Unified Batch & Stream Processing with Apache Samza
DataWorks Summit
Hadoop Operations - Best Practices from the Field
DataWorks Summit
Hadoop & cloud storage object store integration in production (final)
Chris Nauroth
Hadoop - Lessons Learned
tcurdt
Lessons learned from scaling YARN to 40K machines in a multi tenancy environment
DataWorks Summit
1
sur
18
Top clipped slide
Hadoop engineering bo_f_final
12 Jun 2015
•
0 j'aime
0 j'aime
×
Soyez le premier à aimer ceci
afficher plus
•
1,993 vues
vues
×
Nombre de vues
0
Sur Slideshare
0
À partir des intégrations
0
Nombre d'intégrations
0
Télécharger maintenant
Télécharger pour lire hors ligne
Signaler
Ingénierie
Best Practices in Hadoop Engineering: operations, quality and releases
Ramya Sunil
Suivre
Senior Software Engineer in Test, Hadoop | Hortonworks à The Apache Software Foundation
Publicité
Publicité
Publicité
Recommandé
Hadoop operations-2015-hadoop-summit-san-jose-v5
Chris Nauroth
3.5K vues
•
38 diapositives
The Future of Apache Storm
DataWorks Summit/Hadoop Summit
1.6K vues
•
65 diapositives
CBlocks - Posix compliant files systems for HDFS
DataWorks Summit
677 vues
•
36 diapositives
Flexible and Real-Time Stream Processing with Apache Flink
DataWorks Summit
2.2K vues
•
38 diapositives
How to overcome mysterious problems caused by large and multi-tenancy Hadoop ...
DataWorks Summit/Hadoop Summit
6.6K vues
•
54 diapositives
Optimizing, profiling and deploying high performance Spark ML and TensorFlow ...
DataWorks Summit
1.3K vues
•
106 diapositives
Contenu connexe
Présentations pour vous
(20)
Tuning Apache Ambari performance for Big Data at scale with 3000 agents
DataWorks Summit
•
2.2K vues
HadoopCon2015 Multi-Cluster Live Synchronization with Kerberos Federated Hadoop
Yafang Chang
•
1.3K vues
Ingest and Stream Processing - What will you choose?
DataWorks Summit/Hadoop Summit
•
1.8K vues
Unified Batch & Stream Processing with Apache Samza
DataWorks Summit
•
2.9K vues
Hadoop Operations - Best Practices from the Field
DataWorks Summit
•
5K vues
Hadoop & cloud storage object store integration in production (final)
Chris Nauroth
•
2.7K vues
Hadoop - Lessons Learned
tcurdt
•
3.5K vues
Lessons learned from scaling YARN to 40K machines in a multi tenancy environment
DataWorks Summit
•
1.1K vues
How the Internet of Things are Turning the Internet Upside Down
DataWorks Summit
•
974 vues
HadoopCon- Trend Micro SPN Hadoop Overview
Yafang Chang
•
1.3K vues
ORC 2015: Faster, Better, Smaller
The Apache Software Foundation
•
1.5K vues
High Availability for HBase Tables - Past, Present, and Future
DataWorks Summit
•
2.7K vues
Simplified Cluster Operation & Troubleshooting
DataWorks Summit/Hadoop Summit
•
1.3K vues
Evolving HDFS to Generalized Storage Subsystem
DataWorks Summit/Hadoop Summit
•
1.5K vues
HDFS Tiered Storage: Mounting Object Stores in HDFS
DataWorks Summit/Hadoop Summit
•
1.8K vues
Storage and-compute-hdfs-map reduce
Chris Nauroth
•
654 vues
LLAP: Sub-Second Analytical Queries in Hive
DataWorks Summit/Hadoop Summit
•
6.9K vues
A Container-based Sizing Framework for Apache Hadoop/Spark Clusters
DataWorks Summit/Hadoop Summit
•
2.4K vues
Taming the Elephant: Efficient and Effective Apache Hadoop Management
DataWorks Summit/Hadoop Summit
•
1.2K vues
Keep your hadoop cluster at its best! v4
Chris Nauroth
•
791 vues
Similaire à Hadoop engineering bo_f_final
(20)
Getting to Walk with DevOps
Eklove Mohan
•
318 vues
Docker for the enterprise
Bert Poller
•
797 vues
Ankit Chohan - Java
Ankit Chohan
•
324 vues
DevOps for Big Data - Data 360 2014 Conference
Grid Dynamics
•
3.2K vues
Midwest PHP - Scaling Magento
Mathew Beane
•
1.2K vues
StarlingX - A Platform for the Distributed Edge | Ildiko Vancsa
Vietnam Open Infrastructure User Group
•
124 vues
Sanger, upcoming Openstack for Bio-informaticians
Peter Clapham
•
270 vues
Flexible compute
Peter Clapham
•
815 vues
Open shift and docker - october,2014
Hojoong Kim
•
10K vues
Oracle Cloud DBaaS
Arush Jain
•
748 vues
An Open Source Workbench for Prototyping Multimodal Interactions Based on Off...
Jean Vanderdonckt
•
484 vues
Modern Web-site Development Pipeline
GlobalLogic Ukraine
•
797 vues
Apache Tez – Present and Future
Rajesh Balamohan
•
999 vues
Apache Tez – Present and Future
Jianfeng Zhang
•
616 vues
PP_Eric_Gandt
Eric Gandt
•
278 vues
Performance of Microservice Frameworks on different JVMs
Maarten Smeets
•
212 vues
Red Hat for IBM System z IBM Enterprise2014 Las Vegas
Filipe Miranda
•
1.6K vues
Devops architecture
Ojasvi Jagtap
•
5.2K vues
Testing Below the Application
Ash Winter
•
81 vues
Cloudexpowest opensourcecloudcomputing-1by arun kumar
Arun Kumar
•
148 vues
Publicité
Dernier
(20)
Computational Intelligence Assisted Engineering Design Optimization (using MA...
AmirParnianifard1
•
0 vue
SELF CURING CONCRETE
IRJET Journal
•
0 vue
JacobK-PneumaticConveyingPDFmin.pdf
GeorgeMarkas1
•
0 vue
MP Terms Of Business .pdf
AngelikiMavroeidi1
•
0 vue
Unit - III - Mix Proportioning.pptx
AbishekKumar81
•
0 vue
lu-agile-services.pdf
ROMANANDRESCancelado
•
0 vue
Experimental Investigation on Durability Properties of Silica Fume blended Hi...
IRJET Journal
•
0 vue
Intrusion Detection System Using Face Recognition
IRJET Journal
•
0 vue
SHAPER, MILLING AND BROACHING MACHINES.ppt
DineshKumar4165
•
0 vue
THEORY OF METAL CUTTING.ppt
DineshKumar4165
•
0 vue
Smart Safety Vest For Miners
IRJET Journal
•
0 vue
Electronic Passport Verification System using IOT
IRJET Journal
•
0 vue
Power Generation-10n.ppt
SubratSahu57
•
0 vue
Strength Improvement in the Soil Using Waste Materials
IRJET Journal
•
0 vue
CENTRE LATHE AND SPECIAL PURPOSE LATHES.ppt
DineshKumar4165
•
0 vue
RUBBERISED FIBRE REINFORCED CONCRETE
IRJET Journal
•
0 vue
Structural Glass Presentation.pptx
ssuser891777
•
0 vue
Lecture (CMOS).pptx
SahdevChandraSwarnak1
•
0 vue
ABRASIVE PROCESSES.pptx
DineshKumar4165
•
0 vue
PCE Connect
IRJET Journal
•
0 vue
Hadoop engineering bo_f_final
© Hortonworks Inc.
2011 Hadoop Engineering Best Practices Raja Aluri, Release Eng Deepesh Khandelwal, Quality Eng Ramya Sunil, Quality Eng Page 1
© Hortonworks Inc.
2011 Agenda • Source Mechanics • Why do System Testing? • Test Matrix • Automated Testing Flow • Test Planning • Planning your own System Testing • Q & A Page 2 Architecting the Future of Big Data
© Hortonworks Inc.
2011 Apache Hortonworks Partner Source Mechanics • Hortonworks Open Source Philosophy • How we do Apache first development • How we incorporate fixes or features that did not make into apache yet • How we integrate our partner contributions to the source code • Bookkeeping of the delta between apache and Hortonworks Page 3 Architecting the Future of Big Data
© Hortonworks Inc.
2011 Apache-Hortonworks-Partner Source flow Page 4 Architecting the Future of Big Data Partner ApacheRef HDPRef Partner HWX ApacheRef HDP Apache Git Hadoopbranch-2 Hadoopbranch-2.4 Issue Type Course of Action Normal Issue Patch in Apache first Urgent Issue Patch in HWX Repo first Read-Write Repository Read-Only Repository Continuous Merges Continuous Merges HDP Build CI HDP Package Repo HDP Maven Repository Publish Releases QE Workflow for Testing
© Hortonworks Inc.
2011 Unit Testing • Test individual parts of the program in isolation, white-box testing • Homogeneous cluster, usually in-memory • One configuration, usually 1 operating system and unsecure • Limited dataset, usually few kilobytes Page 5 Architecting the Future of Big Data Unit testing component A Unit testing component C Unit testing component B ?? ?? ?? ?? DB Interaction Concurrent user interaction Third party connectors ?? ?? ??
© Hortonworks Inc.
2011 System Testing • Mimics production environment – Multiple nodes in the cluster – Multiple concurrent users – Different workloads • Multiple configurations to test • Large dataset, more complex and richer • Encompasses different types of testing – Functional – Performance, Stress and Reliability – High Availability – Backwards Compatibility – Integration testing – Third party connectors – Upgrade testing Page 6 Architecting the Future of Big Data
© Hortonworks Inc.
2011 System Testing cont... • Heterogeneous testing – Cross version testing – Cross operating system testing – Hardware configs like Disk and CPU – Security settings, level of encryption Page 7 Architecting the Future of Big Data
© Hortonworks Inc.
2011 Test Matrix • Total of ~15000+ configurations to test! Page 8 Architecting the Future of Big Data OS •CentOS •SuSE •Debian •Ubuntu •Windows JDK •Oracle JDK •OpenJDK •Different version - 1.6.x, 1.7.x, 1.8.x Security •Disabled •Enabled – MIT-only, AD-only, MIT-AD •Ranger - enabled/disabled Encryption •Wire encryption – enabled/disabled •Transparent Data Encryption – enabled/disabled DB •Mysql •Oracle •Postgres •MSSQL File system •HDFS •WASB •Other vendor specific FSs Others •Tez – enabled/disabled •Slider apps v/s standalone
© Hortonworks Inc.
2011 Automated Testing Flow Page 9 Architecting the Future of Big Data Build Job Apache Repos Internal Commits Staging Repo QE Deploy Trigger Provision VMs Deploy HDP Stack Test Setup & Execution Test analysis Continuous Integration Publishing Builds to staging repo Installer deploying bits from staging repo to test cluster Bug tracking system
© Hortonworks Inc.
2011 Test Planning 20+ components in the HDP stack and growing! Page 10 Architecting the Future of Big Data Test plan Internal developers Apache jiras and community forums Product Management Support tickets
© Hortonworks Inc.
2011 Planning your own QATS Architecting the Future of Big Data Page 11
© Hortonworks Inc.
2011 Typical user scenarios • Fresh install • Upgrade stack, going from an earlier release to a newer one • Migration, changing distributions • Applying changes to an existing cluster – Upgrading hardware in regards to CPU, memory, disks – Changing dependent software pieces like OS, JDK – Changing security settings like turning ON Kerberos, Encryption – Changing component configs in *-site.xml, enabling HA Page 12 Architecting the Future of Big Data
© Hortonworks Inc.
2011 Planning your own QATS Page 13 Architecting the Future of Big Data E2E automation Preparation phase • Collect requirements on the stack and workload • Identify appropriate hardware CI development phase • Build in- house CI system for deployment and testing Testing phase • Build basic acceptance tests • End to end automation for your application
© Hortonworks Inc.
2011 Preparation Phase • Collect the stack requirements – Identify all the stack components that will be installed including the third-party applications, connectors – Identify the installer – Identify configs • Hardware selection – Should be scaled appropriately to mimic production environment – Prefer multi-node than single-node with component services distributed • Collect workload information – Use actual workload whenever possible – If not, simulate the workload, some tools available – Use rumen to obtain jobtrace from existing clusters – Use gridmix to generate workload – Data set size and complexity – Number of concurrent users Page 14 Architecting the Future of Big Data
© Hortonworks Inc.
2011 CI Development phase • Implement a CI system – Modularize CI system, eg individual Jenkins jobs for provision, deploy and test • Determine the cadence of testing • Establish reporting Page 15 Architecting the Future of Big Data Provision cluster Deploy Test
© Hortonworks Inc.
2011 Testing Phase • Basic Acceptance Tests – Basic service check for individual deployed components – Basic acceptance tests to validate integrations • Establish baseline – to track performance of pipeline components in future • Compatibility tests (including apps, third party connectors, dashboards etc) • E2E automation to simulate production workloads Page 16 Architecting the Future of Big Data
© Hortonworks Inc.
2011 Q & A Page 17 Architecting the Future of Big Data
© Hortonworks Inc.
2011 Thank You! Architecting the Future of Big Data Page 18
Publicité