SlideShare une entreprise Scribd logo
1  sur  28
Bug bites Elephant?
Test-driven Quality Assurance
in Big Data Application Development
Dr. Dominik Benz, inovex GmbH
2013/06/03, Berlin Buzzwords
? TDD!
?
?
?
?
?
?
?
Write/execute tests, specify
acceptance criteria, …
2
Who speaks… … the Elephant language?
Class A
extends
Mapper…
ROI, $$, …
apt-get
install…
3
The road… … to Big Data QA
our Big Data
QA problem
the FitNesse
approach
test data
definition /
selection
job & workflow
control
result
inspection
4
QA problem Web Intelligence @ 1&1
DWH
Hadoop Cluster
~ 1 billion log
events / day,
~ 1 TB (thrift)
logfiles
chains of MR jobs,
running on
20 nodes / 8 cores /
96 GB RAM (CDH)
BI reporting, web
analytics, …
5
QA problem An exemplary workflow
Log Files
(thrift)
Log Files
(thrift)
Log Files
(thrift)
Inter-
mediate
result
(avro)
MR
job 1
… DWH
(RDBMS)
MR
job 2
create
(sample)
input data
create
(sample)
input data
? inspect
(binary)
formats
inspect
(binary)
formats
? control
workflows
control
workflows
?
method tests what? issues for our usecase
JUnit isolated functions no integration, Java syntax
MRUnit 1 mapper + 1 reducer „little“ integration, Java
syntax
iTest hadoop jobs/workflows Java / Groovy syntax
Scripts/CLI (manual) scripting/inspect. „script chaos“, syntax
6
QA problem Existing Approaches
FitNesse as suitable addition / solution!
7
The road… … to Big Data QA
Big Data QA
is different!
the FitNesse
approach
test data
definition /
selection
job & workflow
control
result
inspection
8
FitNesse In a nutshell
„fully integrated
standalone wiki and
acceptance testing
framework”
• „executable“ Wiki-
Pages (returning test
results)
• (almost) natural
language test
specification
• connection to SUT
via (Java-)“Fixtures“
9
FitNesse Architecture Overview
script |
check |
num results |
3 |
Browser
FitNesse
Server
public int
numResults
{ ... }
System under Test
Fixtures
„calling java methods from
wiki“, compare return values
Integrates with REST, Jenkins…
10
FitNesse An Exemplary Test
11
FitNesse Exemplary Test Source
!path /home/inovex/lib/*.jar
| script | Hadoop |
| upload | viewLog.csv | to hdfs | /testdata/ |
| hadoop job from jar | viewLog.jar | [...] |
| show | job output |
| check | number of output files | 3 |
12
FitNesse Hadoop Fixture Java Code
public class Hadoop {
public boolean uploadToHdfs(String localFile,
String remoteFile) {...}
public boolean hadoopJobFromJar(String jar,
String input, String output) {...}
public String jobOutput() {...}
public String numberOfOutputFiles() {...}
}
13
The road… … to Big Data QA
Big Data QA
is different!
Fitnesse Wiki
test execution!
test data
definition /
selection
job & workflow
control
result
inspection
14
Test Data CSV
‣ Big Data: Efficient data transfer among heterogeneous sources
‣ Define Interface via IDL, Compiler for many languages
15
Test Data Thrift
‣ Dev/Test Hadoop Cluster: Identical Hardware like Prod, but
fewer nodes
‣ (random/biased) sampling e.g. on daily basis
‣ Feedback loop:
‣ identify „special cases“ from real data
‣ include them in (manual) data definition
‣ Gradually increase test coverage / artefact quality
16
Test Data Real World Data
17
The road… … to Big Data QA
Big Data QA
is different!
FitNesse Wiki
test execution!
Define CSV /
thrift / real-
world test data!
job & workflow
control
result
inspection
‣ Execute arbitrary (shell) commands
‣ Mainly a wrapper around apache.commons.exec.CommandLine
18
Job Control Swiss Army Knife: Shell
‣ Hide complexity from test authors
‣ „define“ appropriate test language via (Java) method names
‣ re-use other fixtures (Shell, …) internally
19
Job Control Hadoop Fixture
‣ FitNesse allows to group tests into suites
‣ Can be used to simulate MR processing chains
‣ SetupSuite / TearDownSuite for creating /
destroying test conditions
‣ Tests can still be executed individually
20
Job Control Workflows & Suites
MR job 1
MR job 2
21
The road… … to Big Data QA
Big Data QA
is different!
FitNesse Wiki
test execution!
Define CSV /
thrift / real-
world data!
Use suites & fixtures
for jobs/workflows!
result
inspection
‣ Validate RDBMS contents (via JDBC)
‣ E.g. for checking the final result
‣ Or use Hive + Hive-Server to query raw data
22
Results Data Warehouse / Hive
‣ Execute arbitrary pig commands from Wiki page
‣ Inspect e.g. binary intermediate results (avro, …)
23
Results Pig
public class PigConsole extends PigServer {
public void loadAvroFileUsingAlias(String
filename, String alias) {
this.registerQuery(
alias + "= LOAD" + filename + "USING" +
AVRO_STORAGE_LOADER + ";");
}
}
24
Results Pig Fixture extends PigServer
25
Results Server Infrastructure
Fitnesse Master
TestEnvironments
ProjA ProjB
TestConfigurations
ProjA ProjB
dev qs live dev qs live
Import / edit
tests remotely
QS
ProjA Slave
Dev
ProjA Slave
Live
ProjA SlaveProjA
QS
ProjA Slave
Dev
ProjA Slave
Live
ProjA Slave
Import / edit
config remotely
dev qs live
26
Thank you! dominik.benz@inovex.de
Big Data QA
is different!
FitNesse Wiki
test execution!
Define CSV /
thrift / real-
world data!
Inspect results
via Pig/Hive
Use suites & fixtures
for jobs/workflows!
27
Want more? Inovex trains you!
Android Developer Training (3 days, Karlsruhe/München)
Hadoop Developer Training (3 days, Karlsruhe/Köln)
Certified Scrum Developer Training (5 days, Köln)
Pentaho Data Integration Training (4 days, München/Köln)
Liferay Portal-Admin Training (3 days, Karlsruhe)
Liferay Portal-Developer Training (4 days, Karlsruhe)
information and registration at
www.inovex.de/offene-trainings
28
Inovex @bbuzz
Stefan Kathrin Bernhard
Jörg
Andrew
Christian Christian

Contenu connexe

Tendances

RDKit UGM 2016: Higher Quality Chemical Depictions
RDKit UGM 2016: Higher Quality Chemical DepictionsRDKit UGM 2016: Higher Quality Chemical Depictions
RDKit UGM 2016: Higher Quality Chemical DepictionsNextMove Software
 
2014 09 30_sparkling_water_hands_on
2014 09 30_sparkling_water_hands_on2014 09 30_sparkling_water_hands_on
2014 09 30_sparkling_water_hands_onSri Ambati
 
A Beginner's Guide to Building Data Pipelines with Luigi
A Beginner's Guide to Building Data Pipelines with LuigiA Beginner's Guide to Building Data Pipelines with Luigi
A Beginner's Guide to Building Data Pipelines with LuigiGrowth Intelligence
 
Real-Time Integration Between MongoDB and SQL Databases
Real-Time Integration Between MongoDB and SQL Databases Real-Time Integration Between MongoDB and SQL Databases
Real-Time Integration Between MongoDB and SQL Databases MongoDB
 
Onyx data processing the clojure way
Onyx   data processing  the clojure wayOnyx   data processing  the clojure way
Onyx data processing the clojure wayBahadir Cambel
 
Deep Dumpster Diving
Deep Dumpster DivingDeep Dumpster Diving
Deep Dumpster DivingRonnBlack
 
Presentation of OrientDB v2.2 - Webinar
Presentation of OrientDB v2.2 - WebinarPresentation of OrientDB v2.2 - Webinar
Presentation of OrientDB v2.2 - WebinarOrient Technologies
 
Interactive Session on Sparkling Water
Interactive Session on Sparkling WaterInteractive Session on Sparkling Water
Interactive Session on Sparkling WaterSri Ambati
 

Tendances (10)

Ajax Lecture Notes
Ajax Lecture NotesAjax Lecture Notes
Ajax Lecture Notes
 
RDKit UGM 2016: Higher Quality Chemical Depictions
RDKit UGM 2016: Higher Quality Chemical DepictionsRDKit UGM 2016: Higher Quality Chemical Depictions
RDKit UGM 2016: Higher Quality Chemical Depictions
 
Meetup spark structured streaming
Meetup spark structured streamingMeetup spark structured streaming
Meetup spark structured streaming
 
2014 09 30_sparkling_water_hands_on
2014 09 30_sparkling_water_hands_on2014 09 30_sparkling_water_hands_on
2014 09 30_sparkling_water_hands_on
 
A Beginner's Guide to Building Data Pipelines with Luigi
A Beginner's Guide to Building Data Pipelines with LuigiA Beginner's Guide to Building Data Pipelines with Luigi
A Beginner's Guide to Building Data Pipelines with Luigi
 
Real-Time Integration Between MongoDB and SQL Databases
Real-Time Integration Between MongoDB and SQL Databases Real-Time Integration Between MongoDB and SQL Databases
Real-Time Integration Between MongoDB and SQL Databases
 
Onyx data processing the clojure way
Onyx   data processing  the clojure wayOnyx   data processing  the clojure way
Onyx data processing the clojure way
 
Deep Dumpster Diving
Deep Dumpster DivingDeep Dumpster Diving
Deep Dumpster Diving
 
Presentation of OrientDB v2.2 - Webinar
Presentation of OrientDB v2.2 - WebinarPresentation of OrientDB v2.2 - Webinar
Presentation of OrientDB v2.2 - Webinar
 
Interactive Session on Sparkling Water
Interactive Session on Sparkling WaterInteractive Session on Sparkling Water
Interactive Session on Sparkling Water
 

En vedette

Dudley dolan q-validus_big data_workshop_dcu_event_aqua_smart_march16_ final
Dudley dolan q-validus_big data_workshop_dcu_event_aqua_smart_march16_ finalDudley dolan q-validus_big data_workshop_dcu_event_aqua_smart_march16_ final
Dudley dolan q-validus_big data_workshop_dcu_event_aqua_smart_march16_ finalAquaSmartData
 
Metadata quality in digital repositories
Metadata quality in digital repositoriesMetadata quality in digital repositories
Metadata quality in digital repositoriesNikos Palavitsinis, PhD
 
Big Data Everywhere Chicago: The Big Data Imperative -- Discovering & Protect...
Big Data Everywhere Chicago: The Big Data Imperative -- Discovering & Protect...Big Data Everywhere Chicago: The Big Data Imperative -- Discovering & Protect...
Big Data Everywhere Chicago: The Big Data Imperative -- Discovering & Protect...BigDataEverywhere
 
Big data governance as a corporate governance imperative
Big data governance as a corporate governance imperativeBig data governance as a corporate governance imperative
Big data governance as a corporate governance imperativeGuy Pearce
 
Big Data Expo 2015 - Trillium software Big Data and the Data Quality
Big Data Expo 2015 - Trillium software Big Data and the Data QualityBig Data Expo 2015 - Trillium software Big Data and the Data Quality
Big Data Expo 2015 - Trillium software Big Data and the Data QualityBigDataExpo
 
Data donderdag data quality sas
Data donderdag data quality sasData donderdag data quality sas
Data donderdag data quality sasCre-Aid
 
Quality Approaches to Big Data
Quality Approaches to Big DataQuality Approaches to Big Data
Quality Approaches to Big DataPiet J.H. Daas
 
Big Data Quality Panel : Diachron Workshop @EDBT
Big Data Quality Panel: Diachron Workshop @EDBTBig Data Quality Panel: Diachron Workshop @EDBT
Big Data Quality Panel : Diachron Workshop @EDBTPaolo Missier
 
Data quality - The True Big Data Challenge
Data quality - The True Big Data ChallengeData quality - The True Big Data Challenge
Data quality - The True Big Data ChallengeStefan Kühn
 
Big data and the data quality imperative
Big data and the data quality imperativeBig data and the data quality imperative
Big data and the data quality imperativeTrillium Software
 
Strengthening the Quality of Big Data Implementations
Strengthening the Quality of Big Data ImplementationsStrengthening the Quality of Big Data Implementations
Strengthening the Quality of Big Data ImplementationsCognizant
 
Acquire Grow & Retain customers - The business imperative for Big Data
Acquire Grow & Retain customers - The business imperative for Big DataAcquire Grow & Retain customers - The business imperative for Big Data
Acquire Grow & Retain customers - The business imperative for Big DataIBM Software India
 
Towards Quality-Aware Development of Big Data Applications with DICE
Towards Quality-Aware Development of Big Data Applications with DICETowards Quality-Aware Development of Big Data Applications with DICE
Towards Quality-Aware Development of Big Data Applications with DICEPooyan Jamshidi
 
Big data for quality education
Big data for quality educationBig data for quality education
Big data for quality educationMalintha Adikari
 
Data profiling-best-practices
Data profiling-best-practicesData profiling-best-practices
Data profiling-best-practicesBlaise Cheuteu
 
DICE & Cloudify – Quality Big Data Made Easy
DICE & Cloudify – Quality Big Data Made EasyDICE & Cloudify – Quality Big Data Made Easy
DICE & Cloudify – Quality Big Data Made EasyCloudify Community
 
Data Quality Challenges to Big Data_Practical Insights_KPMG Presentation 20.4...
Data Quality Challenges to Big Data_Practical Insights_KPMG Presentation 20.4...Data Quality Challenges to Big Data_Practical Insights_KPMG Presentation 20.4...
Data Quality Challenges to Big Data_Practical Insights_KPMG Presentation 20.4...Hugo van Hoogstraten
 

En vedette (20)

Dudley dolan q-validus_big data_workshop_dcu_event_aqua_smart_march16_ final
Dudley dolan q-validus_big data_workshop_dcu_event_aqua_smart_march16_ finalDudley dolan q-validus_big data_workshop_dcu_event_aqua_smart_march16_ final
Dudley dolan q-validus_big data_workshop_dcu_event_aqua_smart_march16_ final
 
Metadata quality in digital repositories
Metadata quality in digital repositoriesMetadata quality in digital repositories
Metadata quality in digital repositories
 
Big Data Everywhere Chicago: The Big Data Imperative -- Discovering & Protect...
Big Data Everywhere Chicago: The Big Data Imperative -- Discovering & Protect...Big Data Everywhere Chicago: The Big Data Imperative -- Discovering & Protect...
Big Data Everywhere Chicago: The Big Data Imperative -- Discovering & Protect...
 
Big data governance as a corporate governance imperative
Big data governance as a corporate governance imperativeBig data governance as a corporate governance imperative
Big data governance as a corporate governance imperative
 
Big Data Expo 2015 - Trillium software Big Data and the Data Quality
Big Data Expo 2015 - Trillium software Big Data and the Data QualityBig Data Expo 2015 - Trillium software Big Data and the Data Quality
Big Data Expo 2015 - Trillium software Big Data and the Data Quality
 
Data donderdag data quality sas
Data donderdag data quality sasData donderdag data quality sas
Data donderdag data quality sas
 
Quality Approaches to Big Data
Quality Approaches to Big DataQuality Approaches to Big Data
Quality Approaches to Big Data
 
Big Data Quality Panel : Diachron Workshop @EDBT
Big Data Quality Panel: Diachron Workshop @EDBTBig Data Quality Panel: Diachron Workshop @EDBT
Big Data Quality Panel : Diachron Workshop @EDBT
 
Data quality - The True Big Data Challenge
Data quality - The True Big Data ChallengeData quality - The True Big Data Challenge
Data quality - The True Big Data Challenge
 
Big data and the data quality imperative
Big data and the data quality imperativeBig data and the data quality imperative
Big data and the data quality imperative
 
Strengthening the Quality of Big Data Implementations
Strengthening the Quality of Big Data ImplementationsStrengthening the Quality of Big Data Implementations
Strengthening the Quality of Big Data Implementations
 
Acquire Grow & Retain customers - The business imperative for Big Data
Acquire Grow & Retain customers - The business imperative for Big DataAcquire Grow & Retain customers - The business imperative for Big Data
Acquire Grow & Retain customers - The business imperative for Big Data
 
Towards Quality-Aware Development of Big Data Applications with DICE
Towards Quality-Aware Development of Big Data Applications with DICETowards Quality-Aware Development of Big Data Applications with DICE
Towards Quality-Aware Development of Big Data Applications with DICE
 
Big data for quality education
Big data for quality educationBig data for quality education
Big data for quality education
 
Data profiling-best-practices
Data profiling-best-practicesData profiling-best-practices
Data profiling-best-practices
 
Metadata Workshop
Metadata WorkshopMetadata Workshop
Metadata Workshop
 
HUGE List of IEP Goals
HUGE List of IEP Goals HUGE List of IEP Goals
HUGE List of IEP Goals
 
Big Data Trends
Big Data TrendsBig Data Trends
Big Data Trends
 
DICE & Cloudify – Quality Big Data Made Easy
DICE & Cloudify – Quality Big Data Made EasyDICE & Cloudify – Quality Big Data Made Easy
DICE & Cloudify – Quality Big Data Made Easy
 
Data Quality Challenges to Big Data_Practical Insights_KPMG Presentation 20.4...
Data Quality Challenges to Big Data_Practical Insights_KPMG Presentation 20.4...Data Quality Challenges to Big Data_Practical Insights_KPMG Presentation 20.4...
Data Quality Challenges to Big Data_Practical Insights_KPMG Presentation 20.4...
 

Similaire à Bug bites Elephant? Test-driven Quality Assurance in Big Data Application Development

QA Fest 2019. Дмитрий Собко. Testing Big Data solutions fast and furiously
QA Fest 2019. Дмитрий Собко. Testing Big Data solutions fast and furiouslyQA Fest 2019. Дмитрий Собко. Testing Big Data solutions fast and furiously
QA Fest 2019. Дмитрий Собко. Testing Big Data solutions fast and furiouslyQAFest
 
Testing Big Data solutions fast and furiously
Testing Big Data solutions fast and furiouslyTesting Big Data solutions fast and furiously
Testing Big Data solutions fast and furiouslyKatherine Golovinova
 
Productionalizing ML : Real Experience
Productionalizing ML : Real ExperienceProductionalizing ML : Real Experience
Productionalizing ML : Real ExperienceIhor Bobak
 
Test strategies for data processing pipelines, v2.0
Test strategies for data processing pipelines, v2.0Test strategies for data processing pipelines, v2.0
Test strategies for data processing pipelines, v2.0Lars Albertsson
 
Hadoop: Code Injection, Distributed Fault Injection
Hadoop: Code Injection, Distributed Fault InjectionHadoop: Code Injection, Distributed Fault Injection
Hadoop: Code Injection, Distributed Fault InjectionCloudera, Inc.
 
Java one 2010
Java one 2010Java one 2010
Java one 2010scdn
 
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010Bhupesh Bansal
 
Hadoop and Voldemort @ LinkedIn
Hadoop and Voldemort @ LinkedInHadoop and Voldemort @ LinkedIn
Hadoop and Voldemort @ LinkedInHadoop User Group
 
Inside Azure Diagnostics (DevLink 2014)
Inside Azure Diagnostics (DevLink 2014)Inside Azure Diagnostics (DevLink 2014)
Inside Azure Diagnostics (DevLink 2014)Michael Collier
 
sudoers: Benchmarking Hadoop with ALOJA
sudoers: Benchmarking Hadoop with ALOJAsudoers: Benchmarking Hadoop with ALOJA
sudoers: Benchmarking Hadoop with ALOJANicolas Poggi
 
What to expect from Java 9
What to expect from Java 9What to expect from Java 9
What to expect from Java 9Ivan Krylov
 
Inside Azure Diagnostics
Inside Azure DiagnosticsInside Azure Diagnostics
Inside Azure DiagnosticsMichael Collier
 
Scalable and Flexible Machine Learning With Scala @ LinkedIn
Scalable and Flexible Machine Learning With Scala @ LinkedInScalable and Flexible Machine Learning With Scala @ LinkedIn
Scalable and Flexible Machine Learning With Scala @ LinkedInVitaly Gordon
 
Hadoop & Big Data benchmarking
Hadoop & Big Data benchmarkingHadoop & Big Data benchmarking
Hadoop & Big Data benchmarkingBart Vandewoestyne
 
Start with version control and experiments management in machine learning
Start with version control and experiments management in machine learningStart with version control and experiments management in machine learning
Start with version control and experiments management in machine learningMikhail Rozhkov
 
Basic of Big Data
Basic of Big Data Basic of Big Data
Basic of Big Data Amar kumar
 
Big Data Analytics using Mahout
Big Data Analytics using MahoutBig Data Analytics using Mahout
Big Data Analytics using MahoutIMC Institute
 
Basics of big data analytics hadoop
Basics of big data analytics hadoopBasics of big data analytics hadoop
Basics of big data analytics hadoopAmbuj Kumar
 
OWASP ZAP Workshop for QA Testers
OWASP ZAP Workshop for QA TestersOWASP ZAP Workshop for QA Testers
OWASP ZAP Workshop for QA TestersJavan Rasokat
 

Similaire à Bug bites Elephant? Test-driven Quality Assurance in Big Data Application Development (20)

QA Fest 2019. Дмитрий Собко. Testing Big Data solutions fast and furiously
QA Fest 2019. Дмитрий Собко. Testing Big Data solutions fast and furiouslyQA Fest 2019. Дмитрий Собко. Testing Big Data solutions fast and furiously
QA Fest 2019. Дмитрий Собко. Testing Big Data solutions fast and furiously
 
Testing Big Data solutions fast and furiously
Testing Big Data solutions fast and furiouslyTesting Big Data solutions fast and furiously
Testing Big Data solutions fast and furiously
 
Productionalizing ML : Real Experience
Productionalizing ML : Real ExperienceProductionalizing ML : Real Experience
Productionalizing ML : Real Experience
 
Test strategies for data processing pipelines, v2.0
Test strategies for data processing pipelines, v2.0Test strategies for data processing pipelines, v2.0
Test strategies for data processing pipelines, v2.0
 
Hadoop: Code Injection, Distributed Fault Injection
Hadoop: Code Injection, Distributed Fault InjectionHadoop: Code Injection, Distributed Fault Injection
Hadoop: Code Injection, Distributed Fault Injection
 
Java one 2010
Java one 2010Java one 2010
Java one 2010
 
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
 
Hadoop and Voldemort @ LinkedIn
Hadoop and Voldemort @ LinkedInHadoop and Voldemort @ LinkedIn
Hadoop and Voldemort @ LinkedIn
 
Inside Azure Diagnostics (DevLink 2014)
Inside Azure Diagnostics (DevLink 2014)Inside Azure Diagnostics (DevLink 2014)
Inside Azure Diagnostics (DevLink 2014)
 
sudoers: Benchmarking Hadoop with ALOJA
sudoers: Benchmarking Hadoop with ALOJAsudoers: Benchmarking Hadoop with ALOJA
sudoers: Benchmarking Hadoop with ALOJA
 
What to expect from Java 9
What to expect from Java 9What to expect from Java 9
What to expect from Java 9
 
Inside Azure Diagnostics
Inside Azure DiagnosticsInside Azure Diagnostics
Inside Azure Diagnostics
 
Scalable and Flexible Machine Learning With Scala @ LinkedIn
Scalable and Flexible Machine Learning With Scala @ LinkedInScalable and Flexible Machine Learning With Scala @ LinkedIn
Scalable and Flexible Machine Learning With Scala @ LinkedIn
 
Hadoop & Big Data benchmarking
Hadoop & Big Data benchmarkingHadoop & Big Data benchmarking
Hadoop & Big Data benchmarking
 
Deep Dive Modern Apps Lifecycle with Visual Studio 2012: How to create cross ...
Deep Dive Modern Apps Lifecycle with Visual Studio 2012: How to create cross ...Deep Dive Modern Apps Lifecycle with Visual Studio 2012: How to create cross ...
Deep Dive Modern Apps Lifecycle with Visual Studio 2012: How to create cross ...
 
Start with version control and experiments management in machine learning
Start with version control and experiments management in machine learningStart with version control and experiments management in machine learning
Start with version control and experiments management in machine learning
 
Basic of Big Data
Basic of Big Data Basic of Big Data
Basic of Big Data
 
Big Data Analytics using Mahout
Big Data Analytics using MahoutBig Data Analytics using Mahout
Big Data Analytics using Mahout
 
Basics of big data analytics hadoop
Basics of big data analytics hadoopBasics of big data analytics hadoop
Basics of big data analytics hadoop
 
OWASP ZAP Workshop for QA Testers
OWASP ZAP Workshop for QA TestersOWASP ZAP Workshop for QA Testers
OWASP ZAP Workshop for QA Testers
 

Plus de inovex GmbH

lldb – Debugger auf Abwegen
lldb – Debugger auf Abwegenlldb – Debugger auf Abwegen
lldb – Debugger auf Abwegeninovex GmbH
 
Are you sure about that?! Uncertainty Quantification in AI
Are you sure about that?! Uncertainty Quantification in AIAre you sure about that?! Uncertainty Quantification in AI
Are you sure about that?! Uncertainty Quantification in AIinovex GmbH
 
Why natural language is next step in the AI evolution
Why natural language is next step in the AI evolutionWhy natural language is next step in the AI evolution
Why natural language is next step in the AI evolutioninovex GmbH
 
Network Policies
Network PoliciesNetwork Policies
Network Policiesinovex GmbH
 
Interpretable Machine Learning
Interpretable Machine LearningInterpretable Machine Learning
Interpretable Machine Learninginovex GmbH
 
Jenkins X – CI/CD in wolkigen Umgebungen
Jenkins X – CI/CD in wolkigen UmgebungenJenkins X – CI/CD in wolkigen Umgebungen
Jenkins X – CI/CD in wolkigen Umgebungeninovex GmbH
 
AI auf Edge-Geraeten
AI auf Edge-GeraetenAI auf Edge-Geraeten
AI auf Edge-Geraeteninovex GmbH
 
Prometheus on Kubernetes
Prometheus on KubernetesPrometheus on Kubernetes
Prometheus on Kubernetesinovex GmbH
 
Deep Learning for Recommender Systems
Deep Learning for Recommender SystemsDeep Learning for Recommender Systems
Deep Learning for Recommender Systemsinovex GmbH
 
Representation Learning von Zeitreihen
Representation Learning von ZeitreihenRepresentation Learning von Zeitreihen
Representation Learning von Zeitreiheninovex GmbH
 
Talk to me – Chatbots und digitale Assistenten
Talk to me – Chatbots und digitale AssistentenTalk to me – Chatbots und digitale Assistenten
Talk to me – Chatbots und digitale Assistenteninovex GmbH
 
Künstlich intelligent?
Künstlich intelligent?Künstlich intelligent?
Künstlich intelligent?inovex GmbH
 
Das Android Open Source Project
Das Android Open Source ProjectDas Android Open Source Project
Das Android Open Source Projectinovex GmbH
 
Machine Learning Interpretability
Machine Learning InterpretabilityMachine Learning Interpretability
Machine Learning Interpretabilityinovex GmbH
 
Performance evaluation of GANs in a semisupervised OCR use case
Performance evaluation of GANs in a semisupervised OCR use casePerformance evaluation of GANs in a semisupervised OCR use case
Performance evaluation of GANs in a semisupervised OCR use caseinovex GmbH
 
People & Products – Lessons learned from the daily IT madness
People & Products – Lessons learned from the daily IT madnessPeople & Products – Lessons learned from the daily IT madness
People & Products – Lessons learned from the daily IT madnessinovex GmbH
 
Infrastructure as (real) Code – Manage your K8s resources with Pulumi
Infrastructure as (real) Code – Manage your K8s resources with PulumiInfrastructure as (real) Code – Manage your K8s resources with Pulumi
Infrastructure as (real) Code – Manage your K8s resources with Pulumiinovex GmbH
 

Plus de inovex GmbH (20)

lldb – Debugger auf Abwegen
lldb – Debugger auf Abwegenlldb – Debugger auf Abwegen
lldb – Debugger auf Abwegen
 
Are you sure about that?! Uncertainty Quantification in AI
Are you sure about that?! Uncertainty Quantification in AIAre you sure about that?! Uncertainty Quantification in AI
Are you sure about that?! Uncertainty Quantification in AI
 
Why natural language is next step in the AI evolution
Why natural language is next step in the AI evolutionWhy natural language is next step in the AI evolution
Why natural language is next step in the AI evolution
 
WWDC 2019 Recap
WWDC 2019 RecapWWDC 2019 Recap
WWDC 2019 Recap
 
Network Policies
Network PoliciesNetwork Policies
Network Policies
 
Interpretable Machine Learning
Interpretable Machine LearningInterpretable Machine Learning
Interpretable Machine Learning
 
Jenkins X – CI/CD in wolkigen Umgebungen
Jenkins X – CI/CD in wolkigen UmgebungenJenkins X – CI/CD in wolkigen Umgebungen
Jenkins X – CI/CD in wolkigen Umgebungen
 
AI auf Edge-Geraeten
AI auf Edge-GeraetenAI auf Edge-Geraeten
AI auf Edge-Geraeten
 
Prometheus on Kubernetes
Prometheus on KubernetesPrometheus on Kubernetes
Prometheus on Kubernetes
 
Deep Learning for Recommender Systems
Deep Learning for Recommender SystemsDeep Learning for Recommender Systems
Deep Learning for Recommender Systems
 
Azure IoT Edge
Azure IoT EdgeAzure IoT Edge
Azure IoT Edge
 
Representation Learning von Zeitreihen
Representation Learning von ZeitreihenRepresentation Learning von Zeitreihen
Representation Learning von Zeitreihen
 
Talk to me – Chatbots und digitale Assistenten
Talk to me – Chatbots und digitale AssistentenTalk to me – Chatbots und digitale Assistenten
Talk to me – Chatbots und digitale Assistenten
 
Künstlich intelligent?
Künstlich intelligent?Künstlich intelligent?
Künstlich intelligent?
 
Dev + Ops = Go
Dev + Ops = GoDev + Ops = Go
Dev + Ops = Go
 
Das Android Open Source Project
Das Android Open Source ProjectDas Android Open Source Project
Das Android Open Source Project
 
Machine Learning Interpretability
Machine Learning InterpretabilityMachine Learning Interpretability
Machine Learning Interpretability
 
Performance evaluation of GANs in a semisupervised OCR use case
Performance evaluation of GANs in a semisupervised OCR use casePerformance evaluation of GANs in a semisupervised OCR use case
Performance evaluation of GANs in a semisupervised OCR use case
 
People & Products – Lessons learned from the daily IT madness
People & Products – Lessons learned from the daily IT madnessPeople & Products – Lessons learned from the daily IT madness
People & Products – Lessons learned from the daily IT madness
 
Infrastructure as (real) Code – Manage your K8s resources with Pulumi
Infrastructure as (real) Code – Manage your K8s resources with PulumiInfrastructure as (real) Code – Manage your K8s resources with Pulumi
Infrastructure as (real) Code – Manage your K8s resources with Pulumi
 

Dernier

Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationKnoldus Inc.
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security ObservabilityGlenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security Observabilityitnewsafrica
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesBernd Ruecker
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesManik S Magar
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integrationmarketing932765
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch TuesdayIvanti
 

Dernier (20)

Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog Presentation
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security ObservabilityGlenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architectures
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch Tuesday
 

Bug bites Elephant? Test-driven Quality Assurance in Big Data Application Development

  • 1. Bug bites Elephant? Test-driven Quality Assurance in Big Data Application Development Dr. Dominik Benz, inovex GmbH 2013/06/03, Berlin Buzzwords
  • 2. ? TDD! ? ? ? ? ? ? ? Write/execute tests, specify acceptance criteria, … 2 Who speaks… … the Elephant language? Class A extends Mapper… ROI, $$, … apt-get install…
  • 3. 3 The road… … to Big Data QA our Big Data QA problem the FitNesse approach test data definition / selection job & workflow control result inspection
  • 4. 4 QA problem Web Intelligence @ 1&1 DWH Hadoop Cluster ~ 1 billion log events / day, ~ 1 TB (thrift) logfiles chains of MR jobs, running on 20 nodes / 8 cores / 96 GB RAM (CDH) BI reporting, web analytics, …
  • 5. 5 QA problem An exemplary workflow Log Files (thrift) Log Files (thrift) Log Files (thrift) Inter- mediate result (avro) MR job 1 … DWH (RDBMS) MR job 2 create (sample) input data create (sample) input data ? inspect (binary) formats inspect (binary) formats ? control workflows control workflows ?
  • 6. method tests what? issues for our usecase JUnit isolated functions no integration, Java syntax MRUnit 1 mapper + 1 reducer „little“ integration, Java syntax iTest hadoop jobs/workflows Java / Groovy syntax Scripts/CLI (manual) scripting/inspect. „script chaos“, syntax 6 QA problem Existing Approaches FitNesse as suitable addition / solution!
  • 7. 7 The road… … to Big Data QA Big Data QA is different! the FitNesse approach test data definition / selection job & workflow control result inspection
  • 8. 8 FitNesse In a nutshell „fully integrated standalone wiki and acceptance testing framework” • „executable“ Wiki- Pages (returning test results) • (almost) natural language test specification • connection to SUT via (Java-)“Fixtures“
  • 9. 9 FitNesse Architecture Overview script | check | num results | 3 | Browser FitNesse Server public int numResults { ... } System under Test Fixtures „calling java methods from wiki“, compare return values Integrates with REST, Jenkins…
  • 11. 11 FitNesse Exemplary Test Source !path /home/inovex/lib/*.jar | script | Hadoop | | upload | viewLog.csv | to hdfs | /testdata/ | | hadoop job from jar | viewLog.jar | [...] | | show | job output | | check | number of output files | 3 |
  • 12. 12 FitNesse Hadoop Fixture Java Code public class Hadoop { public boolean uploadToHdfs(String localFile, String remoteFile) {...} public boolean hadoopJobFromJar(String jar, String input, String output) {...} public String jobOutput() {...} public String numberOfOutputFiles() {...} }
  • 13. 13 The road… … to Big Data QA Big Data QA is different! Fitnesse Wiki test execution! test data definition / selection job & workflow control result inspection
  • 15. ‣ Big Data: Efficient data transfer among heterogeneous sources ‣ Define Interface via IDL, Compiler for many languages 15 Test Data Thrift
  • 16. ‣ Dev/Test Hadoop Cluster: Identical Hardware like Prod, but fewer nodes ‣ (random/biased) sampling e.g. on daily basis ‣ Feedback loop: ‣ identify „special cases“ from real data ‣ include them in (manual) data definition ‣ Gradually increase test coverage / artefact quality 16 Test Data Real World Data
  • 17. 17 The road… … to Big Data QA Big Data QA is different! FitNesse Wiki test execution! Define CSV / thrift / real- world test data! job & workflow control result inspection
  • 18. ‣ Execute arbitrary (shell) commands ‣ Mainly a wrapper around apache.commons.exec.CommandLine 18 Job Control Swiss Army Knife: Shell
  • 19. ‣ Hide complexity from test authors ‣ „define“ appropriate test language via (Java) method names ‣ re-use other fixtures (Shell, …) internally 19 Job Control Hadoop Fixture
  • 20. ‣ FitNesse allows to group tests into suites ‣ Can be used to simulate MR processing chains ‣ SetupSuite / TearDownSuite for creating / destroying test conditions ‣ Tests can still be executed individually 20 Job Control Workflows & Suites MR job 1 MR job 2
  • 21. 21 The road… … to Big Data QA Big Data QA is different! FitNesse Wiki test execution! Define CSV / thrift / real- world data! Use suites & fixtures for jobs/workflows! result inspection
  • 22. ‣ Validate RDBMS contents (via JDBC) ‣ E.g. for checking the final result ‣ Or use Hive + Hive-Server to query raw data 22 Results Data Warehouse / Hive
  • 23. ‣ Execute arbitrary pig commands from Wiki page ‣ Inspect e.g. binary intermediate results (avro, …) 23 Results Pig
  • 24. public class PigConsole extends PigServer { public void loadAvroFileUsingAlias(String filename, String alias) { this.registerQuery( alias + "= LOAD" + filename + "USING" + AVRO_STORAGE_LOADER + ";"); } } 24 Results Pig Fixture extends PigServer
  • 25. 25 Results Server Infrastructure Fitnesse Master TestEnvironments ProjA ProjB TestConfigurations ProjA ProjB dev qs live dev qs live Import / edit tests remotely QS ProjA Slave Dev ProjA Slave Live ProjA SlaveProjA QS ProjA Slave Dev ProjA Slave Live ProjA Slave Import / edit config remotely dev qs live
  • 26. 26 Thank you! dominik.benz@inovex.de Big Data QA is different! FitNesse Wiki test execution! Define CSV / thrift / real- world data! Inspect results via Pig/Hive Use suites & fixtures for jobs/workflows!
  • 27. 27 Want more? Inovex trains you! Android Developer Training (3 days, Karlsruhe/München) Hadoop Developer Training (3 days, Karlsruhe/Köln) Certified Scrum Developer Training (5 days, Köln) Pentaho Data Integration Training (4 days, München/Köln) Liferay Portal-Admin Training (3 days, Karlsruhe) Liferay Portal-Developer Training (4 days, Karlsruhe) information and registration at www.inovex.de/offene-trainings
  • 28. 28 Inovex @bbuzz Stefan Kathrin Bernhard Jörg Andrew Christian Christian