SlideShare a Scribd company logo
1 of 31
1
Testing Big Data
Prepared by: Anca Andreea Sfecla, Quality Assurance Manager
Embarcadero Technologies Romania
@ CODECAMP 2013,
20th April 2013
Prepared by Anca Sfecla, QAM - Embarcadero Technologies
Prepared by Anca Sfecla, QAM - Embarcadero Technologies
What is Big Data?
• “Big Data is the frontier of a firm’s ability to store,
process, and access all the data it needs to
operate effectively, make decisions, reduce risks,
and serve customers.” - Forrester Research
• “Big data creates a new layer in the economy
which is all about information, turning
information, or data, into revenue. In 2013, big
data is forecast to drive $34 billion of IT spending”
– Gartner Research
Prepared by Anca Sfecla, QAM - Embarcadero Technologies
Big Data Characteristics
Big
Data
Volume
Variety
Velocity
Value
Prepared by Anca Sfecla, QAM - Embarcadero Technologies
Big Data Characteristics
Big
Data
Volume
Variety
Velocity
Value
Prepared by Anca Sfecla, QAM - Embarcadero Technologies
Big Data Characteristics
Big
Data
Volume
Variety
Velocity
Value
Prepared by Anca Sfecla, QAM - Embarcadero Technologies
Big Data Characteristics
Big
Data
Volume
Variety
Velocity
Value
Prepared by Anca Sfecla, QAM - Embarcadero Technologies
Big Data Characteristics
Big
Data
Volume
Variety
Velocity
Value
Prepared by Anca Sfecla, QAM - Embarcadero Technologies
Big Data Success Stories
• Detecting infections in premature infants up
to 24 hours before they exhibit symptoms
• Reducing the cost of sequencing a genome
from $10,000 to less than $100
• Predict flu outbreaks by analyzing massive
number of Google searches related to flu
symptoms
Prepared by Anca Sfecla, QAM - Embarcadero Technologies
EDW versus Big Data
Prepared by Anca Sfecla, QAM - Embarcadero Technologies
EDW versus Big Data
Clean Data Unclean Data
Gigabytes to
Terabytes(1000 GB)
Petabytes(1000 TB) to
Exabytes(1000 PB)
Simplified, Structured Complex, Semi or Unstructured
Data from relational
database
Data from non-relational flat
file storage
Centralized data Distributed data
Structured Database
Schema
Customized-instant schema,
generated
Prepared by Anca Sfecla, QAM - Embarcadero Technologies
Big Data Solutions
Microsoft Big Data Solution
Prepared by Anca Sfecla, QAM - Embarcadero Technologies
Big Data Solutions
Prepared by Anca Sfecla, QAM - Embarcadero Technologies
Big Data Solutions
Prepared by Anca Sfecla, QAM - Embarcadero Technologies
Prepared by Anca Sfecla, QAM - Embarcadero Technologies
Big Data Processing using Hadoop
Framework
Prepared by Anca Sfecla, QAM - Embarcadero Technologies
Big Data
Analytics
Web Logs
Streaming
Data
Social Data
Transactional
Data (RDBMS)
Enterprise Data Warehouse
HADOOP
HivePig
MapReduce
(Job Execution)
HBase(NoSQL DB)
HDFS (Hadoop Distributed File System)
ProcessedData
Data Load using Sqoop
ETL
Process
Big Data Architecture
Prepared by Anca Sfecla, QAM - Embarcadero Technologies
Big Data Architecture
Big Data
Analytics
Web Logs
Streaming
Data
Social Data
Transactional
Data (RDBMS)
Enterprise Data Warehouse
HADOOP
HivePig
MapReduce
(Job Execution)
HBase(NoSQL DB)
HDFS (Hadoop Distributed File System)
ProcessedData
Data Load using Sqoop
ETL
Process
1 Pre-Hadoop
Processing
Prepared by Anca Sfecla, QAM - Embarcadero Technologies
Possible problems
• incorrect data captured from source systems
• incorrect storage of data
• incomplete or incorrect replications
Prepared by Anca Sfecla, QAM - Embarcadero Technologies
Big Data
Analytics
Web Logs
Streaming
Data
Social Data
Transactional
Data (RDBMS)
Enterprise Data Warehouse
HADOOP
HivePig
MapReduce
(Job Execution)
HBase(NoSQL DB)
HDFS (Hadoop Distributed File System)
Processed Data
Data Load using Sqoop
ETL
Process
Big Data Architecture
1 Pre-Hadoop
Processing
2 Map-Reduce
process
validation
Prepared by Anca Sfecla, QAM - Embarcadero Technologies
Possible problems
•coding issues in map-reduce jobs
• jobs working correctly when run
in standalone node, but working
incorrectly when run on multiple
nodes
• incorrect aggregations, node
configurations and incorrect
output format
Prepared by Anca Sfecla, QAM - Embarcadero Technologies
Big Data
Analytics
Web Logs
Streaming
Data
Social Data
Transactional
Data (RDBMS)
Enterprise Data Warehouse
HADOOP
HivePig
MapReduce
(Job Execution)
HBase(NoSQL DB)
HDFS (Hadoop Distributed File System)
Processed Data
Data Load using Sqoop
ETL
Process
Big Data Architecture
1 Pre-Hadoop
Processing
2 Map-Reduce
process
validation
3 Data Extract
and Load
Process
Prepared by Anca Sfecla, QAM - Embarcadero Technologies
Possible problems
• incorrectly applied transformation
rules
• incomplete data extract from HDFS
• incorrect load of HDFS files into
analysis tools
Prepared by Anca Sfecla, QAM - Embarcadero Technologies
Big Data
Analytics
Web Logs
Streaming
Data
Social Data
Transactional
Data (RDBMS)
Enterprise Data Warehouse
HADOOP
HivePig
MapReduce
(Job Execution)
HBase(NoSQL DB)
HDFS (Hadoop Distributed File System)
Processed Data
Data Load using Sqoop
ETL
Process
Big Data Architecture
1 Pre-Hadoop
Processing
2 Map-Reduce
process
validation
3 Data Extract
and Load Process
Reports testing
Prepared by Anca Sfecla, QAM - Embarcadero Technologies
Possible problems
• report definitions not set as per requirement
• report data issues
• layout and format issues
Prepared by Anca Sfecla, QAM - Embarcadero Technologies
Big Data
Analytics
Web Logs
Streaming
Data
Social Data
Transactional
Data (RDBMS)
Enterprise Data Warehouse
HADOOP
HivePig
MapReduce
(Job Execution)
HBase(NoSQL DB)
HDFS (Hadoop Distributed File System)
Processed Data
Data Load using Sqoop
ETL
Process
Big Data Architecture
1 Pre-Hadoop
Processing
2 Map-Reduce
process
validation
3 Data Extract
and Load Process
NonFunctionalTesting
Reports testing
Prepared by Anca Sfecla, QAM - Embarcadero Technologies
Possible problems
• imbalance in input splits
• redundant sorts
• moving most of the aggregation computations to the
Reduce process
• node failures
• data corruption
Prepared by Anca Sfecla, QAM - Embarcadero Technologies
New to the tester
• Semi-structured and unstructured data
• Immense volumes of dynamic, complex data
• Test environment
• Big Data ecosystem
• Pure programming tools
• Non-SQL interrogations
Prepared by Anca Sfecla, QAM - Embarcadero Technologies
Testing Big Data
• Big
• Fast
• Complex
• Rewarding
Prepared by Anca Sfecla, QAM - Embarcadero Technologies
Q&A
Prepared by Anca Sfecla, QAM - Embarcadero Technologies
Thank you!
& Please fill in your evaluation form
anca.sfecla@embarcadero.com

More Related Content

What's hot

Whitepaper: Volume Testing Thick Clients and Databases
Whitepaper:  Volume Testing Thick Clients and DatabasesWhitepaper:  Volume Testing Thick Clients and Databases
Whitepaper: Volume Testing Thick Clients and DatabasesRTTS
 
How to Test Big Data Systems | QualiTest Group
How to Test Big Data Systems | QualiTest GroupHow to Test Big Data Systems | QualiTest Group
How to Test Big Data Systems | QualiTest GroupQualitest
 
Big Data Testing : Automate theTesting of Hadoop, NoSQL & DWH without Writing...
Big Data Testing : Automate theTesting of Hadoop, NoSQL & DWH without Writing...Big Data Testing : Automate theTesting of Hadoop, NoSQL & DWH without Writing...
Big Data Testing : Automate theTesting of Hadoop, NoSQL & DWH without Writing...RTTS
 
Completing the Data Equation: Test Data + Data Validation = Success
Completing the Data Equation: Test Data + Data Validation = SuccessCompleting the Data Equation: Test Data + Data Validation = Success
Completing the Data Equation: Test Data + Data Validation = SuccessRTTS
 
An introduction to QuerySurge webinar
An introduction to QuerySurge webinarAn introduction to QuerySurge webinar
An introduction to QuerySurge webinarRTTS
 
Big data testing (1)
Big data testing (1)Big data testing (1)
Big data testing (1)vodqancr
 
Webinar - QuerySurge and Azure DevOps in the Azure Cloud
 Webinar - QuerySurge and Azure DevOps in the Azure Cloud Webinar - QuerySurge and Azure DevOps in the Azure Cloud
Webinar - QuerySurge and Azure DevOps in the Azure CloudRTTS
 
Testistanbul 2016 - Keynote: "Enterprise Challenges of Test Data" by Rex Black
Testistanbul 2016 - Keynote: "Enterprise Challenges of Test Data" by Rex BlackTestistanbul 2016 - Keynote: "Enterprise Challenges of Test Data" by Rex Black
Testistanbul 2016 - Keynote: "Enterprise Challenges of Test Data" by Rex BlackTurkish Testing Board
 
Succinct Spark: Fast Interactive Queries on Compressed RDDs by Rachit Agarwal
Succinct Spark: Fast Interactive Queries on Compressed RDDs by Rachit AgarwalSuccinct Spark: Fast Interactive Queries on Compressed RDDs by Rachit Agarwal
Succinct Spark: Fast Interactive Queries on Compressed RDDs by Rachit AgarwalSpark Summit
 
Big Data Computing Architecture
Big Data Computing ArchitectureBig Data Computing Architecture
Big Data Computing ArchitectureGang Tao
 
Using Hadoop to build a Data Quality Service for both real-time and batch data
Using Hadoop to build a Data Quality Service for both real-time and batch dataUsing Hadoop to build a Data Quality Service for both real-time and batch data
Using Hadoop to build a Data Quality Service for both real-time and batch dataDataWorks Summit/Hadoop Summit
 
Testing the Data Warehouse―Big Data, Big Problems
Testing the Data Warehouse―Big Data, Big ProblemsTesting the Data Warehouse―Big Data, Big Problems
Testing the Data Warehouse―Big Data, Big ProblemsTechWell
 
Disrupting Insurance with Advanced Analytics The Next Generation Carrier
Disrupting Insurance with Advanced Analytics The Next Generation CarrierDisrupting Insurance with Advanced Analytics The Next Generation Carrier
Disrupting Insurance with Advanced Analytics The Next Generation CarrierDataWorks Summit/Hadoop Summit
 
Introduction to Big Data Technologies: Hadoop/EMR/Map Reduce & Redshift
Introduction to Big Data Technologies:  Hadoop/EMR/Map Reduce & RedshiftIntroduction to Big Data Technologies:  Hadoop/EMR/Map Reduce & Redshift
Introduction to Big Data Technologies: Hadoop/EMR/Map Reduce & RedshiftDataKitchen
 
Dataiku at SF DataMining Meetup - Kaggle Yandex Challenge
Dataiku at SF DataMining Meetup - Kaggle Yandex ChallengeDataiku at SF DataMining Meetup - Kaggle Yandex Challenge
Dataiku at SF DataMining Meetup - Kaggle Yandex ChallengeDataiku
 
Building Data Science into Organizations: Field Experience
Building Data Science into Organizations: Field ExperienceBuilding Data Science into Organizations: Field Experience
Building Data Science into Organizations: Field ExperienceDatabricks
 

What's hot (20)

Whitepaper: Volume Testing Thick Clients and Databases
Whitepaper:  Volume Testing Thick Clients and DatabasesWhitepaper:  Volume Testing Thick Clients and Databases
Whitepaper: Volume Testing Thick Clients and Databases
 
How to Test Big Data Systems | QualiTest Group
How to Test Big Data Systems | QualiTest GroupHow to Test Big Data Systems | QualiTest Group
How to Test Big Data Systems | QualiTest Group
 
TESTING IN BIG DATA WORLD
TESTING IN BIG DATA  WORLDTESTING IN BIG DATA  WORLD
TESTING IN BIG DATA WORLD
 
Big Data Testing : Automate theTesting of Hadoop, NoSQL & DWH without Writing...
Big Data Testing : Automate theTesting of Hadoop, NoSQL & DWH without Writing...Big Data Testing : Automate theTesting of Hadoop, NoSQL & DWH without Writing...
Big Data Testing : Automate theTesting of Hadoop, NoSQL & DWH without Writing...
 
Completing the Data Equation: Test Data + Data Validation = Success
Completing the Data Equation: Test Data + Data Validation = SuccessCompleting the Data Equation: Test Data + Data Validation = Success
Completing the Data Equation: Test Data + Data Validation = Success
 
An introduction to QuerySurge webinar
An introduction to QuerySurge webinarAn introduction to QuerySurge webinar
An introduction to QuerySurge webinar
 
Big data testing (1)
Big data testing (1)Big data testing (1)
Big data testing (1)
 
Webinar - QuerySurge and Azure DevOps in the Azure Cloud
 Webinar - QuerySurge and Azure DevOps in the Azure Cloud Webinar - QuerySurge and Azure DevOps in the Azure Cloud
Webinar - QuerySurge and Azure DevOps in the Azure Cloud
 
Analysis of Major Trends in Big Data Analytics
Analysis of Major Trends in Big Data AnalyticsAnalysis of Major Trends in Big Data Analytics
Analysis of Major Trends in Big Data Analytics
 
A Continuously Deployed Hadoop Analytics Platform?
A Continuously Deployed Hadoop Analytics Platform?A Continuously Deployed Hadoop Analytics Platform?
A Continuously Deployed Hadoop Analytics Platform?
 
Testistanbul 2016 - Keynote: "Enterprise Challenges of Test Data" by Rex Black
Testistanbul 2016 - Keynote: "Enterprise Challenges of Test Data" by Rex BlackTestistanbul 2016 - Keynote: "Enterprise Challenges of Test Data" by Rex Black
Testistanbul 2016 - Keynote: "Enterprise Challenges of Test Data" by Rex Black
 
Succinct Spark: Fast Interactive Queries on Compressed RDDs by Rachit Agarwal
Succinct Spark: Fast Interactive Queries on Compressed RDDs by Rachit AgarwalSuccinct Spark: Fast Interactive Queries on Compressed RDDs by Rachit Agarwal
Succinct Spark: Fast Interactive Queries on Compressed RDDs by Rachit Agarwal
 
Big Data Computing Architecture
Big Data Computing ArchitectureBig Data Computing Architecture
Big Data Computing Architecture
 
Using Hadoop to build a Data Quality Service for both real-time and batch data
Using Hadoop to build a Data Quality Service for both real-time and batch dataUsing Hadoop to build a Data Quality Service for both real-time and batch data
Using Hadoop to build a Data Quality Service for both real-time and batch data
 
Testing the Data Warehouse―Big Data, Big Problems
Testing the Data Warehouse―Big Data, Big ProblemsTesting the Data Warehouse―Big Data, Big Problems
Testing the Data Warehouse―Big Data, Big Problems
 
Disrupting Insurance with Advanced Analytics The Next Generation Carrier
Disrupting Insurance with Advanced Analytics The Next Generation CarrierDisrupting Insurance with Advanced Analytics The Next Generation Carrier
Disrupting Insurance with Advanced Analytics The Next Generation Carrier
 
Big data pipelines
Big data pipelinesBig data pipelines
Big data pipelines
 
Introduction to Big Data Technologies: Hadoop/EMR/Map Reduce & Redshift
Introduction to Big Data Technologies:  Hadoop/EMR/Map Reduce & RedshiftIntroduction to Big Data Technologies:  Hadoop/EMR/Map Reduce & Redshift
Introduction to Big Data Technologies: Hadoop/EMR/Map Reduce & Redshift
 
Dataiku at SF DataMining Meetup - Kaggle Yandex Challenge
Dataiku at SF DataMining Meetup - Kaggle Yandex ChallengeDataiku at SF DataMining Meetup - Kaggle Yandex Challenge
Dataiku at SF DataMining Meetup - Kaggle Yandex Challenge
 
Building Data Science into Organizations: Field Experience
Building Data Science into Organizations: Field ExperienceBuilding Data Science into Organizations: Field Experience
Building Data Science into Organizations: Field Experience
 

Similar to Iasi code camp 20 april 2013 testing big data-anca sfecla - embarcadero

Scaling up with Cisco Big Data: Data + Science = Data Science
Scaling up with Cisco Big Data: Data + Science = Data ScienceScaling up with Cisco Big Data: Data + Science = Data Science
Scaling up with Cisco Big Data: Data + Science = Data ScienceeRic Choo
 
How Hewlett Packard Enterprise Gets Real with IoT Analytics
How Hewlett Packard Enterprise Gets Real with IoT AnalyticsHow Hewlett Packard Enterprise Gets Real with IoT Analytics
How Hewlett Packard Enterprise Gets Real with IoT AnalyticsArcadia Data
 
Off-Label Data Mesh: A Prescription for Healthier Data
Off-Label Data Mesh: A Prescription for Healthier DataOff-Label Data Mesh: A Prescription for Healthier Data
Off-Label Data Mesh: A Prescription for Healthier DataHostedbyConfluent
 
The Hidden Value of Hadoop Migration
The Hidden Value of Hadoop MigrationThe Hidden Value of Hadoop Migration
The Hidden Value of Hadoop MigrationDatabricks
 
High Performance Spatial-Temporal Trajectory Analysis with Spark
High Performance Spatial-Temporal Trajectory Analysis with Spark High Performance Spatial-Temporal Trajectory Analysis with Spark
High Performance Spatial-Temporal Trajectory Analysis with Spark DataWorks Summit/Hadoop Summit
 
Slides: Success Stories for Data-to-Cloud
Slides: Success Stories for Data-to-CloudSlides: Success Stories for Data-to-Cloud
Slides: Success Stories for Data-to-CloudDATAVERSITY
 
August meetup - All about Apache Druid
August meetup - All about Apache Druid August meetup - All about Apache Druid
August meetup - All about Apache Druid Imply
 
Using real time big data analytics for competitive advantage
 Using real time big data analytics for competitive advantage Using real time big data analytics for competitive advantage
Using real time big data analytics for competitive advantageAmazon Web Services
 
Building a Single Logical Data Lake: For Advanced Analytics, Data Science, an...
Building a Single Logical Data Lake: For Advanced Analytics, Data Science, an...Building a Single Logical Data Lake: For Advanced Analytics, Data Science, an...
Building a Single Logical Data Lake: For Advanced Analytics, Data Science, an...Denodo
 
Exploring the Wider World of Big Data
Exploring the Wider World of Big DataExploring the Wider World of Big Data
Exploring the Wider World of Big DataNetApp
 
Big Data Session 1.pptx
Big Data Session 1.pptxBig Data Session 1.pptx
Big Data Session 1.pptxElsonPaul2
 
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization Denodo
 
A Tale of Two BI Standards
A Tale of Two BI StandardsA Tale of Two BI Standards
A Tale of Two BI StandardsArcadia Data
 
Seagate: Sensor Overload! Taming The Raging Manufacturing Big Data Torrent
Seagate: Sensor Overload! Taming The Raging Manufacturing Big Data TorrentSeagate: Sensor Overload! Taming The Raging Manufacturing Big Data Torrent
Seagate: Sensor Overload! Taming The Raging Manufacturing Big Data TorrentSeeling Cheung
 
Best Practices for Building and Deploying Data Pipelines in Apache Spark
Best Practices for Building and Deploying Data Pipelines in Apache SparkBest Practices for Building and Deploying Data Pipelines in Apache Spark
Best Practices for Building and Deploying Data Pipelines in Apache SparkDatabricks
 
Presentation architecting virtualized infrastructure for big data
Presentation   architecting virtualized infrastructure for big dataPresentation   architecting virtualized infrastructure for big data
Presentation architecting virtualized infrastructure for big datasolarisyourep
 
Presentation architecting virtualized infrastructure for big data
Presentation   architecting virtualized infrastructure for big dataPresentation   architecting virtualized infrastructure for big data
Presentation architecting virtualized infrastructure for big dataxKinAnx
 

Similar to Iasi code camp 20 april 2013 testing big data-anca sfecla - embarcadero (20)

Scaling up with Cisco Big Data: Data + Science = Data Science
Scaling up with Cisco Big Data: Data + Science = Data ScienceScaling up with Cisco Big Data: Data + Science = Data Science
Scaling up with Cisco Big Data: Data + Science = Data Science
 
How Hewlett Packard Enterprise Gets Real with IoT Analytics
How Hewlett Packard Enterprise Gets Real with IoT AnalyticsHow Hewlett Packard Enterprise Gets Real with IoT Analytics
How Hewlett Packard Enterprise Gets Real with IoT Analytics
 
Off-Label Data Mesh: A Prescription for Healthier Data
Off-Label Data Mesh: A Prescription for Healthier DataOff-Label Data Mesh: A Prescription for Healthier Data
Off-Label Data Mesh: A Prescription for Healthier Data
 
Big Data and OSS at IBM
Big Data and OSS at IBMBig Data and OSS at IBM
Big Data and OSS at IBM
 
The Hidden Value of Hadoop Migration
The Hidden Value of Hadoop MigrationThe Hidden Value of Hadoop Migration
The Hidden Value of Hadoop Migration
 
Talend introduction v1
Talend introduction v1Talend introduction v1
Talend introduction v1
 
High Performance Spatial-Temporal Trajectory Analysis with Spark
High Performance Spatial-Temporal Trajectory Analysis with Spark High Performance Spatial-Temporal Trajectory Analysis with Spark
High Performance Spatial-Temporal Trajectory Analysis with Spark
 
TSE_Pres12.pptx
TSE_Pres12.pptxTSE_Pres12.pptx
TSE_Pres12.pptx
 
Slides: Success Stories for Data-to-Cloud
Slides: Success Stories for Data-to-CloudSlides: Success Stories for Data-to-Cloud
Slides: Success Stories for Data-to-Cloud
 
August meetup - All about Apache Druid
August meetup - All about Apache Druid August meetup - All about Apache Druid
August meetup - All about Apache Druid
 
Using real time big data analytics for competitive advantage
 Using real time big data analytics for competitive advantage Using real time big data analytics for competitive advantage
Using real time big data analytics for competitive advantage
 
Building a Single Logical Data Lake: For Advanced Analytics, Data Science, an...
Building a Single Logical Data Lake: For Advanced Analytics, Data Science, an...Building a Single Logical Data Lake: For Advanced Analytics, Data Science, an...
Building a Single Logical Data Lake: For Advanced Analytics, Data Science, an...
 
Exploring the Wider World of Big Data
Exploring the Wider World of Big DataExploring the Wider World of Big Data
Exploring the Wider World of Big Data
 
Big Data Session 1.pptx
Big Data Session 1.pptxBig Data Session 1.pptx
Big Data Session 1.pptx
 
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
 
A Tale of Two BI Standards
A Tale of Two BI StandardsA Tale of Two BI Standards
A Tale of Two BI Standards
 
Seagate: Sensor Overload! Taming The Raging Manufacturing Big Data Torrent
Seagate: Sensor Overload! Taming The Raging Manufacturing Big Data TorrentSeagate: Sensor Overload! Taming The Raging Manufacturing Big Data Torrent
Seagate: Sensor Overload! Taming The Raging Manufacturing Big Data Torrent
 
Best Practices for Building and Deploying Data Pipelines in Apache Spark
Best Practices for Building and Deploying Data Pipelines in Apache SparkBest Practices for Building and Deploying Data Pipelines in Apache Spark
Best Practices for Building and Deploying Data Pipelines in Apache Spark
 
Presentation architecting virtualized infrastructure for big data
Presentation   architecting virtualized infrastructure for big dataPresentation   architecting virtualized infrastructure for big data
Presentation architecting virtualized infrastructure for big data
 
Presentation architecting virtualized infrastructure for big data
Presentation   architecting virtualized infrastructure for big dataPresentation   architecting virtualized infrastructure for big data
Presentation architecting virtualized infrastructure for big data
 

More from Codecamp Romania

Cezar chitac the edge of experience
Cezar chitac   the edge of experienceCezar chitac   the edge of experience
Cezar chitac the edge of experienceCodecamp Romania
 
Business analysis techniques exercise your 6-pack
Business analysis techniques   exercise your 6-packBusiness analysis techniques   exercise your 6-pack
Business analysis techniques exercise your 6-packCodecamp Romania
 
Bpm company code camp - configuration or coding with pega
Bpm company   code camp - configuration or coding with pegaBpm company   code camp - configuration or coding with pega
Bpm company code camp - configuration or coding with pegaCodecamp Romania
 
Andrei prisacaru takingtheunitteststothedatabase
Andrei prisacaru takingtheunitteststothedatabaseAndrei prisacaru takingtheunitteststothedatabase
Andrei prisacaru takingtheunitteststothedatabaseCodecamp Romania
 
2015 dan ardelean develop for windows 10
2015 dan ardelean   develop for windows 10 2015 dan ardelean   develop for windows 10
2015 dan ardelean develop for windows 10 Codecamp Romania
 
The case for continuous delivery
The case for continuous deliveryThe case for continuous delivery
The case for continuous deliveryCodecamp Romania
 
Stefan stolniceanu spritekit, 2 d or not 2d
Stefan stolniceanu   spritekit, 2 d or not 2dStefan stolniceanu   spritekit, 2 d or not 2d
Stefan stolniceanu spritekit, 2 d or not 2dCodecamp Romania
 
Sizing epics tales from an agile kingdom
Sizing epics   tales from an agile kingdomSizing epics   tales from an agile kingdom
Sizing epics tales from an agile kingdomCodecamp Romania
 
Raluca butnaru corina cilibiu the unknown universe of a product and the cer...
Raluca butnaru corina cilibiu   the unknown universe of a product and the cer...Raluca butnaru corina cilibiu   the unknown universe of a product and the cer...
Raluca butnaru corina cilibiu the unknown universe of a product and the cer...Codecamp Romania
 
Parallel & async processing using tpl dataflow
Parallel & async processing using tpl dataflowParallel & async processing using tpl dataflow
Parallel & async processing using tpl dataflowCodecamp Romania
 
Material design screen transitions in android
Material design screen transitions in androidMaterial design screen transitions in android
Material design screen transitions in androidCodecamp Romania
 
Kickstart your own freelancing career
Kickstart your own freelancing careerKickstart your own freelancing career
Kickstart your own freelancing careerCodecamp Romania
 
Ionut grecu the soft stuff is the hard stuff. the agile soft skills toolkit
Ionut grecu   the soft stuff is the hard stuff. the agile soft skills toolkitIonut grecu   the soft stuff is the hard stuff. the agile soft skills toolkit
Ionut grecu the soft stuff is the hard stuff. the agile soft skills toolkitCodecamp Romania
 
Diana antohi me against myself or how to fail and move forward
Diana antohi   me against myself  or how to fail  and move forwardDiana antohi   me against myself  or how to fail  and move forward
Diana antohi me against myself or how to fail and move forwardCodecamp Romania
 

More from Codecamp Romania (20)

Cezar chitac the edge of experience
Cezar chitac   the edge of experienceCezar chitac   the edge of experience
Cezar chitac the edge of experience
 
Cloud powered search
Cloud powered searchCloud powered search
Cloud powered search
 
Ccp
CcpCcp
Ccp
 
Business analysis techniques exercise your 6-pack
Business analysis techniques   exercise your 6-packBusiness analysis techniques   exercise your 6-pack
Business analysis techniques exercise your 6-pack
 
Bpm company code camp - configuration or coding with pega
Bpm company   code camp - configuration or coding with pegaBpm company   code camp - configuration or coding with pega
Bpm company code camp - configuration or coding with pega
 
Andrei prisacaru takingtheunitteststothedatabase
Andrei prisacaru takingtheunitteststothedatabaseAndrei prisacaru takingtheunitteststothedatabase
Andrei prisacaru takingtheunitteststothedatabase
 
Agility and life
Agility and lifeAgility and life
Agility and life
 
2015 dan ardelean develop for windows 10
2015 dan ardelean   develop for windows 10 2015 dan ardelean   develop for windows 10
2015 dan ardelean develop for windows 10
 
The bigrewrite
The bigrewriteThe bigrewrite
The bigrewrite
 
The case for continuous delivery
The case for continuous deliveryThe case for continuous delivery
The case for continuous delivery
 
Stefan stolniceanu spritekit, 2 d or not 2d
Stefan stolniceanu   spritekit, 2 d or not 2dStefan stolniceanu   spritekit, 2 d or not 2d
Stefan stolniceanu spritekit, 2 d or not 2d
 
Sizing epics tales from an agile kingdom
Sizing epics   tales from an agile kingdomSizing epics   tales from an agile kingdom
Sizing epics tales from an agile kingdom
 
Scale net apps in aws
Scale net apps in awsScale net apps in aws
Scale net apps in aws
 
Raluca butnaru corina cilibiu the unknown universe of a product and the cer...
Raluca butnaru corina cilibiu   the unknown universe of a product and the cer...Raluca butnaru corina cilibiu   the unknown universe of a product and the cer...
Raluca butnaru corina cilibiu the unknown universe of a product and the cer...
 
Parallel & async processing using tpl dataflow
Parallel & async processing using tpl dataflowParallel & async processing using tpl dataflow
Parallel & async processing using tpl dataflow
 
Material design screen transitions in android
Material design screen transitions in androidMaterial design screen transitions in android
Material design screen transitions in android
 
Kickstart your own freelancing career
Kickstart your own freelancing careerKickstart your own freelancing career
Kickstart your own freelancing career
 
Ionut grecu the soft stuff is the hard stuff. the agile soft skills toolkit
Ionut grecu   the soft stuff is the hard stuff. the agile soft skills toolkitIonut grecu   the soft stuff is the hard stuff. the agile soft skills toolkit
Ionut grecu the soft stuff is the hard stuff. the agile soft skills toolkit
 
Ecma6 in the wild
Ecma6 in the wildEcma6 in the wild
Ecma6 in the wild
 
Diana antohi me against myself or how to fail and move forward
Diana antohi   me against myself  or how to fail  and move forwardDiana antohi   me against myself  or how to fail  and move forward
Diana antohi me against myself or how to fail and move forward
 

Recently uploaded

"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 

Recently uploaded (20)

"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 

Iasi code camp 20 april 2013 testing big data-anca sfecla - embarcadero

  • 1. 1 Testing Big Data Prepared by: Anca Andreea Sfecla, Quality Assurance Manager Embarcadero Technologies Romania @ CODECAMP 2013, 20th April 2013
  • 2. Prepared by Anca Sfecla, QAM - Embarcadero Technologies
  • 3. Prepared by Anca Sfecla, QAM - Embarcadero Technologies What is Big Data? • “Big Data is the frontier of a firm’s ability to store, process, and access all the data it needs to operate effectively, make decisions, reduce risks, and serve customers.” - Forrester Research • “Big data creates a new layer in the economy which is all about information, turning information, or data, into revenue. In 2013, big data is forecast to drive $34 billion of IT spending” – Gartner Research
  • 4. Prepared by Anca Sfecla, QAM - Embarcadero Technologies Big Data Characteristics Big Data Volume Variety Velocity Value
  • 5. Prepared by Anca Sfecla, QAM - Embarcadero Technologies Big Data Characteristics Big Data Volume Variety Velocity Value
  • 6. Prepared by Anca Sfecla, QAM - Embarcadero Technologies Big Data Characteristics Big Data Volume Variety Velocity Value
  • 7. Prepared by Anca Sfecla, QAM - Embarcadero Technologies Big Data Characteristics Big Data Volume Variety Velocity Value
  • 8. Prepared by Anca Sfecla, QAM - Embarcadero Technologies Big Data Characteristics Big Data Volume Variety Velocity Value
  • 9. Prepared by Anca Sfecla, QAM - Embarcadero Technologies Big Data Success Stories • Detecting infections in premature infants up to 24 hours before they exhibit symptoms • Reducing the cost of sequencing a genome from $10,000 to less than $100 • Predict flu outbreaks by analyzing massive number of Google searches related to flu symptoms
  • 10. Prepared by Anca Sfecla, QAM - Embarcadero Technologies EDW versus Big Data
  • 11. Prepared by Anca Sfecla, QAM - Embarcadero Technologies EDW versus Big Data Clean Data Unclean Data Gigabytes to Terabytes(1000 GB) Petabytes(1000 TB) to Exabytes(1000 PB) Simplified, Structured Complex, Semi or Unstructured Data from relational database Data from non-relational flat file storage Centralized data Distributed data Structured Database Schema Customized-instant schema, generated
  • 12. Prepared by Anca Sfecla, QAM - Embarcadero Technologies Big Data Solutions Microsoft Big Data Solution
  • 13. Prepared by Anca Sfecla, QAM - Embarcadero Technologies Big Data Solutions
  • 14. Prepared by Anca Sfecla, QAM - Embarcadero Technologies Big Data Solutions
  • 15. Prepared by Anca Sfecla, QAM - Embarcadero Technologies
  • 16. Prepared by Anca Sfecla, QAM - Embarcadero Technologies Big Data Processing using Hadoop Framework
  • 17. Prepared by Anca Sfecla, QAM - Embarcadero Technologies Big Data Analytics Web Logs Streaming Data Social Data Transactional Data (RDBMS) Enterprise Data Warehouse HADOOP HivePig MapReduce (Job Execution) HBase(NoSQL DB) HDFS (Hadoop Distributed File System) ProcessedData Data Load using Sqoop ETL Process Big Data Architecture
  • 18. Prepared by Anca Sfecla, QAM - Embarcadero Technologies Big Data Architecture Big Data Analytics Web Logs Streaming Data Social Data Transactional Data (RDBMS) Enterprise Data Warehouse HADOOP HivePig MapReduce (Job Execution) HBase(NoSQL DB) HDFS (Hadoop Distributed File System) ProcessedData Data Load using Sqoop ETL Process 1 Pre-Hadoop Processing
  • 19. Prepared by Anca Sfecla, QAM - Embarcadero Technologies Possible problems • incorrect data captured from source systems • incorrect storage of data • incomplete or incorrect replications
  • 20. Prepared by Anca Sfecla, QAM - Embarcadero Technologies Big Data Analytics Web Logs Streaming Data Social Data Transactional Data (RDBMS) Enterprise Data Warehouse HADOOP HivePig MapReduce (Job Execution) HBase(NoSQL DB) HDFS (Hadoop Distributed File System) Processed Data Data Load using Sqoop ETL Process Big Data Architecture 1 Pre-Hadoop Processing 2 Map-Reduce process validation
  • 21. Prepared by Anca Sfecla, QAM - Embarcadero Technologies Possible problems •coding issues in map-reduce jobs • jobs working correctly when run in standalone node, but working incorrectly when run on multiple nodes • incorrect aggregations, node configurations and incorrect output format
  • 22. Prepared by Anca Sfecla, QAM - Embarcadero Technologies Big Data Analytics Web Logs Streaming Data Social Data Transactional Data (RDBMS) Enterprise Data Warehouse HADOOP HivePig MapReduce (Job Execution) HBase(NoSQL DB) HDFS (Hadoop Distributed File System) Processed Data Data Load using Sqoop ETL Process Big Data Architecture 1 Pre-Hadoop Processing 2 Map-Reduce process validation 3 Data Extract and Load Process
  • 23. Prepared by Anca Sfecla, QAM - Embarcadero Technologies Possible problems • incorrectly applied transformation rules • incomplete data extract from HDFS • incorrect load of HDFS files into analysis tools
  • 24. Prepared by Anca Sfecla, QAM - Embarcadero Technologies Big Data Analytics Web Logs Streaming Data Social Data Transactional Data (RDBMS) Enterprise Data Warehouse HADOOP HivePig MapReduce (Job Execution) HBase(NoSQL DB) HDFS (Hadoop Distributed File System) Processed Data Data Load using Sqoop ETL Process Big Data Architecture 1 Pre-Hadoop Processing 2 Map-Reduce process validation 3 Data Extract and Load Process Reports testing
  • 25. Prepared by Anca Sfecla, QAM - Embarcadero Technologies Possible problems • report definitions not set as per requirement • report data issues • layout and format issues
  • 26. Prepared by Anca Sfecla, QAM - Embarcadero Technologies Big Data Analytics Web Logs Streaming Data Social Data Transactional Data (RDBMS) Enterprise Data Warehouse HADOOP HivePig MapReduce (Job Execution) HBase(NoSQL DB) HDFS (Hadoop Distributed File System) Processed Data Data Load using Sqoop ETL Process Big Data Architecture 1 Pre-Hadoop Processing 2 Map-Reduce process validation 3 Data Extract and Load Process NonFunctionalTesting Reports testing
  • 27. Prepared by Anca Sfecla, QAM - Embarcadero Technologies Possible problems • imbalance in input splits • redundant sorts • moving most of the aggregation computations to the Reduce process • node failures • data corruption
  • 28. Prepared by Anca Sfecla, QAM - Embarcadero Technologies New to the tester • Semi-structured and unstructured data • Immense volumes of dynamic, complex data • Test environment • Big Data ecosystem • Pure programming tools • Non-SQL interrogations
  • 29. Prepared by Anca Sfecla, QAM - Embarcadero Technologies Testing Big Data • Big • Fast • Complex • Rewarding
  • 30. Prepared by Anca Sfecla, QAM - Embarcadero Technologies Q&A
  • 31. Prepared by Anca Sfecla, QAM - Embarcadero Technologies Thank you! & Please fill in your evaluation form anca.sfecla@embarcadero.com