Big Data - Hadoop and MapReduce - new age tools for aid to testing and QA BigData with its slew of technologies and terms has been the most talked about area in last couple of years. This has evolved in Big Data Science, Analytics and now on the IoT and automation side. There is a need for testers and QA team to not only get used to this new age digital transformation area but at the same time embrace the technology to their own advantage. We have experimented and successfully used Big Data Technologies – Hadoop and MapReduce for a recent testing engagement. The actual application was implemented using classic technologies like CentOS and C++. Testing team implemented Hadoop and MapReduce to help in quick turnaround for the testing.
2. Aditya Garg @Adigindia
Co-Founder and Director QAAgility.com
Co-founder & Steering Committee Member of Agile Testing
Alliance – run meetup groups across multiple cities
Co-creator and licensed trainer of Agile Testing Alliance’s
certifications CP-BAT, CP-MAT, CP-AAT, CP-SAT
Co-Author of a book on Selenium Co-Author of a book on Selenium
Love Cooking Indian Dishes
Tasting (Testing) World food
Travelling and meeting testers
(Get inspired and may be inspire a few)
@adigindia
https://www.linkedin.com/in/adigarg
3. Big Data - Hadoop and
MapReduce - new age tools
for aid to testing and QA
Topic for the presentation
for aid to testing and QA
5. 1. How to test Big Data
applications ?
2. How can QA and Testing
What are we going to discuss ?
2. How can QA and Testing
team use Big Data tools
for their testing needs ?
6. 1. How to test Big Data
applications ?
2. How can QA and Testing
What are we going to discuss ?
2. How can QA and Testing
team use Big Data tools
for their testing needs ?
9. Which Search Engine do you use ?
http://searchstorage.techtarget.com/definition
all-that
How much data does Google store ?
https://www.cirrusinsight.com/blog/how-much-data-does-google-store
http://searchstorage.techtarget.com/definition
/Kilo-mega-giga-tera-peta-and-all
14. Definition
Big datais the term for a collection
of data sets so large and complex
that it becomes difficult to
process using on-hand database
management tools or traditional
Ref: goo.gl/iWZhjJ
management tools or traditional
data processing applications. The
challenges include capture,
curation, storage, search,
sharing, transfer, analysis, and
visualization.
http://www.forbes.com/sites/gilpress/2014/09/03/12-big-data-definitions-whats-
yours/#379879e621a9
15. Big Data Application
1. Finance
2. Insurance
3. Health Care
4. Agriculture
5. Defense5. Defense
6. Manufacturing
7. Aero Space
8. Oil and Gas
9. Advertisement and Marketing
10.Election Campaigns
11. List goes on --- applicability across industries
16. Lets go back to definition
Big datais the term for a collection
of data sets so large and complex that
it becomes difficult to process using
on-hand database management
tools or traditional data processingtools or traditional data processing
applications. The challenges include
capture, curation, storage, search,
sharing, transfer, analysis, and
visualization.
19. Hadoop – Key components HDFS and MR
*Source Udacity
20. 1. Sqoop takes data from
regular RDBMS and
puts it into HDFS
2. Flume ingests data
into HDFS as it is
generated by external
systems
3. HBASE is real time
Hadoop Ecosystem
*Source Udacity
3. HBASE is real time
database on top of
HDFS
4. Hue is a graphical
front end to the
cluster
5. Oozie is workflow
management tool
6. Mahout is Machine
Learning library
21. HDFS
• HDFS stands for Hadoop Distributed File
System, which is the storage system used
by Hadoop. The following is a high-level
architecture that explains how HDFSarchitecture that explains how HDFS
works.
27. MTBT – Multicast Tick by Tick Adapter
Input was exchange feed – Output given to HFT Engine
Legacy Adaptor (3rd Party)
connects to the TAP – and
converts to a format which
can be used by HFT
MTBT - Adaptor
Exchange TAP
– Co-location
servers listen
to it at high
speed
can be used by HFT
Platforms (Algorithmic
Trading Platforms)
New Adaptor – being made
Inhouse – to increase the
speed by 10 Times
HFT
Engine
28. MTBT – Multicast Tick by Tick Adapter
•Client was trying to build a brand new MTBT
Exchange Adaptor
•The adaptor was being developed in C and Unix and
was to run in a co-location with NSE (National Stockwas to run in a co-location with NSE (National Stock
Exchange)
•The new adaptor was supposed to increase the
overall speed by more than 10 times from the existing
adaptor
•The Goal was to test the new adaptor
29. MTBT - Adaptor Challenges
--------------------------------------------------
1. Manually next to impossible
2. Even few seconds samples were
running into large MegaBytes (MB)
files
3. Manually impossible to compare
MTBT – Challenges
Input Output
Output over time
3. Manually impossible to compare
the legacy records with the New
code processed records
4. Daily processed data ran into 150
Giga Bytes (GB) plus files
30. MTBT – SOLUTION
1 Reduce LEGACY MTBT - Output file into a standard format
2 Reduce NEW INHOUSE MTBT output file into a standard format
3 Compare the two files
4 Generate Report