Contenu connexe
Similaire à Big datalittletests heintz
Similaire à Big datalittletests heintz (20)
Big datalittletests heintz
- 1. Big Data
Little
Tests
John Heintz
Founder, Gist Labs
Technical Consultant, Cutter Consortium
john@gistlabs.com @jheintz
http://gistlabs.com
- 2. About John Heintz
• Developer since 1995
• Agilist since 1999
• Founded Gist Labs in 2008
• Developer, Mentor, Consultant
• Intuitive, Abstract, Precise
Kool-Aids I’ve drank:
Agile/Lean/Kanban, OO, TDD, REST, Mentoring, Craftsmanship,
Emergent/Progressive Design, InnovationGames®, Systems and
Complexity Theory
2
© 2012 Gist Labs, LLC
- 3. My Goals for You
• Demystify test automation for Big Data
• Provide executable examples
3
© 2012 Gist Labs, LLC
- 4. What you shouldn’t
expect…
• Barely introduce Big Data concepts
• No performance tuning
4
© 2012 Gist Labs, LLC
- 5. Simple Code, Config
• I went as simple and clear as possible
• Java, JUnit4
• Maven… okay maybe not simple :-
5
© 2012 Gist Labs, LLC
- 6. Mostly Code
• Remember the Law of Two Feet
• If code isn’t what you were looking for I
totally respect you finding something better
for your time J
6
© 2012 Gist Labs, LLC
- 7. • Everything available from
http://gistlabs.com/2012/08/big-data-little-tests/
• The entire command script is there…
so you can take notes assuming that’s available
7
© 2012 Gist Labs, LLC
- 8. My Soapboxes…
These are topics I’ll repeat myself on
• Fast test execution
• One-click build
8
© 2012 Gist Labs, LLC
- 9. Big Data
• Too much
• Too fast
• Not trivially structured
9
© 2012 Gist Labs, LLC
- 10. Map Reduce
• Map from one input to one output
• Reduce from many inputs to one output
• Can be run in parallel
• Crude, but massive
10
© 2012 Gist Labs, LLC
- 11. CAP Theorem
• Consistency
• Availability
• Partition Tolerance
11
© 2012 Gist Labs, LLC
- 12. Big Data Ecosystem
• Hadoop: A giant among giants
(Tons of projects on this platform!!)
• Cassandra: Feels like a weird RDBMS
• Riak: An elegant key/value/search store
• MongoDB: Document store
12
© 2012 Gist Labs, LLC
- 16. Other Frameworks
• CassandraUnit
https://github.com/jsevellec/cassandra-unit
• PigUnit, Hadoop Query Language
http://pig.apache.org/docs/r0.8.1/pigunit.html
16
© 2012 Gist Labs, LLC
- 17. Code Questions?
• Fast test execution?
• One-click build?
17
© 2012 Gist Labs, LLC
- 18. What about Big Tests?
• Real test data
• Realistic cluster
18
© 2012 Gist Labs, LLC
- 19. Real Test Data
My favorite strategy is to:
• Develop with small, crafted data
• Build/test the same way
• Run another test on top of real prod data
19
© 2012 Gist Labs, LLC
- 20. Production
Continuous Integration Servers
Continuous Deployment Servers
Build
Test1
Cluster
Cluster
Test2
Cluster
Staging
Developers
Version Control
Developers
Virtual vs Physical Servers
Private vs Public Cloud
Developer Sandboxes
Network Infrastructure
Self-service Provisioning
Storage Infrastructure
20
© 2012 Gist Labs, LLC
- 21. Realistic Cluster
• Use a CI/DevOps environment
• Virtualize, “X as a Service”
• Virtual Machines
• Virtual Infrastructure (Network, Storage)
21
© 2012 Gist Labs, LLC
- 22. Jenkins CI Server
• Master/slave clusters
• Plugins for Hadoop and VMWare
• http://jenkins-ci.org/
22
© 2012 Gist Labs, LLC
- 24. Thank you!
• Everything available from:
http://gistlabs.com/2012/08/big-data-little-tests/
• John Heintz, @jheintz, http://gistlabs.com
24
© 2012 Gist Labs, LLC