3. Even in simple datasets, common statistics
fails - (avg, min, max, distribution)
Donnerstag, 24. Mai 12
4. 79 times more CPU power than used
in Apollo missions on one iPhone
Donnerstag, 24. Mai 12
5. Why you need big data
You Are Here ! Yield
2010 s Systems Thinking Wisdom
2000 s Knowledge Ecology Intelligence
1990 s Knowledge Management Knowledge
1980 s
Information Mangement Information
1970 s
1960 s
1950 s Data Processing Data
Donnerstag, 24. Mai 12
7. You are not looking for patterns,
you are looking for anomalies
Donnerstag, 24. Mai 12
8. Cloud Computing 1.0
Is
When the IT guys are finally
able to explain to business
people what they were
talking about 20 years ago!
Donnerstag, 24. Mai 12
15. BASE
(Basically Available, Soft State, Eventual consistency)
not
ACID
(Atomicity, Consistency, Isolation, Durability)
Donnerstag, 24. Mai 12
16. How to scale
(AWS Example)
• Do not allocate instances manually
• Each component needs to be independent
• Plan for failure
• Actively provoke failure
Donnerstag, 24. Mai 12
17. Human Software
• Click Workers and Mechanical Turks are not just
cheap labour
• They allow programmers to hand tasks to humans
they are not able to handle algorithmically
• Make use of it to
• Do things too complicated for machine learning
• Pre populate machine learning spaces
Donnerstag, 24. Mai 12
18. Old Style (Imperative)
Programming
• Step by step explanation 1
what to do
• Explaining WHAT to do
rather than RESULTS
you want 2
• Always necessary
for basic algorithms
3
Donnerstag, 24. Mai 12
19. One New Stly (Functional)
Programming I
• Combine results to 1
become a program
2
• Allows dynamic 3
distribution
• Map-Reduce is only one
way of doing it!
Donnerstag, 24. Mai 12
20. Functional
Programming II
F ( G ( H ( A,B) , C), D)
getMusicLikes(getFriends(facebookID)
Instead of
for i in getFriends(facebookID)
getMusicLikes(i)
Donnerstag, 24. Mai 12
21. Check out my tool list:
http://www.hcboos.net/100-links/
Donnerstag, 24. Mai 12
26. Credits
• „Big Data Just Beginning to Explode“ by
CSC http://www.csc.com/insights/flxwd/
78931-big_data_just_beginning_to_explode
• „Social media network connections among
twitter users“ by Marc Smith http://
www.flickr.com/photos/marc_smith/
• Asteroid Datasets by Bruce Gary http://
brucegary.net/POVENMIRE/x.htm
Donnerstag, 24. Mai 12