Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Real Time-Big Data-Social Network-Data Science
1. Real Time-Big Data-Social
Network-Data Science-Gamified!
Jason Capehart
a.k.a. The Cascade Project 12/12/12
(Okay … that last part of the title isn’t true)
9. Surely, You Must Be Joking.
Store Examples
Key-Value Hadoop, Memcached, Redis
Document MongoDB, CouchDB
Graph Neo4j, Giraph, Titan
Real Time Storm, Impala
10.
11. Citation:
Kwak, H., Changhyun, L., Park, H., & Moon, S. (2010). What is Twitter, a Social Network or a News
Media? Proceedings of the 19th International World Wide Web (WWW) Conference (pp. 591-600).
Raleigh, NC: ACM.
12. Citation:
A. Clauset, C.R. Shalizi, and M.E.J. Newman, "Power-law distributions in empirical data" SIAM
Review 51(4), 661-703 (2009). (arXiv:0706.1062, doi:10.1137/070710111)
13. 800,000,000
(that’s a lot of users)
(cost = 200k for fire hose)
14. Sampled
Not Sampled
Citation:
Stumpf, M. P., Wiuf, C., & May, R. M. (2005). Subnets of scale-free networks are not scale-free:
Sampling properties of networks. Proceedings of the National Academy of Sciences, 4221-4224.
15.
16. # Pseudo Code
id_guess = randint(0, 10^9)
user = api.get_user(id = id_guess)
Repeat until tired or rate limited
17.
18. Power Law (xmin = 281, α = 2.19)
Lognormal
Discrete Power Law vs.
Lognormal
Loglikelihood
89.46
Ratio
Vuong’s Test
7.14
Statistic
p-val
>0.99
(1-sided)
19.
20. Power Law (xmin = 222, α = 2.33)
Lognormal
Stretched Exponential
21. • Conclusions = None!
– All work is in progress
• Discussion
– Cascade uses open source
– Opportunities to give back?
22. References
1. A. Clauset, C.R. Shalizi, and M.E.J. Newman, "Power-law distributions in empirical data" SIAM Review 51(4), 661-703
(2009). (arXiv:0706.1062, doi:10.1137/070710111)
– Code: http://tuvalu.santafe.edu/~aaronc/powerlaws/
2. Newman, M. (2005, September-October). Power laws, Pareto distributions and Zipf's law. Contemporary Physics,
46(5), 323-351.
3. Kwak, H., Changhyun, L., Park, H., & Moon, S. (2010). What is Twitter, a Social Network or a News Media? Proceedings
of the 19th International World Wide Web (WWW) Conference (pp. 591-600). Raleigh, NC: ACM
4. Stumpf, M. P., Wiuf, C., & May, R. M. (2005). Subnets of scale-free networks are not scale-free: Sampling properties of
networks. Proceedings of the National Academy of Sciences, 4221-4224.