3. “Machine learning is a discipline that is concerned with the design and development of algorithms that allow computers to evolve behaviors based on empirical data”
4. Algorithm Data Input Data Output Runtime ML & AI in the academia and how it’s commonly taught
5. Algorithm Data Input Data Output Runtime ML & AI in the real world or, at least, where the trends are going
13. Growing at exponential rateRuntime Data, is often no longer scarce… in fact, we (Rubyists) are responsible for generating a lot of it…
14. Data Input Data Input Data Input Data Input Data Input ? Runtime Runtime Runtime Runtime Runtime Mo’ data, Mo’ problems? Requires more resources? No better off…?
15. “Mitigating the Paucity-of-Data Problem: Exploring the Effect of Training Corpus Size on Classifier Performance for Natural Language Processing” Michelle Banko, Eric Brill http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.12.646 “More input data vs. Better Algorithms”
16. “Data-Driven Learning” "We were able significantly reduce the error rate, compared to the best system trained on the standard training set size, simply by adding more training data... We see that even out to a billion words the learners continue to benefit from additional training data."
18. 新星歐唐尼爾 保守特立獨行 Wordsegmentationistricky Word|segmentation|is|tricky Strategy 1: Grammar for dummies Strategy 2: Natural language toolkit (encode a language model) Strategy 3: Take a guess! NLP with Big-Data Google does this better than anyone else…
19. P(W) xP(ordsegmentationistricky) P(Wo) xP(rdsegmentationistricky) … P(Word) xP(segmentationistricky) argmax P(W) = ???? Word Segmentation: Take a guess! Estimate the probability of every segmentation, pick the best performer
20.
21. Adding new language: scrape the web, count the words, done.Word Segmentation: Take a guess! That’s how Google does it, and does it well…
22. Algorithm Data Input Data Output Data Input Data Input Data Input Data Input Runtime Runtime Runtime Runtime Runtime Of course, smarter algorithms still matter! don’t get me wrong…
23. If we can identify significant concepts (within a dataset) then we can represent a large dataset with fewer bits. “Machine Learning” If we can represent our data with fewer bits (compress our data), then we have identified “significant” concepts! Learning vs. Compression closely correlated concepts
25. ? Exercise: maximize the margin Color Red = Not tasty Green = Tasty ? Tasty… Feel Predicting a “tasty fruit” with the perceptron algorithm (y = mx + b) http://bit.ly/bMcwhI
26. Green = Positive Purple = Negative Where perceptron breaks down we need a better model…
27. Gree = Positive Purple = Negative Perfect! Idea: y = x2 Throw the data into a “higher dimensional” space! http://bit.ly/dfG7vD
28. require'SVM' sp =Problem.new sp.addExample(”spam", [1,1,0]) sp.addExample(”ham", [0,1,1]) pa =Parameter.new m=Model.new(sp, pa) m.predict [1, 0, 0] Support Vector Machines That’s the core insight! Simple as that. http://bit.ly/a2oyMu
30. A B C D Ben Any M xN matrix (where M >= N), can be decomposed into: M xM - call it U M xN - call it S N xN - call it V Fred Tom James Bob Observation: we can use this decomposition to approximate the original MxN matrix (by fiddling with S and then recomputingU x S x V) Linear Algebra + Singular Value Decomposition A bit of linear algebra for good measure…
31. SVD in action bread and butter of computer vision systems
32. require'linalg' m=Linalg::DMatrix[[1,0,1,0], [1,1,1,1], ... ]] # Compute the SVD Decomposition u, s, vt=m.singular_value_decomposition # ... compute user similarity # ... make recommendations based on similar users! gem install linalg to do the heavy-lifting… http://bit.ly/9lXuOL
34. Raw data Similarity? 1. AAAA AAA AAAA AAA AAAAA 2. BBBBB BBBBBB BBBBB BBBBB 3. AAAA BBBBB AAA BBBBB AA similarity(1, 3) > similarity(1, 2) similarity(2, 3) > similarity(1, 2) Yeah.. but how did you figure that out? Learning & compression are closely correlated concepts Some of you ran Lempel-Ziv on it…
35. Exercise: cluster your ITunes library.. files =Dir['data/*'] defdeflate(*files) z=Zlib::Deflate.new z.deflate(files.collect {|f| open(f).read}.join(""), Zlib::FINISH).size end pairwise= files.combination(2).collect do |f1, f2| a = deflate(f1) b= deflate(f2) both = deflate(f1, f2) { :files => [f1, f2], :score => (a+b)-both } end pp pairwise.sort {|a,b| b[:score] <=> a[:score]}.first(20) Similarity = amount of space saved when compressed together vs. individually Clustering with Zlib no knowledge of the domain, just straight up compression
36. Algorithm Data Input Data Output Data Input Algorithm Data Input Algorithm Data Input Algorithm Data Input Algorithm Runtime Runtime Runtime Runtime Runtime “Ensemble Methods in Machine Learning” Thomas G. Diettrerich (2000) “Ensemble methods are learning algorithms that construct a set of classifiers and then classify new data points by taking a vote of their predictions… ensembles can often perform better than any single classifier.”
37. The Ensemble = 30+ members BellKor = 7 members http://nyti.ms/ccR7ul
38. require'open-uri' classCrowdsource definitialize load_leaderboard# scrape github contest leaders parse_leaders# find their top performing results fetch_results# download best results cleanup_leaders# cleanup missing or incorrect data crunchit# build an ensemble end #... end Crowdsource.new Collaborative, Collaborative Filtering? Unfortunately, GitHub grew didn’t buy into the idea…
41. Complex ideas are constructed on simple ideas: explore the simple ideasMore resources, More data, More Models = Collaborative, Data-Driven Learning
42. Collaborative Filtering with Ensembles: http://www.igvita.com/2009/09/01/collaborative-filtering-with-ensembles/ Support Vector Machines in Ruby: http://www.igvita.com/2008/01/07/support-vector-machines-svm-in-ruby/ SVD Recommendation System in Ruby: http://www.igvita.com/2007/01/15/svd-recommendation-system-in-ruby/ gem install ai4r http://ai4r.rubyforge.org/ Phew, time for questions? hope this convinced you to explore the area further…
Notes de l'éditeur
Now, I believe that as the rails ecosystem grows, and becomes older… The end-to-end performance becomes only more important, because all of the sudden, the projects are larger, and more successful, and they’re feeling the pain of “scaling the Rails stack”.