Practical Machine Learning and Rails Part1

•Télécharger en tant que PPTX, PDF•

7 j'aime•3,999 vues

ryanstout

Part 2: http://www.slideshare.net/ryanstout/practical-machine-learning-and-rails-part2

Technologie Formation

Andrew Cantino
VP Engineering, Mavenlink @tectonic

Founder, Agile Productions @ryanstout

This talk will
- introduce machine learning

- make you ML-aware

- have examples

This talk will not
- give you a PhD

- implement algorithms

- cover collaborative filtering,
optimization, clustering, advanced statistics, genetic algorithms, classical AI, NLP, ...

What is Machine Learning?
Many different algorithms

that predict data

from other data

using applied statistics.

What data?
The web is data.

User decisions
APIs A/B Tests
Databases
Logs Streams

Browser versions
Reviews
Clicktrails

Okay. We have data.
What do we do with it?

We classify it.

Classification
• Documents
o Sort email (Gmail's importance filter)
o Route questions to appropriate expert (Aardvark)
o Categorize reviews (Amazon)

• Users
o Expertise; interests; pro vs free; likelihood of paying;
expected future karma

• Events
o Abnormal vs. normal

Algorithms:
Decision Tree Learning

Features
Email contains
word "viagra"

no yes

Email contains Email contains
word "Ruby" attachment?

no yes no yes

P(Spam)=10% P(Spam)=5% P(Spam)=70% P(Spam)=95%

Labels

Algorithms:
Support Vector Machines (SVMs)

Graphics from Wikipedia

Algorithms:
Naive Bayes

• Break documents into words and treat each
word as an independent feature

• Surprisingly effective on simple text and
document classification

• Works well when you have lots of data

Graphics from Wikipedia

Algorithms:
Naive Bayes

You received 100 emails, 70 of which were spam.
Word Spam with this word Ham with this word

viagra 42 (60%) 1 (3.3%)

ruby 7 (10%) 15 (50%)

hello 35 (50%) 24 (80%)

A new email contains hello and viagra. The probability that it
is spam is:
P(S|hello,viagra) = P(S) * P(hello,viagra|S) / P(hello,viagra)
= 0.7 * (0.5 * 0.6) / (0.59 * 0.43)
= 82%
Graphics from Wikipedia

Algorithms:
Neural Nets
Hidden layer

Input layer (features)

Output layer (Classification)

Graphics from Wikipedia

Curse of Dimensionality

The more features
and labels that you
have, the more data
that you need.

http://www.iro.umontreal.ca/~bengioy/yoshua_en/research_files/CurseDimensionality.jpg

Overfitting
• With enough parameters, anything is
possible.

• We want our algorithms to generalize and
infer, not memorize specific training
examples.

• Therefore, we test our algorithms on
different data than we train them on.

Recommandé

第8章ガウス過程回帰による異常検知Chika Inoshita

Rによるprincomp関数を使わない主成分分析wada, kazumi

ConvNetの歴史とResNet亜種、ベストプラクティスYusuke Uchida

[DL輪読会]A closer look at few shot classificationDeep Learning JP

Lec11: Active Contour and Level Set for Medical Image SegmentationUlaş Bağcı

三次元点群を取り扱うニューラルネットワークのサーベイNaoya Chiba

【DL輪読会】Decoupling Human and Camera Motion from Videos in the Wild (CVPR2023)Deep Learning JP

[DL輪読会]PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metr...Deep Learning JP

Recommandé

第8章ガウス過程回帰による異常検知Chika Inoshita

Rによるprincomp関数を使わない主成分分析wada, kazumi

ConvNetの歴史とResNet亜種、ベストプラクティスYusuke Uchida

[DL輪読会]A closer look at few shot classificationDeep Learning JP

Lec11: Active Contour and Level Set for Medical Image SegmentationUlaş Bağcı

三次元点群を取り扱うニューラルネットワークのサーベイNaoya Chiba

【DL輪読会】Decoupling Human and Camera Motion from Videos in the Wild (CVPR2023)Deep Learning JP

[DL輪読会]PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metr...Deep Learning JP

RustによるGPUプログラミング環境KiyotomoHiroyasu

非参照型メトリクスを用いた放射線動画の評価Yutaka KATAYAMA

クラシックな機械学習の入門　　9. モデル推定Hiroshi Nakagawa

[cvpaper.challenge] 超解像メタサーベイ #meta-study-group勉強会S_aiueo32

【DL輪読会】AUTOGT: AUTOMATED GRAPH TRANSFORMER ARCHITECTURE SEARCHDeep Learning JP

[DL輪読会]Graph R-CNN for Scene Graph GenerationDeep Learning JP

深層学習フレームワークChainerの特徴Yuya Unno

【DL輪読会】Scaling laws for single-agent reinforcement learningDeep Learning JP

決定木・回帰木に基づくアンサンブル学習の最近Ichigaku Takigawa

バイオインフォマティクスで実験ノートを取ろうMasahiro Kasahara

Ph.D. Defense Presentation Slides (Changhee Han) カリスの東大博論審査会（公聴会）発表スライド Patho...カリス東大AI博士

金融リスクとポートフォリオマネジメントKei Nakagawa

Local binary patternInternational Islamic University

劣微分ShintaUrakami

【DL輪読会】Monocular real time volumetric performance capture Deep Learning JP

[DL輪読会]Transframer: Arbitrary Frame Prediction with Generative ModelsDeep Learning JP

PCAの最終形態GPLVMの解説弘毅露崎

文献紹介：CutDepth: Edge-aware Data Augmentation in Depth EstimationToru Tamaki

Image Filtering in the Frequency DomainAmnaakhaan

Variational Template Machine for Data-to-Text Generationharmonylab

Static Analysisalice yang

Cs221 lecture5-fall11darwinrlo

Contenu connexe

Tendances

RustによるGPUプログラミング環境KiyotomoHiroyasu

非参照型メトリクスを用いた放射線動画の評価Yutaka KATAYAMA

クラシックな機械学習の入門　　9. モデル推定Hiroshi Nakagawa

[cvpaper.challenge] 超解像メタサーベイ #meta-study-group勉強会S_aiueo32

【DL輪読会】AUTOGT: AUTOMATED GRAPH TRANSFORMER ARCHITECTURE SEARCHDeep Learning JP

[DL輪読会]Graph R-CNN for Scene Graph GenerationDeep Learning JP

深層学習フレームワークChainerの特徴Yuya Unno

【DL輪読会】Scaling laws for single-agent reinforcement learningDeep Learning JP

決定木・回帰木に基づくアンサンブル学習の最近Ichigaku Takigawa

バイオインフォマティクスで実験ノートを取ろうMasahiro Kasahara

Ph.D. Defense Presentation Slides (Changhee Han) カリスの東大博論審査会（公聴会）発表スライド Patho...カリス東大AI博士

金融リスクとポートフォリオマネジメントKei Nakagawa

Local binary patternInternational Islamic University

劣微分ShintaUrakami

【DL輪読会】Monocular real time volumetric performance capture Deep Learning JP

[DL輪読会]Transframer: Arbitrary Frame Prediction with Generative ModelsDeep Learning JP

PCAの最終形態GPLVMの解説弘毅露崎

文献紹介：CutDepth: Edge-aware Data Augmentation in Depth EstimationToru Tamaki

Image Filtering in the Frequency DomainAmnaakhaan

Variational Template Machine for Data-to-Text Generationharmonylab

Tendances (20)

RustによるGPUプログラミング環境

非参照型メトリクスを用いた放射線動画の評価

クラシックな機械学習の入門　　9. モデル推定

[cvpaper.challenge] 超解像メタサーベイ #meta-study-group勉強会

【DL輪読会】AUTOGT: AUTOMATED GRAPH TRANSFORMER ARCHITECTURE SEARCH

[DL輪読会]Graph R-CNN for Scene Graph Generation

深層学習フレームワークChainerの特徴

【DL輪読会】Scaling laws for single-agent reinforcement learning

決定木・回帰木に基づくアンサンブル学習の最近

バイオインフォマティクスで実験ノートを取ろう

Ph.D. Defense Presentation Slides (Changhee Han) カリスの東大博論審査会（公聴会）発表スライド Patho...

金融リスクとポートフォリオマネジメント

Local binary pattern

劣微分

【DL輪読会】Monocular real time volumetric performance capture

[DL輪読会]Transframer: Arbitrary Frame Prediction with Generative Models

PCAの最終形態GPLVMの解説

文献紹介：CutDepth: Edge-aware Data Augmentation in Depth Estimation

Image Filtering in the Frequency Domain

Variational Template Machine for Data-to-Text Generation

Similaire à Practical Machine Learning and Rails Part1

Static Analysisalice yang

Cs221 lecture5-fall11darwinrlo

Machine Learning 101 - AWS Machine Learning Web DayAWS Germany

NAIVE BAYES ALGORITHMRang Technologies

The Art of Identifying Vulnerabilities - CascadiaFest 2015Adam Baldwin

Barga Data Science lecture 9Roger Barga

Strata London - Deep Learning 05-2015Turi, Inc.

Data mining on yelp datasetParineetha Tirumali

2020 01 21 Data Platform Geeks - Machine Learning.NetBruno Capuano

A Fast Flowgraph Based Classification System for Packed and Polymorphic Malwa...Silvio Cesare

The Magical Art of Extracting Meaning From Datalmrei

Knowledge graphs, meet Deep LearningConnected Data World

Machine learning, biomarker accuracy and best practicesPradeep Redddy Raamana

07-Classification.pptxShree Shree

Machine Learning ClassifiersMostafa

Probabilistic Programming: Why, What, How, When?Salesforce Engineering

Defcon 21-pinto-defending-networks-machine-learning by pseudor00tpseudor00t overflow

Practical Data Analysis in PythonHilary Mason

Barga Data Science lecture 8Roger Barga

Part 3 Machine LearnningMohamed Essam

Similaire à Practical Machine Learning and Rails Part1 (20)

Static Analysis

Cs221 lecture5-fall11

Machine Learning 101 - AWS Machine Learning Web Day

NAIVE BAYES ALGORITHM

The Art of Identifying Vulnerabilities - CascadiaFest 2015

Barga Data Science lecture 9

Strata London - Deep Learning 05-2015

Data mining on yelp dataset

2020 01 21 Data Platform Geeks - Machine Learning.Net

A Fast Flowgraph Based Classification System for Packed and Polymorphic Malwa...

The Magical Art of Extracting Meaning From Data

Knowledge graphs, meet Deep Learning

Machine learning, biomarker accuracy and best practices

07-Classification.pptx

Machine Learning Classifiers

Probabilistic Programming: Why, What, How, When?

Defcon 21-pinto-defending-networks-machine-learning by pseudor00t

Practical Data Analysis in Python

Barga Data Science lecture 8

Part 3 Machine Learnning

Plus de ryanstout

Neural networks - BigSkyDevConryanstout

Volt 2015ryanstout

Isomorphic App Development with Ruby and Volt - Rubyconf2014ryanstout

Reactive programmingryanstout

Concurrency Patternsryanstout

EmberJSryanstout

Practical Machine Learning and Rails Part2ryanstout

Intro to Advanced JavaScriptryanstout

Plus de ryanstout (8)

Neural networks - BigSkyDevCon

Volt 2015

Isomorphic App Development with Ruby and Volt - Rubyconf2014

Reactive programming

Concurrency Patterns

EmberJS

Practical Machine Learning and Rails Part2

Intro to Advanced JavaScript

Dernier

The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3

Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3

DevEX - reference for building teams, processes, and platformsSergiu Bodiu

[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra

New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada

Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen

The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3

Scale your database traffic with Read & Write split using MySQL RouterMydbops

Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA

Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3

Sample pptx for embedding into website for demoHarshalMandlekar2

Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3

A Journey Into the Emotions of Software DevelopersNicole Novielli

Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq

How to write a Business Continuity PlanDatabarracks

Rise of the Machines: Known As Drones...Rick Flair

What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina

Decarbonising Buildings: Making a net-zero built environment a realityIES VE

The State of Passkeys with FIDO Alliance.pptxLoriGlavin3

Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal

Dernier (20)

The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx

Moving Beyond Passwords: FIDO Paris Seminar.pdf

DevEX - reference for building teams, processes, and platforms

[Webinar] SpiraTest - Setting New Standards in Quality Assurance

New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024

Testing tools and AI - ideas what to try with some tool examples

The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx

Scale your database traffic with Read & Write split using MySQL Router

Long journey of Ruby standard library at RubyConf AU 2024

Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx

Sample pptx for embedding into website for demo

Digital Identity is Under Attack: FIDO Paris Seminar.pptx

A Journey Into the Emotions of Software Developers

Genislab builds better products and faster go-to-market with Lean project man...

How to write a Business Continuity Plan

Rise of the Machines: Known As Drones...

What is DBT - The Ultimate Data Build Tool.pdf

Decarbonising Buildings: Making a net-zero built environment a reality

The State of Passkeys with FIDO Alliance.pptx

Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...

Practical Machine Learning and Rails Part1

1. Practical Machine Learning and Rails

2. Andrew Cantino VP Engineering, Mavenlink @tectonic Founder, Agile Productions @ryanstout

3. This talk will - introduce machine learning - make you ML-aware - have examples

4. This talk will not - give you a PhD - implement algorithms - cover collaborative filtering, optimization, clustering, advanced statistics, genetic algorithms, classical AI, NLP, ...

5. What is Machine Learning? Many different algorithms that predict data from other data using applied statistics.

6. "Enhance and rotate 20 degrees"

7. What data? The web is data. User decisions APIs A/B Tests Databases Logs Streams Browser versions Reviews Clicktrails

8. Okay. We have data. What do we do with it? We classify it.

9. Classification

10. Classification OR

11. Classification :) OR :(

12. Classification • Documents o Sort email (Gmail's importance filter) o Route questions to appropriate expert (Aardvark) o Categorize reviews (Amazon) • Users o Expertise; interests; pro vs free; likelihood of paying; expected future karma • Events o Abnormal vs. normal

13. Algorithms: Decision Tree Learning

14. Algorithms: Decision Tree Learning Features Email contains word "viagra" no yes Email contains Email contains word "Ruby" attachment? no yes no yes P(Spam)=10% P(Spam)=5% P(Spam)=70% P(Spam)=95% Labels

15. Algorithms: Support Vector Machines (SVMs) Graphics from Wikipedia

16. Algorithms: Support Vector Machines (SVMs) Graphics from Wikipedia

17. Algorithms: Naive Bayes • Break documents into words and treat each word as an independent feature • Surprisingly effective on simple text and document classification • Works well when you have lots of data Graphics from Wikipedia

18. Algorithms: Naive Bayes You received 100 emails, 70 of which were spam. Word Spam with this word Ham with this word viagra 42 (60%) 1 (3.3%) ruby 7 (10%) 15 (50%) hello 35 (50%) 24 (80%) A new email contains hello and viagra. The probability that it is spam is: P(S|hello,viagra) = P(S) * P(hello,viagra|S) / P(hello,viagra) = 0.7 * (0.5 * 0.6) / (0.59 * 0.43) = 82% Graphics from Wikipedia

19. Algorithms: Neural Nets Hidden layer Input layer (features) Output layer (Classification) Graphics from Wikipedia

20. Curse of Dimensionality The more features and labels that you have, the more data that you need. http://www.iro.umontreal.ca/~bengioy/yoshua_en/research_files/CurseDimensionality.jpg

21. Overfitting • With enough parameters, anything is possible. • We want our algorithms to generalize and infer, not memorize specific training examples. • Therefore, we test our algorithms on different data than we train them on.