Machine learning is teaching the computer how to learn by itself. It is far easier to be done, especially when you have small data set and a good level of expertise in your field. Classifying objects, predicting who will buy, where to park your car or which test will fail may be achieved with grassy algorithm like neural networks, genetic algorithms or ant herding. PHP is in good position to make use of such teachings, and take advantages of related technologies like R. By the end of the session, you’ll know where you want to try it.
4. MACHINE LEARNING
Teaching the machine
Supervised learning : learning then applying
Application build its own model : training phase
It applies its model to real cases : applying phase
5. APPLICATIONS
Play go, chess, tic-tac-toe and beat everyone else
Fraud detection and risk analysis
Automated translation or automated transcription
OCR and face recognition
Medical diagnostics
Walk, welcome guest at hotels, play football
Finding good PHP code
7. REAL USE CASE
Identify code in comments
Classic problem
Good problem for machine learning
Complex, no simple solution
A lot of data and expertise are available
9. THE FANN EXTENSION
ext/fann (https://pecl.php.net/package/fann)
Fast Artificial Neural Network
http://leenissen.dk/fann/wp/
Neural networks in PHP
Works on PHP 7, thanks to the hard work of Jakub Zelenka
https://github.com/bukka/php-fann
14. EXPERT AT WORK
// Test if the if is in a compressed format
// none need yet
// icon
// There is a parser specified in `Parser::$KEYWORD_PARSERS`
// $result should exist, regardless of $_message
// $a && $b and multidimensional
// numGlyphs + 1
// TODO : fix this; var_dump($var);
// if(ob_get_clean()){
//$annots .= ' /StructParent ';
// $cfg['Servers'][$i]['controlpass'] = 'pmapass';
15. INPUTVECTOR
'length' : size of the comment
'countDollar' : number of $
'countEqual' : number of =
'countObjectOperator' number of -> operator ($o->p)
'countSemicolon' : number of semi-colon ;
16. INPUT DATA
46 5 1
825 0 0 0 1
0
37 2 0 0 0
0
55 2 2 0 1
1
61 2 1 3 1
1
...
* This file is part of Exakat.
*
* Exakat is free software: you can redist
* it under the terms of the GNU Affero Ge
* the Free Software Foundation, either ve
* (at your option) any later version.
*
* Exakat is distributed in the hope that
* but WITHOUT ANY WARRANTY; without even
* MERCHANTABILITY or FITNESS FOR A PARTIC
* GNU Affero General Public License for m
*
* You should have received a copy of the
* along with Exakat. If not, see <http:/
*
* The latest code can be found at <http:/
*
*/
// $x[3] or $x[] and multidimensional
//if ($round == 3) { die('Round '.$round);
//$this->errors[] = $this->language->get('
Number of input
Number of incoming data
Number of outgoing data
24. RESULTS > 0.8
Answer between 0 and 1
Values ranges from -14 to 0,999
The closer to 1, the safer.The closer to 0, the safer.
Is this a percentage? Is this a carrots count ?
It's a mix of counts…
31. RESULTS
1960 issues
50+% of false positive
With an easy clean, 822 issues reported
14k comments, analyzed in 367 ms
Total time of coding : 27 mins.
// = ( 59 x 84 ) mm = ( 2.32 x 3.31 ) in
/* vim: set expandtab sw=4 ts=4 sts=4: */
32. LEARN BETTER, NOT HARDER
Better training data
Improve characteristics
Configure the neural network
Change algorithm
Automate learning
Update constantly
Real data
History
data
Training
Model Results
Retroaction
33. BETTERTRAINING DATA
More data, more data, more data
Varied situations, real case situations
Include specific cases
Experience is capital
https://homes.cs.washington.edu/~pedrod/papers/cacm12.pdf
34. IMPROVE CHARACTERISTICS
Add new characteristics
Remove the one that are less interesting
Find the right set of characteristics
35. NETWORK CONFIGURATION
Input vector
Intermediate neurons
Activation function
Output vector
0
5000
10000
15000
20000
1 2 3 4 5 6 7 8 9 10
1 layer 2 layers 3 layers 4 layers
Time of training (ms)
36. CHANGE ALGORITHM
First add more data before changing algorithm
Try cascade2 algorithm from FANN
0.6 => 0 found
0.5 => 2 found
Not found by the first algorithm
44. QUELLES APPLICATIONS?
Non-déterministe
Elimination de tout ce qui est systématique à trouver
Accès à l'expertise et aux vecteurs de caractéristiques
Couche finale après les résultats
Classification, priorisation, approximation rapide