Todos cometemos errores cuando desarrollamos modelos predictivos y es así como nosotros aprendemos. Frank Vanden Berghen, CEO of Timi, y Daniel Soto Zeevaert (Director Lat Am of Timi) con más de 30 años de experiencia cometiendo errores en la construcción de modelos predictivos nos compartirán su experiencia y muchas formas creativas en que los analistas pueden equivocarse y construir malos modelos predictivos inclusive utilizando las mejores herramientas y algoritmos.
Call Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night Stand
BDAS-2017 | Hanldling Target Bias in Predictive Modelling
1. How to Fail in
Predictive
Analytics
Failing analytical processes
requires a lot of attention on
the Business and Management
side
______
... But there are also secrets we
will share to make sure that
even if the organization
doesn’t let you, you can still
have great looking, but
completely invalid models. And
nobody will even notice!
2. Hola!
Soy Frank Vanden Berghen
CEO of Timi
Ir, Dr Applied Mathematics
Experto en fracasos de algoritmos
Me puedes encontrar como frank@timi.eu
3. Hola!
Y Soy Daniel Soto Zeevaert
Executive Director LatAm,
Lic, Ms.Sc.
Experto en fracasos de procesos
Me puedes encontrar como daniel.soto@timi.eu
5. Why do projects fail?
Too much effort
Nobody understands
“Bad” Model
Insights only
Lack of Support
Change
21%
19%
18%
16%
15%
10%
Rexer Analytics 2012
With so many errors in management, our Bad
Models often go unnoticed!
12. How can we make sure to fail models?
6 incredible ways to
have good looking,
perfectly invalid
models.
13. Target Bias
Root of the
problem
Non-Random targeting in
customer selection.
Selection Bias Tool Impact
14. Solutions?
• Remove
problematic
variables (e.g. Age,
Gender, Region)
• Risk: other
variables may be
highly correlated
(selection may
have made sense!)
• Filter all the ”non
target”
• Create a clean
model
• Apply to the
population
• Lift will not look as
wonderful
• Model may have a
huge impact and
make money.
• Try to mess the
production phase!
Critical to focus on
process!
15. Missing Values
Is Missing information
actual information?
MAR vs NMAR completamente nuevas de
competir y ganar.
We can safely replace the
missing by a “central” value
MAR
Missing means something, there
NMAR
16. Probability to buy a
particular product =
probability to buy any
product
The model accurately predicts a purchase.
We just don’t know which one!
Solution: Remember Bayes: P(A) = P (A|B) * P(B).
Too Generic
17. El costo del error es igual para
todos
E(R) = P(X)*E(V)
Two Step Models
Proba
Expected
Value
Expected
Revenue
18. TARGET = at least 36
purchase in 6
months Tadaaaa!
Number of Transactions
Probability
Too Obvious
19. “If a client pays an invoice, he WILL buy a product”
“if DATE_CANCELATION is missing, it means clients are very loyal”
Target Time Interval
past Future
contacts
transactions
products
Obvious
Future
past
Leakage from the Future
20. TABLA: COMPARAR DATOS
A B C
AMARILLO 10 20 7
AZUL 30 15 10
NARANJA 5 24 16
Fail with
When everything else keeps working…
By not including the Business people, you will make sure that nobody supports you. A perk is that any change in the organization will go completely unnoticed, and you will be able to keep spending a lot of time a useless models before you are even aware that it is useless!
Also, back to CRISP-DM, if Business understanding is poorly done, this is a GUARANTEED FAILURE!
It is critical to control what would happen if the organization lets you finish on time, believes in your model, and mitigate the risk that you may have forgotten to have strong overfitting in your data. Basically, a worse case scenario: you have a good model, robust, support from management, you even let other people work on presenting with you.
As a last resort: use some old, slow, and complex data management tool.
If your ETL cannot do a transposition, or if you are forced to rely on SQL for at least SOME variable creation, you might be safe! This will be out of your hands, IT will not give you the bandwidth.
If this fails, then convince your organization to acquire open-source tools with no support! Talend and Pentaho have been a secret weapon of many consultants!
It is really critical that any change require heavy lifting. Use ONLY tools that force you to specify the metadata of every variable. Use something in RAM, so it will be very costly, and also make sure that any change in requirement represents a long process. Most ETL do this incredibly well. Avoid Anatella, it is too fast, too easy, and requires too little resources. You would be force to put things in production.
SCALA is a great option: complex programming, few experts available. Combine it with SAP or other systems that will lock you out!
It is really important not to have any clarity in your reports. Avoid graphics, and if you must use them, change the color scheme, use obscure methods, and insist on precise number of p-value, AIC or BIC
Spend all your time setting parameters, or tweaking code. This is the best investment to avoid having meaningful impacts on model precision!