SlideShare une entreprise Scribd logo
1  sur  80
Machine Learning in the
FINANCIAL Industry
A birD’s eye View
Subrat Panda and Biswa G Singh
Brief Introduction - Subrat
● BTech ( 2002) , PhD (2009) – CSE, IIT Kharagpur
● Synopsys (EDA), IBM (CPU), NVIDIA (GPU), Taro (Full Stack Engineer), Capillary
(Principal Architect - AI)
● Applying AI to Retail
● Co-Founded IDLI (for social good) with Prof. Amit Sethi (IIT Bombay), Jacob Minz
(Synopsys) and Biswa Gourav Singh (AMD)
● https://www.facebook.com/groups/idliai/
● Linked In - https://www.linkedin.com/in/subratpanda/
● Facebook - https://www.facebook.com/subratpanda
● Twitter - @subratpanda
Brief Introduction - Biswa
● BTech ( NIST - 2005) , MS (2009) – Clemson University
● Synopsys (EDA), IBM (CPU), ARM, AMD, Capillary (Lead ML Engineer - Data
Sciences)
● Applying AI to Retail
● Co-Founded IDLI (for social good) with Prof. Amit Sethi (IIT Bombay), Jacob Minz
(Synopsys) and Subrat Panda
● https://www.facebook.com/groups/idliai/
● Linked In - https://www.linkedin.com/in/biswagsingh/
● Facebook - https://www.facebook.com/biswa.singh
● Kaggle Expert, Winner of AV (Click stream prediction)
Preface
• Artificial intelligence is already part of our everyday lives.
Application of AI, Machine Learning and Deep Learning
Machine Learning Classical Definition
▪ Arthur Samuel (1959): "computer’s ability to learn without being
explicitly programmed.“
▪ Tom M Mitchel (1998): "A computer program is said to learn from
experience E with respect to some class of tasks T and performance
measure P if its performance at tasks in T, as measured by P,
improves with experience E.“
▪ Optimize a performance criterion using example data or past
experience.
Types of Machine Learning Algorithms
▪ Supervised Learning: Input data with labeled
responses
▪ Regression : Given a picture of a person, we have to
predict their age on the basis of the given picture
▪ Classification : Given a patient with a tumor, we
have to predict whether the tumor is malignant or
benign.
IRIS DATASET SPECIES
CLASSIFICATION
TEXT CLASSIFICATIONIMAGE CLASSIFICATION
Linear Regression Non-Linear Regression
▪ Unsupervised Learning: Input data without labeled responses.
▪ Clustering: Take a collection of 1,000,000 different genes, and find a way to
automatically group these genes into groups that are somehow similar or
related by different variables, such as lifespan, location, roles, and so on.
▪ Non Clustering: Exploratory data analysis (PCA, Auto-encoders)
Types of Machine Learning Algorithms
Customer
SegmentationMNIST Digit Segmentation
Data Modeling
Pop Quiz
▪ Predicting housing prices based on input parameters like house size, number of
rooms, location of house etc. falls under which category of machine learning
problem:
▪ A) Regression
▪ B) Classification
▪ C) Clustering
▪ D) None
▪ Automatically segmenting your customers according to the customer
information falls under which category of machine learning.
▪ A) Regression
▪ B) Classification
▪ C) Clustering
▪ D) None
Intelligent Loan Application system
Credit: Coursera U Wash, Machine learning Course
What makes a loan Risky?
Credit: Coursera U Wash, Machine learning Course
Classifier View
1) Review credit application with an expert
2) Learn algorithms to replicate expert judgement
3) Use of traditional data
4) Additional Insight:
a) Give Applicant a questionnaire
b) Add the questionnaire data to predict outcome
c) Long term effort , as risk outcomes needs to be
observed
d) Mining Voice data
Map Expert Judgement to improve
Why Machines?
- For banks NPAs are a big mess (whether big or small)
- NPAs happen because of a lot of reasons:
- Human Error in Judgement
- Lack of analysis of all available data points
- Long term data and multiple data sources not considered together
- Inherent biases in some people
- Big view analytics of data missing
- Incomplete Risk analysis
- Market dynamics and correlation change over time
- Machine Learning Algorithms can model most if not all of the
conditions
- Can assist Risk Analysts - Augmented Intelligence
- Multiple models can be used and voting between them so that
people responsible don’t get blind-sided
Techniques we will discuss
- Logistic Regression
- Discussion of Concepts
- Demo
- Boosting
- Discussion of Concepts
- Demo
- Time Series Analysis ( Discussion )
Logistic Regression
Introduction
▪ It is an approach to the classification problem.
▪ The output vector is either 1 or 0 instead of a continuous range of
values
▪ y ∈ {0,1}
▪ Binary classification problem (two values)
▪ Linear regression wont work in the classification problem
IMAGE CLASSIFICATION
Logistic Regression: Hypothesis
▪ The hypothesis should satisfy
▪ 0 ≤ h(x) ≤ 1
▪ the "Sigmoid Function," also called the
"Logistic Function":
▪ We want to restrict the range to 0 and 1.
This is accomplished by plugging θTx into the
Logistic Function
Decision Boundary
In order to get our discrete 0 or 1 classification, we can translate the output of the hypothesis function as
follows:
hθ(x)≥0.5→y=1
hθ(x)<0.5→y=0
Cost Function
▪ Can not use squared cost function as Logistic Function will cause the
output to be wavy, causing many local optima.
Cost Function
▪ Logistic regression Cost function
We will have to maximize the log likelihood
Maximizing log likelihood
Similar to linear regression, we have to use gradient descent. Now our
updates will look like below:
▪ Bias is the algorithm's tendency to
consistently learn the wrong thing by
not taking into account all the
information in the data
▪ Variance is the algorithm's tendency to
learn random things irrespective of the
real signal by fitting highly flexible
models that follow the error/noise in
the data too closely
Bias/Variance
• Generalization ability gives an algorithm’s ability to give accurate
prediction new, previous unseen data
• Models that are too complex for the amount of training data
available are said to overfit and are not likely to generalize well to
new examples
• High variance can cause an algorithm to model the random noise in
the training data, rather than the intended outputs (overfitting).
• Models that are too simple, that do not even do well on training data,
are said to underfit and also not likely to generalize well.
• High bias can cause an algorithm to miss the relevant relations
between features and target outputs (underfitting).
Problem of high Bias/Variance
Bias-Variance: An Example
Bias/Variance is a Way to Understand
Overfitting and Underfitting
Error/Loss on
training set Dtrain
Error/Loss on an
unseen test set
Dtest
high error
30
complex classifiersimple classifier
“too simple”
“too complex”
Definitions
• Overfitting: too much reliance on the training data
• Underfitting: a failure to learn the relationships in the training data
• High Variance: model changes significantly based on training data
• High Bias: assumptions about model lead to ignoring training data
• Overfitting and underfitting cause poor generalization on the test set
• A validation set for model tuning can prevent under and overfitting
▪ Underfitting:
▪ Easier to resolve
▪ Try different machine learning models
▪ Try stronger models with higher capacity
(hyperparameter tuning)
▪ Try more features
▪ Overfitting
▪ Use a resampling technique like K-fold cross validation
▪ Improve the feature quality or remove some features
▪ Training with more data
▪ Early stopping
▪ Regularization
▪ Ensembling
Ways to Deal with Overfitting and Underfitting
Early Stopping
• Regularization penalizes the coefficients. In machine learning, it
actually penalizes the weight matrices of the nodes.
• L1 and L2 are the most common types of regularization.
• These update the general cost function by adding another term
known as the regularization term.
Regularization
Cost function = Loss (say, binary cross entropy) + Regularization term
▪ In L2, we have:
▪ Here, lambda is the regularization parameter. It is the hyperparameter whose
value is optimized for better results. L2 regularization is also known as weight
decay as it forces the weights to decay towards zero (but not exactly zero).
▪ In L1, we have:
▪ In this, we penalize the absolute value of the weights. Unlike L2, the weights
may be reduced to zero here.
L1 and L2 Regularization
Tree based modeling
Decision Tree
▪ Decision Tree is the supervised learning algorithm.
▪ We split the population or sample into two or more homogeneous
sets (or sub-populations) based on most significant differentiator
in input variables.
1.Root Node: It represents entire
population or sample and this
further gets divided into two or
more homogeneous sets.
2.Splitting: It is a process of
dividing a node into two or more
sub-nodes.
3.Decision Node: When a sub-
node splits into further sub-nodes,
then it is called decision node.
4.Leaf/ Terminal Node: Nodes do
not split is called Leaf or Terminal
node.
Another Example
Methods of splitting: Information gain
which node can be described
easily?
▪ Information theory is a measure to define this degree of disorganization in a system known as
Entropy.
Here p and q is probability of success and failure respectively in that node.
Other Tree based methods
▪ Trade-off management of bias-variance errors.
▪ Bagging is a simple ensembling technique in which we
build many independent predictors/models/learners
and combine them using some model averaging
techniques.
▪ Ensemble methods involve group of predictive models
to achieve a better accuracy and model stability.
▪ Random Forest: Multiple Trees instead
of single tree. It’s a bagging method
▪ To classify a new object based on
attributes, each tree gives a classification
and we say the tree “votes” for that class.
▪ Gradient Boosting is a tree ensemble technique that creates a
strong classifier from a number of weak classifiers.
▪ It works in the technique of weak learners and the additive model.
▪ Boosting is an ensemble technique in which the predictors are not
made independently, but sequentially.
Other Tree based methods
INTRODUCTION TO
BOOSTING
DEFINITION
• The term ‘Boosting’ refers to a family of algorithms which converts weak learner to strong learners.
• Let’s understand this definition in detail by solving a problem of spam email identification:
• How would you classify an email as SPAM or not? Like everyone else, our initial approach would be to identify
‘spam’ and ‘not spam’ emails using following criteria. If:
• Email has only one image file (promotional image), It’s a SPAM
• Email has only link(s), It’s a SPAM
• Email body consist of sentence like “You won a prize money of $ xxxxxx”, It’s a SPAM
• Email from our official domain “metu.edu.tr” , Not a SPAM
• Email from known source, Not a SPAM
• Above, we’ve defined multiple rules to classify an email into ‘spam’ or ‘not spam’. But, do you think these rules
individually are strong enough to successfully classify an email? No.
• Individually, these rules are not powerful enough to classify an email into ‘spam’ or ‘not spam’. Therefore, these
rules are called as weak learner.
DEFINITION
• To convert weak learner to strong learner, we’ll combine the
prediction of each weak learner using methods like:
• Using average/ weighted average
• Considering prediction has higher vote
• For example: Above, we have defined 5 weak learners. Out of these
5, 3 are voted as ‘SPAM’ and 2 are voted as ‘Not a SPAM’. In this case,
by default, we’ll consider an email as SPAM because we have
higher(3) vote for ‘SPAM’.
How Boosting Algorithms works?
• To find weak rule, we apply base learning algorithms with a different distribution. Each time base
learning algorithm is applied, it generates a new weak prediction rule. This is an iterative process.
After many iterations, the boosting algorithm combines these weak rules into a single strong
prediction rule.
• For choosing the right distribution, here are the following steps:
Step 1: The base learner takes all the distributions and assign equal weight or attention to each
observation.
Step 2: If there is any prediction error caused by first base learning algorithm, then we pay higher
attention to observations having prediction error. Then, we apply the next base learning algorithm.
Step 3: Iterate Step 2 till the limit of base learning algorithm is reached or higher accuracy is
achieved.
• Finally, it combines the outputs from weak learner and creates a strong learner which eventually
improves the prediction power of the model. Boosting pays higher focus on examples which are
misclassified or have higher errors by preceding weak rules.
Types of Boosting Algorithms
• Underlying engine used for boosting algorithms can be anything. It
can be decision stamp, margin-maximizing classification algorithm
etc. There are many boosting algorithms which use other types of
engine such as:
• AdaBoost (Adaptive Boosting)
• Gradient Tree Boosting
• GentleBoost
• LPBoost
• BrownBoost
• XGBoost
Gradient Boosting
• In gradient boosting, it trains many models sequentially. Each new
model gradually minimizes the loss function (y = ax + b + e, e needs
special attention as it is an error term) of the whole system
using Gradient Descent method. The learning procedure
consecutively fit new models to provide a more accurate estimate of
the response variable.
• The principle idea behind this algorithm is to construct new base
learners which can be maximally correlated with negative gradient of
the loss function, associated with the whole ensemble.
Gradient Boosting
• Type of Problem – You have a set of variables vectors x1 , x2 and x3. You need to predict y
which is a continuous variable.
• Steps of Gradient Boost algorithm
Step 1 : Assume mean is the prediction of all variables.
Step 2 : Calculate errors of each observation from the mean (latest prediction).
Step 3 : Find the variable that can split the errors perfectly and find the value for the split. This
is assumed to be the latest prediction.
Step 4 : Calculate errors of each observation from the mean of both the sides of split (latest
prediction).
Step 5 : Repeat the step 3 and 4 till the objective function maximizes/minimizes.
Step 6 : Take a weighted mean of all the classifiers to come up with the final model.
• We have excluded the mathematical formation of boosting algorithms from this article to keep
the article simple.
Example
• Assume, you are given a previous model M to improve on. Currently you observe that the model
has an accuracy of 80% (any metric). How do you go further about it?
• One simple way is to build an entirely different model using new set of input variables and trying
better ensemble learners. On the contrary, I have a much simpler way to suggest. It goes like
this:
Y = M(x) + error
• What if I am able to see that error is not a white noise but have same correlation with
outcome(Y) value. What if we can develop a model on this error term? Like,
error = G(x) + error2
Example
• Probably, you’ll see error rate will improve to a higher number, say
84%. Let’s take another step and regress against error2.
error2 = H(x) + error3
• Now we combine all these together :
Y = M(x) + G(x) + H(x) + error3
• This probably will have a accuracy of even more than 84%. What if I
can find an optimal weights for each of the three learners,
Y = alpha * M(x) + beta * G(x) + gamma * H(x) + error4
Example
• If we found good weights, we probably have made even a better
model. This is the underlying principle of a boosting learner.
• Boosting is generally done on weak learners, which do not have a
capacity to leave behind white noise.
• Boosting can lead to overfitting, so we need to stop at the right point.
XGBoosting (Extreme Gradient Boosting)
• Execution Speed: Generally, XGBoost is fast. Really fast when
compared to other implementations of gradient boosting.
• Model Performance: XGBoost dominates structured or tabular
datasets on classification and regression predictive modeling
problems.
• The evidence is that it is the go-to algorithm for competition winners
on the Kaggle competitive data science platform.
What Algorithm Does XGBoost Use?
• The XGBoost library implements the gradient boosting decision tree algorithm.
• This algorithm goes by lots of different names such as gradient boosting, multiple additive
regression trees, stochastic gradient boosting or gradient boosting machines.
• Boosting is an ensemble technique where new models are added to correct the errors made by
existing models. Models are added sequentially until no further improvements can be made. A
popular example is the AdaBoost algorithm that weights data points that are hard to predict.
• Gradient boosting is an approach where new models are created that predict the residuals or
errors of prior models and then added together to make the final prediction. It is called gradient
boosting because it uses a gradient descent algorithm to minimize the loss when adding new
models.
• This approach supports both regression and classification predictive modeling problems.
XGBoosting (Extreme Gradient Boosting)
• What is the difference between the R gbm (gradient boosting machine) and xgboost
(extreme gradient boosting)?
• Both xgboost and gbm follows the principle of gradient boosting. There are however, the
difference in modeling details. Specifically, xgboost used a more regularized model
formalization to control over-fitting, which gives it better performance.
• Objective Function : Training Loss + Regularization
• The regularization term controls the complexity of the model, which helps us to avoid overfitting.
This sounds a bit abstract, so let us consider the following problem in the following picture. You
are asked to fit visually a step function given the input data points on the upper left corner of the
image. Which solution among the three do you think is the best fit?
ERROR ANALYSIS
- Accuracy
- Prediction Threshold
- Precision
- Recall
- F1Score
- AUC
● Target: 0/1, -1/+1, True/False ..
● Prediction = f(inputs) = f(x): 0/1 or real number
● Threshold: f(x) > (thres) => 1, else => 0
● threshold(f(x)): 0/1
ACCURACY
● #Right/#Total
● p(“correct”): p(threshold(f(x)) = target)
CONFUSION MATRIX
(TP)
(TN)(FP)
(FN)
● Assumes equal cost for both kinds of errors – cost(b-type-
error) = cost (c-type-error)
● Is 99% accuracy good? – can be excellent, good, mediocre,
poor, terrible – depends on problem
● Is 10% accuracy bad? – information retrieval
● BaseRate = accuracy of predicting predominant class (on
most problems obtaining Base Rate accuracy is easy)
PREDICTION THRESHOLD
● An expensive robotic chicken crosses a very busy road a thousand times per day. An ML model
evaluates traffic patterns and predicts when this chicken can safely cross the street with an accuracy of
99.99%.
● A deadly, but curable, medical condition afflicts .01% of the population. An ML model uses
symptoms as features and predicts this affliction with an accuracy of 99.99%.
● In the game of roulette, a ball is dropped on a spinning wheel and eventually lands in one of 38
slots. Using visual features (the spin of the ball, the position of the wheel when the ball was
dropped, the height of the ball over the wheel), an ML model can predict the slot that the ball will
land in with an accuracy of 4%.
● A 99.99% accuracy value on a very busy road strongly suggests that the ML model is far better than chance. In some
settings, however, the cost of making even a small number of mistakes is still too high. 99.99% accuracy means that
the expensive chicken will need to be replaced, on average, every 10 days. (The chicken might also cause extensive
damage to cars that it hits.)
● Accuracy is a poor metric here. After all, even a "dumb" model that always predicts "not sick" would still be 99.99%
accurate. Mistakenly predicting "not sick" for a person who actually is sick could be deadly.
● This ML model is making predictions far better than chance; a random guess would be correct 1/38 of the time—
yielding an accuracy of 2.6%. Although the model's accuracy is "only" 4%, the benefits of success far outweigh the
disadvantages of failure.
In which of the following scenarios would a high accuracy value suggest that the ML model is doing a good job?
● Precision attempts to answer the following question:
What proportion of positive identifications was actually correct?
Precision is defined as follows:
Note: A model that produces no false positives has a precision of 1.0.
I.e The model when it predicts a tumor is malignant, it is correct 50% of time
PRECISION
● Recall attempts to answer the following question:
What proportion of actual positives was identified correctly?
Mathematically, recall is defined as follows:
Note: A model that produces no false negatives has a recall of 1.0.
RECALL
Our model has a recall of 0.11—in other words, it correctly identifies 11% of all malignant tumors.
Consider a classification model that separates email into two categories: "spam" or "not spam."
If you raise the classification threshold, what will happen to precision?
a) Probably increase.
b) Probably decrease.
c) Definitely decrease.
d) Definitely increase.
Consider a classification model that separates email into two categories: "spam" or "not spam."
If you raise the classification threshold, what will happen to recall?
a) Always decrease or stay the same.
b) Always increase.
c) Always stay constant.
Consider two models—A and B—that each evaluate the same dataset. Which one of the following
statements is true?
a) If Model A has better precision than model B, then model A is better.
b) If model A has better recall than model B, then model A is better.
c) If model A has better precision and better recall than model B, then model A is probably
better.
In general, a model that outperforms another model on both precision and
recall is likely the better model. Obviously, we'll need to make sure that
comparison is being done at a precision / recall point that is useful in
practice for this to be meaningful. For example, suppose our spam detection
model needs to have at least 90% precision to be useful and avoid
unnecessary false alarms. In this case, comparing one model at {20%
precision, 99% recall} to another at {15% precision, 98% recall} is not
particularly instructive, as neither model meets the 90% precision
requirement. But with that caveat in mind, this is a good way to think about
comparing models when using precision and recall.
F1 Score
Various metrics have been developed that rely on both precision and recall.
Harmonic average of precision and recall
● An ROC curve (receiver operating characteristic curve) is a graph showing the
performance of a classification model at all classification thresholds. This curve plots
two parameters:
○ True Positive Rate
○ False Positive Rate
● Sweep threshold and plot
○ TPR vs. FPR
○ Sensitivity vs. 1-Specificity
○ P(true|true) vs. P(true|false)
● AUC is scale-invariant. It measures how well predictions are ranked, rather than their
absolute values.
● AUC is classification-threshold-invariant. It measures the quality of the model's
predictions irrespective of what classification threshold is chosen.
ROC and AUC
● ROC Area:
○ 1.0: perfect prediction
○ 0.9: excellent prediction
○ 0.8: good prediction
○ 0.7: mediocre prediction
○ 0.6: poor prediction
○ 0.5: random prediction
○ <0.5: something wrong!
Properties of ROC
● Split dataset into two groups
○ Training set: used to train the classifier
○ Test set: used to estimate the error rate of the trained classifier
● The holdout method has two basic drawbacks
○ In problems where we have a sparse dataset we may not be able to afford the
“luxury” of setting aside a portion of the dataset for testing
○ Since it is a single train-and-test experiment, the holdout estimate of error
rate will be misleading if we happen to get an “unfortunate” split
● The limitations of the holdout can be overcome with a family of resampling
methods at the expense of higher computational cost
○ Cross Validation
■ Random Subsampling
■ K-Fold Cross-Validation
Validation Strategy
● Random Subsampling performs K data splits of the entire dataset
○ Each data split randomly selects a (fixed) number of examples without
replacement
○ For each data split we retrain the classifier from scratch with the training
examples and then estimate Ei with the test examples
● The true error estimate is obtained as the average of the separate estimates Ei
○ This estimate is significantly better than the holdout estimate
Random Sampling
● Create a K-fold partition of the the dataset n
○ For each of K experiments, use K-1 folds for training and a different fold
for testing g This procedure is illustrated in the following figure for K=4
● K-Fold Cross validation is similar to Random Subsampling
○ The advantage of K-Fold Cross validation is that all the examples in the
dataset are eventually used for both training and testing
● As before, the true error is estimated as the average error rate on test
examples
K-fold Cross Validation
Definition of Time Series: An ordered sequence of values of a variable at equally spaced time intervals.
2-fold use of time Series:
● Obtain an understanding of the underlying forces and structure that produced the observed data
● Fit a model and proceed to forecasting, monitoring or even feedback and feedforward control.
Time Series Analysis is used for many applications such as:
● Economic Forecasting
● Sales Forecasting
● Budgetary Analysis
● Stock Market Analysis
● Yield Projections
● Process and Quality Control
● Inventory Studies
● Workload Projections
● Utility Studies
● Census Analysis and many more
Time Series Methodologies
Time Series Models
- ARIMA Models
- Multivariate Models
- Holt Winters Exponential Smoothing
- We will just cover a overview
Stationary Data
- A common assumption in many time series techniques is that the data are stationary.
- A stationary process has the property that the mean, variance and autocorrelation structure do not
change over time. Stationarity can be defined in precise mathematical terms, but for our purpose we
mean a flat looking series, without trend, constant variance over time, a constant autocorrelation
structure over time and no periodic fluctuations (seasonality).
If the time series is not stationary, we can often transform it to stationarity with one of the following
techniques.
1. We can difference the data. That is, given the series Zt, we create the new series
2. Yi=Zi−Zi−1.
3. The differenced data will contain one less point than the original data. Although you can difference the
data more than once, one difference is usually sufficient.
4. If the data contain a trend, we can fit some type of curve to the data and then model the residuals from
that fit. Since the purpose of the fit is to simply remove long term trend, a simple fit, such as a straight
line, is typically used.
5. For non-constant variance, taking the logarithm or square root of the series may stabilize the variance.
For negative data, you can add a suitable constant to make all the data positive before applying the
transformation. This constant can then be subtracted from the model to obtain predicted (i.e., the fitted)
values and forecasts for future points.
Removing the linear trend and making the signal Stationary
ARIMA
- Autoregressive Integrated Moving Average Model, or ARIMA for short is a standard statistical
model for time series forecast and analysis.
- A standard notation is used of ARIMA(p,d,q) where the parameters are substituted with integer
values to quickly indicate the specific ARIMA model being used.
- The parameters of the ARIMA model are defined as follows:
- p: The number of lag observations included in the model, also called the lag order.
- d: The number of times that the raw observations are differenced, also called the degree of
differencing.
- q: The size of the moving average window, also called the order of moving average.
ARIMA Diagnostics
Two diagnostic plots can be used to help choose the p and q parameters of the ARMA or ARIMA. They are:
● Autocorrelation Function (ACF). The plot summarizes the correlation of an observation with lag values. The x-
axis shows the lag and the y-axis shows the correlation coefficient between -1 and 1 for negative and positive
correlation.
● Partial Autocorrelation Function (PACF). The plot summarizes the correlations for an observation with lag
values that is not accounted for by prior lagged observations.
Some useful patterns you may observe on these plots are:
● The model is AR if the ACF trails off after a lag and has a hard cut-off in the PACF after a lag. This lag is taken
as the value for p.
● The model is MA if the PACF trails off after a lag and has a hard cut-off in the ACF after the lag. This lag value is
taken as the value for q.
● The model is a mix of AR and MA if both the ACF and PACF trail off.
Handling Seasonality
- Seasonality is quite common in economic time series. It is less
common in engineering and scientific data.
- If seasonality is present, it must be incorporated into the time
series model. In this section, we discuss techniques for detecting
seasonality. We defer modeling of seasonality until later sections.
- Removing seasonality:
- A run sequence plot will often show seasonality.
- A seasonal subseries plot is a specialized technique for showing seasonality.
- Multiple box plots can be used as an alternative to the seasonal subseries plot to detect seasonality.
- The autocorrelation plot can help identify seasonality.
Acknowledgements
- https://github.com/avannaldas/Loan-Defaulter-Prediction-Machine-Learning/
- https://www.itl.nist.gov/div898/handbook/pmc/section4/pmc41.htm
- https://machinelearningmastery.com/gentle-introduction-box-jenkins-method-time-series-forecasting/
- ML course by Andew Ng
- https://cse.iitk.ac.in/users/piyush/courses/ml_autumn16/771A_lec21_slides.pdf
- https://ocw.mit.edu/courses/sloan-school-of-management/15-097-prediction-machine-learning-and-statistics-spring-2012/lecture-
notes/MIT15_097S12_lec10.pdf
- https://www.cs.toronto.edu/~hinton/csc2515/notes/lec11boo.htm
- http://www.ccs.neu.edu/home/vip/teach/MLcourse/4_boosting/slides/gradient_boosting.pdf
- http://blog.kaggle.com/2017/01/23/a-kaggle-master-explains-gradient-boosting/

Contenu connexe

Tendances

AI in Marketing
AI in MarketingAI in Marketing
AI in Marketingvganti
 
Netcomm Forum 2013: Seamless Omnichannel
Netcomm Forum 2013: Seamless OmnichannelNetcomm Forum 2013: Seamless Omnichannel
Netcomm Forum 2013: Seamless OmnichannelFederico Gasparotto
 
The growth of artificial intelligence in e commerce (1)@
The growth of artificial intelligence in e commerce (1)@The growth of artificial intelligence in e commerce (1)@
The growth of artificial intelligence in e commerce (1)@Andolasoft Inc
 
Keep control of your brand in a fashion omnichannel strategy exhausting Marke...
Keep control of your brand in a fashion omnichannel strategy exhausting Marke...Keep control of your brand in a fashion omnichannel strategy exhausting Marke...
Keep control of your brand in a fashion omnichannel strategy exhausting Marke...Federico Gasparotto
 
How artificial intelligence is transforming the e commerce industry
How artificial intelligence is transforming the e commerce industryHow artificial intelligence is transforming the e commerce industry
How artificial intelligence is transforming the e commerce industryCountants
 
B com 2013 | La strategia per un e-commerce di successo_Federico Gasparotto
B com 2013 | La strategia per un e-commerce di successo_Federico GasparottoB com 2013 | La strategia per un e-commerce di successo_Federico Gasparotto
B com 2013 | La strategia per un e-commerce di successo_Federico GasparottoB com Expo | GL events Italia
 
Surprising failure factors when implementing eCommerce and Omnichannel eBusiness
Surprising failure factors when implementing eCommerce and Omnichannel eBusinessSurprising failure factors when implementing eCommerce and Omnichannel eBusiness
Surprising failure factors when implementing eCommerce and Omnichannel eBusinessDivante
 
Top 10 retail tech trends 2017
Top 10 retail tech trends   2017Top 10 retail tech trends   2017
Top 10 retail tech trends 2017ALTEN Calsoft Labs
 
AI and CX in 2020 and Beyond
AI and CX in 2020 and BeyondAI and CX in 2020 and Beyond
AI and CX in 2020 and BeyondSogolytics
 
2016 IBM Retail Industry Solutions Guide
2016 IBM Retail Industry Solutions Guide2016 IBM Retail Industry Solutions Guide
2016 IBM Retail Industry Solutions GuideTero Angeria
 
Digital Marketing Strategies to catch the Omin-Channel customer
Digital Marketing Strategies to catch the Omin-Channel customerDigital Marketing Strategies to catch the Omin-Channel customer
Digital Marketing Strategies to catch the Omin-Channel customerFederico Gasparotto
 
Webinar: How to Scale AI in the world of eCommerce
Webinar: How to Scale AI in the world of eCommerceWebinar: How to Scale AI in the world of eCommerce
Webinar: How to Scale AI in the world of eCommerceSakshi Singh
 
The Modern Industrial Distributor - 2020 and Beyond
The Modern Industrial Distributor - 2020 and BeyondThe Modern Industrial Distributor - 2020 and Beyond
The Modern Industrial Distributor - 2020 and BeyondAndrew Johnson
 
The Artificial Intelligence Revolution is Here
The Artificial Intelligence Revolution is HereThe Artificial Intelligence Revolution is Here
The Artificial Intelligence Revolution is HereNational Retail Federation
 
Smart Retailing using IOT
Smart Retailing using IOTSmart Retailing using IOT
Smart Retailing using IOTIRJET Journal
 
IoT and the Retail Marketplace: Bringing IoT, Wearables, and Smart Devices to...
IoT and the Retail Marketplace: Bringing IoT, Wearables, and Smart Devices to...IoT and the Retail Marketplace: Bringing IoT, Wearables, and Smart Devices to...
IoT and the Retail Marketplace: Bringing IoT, Wearables, and Smart Devices to...WithTheBest
 
Powering Digital Retail Transformation
Powering Digital Retail TransformationPowering Digital Retail Transformation
Powering Digital Retail TransformationOliver Guy
 
Strategic thought to exhaust the fashion e-commerce potential in 2014
Strategic thought to exhaust the fashion e-commerce potential in 2014Strategic thought to exhaust the fashion e-commerce potential in 2014
Strategic thought to exhaust the fashion e-commerce potential in 2014Federico Gasparotto
 
Digital transformation trends 2019: Retail
Digital transformation trends 2019: RetailDigital transformation trends 2019: Retail
Digital transformation trends 2019: RetailSerhii Uspenskyi
 
Data Driven Marketing: la nuova frontiera del Digital Marketing sempre più pr...
Data Driven Marketing: la nuova frontiera del Digital Marketing sempre più pr...Data Driven Marketing: la nuova frontiera del Digital Marketing sempre più pr...
Data Driven Marketing: la nuova frontiera del Digital Marketing sempre più pr...Accenture Italia
 

Tendances (20)

AI in Marketing
AI in MarketingAI in Marketing
AI in Marketing
 
Netcomm Forum 2013: Seamless Omnichannel
Netcomm Forum 2013: Seamless OmnichannelNetcomm Forum 2013: Seamless Omnichannel
Netcomm Forum 2013: Seamless Omnichannel
 
The growth of artificial intelligence in e commerce (1)@
The growth of artificial intelligence in e commerce (1)@The growth of artificial intelligence in e commerce (1)@
The growth of artificial intelligence in e commerce (1)@
 
Keep control of your brand in a fashion omnichannel strategy exhausting Marke...
Keep control of your brand in a fashion omnichannel strategy exhausting Marke...Keep control of your brand in a fashion omnichannel strategy exhausting Marke...
Keep control of your brand in a fashion omnichannel strategy exhausting Marke...
 
How artificial intelligence is transforming the e commerce industry
How artificial intelligence is transforming the e commerce industryHow artificial intelligence is transforming the e commerce industry
How artificial intelligence is transforming the e commerce industry
 
B com 2013 | La strategia per un e-commerce di successo_Federico Gasparotto
B com 2013 | La strategia per un e-commerce di successo_Federico GasparottoB com 2013 | La strategia per un e-commerce di successo_Federico Gasparotto
B com 2013 | La strategia per un e-commerce di successo_Federico Gasparotto
 
Surprising failure factors when implementing eCommerce and Omnichannel eBusiness
Surprising failure factors when implementing eCommerce and Omnichannel eBusinessSurprising failure factors when implementing eCommerce and Omnichannel eBusiness
Surprising failure factors when implementing eCommerce and Omnichannel eBusiness
 
Top 10 retail tech trends 2017
Top 10 retail tech trends   2017Top 10 retail tech trends   2017
Top 10 retail tech trends 2017
 
AI and CX in 2020 and Beyond
AI and CX in 2020 and BeyondAI and CX in 2020 and Beyond
AI and CX in 2020 and Beyond
 
2016 IBM Retail Industry Solutions Guide
2016 IBM Retail Industry Solutions Guide2016 IBM Retail Industry Solutions Guide
2016 IBM Retail Industry Solutions Guide
 
Digital Marketing Strategies to catch the Omin-Channel customer
Digital Marketing Strategies to catch the Omin-Channel customerDigital Marketing Strategies to catch the Omin-Channel customer
Digital Marketing Strategies to catch the Omin-Channel customer
 
Webinar: How to Scale AI in the world of eCommerce
Webinar: How to Scale AI in the world of eCommerceWebinar: How to Scale AI in the world of eCommerce
Webinar: How to Scale AI in the world of eCommerce
 
The Modern Industrial Distributor - 2020 and Beyond
The Modern Industrial Distributor - 2020 and BeyondThe Modern Industrial Distributor - 2020 and Beyond
The Modern Industrial Distributor - 2020 and Beyond
 
The Artificial Intelligence Revolution is Here
The Artificial Intelligence Revolution is HereThe Artificial Intelligence Revolution is Here
The Artificial Intelligence Revolution is Here
 
Smart Retailing using IOT
Smart Retailing using IOTSmart Retailing using IOT
Smart Retailing using IOT
 
IoT and the Retail Marketplace: Bringing IoT, Wearables, and Smart Devices to...
IoT and the Retail Marketplace: Bringing IoT, Wearables, and Smart Devices to...IoT and the Retail Marketplace: Bringing IoT, Wearables, and Smart Devices to...
IoT and the Retail Marketplace: Bringing IoT, Wearables, and Smart Devices to...
 
Powering Digital Retail Transformation
Powering Digital Retail TransformationPowering Digital Retail Transformation
Powering Digital Retail Transformation
 
Strategic thought to exhaust the fashion e-commerce potential in 2014
Strategic thought to exhaust the fashion e-commerce potential in 2014Strategic thought to exhaust the fashion e-commerce potential in 2014
Strategic thought to exhaust the fashion e-commerce potential in 2014
 
Digital transformation trends 2019: Retail
Digital transformation trends 2019: RetailDigital transformation trends 2019: Retail
Digital transformation trends 2019: Retail
 
Data Driven Marketing: la nuova frontiera del Digital Marketing sempre più pr...
Data Driven Marketing: la nuova frontiera del Digital Marketing sempre più pr...Data Driven Marketing: la nuova frontiera del Digital Marketing sempre più pr...
Data Driven Marketing: la nuova frontiera del Digital Marketing sempre più pr...
 

Similaire à Machine Learning in the Financial Industry

Pricing like a data scientist
Pricing like a data scientistPricing like a data scientist
Pricing like a data scientistMatthew Evans
 
Barga Data Science lecture 10
Barga Data Science lecture 10Barga Data Science lecture 10
Barga Data Science lecture 10Roger Barga
 
Choosing a Machine Learning technique to solve your need
Choosing a Machine Learning technique to solve your needChoosing a Machine Learning technique to solve your need
Choosing a Machine Learning technique to solve your needGibDevs
 
Introduction to machine learning
Introduction to machine learningIntroduction to machine learning
Introduction to machine learningSanghamitra Deb
 
Mini datathon - Bengaluru
Mini datathon - BengaluruMini datathon - Bengaluru
Mini datathon - BengaluruKunal Jain
 
MACHINE LEARNING YEAR DL SECOND PART.pptx
MACHINE LEARNING YEAR DL SECOND PART.pptxMACHINE LEARNING YEAR DL SECOND PART.pptx
MACHINE LEARNING YEAR DL SECOND PART.pptxNAGARAJANS68
 
Informs presentation new ppt
Informs presentation new pptInforms presentation new ppt
Informs presentation new pptSalford Systems
 
KNOLX_Data_preprocessing
KNOLX_Data_preprocessingKNOLX_Data_preprocessing
KNOLX_Data_preprocessingKnoldus Inc.
 
Machine learning it is time...
Machine learning it is time...Machine learning it is time...
Machine learning it is time...Sandip Chatterjee
 
Modelling and evaluation
Modelling and evaluationModelling and evaluation
Modelling and evaluationeShikshak
 
Types of Machine Learning- Tanvir Siddike Moin
Types of Machine Learning- Tanvir Siddike MoinTypes of Machine Learning- Tanvir Siddike Moin
Types of Machine Learning- Tanvir Siddike MoinTanvir Moin
 
It's Machine Learning Basics -- For You!
It's Machine Learning Basics -- For You!It's Machine Learning Basics -- For You!
It's Machine Learning Basics -- For You!To Sum It Up
 
Machine Learning and Deep Learning 4 dummies
Machine Learning and Deep Learning 4 dummies Machine Learning and Deep Learning 4 dummies
Machine Learning and Deep Learning 4 dummies Dori Waldman
 
Machine learning4dummies
Machine learning4dummiesMachine learning4dummies
Machine learning4dummiesMichael Winer
 
Agile analytics : An exploratory study of technical complexity management
Agile analytics : An exploratory study of technical complexity managementAgile analytics : An exploratory study of technical complexity management
Agile analytics : An exploratory study of technical complexity managementAgnirudra Sikdar
 
Identifying and classifying unknown Network Disruption
Identifying and classifying unknown Network DisruptionIdentifying and classifying unknown Network Disruption
Identifying and classifying unknown Network Disruptionjagan477830
 

Similaire à Machine Learning in the Financial Industry (20)

Intro to ml_2021
Intro to ml_2021Intro to ml_2021
Intro to ml_2021
 
Pricing like a data scientist
Pricing like a data scientistPricing like a data scientist
Pricing like a data scientist
 
Barga Data Science lecture 10
Barga Data Science lecture 10Barga Data Science lecture 10
Barga Data Science lecture 10
 
Choosing a Machine Learning technique to solve your need
Choosing a Machine Learning technique to solve your needChoosing a Machine Learning technique to solve your need
Choosing a Machine Learning technique to solve your need
 
Introduction to machine learning
Introduction to machine learningIntroduction to machine learning
Introduction to machine learning
 
Mini datathon - Bengaluru
Mini datathon - BengaluruMini datathon - Bengaluru
Mini datathon - Bengaluru
 
MACHINE LEARNING YEAR DL SECOND PART.pptx
MACHINE LEARNING YEAR DL SECOND PART.pptxMACHINE LEARNING YEAR DL SECOND PART.pptx
MACHINE LEARNING YEAR DL SECOND PART.pptx
 
Mini datathon
Mini datathonMini datathon
Mini datathon
 
Informs presentation new ppt
Informs presentation new pptInforms presentation new ppt
Informs presentation new ppt
 
KNOLX_Data_preprocessing
KNOLX_Data_preprocessingKNOLX_Data_preprocessing
KNOLX_Data_preprocessing
 
Machine learning it is time...
Machine learning it is time...Machine learning it is time...
Machine learning it is time...
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
 
Modelling and evaluation
Modelling and evaluationModelling and evaluation
Modelling and evaluation
 
Types of Machine Learning- Tanvir Siddike Moin
Types of Machine Learning- Tanvir Siddike MoinTypes of Machine Learning- Tanvir Siddike Moin
Types of Machine Learning- Tanvir Siddike Moin
 
It's Machine Learning Basics -- For You!
It's Machine Learning Basics -- For You!It's Machine Learning Basics -- For You!
It's Machine Learning Basics -- For You!
 
Machine Learning and Deep Learning 4 dummies
Machine Learning and Deep Learning 4 dummies Machine Learning and Deep Learning 4 dummies
Machine Learning and Deep Learning 4 dummies
 
Machine learning4dummies
Machine learning4dummiesMachine learning4dummies
Machine learning4dummies
 
Agile analytics : An exploratory study of technical complexity management
Agile analytics : An exploratory study of technical complexity managementAgile analytics : An exploratory study of technical complexity management
Agile analytics : An exploratory study of technical complexity management
 
Analytics
AnalyticsAnalytics
Analytics
 
Identifying and classifying unknown Network Disruption
Identifying and classifying unknown Network DisruptionIdentifying and classifying unknown Network Disruption
Identifying and classifying unknown Network Disruption
 

Plus de Subrat Panda, PhD

Role of technology in agriculture courses by srmist &amp; the hindu
Role of technology in agriculture courses by srmist &amp; the hinduRole of technology in agriculture courses by srmist &amp; the hindu
Role of technology in agriculture courses by srmist &amp; the hinduSubrat Panda, PhD
 
An introduction to reinforcement learning
An introduction to reinforcement learningAn introduction to reinforcement learning
An introduction to reinforcement learningSubrat Panda, PhD
 

Plus de Subrat Panda, PhD (7)

Role of technology in agriculture courses by srmist &amp; the hindu
Role of technology in agriculture courses by srmist &amp; the hinduRole of technology in agriculture courses by srmist &amp; the hindu
Role of technology in agriculture courses by srmist &amp; the hindu
 
Journey so far
Journey so farJourney so far
Journey so far
 
AI in security
AI in securityAI in security
AI in security
 
An introduction to reinforcement learning
An introduction to reinforcement learningAn introduction to reinforcement learning
An introduction to reinforcement learning
 
ML Workshop at SACON 2018
ML Workshop at SACON 2018ML Workshop at SACON 2018
ML Workshop at SACON 2018
 
AI and Deep Learning
AI and Deep Learning AI and Deep Learning
AI and Deep Learning
 
AI and The future of work
AI and The future of work AI and The future of work
AI and The future of work
 

Dernier

Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesZilliz
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfSeasiaInfotech2
 

Dernier (20)

Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector Databases
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdf
 

Machine Learning in the Financial Industry

  • 1. Machine Learning in the FINANCIAL Industry A birD’s eye View Subrat Panda and Biswa G Singh
  • 2. Brief Introduction - Subrat ● BTech ( 2002) , PhD (2009) – CSE, IIT Kharagpur ● Synopsys (EDA), IBM (CPU), NVIDIA (GPU), Taro (Full Stack Engineer), Capillary (Principal Architect - AI) ● Applying AI to Retail ● Co-Founded IDLI (for social good) with Prof. Amit Sethi (IIT Bombay), Jacob Minz (Synopsys) and Biswa Gourav Singh (AMD) ● https://www.facebook.com/groups/idliai/ ● Linked In - https://www.linkedin.com/in/subratpanda/ ● Facebook - https://www.facebook.com/subratpanda ● Twitter - @subratpanda
  • 3. Brief Introduction - Biswa ● BTech ( NIST - 2005) , MS (2009) – Clemson University ● Synopsys (EDA), IBM (CPU), ARM, AMD, Capillary (Lead ML Engineer - Data Sciences) ● Applying AI to Retail ● Co-Founded IDLI (for social good) with Prof. Amit Sethi (IIT Bombay), Jacob Minz (Synopsys) and Subrat Panda ● https://www.facebook.com/groups/idliai/ ● Linked In - https://www.linkedin.com/in/biswagsingh/ ● Facebook - https://www.facebook.com/biswa.singh ● Kaggle Expert, Winner of AV (Click stream prediction)
  • 4.
  • 5. Preface • Artificial intelligence is already part of our everyday lives.
  • 6. Application of AI, Machine Learning and Deep Learning
  • 7.
  • 8. Machine Learning Classical Definition ▪ Arthur Samuel (1959): "computer’s ability to learn without being explicitly programmed.“ ▪ Tom M Mitchel (1998): "A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P if its performance at tasks in T, as measured by P, improves with experience E.“ ▪ Optimize a performance criterion using example data or past experience.
  • 9. Types of Machine Learning Algorithms ▪ Supervised Learning: Input data with labeled responses ▪ Regression : Given a picture of a person, we have to predict their age on the basis of the given picture ▪ Classification : Given a patient with a tumor, we have to predict whether the tumor is malignant or benign. IRIS DATASET SPECIES CLASSIFICATION TEXT CLASSIFICATIONIMAGE CLASSIFICATION Linear Regression Non-Linear Regression
  • 10. ▪ Unsupervised Learning: Input data without labeled responses. ▪ Clustering: Take a collection of 1,000,000 different genes, and find a way to automatically group these genes into groups that are somehow similar or related by different variables, such as lifespan, location, roles, and so on. ▪ Non Clustering: Exploratory data analysis (PCA, Auto-encoders) Types of Machine Learning Algorithms Customer SegmentationMNIST Digit Segmentation
  • 12.
  • 13. Pop Quiz ▪ Predicting housing prices based on input parameters like house size, number of rooms, location of house etc. falls under which category of machine learning problem: ▪ A) Regression ▪ B) Classification ▪ C) Clustering ▪ D) None ▪ Automatically segmenting your customers according to the customer information falls under which category of machine learning. ▪ A) Regression ▪ B) Classification ▪ C) Clustering ▪ D) None
  • 14. Intelligent Loan Application system Credit: Coursera U Wash, Machine learning Course
  • 15. What makes a loan Risky? Credit: Coursera U Wash, Machine learning Course
  • 17. 1) Review credit application with an expert 2) Learn algorithms to replicate expert judgement 3) Use of traditional data 4) Additional Insight: a) Give Applicant a questionnaire b) Add the questionnaire data to predict outcome c) Long term effort , as risk outcomes needs to be observed d) Mining Voice data Map Expert Judgement to improve
  • 18. Why Machines? - For banks NPAs are a big mess (whether big or small) - NPAs happen because of a lot of reasons: - Human Error in Judgement - Lack of analysis of all available data points - Long term data and multiple data sources not considered together - Inherent biases in some people - Big view analytics of data missing - Incomplete Risk analysis - Market dynamics and correlation change over time - Machine Learning Algorithms can model most if not all of the conditions - Can assist Risk Analysts - Augmented Intelligence - Multiple models can be used and voting between them so that people responsible don’t get blind-sided
  • 19. Techniques we will discuss - Logistic Regression - Discussion of Concepts - Demo - Boosting - Discussion of Concepts - Demo - Time Series Analysis ( Discussion )
  • 21. Introduction ▪ It is an approach to the classification problem. ▪ The output vector is either 1 or 0 instead of a continuous range of values ▪ y ∈ {0,1} ▪ Binary classification problem (two values) ▪ Linear regression wont work in the classification problem IMAGE CLASSIFICATION
  • 22. Logistic Regression: Hypothesis ▪ The hypothesis should satisfy ▪ 0 ≤ h(x) ≤ 1 ▪ the "Sigmoid Function," also called the "Logistic Function": ▪ We want to restrict the range to 0 and 1. This is accomplished by plugging θTx into the Logistic Function
  • 23. Decision Boundary In order to get our discrete 0 or 1 classification, we can translate the output of the hypothesis function as follows: hθ(x)≥0.5→y=1 hθ(x)<0.5→y=0
  • 24. Cost Function ▪ Can not use squared cost function as Logistic Function will cause the output to be wavy, causing many local optima.
  • 25. Cost Function ▪ Logistic regression Cost function
  • 26. We will have to maximize the log likelihood Maximizing log likelihood Similar to linear regression, we have to use gradient descent. Now our updates will look like below:
  • 27. ▪ Bias is the algorithm's tendency to consistently learn the wrong thing by not taking into account all the information in the data ▪ Variance is the algorithm's tendency to learn random things irrespective of the real signal by fitting highly flexible models that follow the error/noise in the data too closely Bias/Variance
  • 28. • Generalization ability gives an algorithm’s ability to give accurate prediction new, previous unseen data • Models that are too complex for the amount of training data available are said to overfit and are not likely to generalize well to new examples • High variance can cause an algorithm to model the random noise in the training data, rather than the intended outputs (overfitting). • Models that are too simple, that do not even do well on training data, are said to underfit and also not likely to generalize well. • High bias can cause an algorithm to miss the relevant relations between features and target outputs (underfitting). Problem of high Bias/Variance
  • 30. Bias/Variance is a Way to Understand Overfitting and Underfitting Error/Loss on training set Dtrain Error/Loss on an unseen test set Dtest high error 30 complex classifiersimple classifier “too simple” “too complex”
  • 31. Definitions • Overfitting: too much reliance on the training data • Underfitting: a failure to learn the relationships in the training data • High Variance: model changes significantly based on training data • High Bias: assumptions about model lead to ignoring training data • Overfitting and underfitting cause poor generalization on the test set • A validation set for model tuning can prevent under and overfitting
  • 32. ▪ Underfitting: ▪ Easier to resolve ▪ Try different machine learning models ▪ Try stronger models with higher capacity (hyperparameter tuning) ▪ Try more features ▪ Overfitting ▪ Use a resampling technique like K-fold cross validation ▪ Improve the feature quality or remove some features ▪ Training with more data ▪ Early stopping ▪ Regularization ▪ Ensembling Ways to Deal with Overfitting and Underfitting Early Stopping
  • 33. • Regularization penalizes the coefficients. In machine learning, it actually penalizes the weight matrices of the nodes. • L1 and L2 are the most common types of regularization. • These update the general cost function by adding another term known as the regularization term. Regularization Cost function = Loss (say, binary cross entropy) + Regularization term
  • 34. ▪ In L2, we have: ▪ Here, lambda is the regularization parameter. It is the hyperparameter whose value is optimized for better results. L2 regularization is also known as weight decay as it forces the weights to decay towards zero (but not exactly zero). ▪ In L1, we have: ▪ In this, we penalize the absolute value of the weights. Unlike L2, the weights may be reduced to zero here. L1 and L2 Regularization
  • 36. Decision Tree ▪ Decision Tree is the supervised learning algorithm. ▪ We split the population or sample into two or more homogeneous sets (or sub-populations) based on most significant differentiator in input variables. 1.Root Node: It represents entire population or sample and this further gets divided into two or more homogeneous sets. 2.Splitting: It is a process of dividing a node into two or more sub-nodes. 3.Decision Node: When a sub- node splits into further sub-nodes, then it is called decision node. 4.Leaf/ Terminal Node: Nodes do not split is called Leaf or Terminal node.
  • 38. Methods of splitting: Information gain which node can be described easily? ▪ Information theory is a measure to define this degree of disorganization in a system known as Entropy. Here p and q is probability of success and failure respectively in that node.
  • 39. Other Tree based methods ▪ Trade-off management of bias-variance errors. ▪ Bagging is a simple ensembling technique in which we build many independent predictors/models/learners and combine them using some model averaging techniques. ▪ Ensemble methods involve group of predictive models to achieve a better accuracy and model stability. ▪ Random Forest: Multiple Trees instead of single tree. It’s a bagging method ▪ To classify a new object based on attributes, each tree gives a classification and we say the tree “votes” for that class.
  • 40. ▪ Gradient Boosting is a tree ensemble technique that creates a strong classifier from a number of weak classifiers. ▪ It works in the technique of weak learners and the additive model. ▪ Boosting is an ensemble technique in which the predictors are not made independently, but sequentially. Other Tree based methods
  • 42. DEFINITION • The term ‘Boosting’ refers to a family of algorithms which converts weak learner to strong learners. • Let’s understand this definition in detail by solving a problem of spam email identification: • How would you classify an email as SPAM or not? Like everyone else, our initial approach would be to identify ‘spam’ and ‘not spam’ emails using following criteria. If: • Email has only one image file (promotional image), It’s a SPAM • Email has only link(s), It’s a SPAM • Email body consist of sentence like “You won a prize money of $ xxxxxx”, It’s a SPAM • Email from our official domain “metu.edu.tr” , Not a SPAM • Email from known source, Not a SPAM • Above, we’ve defined multiple rules to classify an email into ‘spam’ or ‘not spam’. But, do you think these rules individually are strong enough to successfully classify an email? No. • Individually, these rules are not powerful enough to classify an email into ‘spam’ or ‘not spam’. Therefore, these rules are called as weak learner.
  • 43. DEFINITION • To convert weak learner to strong learner, we’ll combine the prediction of each weak learner using methods like: • Using average/ weighted average • Considering prediction has higher vote • For example: Above, we have defined 5 weak learners. Out of these 5, 3 are voted as ‘SPAM’ and 2 are voted as ‘Not a SPAM’. In this case, by default, we’ll consider an email as SPAM because we have higher(3) vote for ‘SPAM’.
  • 44. How Boosting Algorithms works? • To find weak rule, we apply base learning algorithms with a different distribution. Each time base learning algorithm is applied, it generates a new weak prediction rule. This is an iterative process. After many iterations, the boosting algorithm combines these weak rules into a single strong prediction rule. • For choosing the right distribution, here are the following steps: Step 1: The base learner takes all the distributions and assign equal weight or attention to each observation. Step 2: If there is any prediction error caused by first base learning algorithm, then we pay higher attention to observations having prediction error. Then, we apply the next base learning algorithm. Step 3: Iterate Step 2 till the limit of base learning algorithm is reached or higher accuracy is achieved. • Finally, it combines the outputs from weak learner and creates a strong learner which eventually improves the prediction power of the model. Boosting pays higher focus on examples which are misclassified or have higher errors by preceding weak rules.
  • 45. Types of Boosting Algorithms • Underlying engine used for boosting algorithms can be anything. It can be decision stamp, margin-maximizing classification algorithm etc. There are many boosting algorithms which use other types of engine such as: • AdaBoost (Adaptive Boosting) • Gradient Tree Boosting • GentleBoost • LPBoost • BrownBoost • XGBoost
  • 46. Gradient Boosting • In gradient boosting, it trains many models sequentially. Each new model gradually minimizes the loss function (y = ax + b + e, e needs special attention as it is an error term) of the whole system using Gradient Descent method. The learning procedure consecutively fit new models to provide a more accurate estimate of the response variable. • The principle idea behind this algorithm is to construct new base learners which can be maximally correlated with negative gradient of the loss function, associated with the whole ensemble.
  • 47. Gradient Boosting • Type of Problem – You have a set of variables vectors x1 , x2 and x3. You need to predict y which is a continuous variable. • Steps of Gradient Boost algorithm Step 1 : Assume mean is the prediction of all variables. Step 2 : Calculate errors of each observation from the mean (latest prediction). Step 3 : Find the variable that can split the errors perfectly and find the value for the split. This is assumed to be the latest prediction. Step 4 : Calculate errors of each observation from the mean of both the sides of split (latest prediction). Step 5 : Repeat the step 3 and 4 till the objective function maximizes/minimizes. Step 6 : Take a weighted mean of all the classifiers to come up with the final model. • We have excluded the mathematical formation of boosting algorithms from this article to keep the article simple.
  • 48. Example • Assume, you are given a previous model M to improve on. Currently you observe that the model has an accuracy of 80% (any metric). How do you go further about it? • One simple way is to build an entirely different model using new set of input variables and trying better ensemble learners. On the contrary, I have a much simpler way to suggest. It goes like this: Y = M(x) + error • What if I am able to see that error is not a white noise but have same correlation with outcome(Y) value. What if we can develop a model on this error term? Like, error = G(x) + error2
  • 49. Example • Probably, you’ll see error rate will improve to a higher number, say 84%. Let’s take another step and regress against error2. error2 = H(x) + error3 • Now we combine all these together : Y = M(x) + G(x) + H(x) + error3 • This probably will have a accuracy of even more than 84%. What if I can find an optimal weights for each of the three learners, Y = alpha * M(x) + beta * G(x) + gamma * H(x) + error4
  • 50. Example • If we found good weights, we probably have made even a better model. This is the underlying principle of a boosting learner. • Boosting is generally done on weak learners, which do not have a capacity to leave behind white noise. • Boosting can lead to overfitting, so we need to stop at the right point.
  • 51. XGBoosting (Extreme Gradient Boosting) • Execution Speed: Generally, XGBoost is fast. Really fast when compared to other implementations of gradient boosting. • Model Performance: XGBoost dominates structured or tabular datasets on classification and regression predictive modeling problems. • The evidence is that it is the go-to algorithm for competition winners on the Kaggle competitive data science platform.
  • 52. What Algorithm Does XGBoost Use? • The XGBoost library implements the gradient boosting decision tree algorithm. • This algorithm goes by lots of different names such as gradient boosting, multiple additive regression trees, stochastic gradient boosting or gradient boosting machines. • Boosting is an ensemble technique where new models are added to correct the errors made by existing models. Models are added sequentially until no further improvements can be made. A popular example is the AdaBoost algorithm that weights data points that are hard to predict. • Gradient boosting is an approach where new models are created that predict the residuals or errors of prior models and then added together to make the final prediction. It is called gradient boosting because it uses a gradient descent algorithm to minimize the loss when adding new models. • This approach supports both regression and classification predictive modeling problems.
  • 53. XGBoosting (Extreme Gradient Boosting) • What is the difference between the R gbm (gradient boosting machine) and xgboost (extreme gradient boosting)? • Both xgboost and gbm follows the principle of gradient boosting. There are however, the difference in modeling details. Specifically, xgboost used a more regularized model formalization to control over-fitting, which gives it better performance. • Objective Function : Training Loss + Regularization • The regularization term controls the complexity of the model, which helps us to avoid overfitting. This sounds a bit abstract, so let us consider the following problem in the following picture. You are asked to fit visually a step function given the input data points on the upper left corner of the image. Which solution among the three do you think is the best fit?
  • 54.
  • 55. ERROR ANALYSIS - Accuracy - Prediction Threshold - Precision - Recall - F1Score - AUC
  • 56. ● Target: 0/1, -1/+1, True/False .. ● Prediction = f(inputs) = f(x): 0/1 or real number ● Threshold: f(x) > (thres) => 1, else => 0 ● threshold(f(x)): 0/1 ACCURACY ● #Right/#Total ● p(“correct”): p(threshold(f(x)) = target)
  • 58. ● Assumes equal cost for both kinds of errors – cost(b-type- error) = cost (c-type-error) ● Is 99% accuracy good? – can be excellent, good, mediocre, poor, terrible – depends on problem ● Is 10% accuracy bad? – information retrieval ● BaseRate = accuracy of predicting predominant class (on most problems obtaining Base Rate accuracy is easy) PREDICTION THRESHOLD
  • 59. ● An expensive robotic chicken crosses a very busy road a thousand times per day. An ML model evaluates traffic patterns and predicts when this chicken can safely cross the street with an accuracy of 99.99%. ● A deadly, but curable, medical condition afflicts .01% of the population. An ML model uses symptoms as features and predicts this affliction with an accuracy of 99.99%. ● In the game of roulette, a ball is dropped on a spinning wheel and eventually lands in one of 38 slots. Using visual features (the spin of the ball, the position of the wheel when the ball was dropped, the height of the ball over the wheel), an ML model can predict the slot that the ball will land in with an accuracy of 4%. ● A 99.99% accuracy value on a very busy road strongly suggests that the ML model is far better than chance. In some settings, however, the cost of making even a small number of mistakes is still too high. 99.99% accuracy means that the expensive chicken will need to be replaced, on average, every 10 days. (The chicken might also cause extensive damage to cars that it hits.) ● Accuracy is a poor metric here. After all, even a "dumb" model that always predicts "not sick" would still be 99.99% accurate. Mistakenly predicting "not sick" for a person who actually is sick could be deadly. ● This ML model is making predictions far better than chance; a random guess would be correct 1/38 of the time— yielding an accuracy of 2.6%. Although the model's accuracy is "only" 4%, the benefits of success far outweigh the disadvantages of failure. In which of the following scenarios would a high accuracy value suggest that the ML model is doing a good job?
  • 60. ● Precision attempts to answer the following question: What proportion of positive identifications was actually correct? Precision is defined as follows: Note: A model that produces no false positives has a precision of 1.0. I.e The model when it predicts a tumor is malignant, it is correct 50% of time PRECISION
  • 61. ● Recall attempts to answer the following question: What proportion of actual positives was identified correctly? Mathematically, recall is defined as follows: Note: A model that produces no false negatives has a recall of 1.0. RECALL Our model has a recall of 0.11—in other words, it correctly identifies 11% of all malignant tumors.
  • 62. Consider a classification model that separates email into two categories: "spam" or "not spam." If you raise the classification threshold, what will happen to precision? a) Probably increase. b) Probably decrease. c) Definitely decrease. d) Definitely increase. Consider a classification model that separates email into two categories: "spam" or "not spam." If you raise the classification threshold, what will happen to recall? a) Always decrease or stay the same. b) Always increase. c) Always stay constant. Consider two models—A and B—that each evaluate the same dataset. Which one of the following statements is true? a) If Model A has better precision than model B, then model A is better. b) If model A has better recall than model B, then model A is better. c) If model A has better precision and better recall than model B, then model A is probably better.
  • 63. In general, a model that outperforms another model on both precision and recall is likely the better model. Obviously, we'll need to make sure that comparison is being done at a precision / recall point that is useful in practice for this to be meaningful. For example, suppose our spam detection model needs to have at least 90% precision to be useful and avoid unnecessary false alarms. In this case, comparing one model at {20% precision, 99% recall} to another at {15% precision, 98% recall} is not particularly instructive, as neither model meets the 90% precision requirement. But with that caveat in mind, this is a good way to think about comparing models when using precision and recall.
  • 64. F1 Score Various metrics have been developed that rely on both precision and recall. Harmonic average of precision and recall
  • 65.
  • 66. ● An ROC curve (receiver operating characteristic curve) is a graph showing the performance of a classification model at all classification thresholds. This curve plots two parameters: ○ True Positive Rate ○ False Positive Rate ● Sweep threshold and plot ○ TPR vs. FPR ○ Sensitivity vs. 1-Specificity ○ P(true|true) vs. P(true|false) ● AUC is scale-invariant. It measures how well predictions are ranked, rather than their absolute values. ● AUC is classification-threshold-invariant. It measures the quality of the model's predictions irrespective of what classification threshold is chosen. ROC and AUC
  • 67.
  • 68. ● ROC Area: ○ 1.0: perfect prediction ○ 0.9: excellent prediction ○ 0.8: good prediction ○ 0.7: mediocre prediction ○ 0.6: poor prediction ○ 0.5: random prediction ○ <0.5: something wrong! Properties of ROC
  • 69. ● Split dataset into two groups ○ Training set: used to train the classifier ○ Test set: used to estimate the error rate of the trained classifier ● The holdout method has two basic drawbacks ○ In problems where we have a sparse dataset we may not be able to afford the “luxury” of setting aside a portion of the dataset for testing ○ Since it is a single train-and-test experiment, the holdout estimate of error rate will be misleading if we happen to get an “unfortunate” split ● The limitations of the holdout can be overcome with a family of resampling methods at the expense of higher computational cost ○ Cross Validation ■ Random Subsampling ■ K-Fold Cross-Validation Validation Strategy
  • 70. ● Random Subsampling performs K data splits of the entire dataset ○ Each data split randomly selects a (fixed) number of examples without replacement ○ For each data split we retrain the classifier from scratch with the training examples and then estimate Ei with the test examples ● The true error estimate is obtained as the average of the separate estimates Ei ○ This estimate is significantly better than the holdout estimate Random Sampling
  • 71. ● Create a K-fold partition of the the dataset n ○ For each of K experiments, use K-1 folds for training and a different fold for testing g This procedure is illustrated in the following figure for K=4 ● K-Fold Cross validation is similar to Random Subsampling ○ The advantage of K-Fold Cross validation is that all the examples in the dataset are eventually used for both training and testing ● As before, the true error is estimated as the average error rate on test examples K-fold Cross Validation
  • 72. Definition of Time Series: An ordered sequence of values of a variable at equally spaced time intervals. 2-fold use of time Series: ● Obtain an understanding of the underlying forces and structure that produced the observed data ● Fit a model and proceed to forecasting, monitoring or even feedback and feedforward control. Time Series Analysis is used for many applications such as: ● Economic Forecasting ● Sales Forecasting ● Budgetary Analysis ● Stock Market Analysis ● Yield Projections ● Process and Quality Control ● Inventory Studies ● Workload Projections ● Utility Studies ● Census Analysis and many more Time Series Methodologies
  • 73. Time Series Models - ARIMA Models - Multivariate Models - Holt Winters Exponential Smoothing - We will just cover a overview
  • 74. Stationary Data - A common assumption in many time series techniques is that the data are stationary. - A stationary process has the property that the mean, variance and autocorrelation structure do not change over time. Stationarity can be defined in precise mathematical terms, but for our purpose we mean a flat looking series, without trend, constant variance over time, a constant autocorrelation structure over time and no periodic fluctuations (seasonality). If the time series is not stationary, we can often transform it to stationarity with one of the following techniques. 1. We can difference the data. That is, given the series Zt, we create the new series 2. Yi=Zi−Zi−1. 3. The differenced data will contain one less point than the original data. Although you can difference the data more than once, one difference is usually sufficient. 4. If the data contain a trend, we can fit some type of curve to the data and then model the residuals from that fit. Since the purpose of the fit is to simply remove long term trend, a simple fit, such as a straight line, is typically used. 5. For non-constant variance, taking the logarithm or square root of the series may stabilize the variance. For negative data, you can add a suitable constant to make all the data positive before applying the transformation. This constant can then be subtracted from the model to obtain predicted (i.e., the fitted) values and forecasts for future points.
  • 75. Removing the linear trend and making the signal Stationary
  • 76. ARIMA - Autoregressive Integrated Moving Average Model, or ARIMA for short is a standard statistical model for time series forecast and analysis. - A standard notation is used of ARIMA(p,d,q) where the parameters are substituted with integer values to quickly indicate the specific ARIMA model being used. - The parameters of the ARIMA model are defined as follows: - p: The number of lag observations included in the model, also called the lag order. - d: The number of times that the raw observations are differenced, also called the degree of differencing. - q: The size of the moving average window, also called the order of moving average.
  • 77. ARIMA Diagnostics Two diagnostic plots can be used to help choose the p and q parameters of the ARMA or ARIMA. They are: ● Autocorrelation Function (ACF). The plot summarizes the correlation of an observation with lag values. The x- axis shows the lag and the y-axis shows the correlation coefficient between -1 and 1 for negative and positive correlation. ● Partial Autocorrelation Function (PACF). The plot summarizes the correlations for an observation with lag values that is not accounted for by prior lagged observations. Some useful patterns you may observe on these plots are: ● The model is AR if the ACF trails off after a lag and has a hard cut-off in the PACF after a lag. This lag is taken as the value for p. ● The model is MA if the PACF trails off after a lag and has a hard cut-off in the ACF after the lag. This lag value is taken as the value for q. ● The model is a mix of AR and MA if both the ACF and PACF trail off.
  • 78. Handling Seasonality - Seasonality is quite common in economic time series. It is less common in engineering and scientific data. - If seasonality is present, it must be incorporated into the time series model. In this section, we discuss techniques for detecting seasonality. We defer modeling of seasonality until later sections. - Removing seasonality: - A run sequence plot will often show seasonality. - A seasonal subseries plot is a specialized technique for showing seasonality. - Multiple box plots can be used as an alternative to the seasonal subseries plot to detect seasonality. - The autocorrelation plot can help identify seasonality.
  • 79.
  • 80. Acknowledgements - https://github.com/avannaldas/Loan-Defaulter-Prediction-Machine-Learning/ - https://www.itl.nist.gov/div898/handbook/pmc/section4/pmc41.htm - https://machinelearningmastery.com/gentle-introduction-box-jenkins-method-time-series-forecasting/ - ML course by Andew Ng - https://cse.iitk.ac.in/users/piyush/courses/ml_autumn16/771A_lec21_slides.pdf - https://ocw.mit.edu/courses/sloan-school-of-management/15-097-prediction-machine-learning-and-statistics-spring-2012/lecture- notes/MIT15_097S12_lec10.pdf - https://www.cs.toronto.edu/~hinton/csc2515/notes/lec11boo.htm - http://www.ccs.neu.edu/home/vip/teach/MLcourse/4_boosting/slides/gradient_boosting.pdf - http://blog.kaggle.com/2017/01/23/a-kaggle-master-explains-gradient-boosting/