1. confusion matrix (aka errormatrix): A matrix that visualizesthe performance of the classification algorithmusing the data
in the matrix (Performance Evaluation/comparing predictive models/distinguishbetweenmodels). It compares the predicted
classification against the actual classification in the form of false positive, true positive, false negative and true negati ve
information. A confusion matrix for a two-class classifier system (Kohavi and Provost, 1998) follows
accuracy (aka errorrate). The rate of correct (or incorrect) predictions made bythe modelover a dataset. Accuracyis usually
estimated by using an independent test set that was not used at any time during the learning process. More complex
accuracy estimation techniques, such as cross-validation and bootstrapping, are commonly used, especially with datasets
containing a small number of instances.
[Awad M., Khanna R. (2015) Support Vector Machines for Classification. In: Efficient Learning Machines. Apress, Berkeley,
CA]
2. Motion Assessment for Accelerometric and Heart Rate Cycling Data Analysis
Figure 3 Feature extraction for EEG, EOG, And EMG. Multi-biosignal sensing readout integrated circuit (ROIC)
https://ieeexplore.ieee.org/abstract/document/9025032
The classification performance was analysed in terms of average sensitivity, specificity, precision and CCR (correct
classification rate), obtained after testing. Sensitivity gives the measure of how many positive class members are predicted
correctly by theclassifier out of thetotalnumber of positiveclasses while specificity does thesame for negative class members.
Precision defines the probability of a subject tested positive being the actual positive and the CCR is defined as the ratio of
correctly classified data points to the total number of data points.
[Computer aided detection of prostatecancer using multiwavelength photoacousticdata with convolutional neural network]
Three major classifications of machine learning algorithms are supervised learning, unsupervised learning and reinforcement
learning. In supervised learning the labeled dataset is trained to predict the future judgments via mapping the relationship. In
unsupervised learning, patterns are identified from the unlabelled dataset. In Reinforcement method thesystemis allowed to
behave in an interactive environment to learn through its trial and error. As we mentioned before, trust prediction has often
been modeled as a classification problem. In machine learning classification can be done through supervised learning methods
like Decision tree, K-NN, SVM, Neural networks, Naïve Bayesian. Classification, ensemble Logistic and Linear regression
techniques. Its performances are mentioned in table 1. Theperformance of all thevariants may vary depends on theparameters.
In our analysis, we considered the implicit conversion factor as a factor for determining the performance.
The classifier assemble the datasets in 4 classes with an ostensible qualities spoketo as evident positives (TP), false positives
(FP), genuine negatives (TN) and false negatives (FN).TP-Thequantity of models effectively delegated trust. FP-Thequantity
of questioned occasions mistakenly marked as trust. TN-The quantity of precedents with doubt relationship effectively
anticipated as doubt. TN-Thequantity of confided in things that are misclassified as doubt. Based on the above discussion, we
argue that FP and TN are more important than TP and FN in trust prediction for real life applications. However, working out
a set of formulas to describe their relative gains of correct prediction and cost of incorrect prediction involves multiple
disciplines.
Accuracy represents the percentage of correctly classified trust pairs. In table 11 both the classifiers hold 80% accuracy.
However, classifier 2 clearly predicts distrust much better than classier 1 Thus accuracy alone is not suitable for trust
prediction. Precision/ Recall This metric differentiates the strength of each classifier to predict trust level but it fails to predict
thedistrust proportion correctly. If recall is used, it reflects the capacity of distrust ratio but fails to evaluate trust value. Model
performance Accuracy = (TN+TP)/(TN+FP+FN+TP) Precision =TP/(FP+TP) Sensitivity =TP(TP+FN) Specificity
=TN/(TN+FP) F-Measure In F-measure precision and recall are treated as single metric. F-measure can be used to obtain a
trade-off between precision and recall. However, like precision and recall, F-measure cannot reflect how well a classifier
handles negative cases [21].
3. Receiver Operator Character (ROC) When using a scalar quantity values, the prediction will be more accurate. In the figure,
the diagonal line is used as a base line. It is clear that classifer2 is closer to true positivewhich depicts the most trusted pair
value with higher TP. TomFawcett recommends averaging theROC curves of multiple test set when comparing classifiers in
order to take into account variance [22]. Even though ROC curve is mostly preferred to evaluate a binary classifier, it has
following pitfalls: Firstly, direct comparisons of classifiers of different families with different thresholds are impossible.
Secondly, discrete classifiers must be converted to scoring versions to generate a full ROC curve. Thirdly, it is suitable for
applications like trust prediction only after performing certain transformations [22].
Precision-Recall (Pr) Curve PR curve plots recall on x-axis and precision on y-axis. Even though it shares many common
characteristics with ROC curve, PR curve is different in terms of optimal classifiers sit on upper right hand corner in the PR
space. Unlike ROC with fixed base line, the base line of PR curve changes with the ratio of positive instance to negative
instance. ROC curve for tasks with imbalanced datasets since PR plots precision that can capturethe poor performance of a
classifier for imbalanced datasets. As a result, PR curve has been proposed to be an alternative to ROC curve for tasks with
strongly imbalanced datasets [23]. To sum up, an evaluation process must be executed after training a model in order to
evaluate the performance of the trained model. In addition, evaluation is needed when comparing several different models to
select an optimal one. [Mining Social Media Content To Predict Peer Trust Level in Social Networks]
Performance evaluation (Evaluation Metric) The accuracy of theobtained results is used to evaluate different methods. The
most popular training approach is tenfold cross-validation, where each fold, i.e., one horizontal segment of the dataset is
considered to be the testing dataset and the remaining nine segments are used as the training dataset.
Except for the accuracy, the performance of theclassifiers is commonly measured by thefollowing metrics such as precision,
recall, and f-measure. These are based on four possible classification outcomes—True-Positive (TP), True-Negative (TN),
False-Positive (FP), and False-Negative (FN). This table describes each parameter metric considering seizure and non-
seizure case
Acronym Detection type Real-world scenario
TP True-positive If a person suffers to ‘seizure’ and also correctly detected as a ‘seizure’
TN True-negative The person is actually normal and the classifier also detected as a ‘non-seizure’
FP False-positive Incorrect detection, when the classifier detects the normal patient as a ‘seizure’
case
FN False-negative Incorrect detection, when the classifier detects the person with ‘seizure(s)’ as a
normal person. This is a severe problem in health informatics research
Precision is the ratio of true-positives to thetotal number of cases that are detected as positive(TP+FP). It is the percentage of
selected cases that are correct, as shown in Eq. 1. High precision means the low false-positive rate.
Recall is the ratio of true-positivecases to the cases that are actually positive. Equation 2 shows the percentage of corrected
cases that are selected.
Despitegetting thehigh Recall results of theclassifier, it does not indicate that theclassifier performs wellin terms of precision.
As a result, it is mandatory to calculate the weighted harmonic mean of Precision and Recall; this measure is known as F-
measure score, shown in Eq. 3. The false-positives and thefalse-negatives are taken into account. Generally, it is more useful
than accuracy, especially when the dataset is imbalanced.
4. We test randomdata of pairs using YTRP as theyeast regulatory network source for evaluation. We conducted a test consisting
of 10 000 new random pairs. The accuracy (ACC) is measured by observing thetrue positives (TP), false positives (FP), true
negatives (TN), and false negatives (FN). A description of each measure is explained below:
TP is the total of correctly predicted connected pairs (found in YTRP);
FP is the total of negative instances predicted as connected pairs pairs that are not found in YTRP);
TN is thenumber of correctly predicted nonconnected pairs (not found in YTRP and belongs to the 6000 zeros used
for training);
FN corresponds to the number of incorrectly predicted negative pairs (not found in YTRP and not part of the 6000
zeros used for training).
Thevalue of each measure is indicated in Table2. We also measure thepositive predictivevalue (PPV) and negative predictive
value (NPV) to diagnose the performance of testing:
𝑃𝑃𝑉=𝑇𝑃/𝑇𝑃+𝐹𝑃=0.869(86.9%)𝑁𝑃𝑉=𝑇𝑁/𝑇𝑁+𝐹𝑁=0.999(99.9%)𝐴𝐶𝐶=(𝑇𝑃+𝑇𝑁)/(𝑇𝑃+𝑇𝑁+𝐹𝑃+𝐹𝑁)=0.9248(9
2.5%)
[Inferring Causation in Yeast Gene Association Networks With Kernel Logistic Regression]
The significant statistical features were extracted by different types of transformation techniques; discrete wavelet
transformations (DWT), continuous wavelet transformation (CWT), Fourier transformation (FT), discrete cosine
transformation (DCT), singular value decomposition (SVD), intrinsic mode function (IMF), and time–frequency domain from
EEG datasets.
[A review of epileptic seizure detection using machine learning classifiers]
Other Evaluation Metric:
Moreover, we need to consider the performance of all the classes. In this paper, we use a modified version of the F1- score,
weighted F1-score, to evaluate the system performance. The definition of the weighted F1 score is:
F1_weighted = P_MWxF1_MW+P_non-MWxF1_non-MW
Where MW is Mind-wandering
Cohen’s Kappa Coefficient (Kappa, ĸ): Cohen’s Kappa coefficient stands for the agreement between two raters. It is the
proportion of agreement after chance agreement is removed from consideration. The Cohen’s Kappacoefficient is calculated
as: ĸ = (po-pe)/(1-pe), -1≤ ĸ≤1
where po is the relative observed agreement among raters, and peis thehypotheticalprobability of the chance agreement. In
our case, we use Kappa to measure the agreement between true labels and predicted labels. The better the detection system
performs, the higher the Kappa is.
Area Under ROC Curve (AUC)
If the classifier can output the probability of each class, then we can calculate the receiver operating characteristic (ROC)
curve. In a ROC curve, thex-axis is thefalse positiverate, and they-axis is thetruepositive rate. By calculating the area under
the ROC curve (AUC), we can analyze the effectiveness of the prediction model. The chance level of AUC is 0.5, and an
excellent model has an AUC close to 1.
[https://arxiv.org/ftp/arxiv/papers/2005/2005.12076.pdf]
/-/What Is an ROC Curve? by KAREN GRACE-MARTIN
An incredibly useful tool in evaluating and comparing predictive models is the ROC curve.
Its name is indeed strange. ROC stands for Receiver Operating Characteristic. Its origin is from sonar back in the1940s; ROCs
were used to measure how well a sonar signal (e.g., from an enemy submarine) could be detected from noise (a school of fish).
In its current usage, ROC curves are a nice way to see how any predictive model can distinguish between the true positives
and negatives.
In order to do this, a model needs to not only correctly predict a positiveas a positive, but also a negative as a negative. The
ROC curve does this by plotting sensitivity /-/
-statistical and entropy-based features (types of feature extractions)
6. [An image encryption algorithm based on a hidden attractor chaos system and the Knuth-Durstenfeld algorithm]
The model is evaluatedas an edge classifier and considers as evaluationmetric the F1-score. [A Generative Graph Method
to Solve the Travelling Salesman Problem]
https://arxiv.org/ftp/arxiv/papers/2006/2006.04611.pdf contains Classification of Sentiment Extraction on the different
approaches use. Deep Learning Model, and evaluation definition (confusion matrix).
[Quantitative Assessment of Traumatic Upper-LimbPeripheralNerve
Injuries Using Surface Electromyography]
7. G-Mean:Since the durations of seizure events are muchshorter than those of non-seizure periods, the seizure detection
can be regardedas a imbalancedclassificationproblem. This indicator is veryinformative andsuitable for the evaluation of
the imbalanced classification inthis work, whichis definedas
Kappa score is considered a statistically robust measure to verify the quality of the method (Ismail Fawaz et al., 2019).
Generally, the values of the kappa score range between1 to -1, where positive 1 indicates perfect classification. We found
out kappa values as follows:
[CWT Based Transfer Learning for Motor ImageryClassificationfor Braincomputer Interfaces]
D. Evaluation: of the proposed failure prediction method is provided using metrices. It is studied on Tensorflow and FPGA
(Altera Arria 10 GX FPGA 10AX115N2F45E1SG device). Metrices of evaluation is (faults prediction=prediction
failures=outcomes)
1. TP refers to the total number of predictionfailures (outcomes) correctly (identifiedas positive) withina specific duration.
E.g. corrected number ofprediction= 85 out of 100 within1minute thenTP= 85. 2. FP refers to the totalnumber of failures
(outcomes) have not occurred but mistakenly predicted within a specific duration (incorrectly identified as positive) (the
total number ofuncorrectedpredicted failures). 3.FN refers to the total number ofunpredictable failures (outcomes) which
have been occurred with a specific duration(incorrectlyidentifiedas negative). E.g., the corrected number of predictionis
85 from 100 within 1minute, it means the number ofunpredictable failure is 15 (100-85=FN). 4.Sensitivityrefers to the ratio
between the corrected number of identified failures (TP) and the total sum of TP and FN (total numbers of outcomes). 5.
Precisionrefers to the ratiobetweencorrectednumber of identified failures andthe sumof the correctedanduncorrected
predictedfailures. 6. Tensionrefers to the relationbetweensensitivityandprecision, whichshouldbe balanced. Increasing
precisionresults in a decreasingsensitivity, so, there is a trade-off betweenthem. The sensitivityimproves withlowFN which
results in increasing FP, and it reduces the precision.
Tension= 2(sensitivityx Precision)/(sensitivity + Precision). It is the relationbetweensensitivityandprecision whichshould
be balanced. Since there is a tradeoff betweenthem.
7. Specificityrefers to the measurement of the proportionof actualnegatives that are correctlyidentified. 8.Accuracy(of a
test) refers to the test’s abilityto differentiate classes correctly.
[Machine learning-basedapproachfor hardware faults prediction]
Good paper illustrate some suchterms. Also,
[W. Zhu et al, Sensitivity, Specificity, Accuracy, associatedconfidence interval and ROCanalysiswith practical SAS
implementation, on Proc. NESUG:Health care life Sci., vol. 19 Baltimore MD. USA p. 67, 2010].
In this section, statistical information about the sensitivity and specificity measures is extracted. The higher the sensitivity
and specificity values, the better the procedure. The results for the automated method compared to the ground truth or
gold standard were calculated for each image. These metrics are defined as:
[Comparison Different Vessel Segmentation Methods in Automated Microaneurysms Detection in Retinal Images using
Convolutional Neural Networks ]
8. Deep neural networks use a nonlinear end-to-end mapping in order to transform a low resolution image to the high
resolution one. Residual blocks facilitate the flow of the information in deep neural networks and enhance the network
performance.
The goal of an open-set recognition system is to reject test samples from unknown classes while maintaining the
performance on known classes. However, in some cases, the learned model should be able to not only differentiate the
unknown classes fromknownclasses, but alsodistinguish among different unknown classes. Zero -shot learning (ZSL) [8], [9]
is one way to address the above challenges and has been applied in image tasks.
[SR2CNN: Zero-Shot Learning for Signal Recognition]
Skewness (EEG) is a complex time domain attribute of a signal, representing the amplitude regularity.
Where N is the total number of EEGsamples, Xi represents the current time seriessample, µ isthe meanandσis thestandard
deviation ofthe signal. To realize SKfeature on-chip, we propose anapproximate SKindicator (ASKI) which provide a similar
trend of SKfor the desiredrange andreducesthe gate count by86x comparedto conventional SKimplementation(2), with
factor “k” in ASKI as dataset dependent.
[A10.13uJ/classification 2-channel Deep Neural Network-based SoC for Emotion Detection of Autistic Children]
https://www.researchgate.net/profile/Khalil_Ur_Rehman/publication/342170018_Classification_of_Power_Quality_Distur
bance_Based_on_Multiscale_Singular_Spectral_Analysis_and_Multi_Resolution_Wavelet_Transforms/links/5ee702ad4585
15814a5e995f/Classification-of-Power-Quality-Disturbance-Based-on-Multiscale-Singular-Spectral-Analysis-and-Multi-
Resolution-Wavelet-Transforms.pdf
9. https://arxiv.org/pdf/2002.01925.pdf
Matthew’s correlation coefficient (MCC).
In addition to this, the efficacyof the proposed scheme is evaluated interms ofdifferent performance metrics, namely, the
classification accuracy (CA), sensitivity (Sn), specificity (Sp), AUC, F-measure, computation time (in seconds), and the
Matthew’s correlation coefficient (MCC). The aforementioned performance measures are obtained with the help of the
confusion matrix.
[An improved scheme for digital mammogram classification using weighted chaotic salp swarm algorithm-based kernel
extreme learning machine]
https://www.sciencedirect.com/science/article/pii/S1568494620302064
The overall system precisionis studiedinterms of the classificationaccuracy, the F-measure, the area under the ROC curve
(AUC) and the Kappa statistics.
https://link.springer.com/article/10.1007/s13246-020-00863-6
Energy: Es = ∫ ⌈ 𝑥2⌉
∞
−∞
𝑑𝑡 , Arithmetic ortalamas: 𝜇( 𝑥) =
1
𝑁
∑ 𝑥( 𝑖)𝑁
𝑖=1 , Standard sapma: 𝜎( 𝑥) = √
1
𝑁
∑ ( 𝑥𝑖 − 𝑥̅)2𝑁
𝑖=1
Basik lik katayisi: ks =
1
𝑁
∑ ( 𝑥𝑖−𝑥̅)4𝑁
𝑖=1
𝜎4 , Carpik lik katsayisi: sk =
𝑁
( 𝑁−1)( 𝑁−2)
∑ (
𝑥𝑖−𝑥̅
𝜎
)
2
𝑁
𝑖=1 , Entropy: ∑
𝑥𝑖
𝐸 𝑠
𝑁
𝑖=1 ln (
𝑥𝑖
𝐸 𝑠
)
10. Covariance (it is intersection between linear algebra and statistics Gilbert’s saying): 𝑐𝑜𝑣( 𝑥, 𝑦) =
∑( 𝑥𝑖−𝑥̅)( 𝑦𝑖−𝑦̅)
𝑛
Variance (mean and standard deviation are two golden keys in statistics Gilbert’s saying):
1
𝑁
∑ ( 𝑥𝑖 − 𝑥̅)2𝑁
𝑖=1
- Kurtosis is widelyusedstatistical method, based onthe high-order statistics, which is very sensitive to outliers of the
amplitude distribution, and is calculated as
[ScalpEEG classificationusing deepBi-LSTMnetwork for seizure detection]