This document presents a new statistical method called smooth isotonic regression (sIR) to calibrate predictive models. sIR improves upon existing calibration methods like logistic regression (LR) and isotonic regression (IR) by using spline curves to provide a smooth, non-parametric calibration curve. The sIR method was shown to outperform LR and IR in calibration based on both simulation data and analysis of a real biological dataset, providing calibrated predicted values that better reflected the actual observed values.
1. Smooth Isotonic Regression: A New Method to
Calibrate Predictive Models
AMIA Summits Transl Sci Proc. 2011;2011:16-20. Epub 2011 Mar 7
20121210
Statistical method paper
3. 2 4 6 8 10
-20020406080100120
concentration
viability
wild type
mutant
従来法
Hill equation (HE)
新手法
Isotonic regression (IR)
最小二乗法によりロジスティック曲線に回帰
毒性評価値はEC50
細胞間順序を加えて二次計画法により推定
毒性評価値はAUC
)50log(log
inf
inf
1
)( ECxn
zero
e
EE
ExE
2
1
)}()({min ii
n
i
i cfcgw
Hill, A.V. J. Physiol. 1910
Best, M.J. & Chakravarti N. Mathematical Programming. 1990
ΔEC50
ΔAUC
4. Reliability diagram
• Out of thirty 20% probability forecasts, the
predicted event should verify six times, i.e. in
20% of the time, not more, not less.
5.
6. Smooth non-parametric estimators
leviate overfitting and underfitting problems, and thus
havereceivedmoreattention recently. Themethodsby
Wang et al. [14] and Meyer [10] find anon-decreasing
mapping function t() that minimizes:
i
(ci − t(pi ))2
+ λ
b
a
[t(m )
(γ)]2
dγ, (1)
where m corresponds to a smoothness parameter, a
and b represent the range of input predictions, and λ
balances the goodness-of-fit (first component) and the
smoothness(second component) of thetransformation
function t(). When m = 1, Equation 1 corresponds
to a piece-wise linear estimator. When m = 2, Equa-
tion 1representsasmooth monotoneestimator. In the-
m
a ~ b
l
smoothness parameter
the range of input predictions
balances the goodness-of-fit and the smoothness
Their inferences require much heavier computation and
tedious parameter tuning
10. Hosmer-Lemeshow test
A statistical test for goodness of fit for logistic
regression model
: Model is well fit
: Model is NOT well fit
library(eRm)
library(ResourceSelection)
library(MKmisc)
0H
H1
Oc
i
Ei
G
ni
the No.group (= 10)
the sum of cases (c = 0 or 1)
the estimated probabilities
the No.cases in group i