Model Uncertainty Quantification: A Post Hoc Calibration Approach for Heart Disease Prediction

Peter Adebayo Odesola; Adewale Alex Adegoke; Idris Babalola

doi:10.55708/js0412003

Open AccessArticle

Model Uncertainty Quantification: A Post Hoc Calibration Approach for Heart Disease Prediction

by Peter Adebayo Odesola¹, Adewale Alex Adegoke² and Idris Babalola^*³

¹ Southampton Solent University, Southampton, United Kingdom

² Westminster Foundation for Democracy London, United Kingdom

³ Department of Health and Social Care, London, United Kingdom

^*whom correspondence should be addressed. E-mail: eidreiz01@gmail.com

Journal of Engineering Research and Sciences, Volume 4, Issue 12, Page # 25-54, 2025; DOI: 10.55708/js0412003

Keywords: Heart disease prediction, Machine learning, Probability calibration, Isotonic regression, Platt scaling, Temperature scaling, Uncertainty quantification, Expected calibration error (ECE), Brier score, Log loss, Spiegelhalter’s test, Reliability diagram, Post hoc calibration

Received: 29 September 2025, Revised: 21 November 2025, Accepted: 23 November 2025, Published Online: 12 December 2025

(This article belongs to the Section Artificial Intelligence – Computer Science (AIC))

Export Citations

Cite

APA Style
Odesola, P. A. , Adegoke, A. A. and Babalola, I. (2025). Model Uncertainty Quantification: A Post Hoc Calibration Approach for Heart Disease Prediction. Journal of Engineering Research and Sciences, 4(12), 25–54. https://doi.org/10.55708/js0412003

Chicago/Turabian Style
Peter Adebayo Odesola, Adewale Alex Adegoke and Idris Babalola. "Model Uncertainty Quantification: A Post Hoc Calibration Approach for Heart Disease Prediction." Journal of Engineering Research and Sciences 4, no. 12 (December 2025): 25–54. https://doi.org/10.55708/js0412003

IEEE Style
P.A. Odesola, A.A. Adegoke and I. Babalola, "Model Uncertainty Quantification: A Post Hoc Calibration Approach for Heart Disease Prediction," Journal of Engineering Research and Sciences, vol. 4, no. 12, pp. 25–54, Dec. 2025, doi: 10.55708/js0412003.

Download Now!

1593 Downloads

Abstract

Full Text

References

Cited By

Metrics

Abstract

Full Text

References

Cited By

Metrics

Year [Ref]	Data (Population / Dataset)	ML Approach & Key Results	Calibration (Evaluation & Metrics)
2025 [17]	Japanese Suita cohort (n=7,260; ~15-year follow-up; ages 30-84).	Risk models (LR, RF, SVM, XGB, LGBM) for 10 year CHD; RF best (AUC ~0.73); SHAP identified key factors.	Yes – Calibration curves and O/E ratios; RF ~1:1 calibration.
2025 [18]	NHANES (USA; ~37,000).	PSO ANN – particle swarm optimized neural net; ~97% accuracy; surpassed LR (~95.8%); feature selection + SMOTE.	No – Calibration not reported.
2024 [19]	Simulated big dataset + UCI.	AttGRU HMSI deep model; ~95.4% accuracy; emphasis on big data processing and feature selection.	No – Calibration not reported.
2023 [20]	UK Biobank (n≈473,000; 10 year follow up).	AutoPrognosis AutoML; AUC ≈0.76; 10 key predictors discovered.	Yes – Brier ~0.057 (good calibration).
2023 [21]	China EHR (Ningbo; n=215,744; 5 year follow up).	XGBoost vs Cox; C index 0.792 vs 0.781.	Yes – HL χ² ≈0.6, p=0.75 in men; non significant HL (good calibration).
2023 [22]	Stanford ECG datasets; external validation at 2 hospitals.	SEER CNN using resting ECG; 5 yr CV mortality AUC ~0.80 – 0.83; ASCVD AUC ~0.67; reclassified ~16% low risk to higher risk with true events.	No – Calibration not reported.
2022 [23]	China hypertension cohort (n=143,043).	Ensemble (avg RF/XGB/DNN); AUC 0.760 vs LR 0.737.	No – Calibration not reported.
2021 [24]	Korea NHIS (n≈223k) + external cohorts.	ML vs risk scores for 5 yr CVD; simple NN improved C stat (0.751 vs 0.741).	Yes – HL χ² baseline 171 vs 15-86 for ML (p>0.05). Brier ~0.031 – 0.032 (good calibration).
2021 [25]	NCDR Chest Pain MI registry (USA; n=755,402; derivation 564k; validation 190k).	In hospital mortality after MI; ensemble/XGBoost/NN vs logistic; similar AUC (~0.89).	Yes – Calibration slope ~1.0 in validation; Brier components & recalibration tables reported.
2021 [26]	Faisalabad Institute + Framingham + South African Hearth dataset & UCI (Cleveland n=303).	Feature importance with 10 ML algorithms; XAI focus.	No – Calibration not reported.
2020 [27]	Eastern China high risk screening (n=25,231; 3 year follow up).	Random Forest; AUC ≈0.787 vs risk charts ≈0.714.	Yes – HL χ²=10.31, p=0.24 (good calibration).
2019 [28]	UK Biobank subset (n=423,604; 5-year follow-up).	AutoPrognosis ensemble; AUC ≈0.774 vs Framingham ≈0.724; +368 cases identified.	Yes – Pipeline includes calibration (e.g., Platt scaling [sigmoid]); good agreement of predicted vs observed risk.
2017 [29]	UK CPRD primary care (n=378,256; 10 year follow up; 24,970 events).	Classic ML vs ACC/AHA score; NN best (AUC ≈0.764) vs 0.728; improved identification.	No – Calibration not reported.

Feature	Description	Data Type	Values / Range
Age (Years)	Age of the patient	Integer	29-77
sex	Sex (1 = male, 0 = female)	Categorical	0, 1
cp	Chest pain type	Categorical	1: typical angina, 2: atypical angina, 3: non-anginal pain, 4: asymptomatic
trestbps(mmHg)	Resting blood pressure (on admission to the hospital)	Integer	94-200
chol(mmol/L)	Serum cholesterol	Integer	126-564
Fbs (mmol/L)	Fasting blood sugar > 120 mg/dl (1 = true, 0 = false)	Categorical	0, 1
restecg	Resting electrocardiographic results	Categorical	0: normal, 1: ST-T abnormality, 2: left ventricular hypertrophy
thalach	Maximum heart rate achieved	Integer	71-202
exang	Exercise induced angina (1 = yes, 0 = no)	Categorical	0, 1
oldpeak	ST depression induced by exercise relative to rest	Real	0.0-6.2
slope	Slope of the peak exercise ST segment	Categorical	1: upsloping, 2: flat, 3: downsloping
ca	Number of major vessels (0-3) colored by fluoroscopy	Integer	0-3
thal	Thalassemia test result	Categorical	3: normal, 6: fixed defect, 7: reversible defect
num	Presence of heart disease (target: 0 = no, 1-4 = disease)	Categorical	0, 1, 2, 3, 4

Model	Parameter grid	Best parameter
K-Nearest Neighbors	Minkowski p: 1, 2; Number of neighbors: 3, 5, 7, 9; Weights: uniform, distance	Minkowski p: 1; Number of neighbors: 9; Weights: distance
Random Forest	Number of trees: 200, 300, 400; Max depth: None, 5, 10; Min samples per leaf: 1, 2, 4; Max features: sqrt, log2	Number of trees: 200; Max depth: None; Max features: sqrt; Min samples per leaf: 1
XGBoost	Number of trees: 200, 300; Learning rate: 0.03, 0.05, 0.1; Max depth: 3, 4, 5; Subsample: 0.8, 1.0; Column sample by tree: 0.8, 1.0	Number of trees: 200; Learning rate: 0.05; Max depth: 4; Subsample: 1.0; Column sample by tree: 0.8
Support Vector Machine	Kernel: rbf, linear; Regularization strength (C): 0.1, 1, 10; Gamma: scale, auto	Kernel: rbf; Regularization strength (C): 10; Gamma: scale
Logistic Regression	Regularization strength (C): 0.1, 1, 10; Solver: lbfgs, liblinear; Class weight: None, balanced	Regularization strength (C): 10; Solver: lbfgs; Class weight: None
Naive Bayes	Variance smoothing: 1e-09, 1e-08, 1e-07	Variance smoothing: 1e-07

Component	Description
Test Split	30% of dataset (~306 instances), stratified by target class
Cross-Validation	5-fold StratifiedKFold with shufflingpercent
Scaling	RobustScaler for numeric variables
Encoding	OneHotEncoder for nominal categorical fields
Models	Logistic Regression, SVM, Random Forest, XGBoost, KNN, Naive Bayes
Development Environment	Google Colab
Python libraries	Sklearn, matplotlib, scipy, numpy, pandas, seaborn
Model Evaluation Metrics	Accuracy, ROC-AUC, Precision, Recall, and F1 Score
Uncertainty Quantification Metrics	Brier Score, Expected Calibration Error (ECE), Log Loss, Spiegelhalter’s Z-score & p-value, Sharpness, Reliability diagram
Train/test split ratio	70% training: 30% testing

Model	Accuracy (%)	Accuracy 95% CI (Lower – Upper)	F1 (%)	F1 95% CI (Lower – Upper)	ROC AUC (%)	ROC AUC 95% CI (Lower – Upper)	Precision (%)	Recall (%)
KNN	99	98.1 – 100.0	99	97.9 – 100.0	100	100.0 – 100.0	100	98.1
RF	98.1	96.4 – 99.4	98.1	96.4 – 99.4	99.6	99.1 – 100.0	100	96.2
XGB	98.1	96.4 – 99.4	98.1	96.5 – 99.4	99.2	98.5 – 99.8	98.1	98.1
SVM	97.1	95.1 – 98.7	97.1	95.1 – 98.8	98.6	96.9 – 100.0	98.1	96.2
LR	86	82.1 – 89.6	86.6	82.3 – 90.3	94.3	91.7 – 96.7	85.3	88.0
NB	80.2	75.6 – 84.4	77.8	71.9 – 82.9	88.4	84.2 – 92.1	91.5	67.7

Model Uncertainty Quantification: A Post Hoc Calibration Approach for Heart Disease Prediction

Model Uncertainty Quantification: A Post Hoc Calibration Approach for Heart Disease Prediction

1. Introduction

1.1. Background

1.2. Motivation and Problem Statement

1.3. Scope and Contributions

1.4. Related Works

1.4.1. Machine Learning in Heart Disease Prediction: Calibration and Reliability Considerations

1.4.2. Gaps in Research

2. Materials and Methods

2.1. Research Methodology Overview

2.2. Description of the Dataset

2.3. Data Preprocessing

2.4. Model Selection

2.5. Model Tuning Strategy

2.6. Cross-validated discrimination

2.7. Model Performance Metrics

2.8. Post-Hoc Calibration and Evaluation

2.8.1. Selected Calibration Techniques

2.8.2. Model Uncertainty Quantification and Calibration Evaluation Metrics

2.9. Confidence intervals and statistical tests

3. Baseline model performance

3.1. Reliability Plots

3.2. Sensitivity of ECE to binning choice

3.3. Calibration metrics by model and calibration method

3.4. Calibration metrics with uncertainty

3.5. Sharpness of predicted probabilities

4. Interpretation of Results

5. Conclusion

Citations by Dimensions

Citations by PlumX

Crossref Citations

Important Links

Copyright

Address

Share Link

Model	Baseline model performance + Hyperparameter tuning			Baseline model performance + Hyperparameter tuning + Cross validation (CV=5) Out of fold (OOF) + Inner 5-fold for Youden J
Model	Accuracy	F1	ROC AUC	Accuracy	F1	ROC AUC
KNN	99.0	99.0	100	99.6	99.6	100
RF	98.1	98.1	99.6	99.6	99.6	100
XGB	98.1	98.1	99.2	99.0	99.0	100
SVM	97.1	97.1	98.6	99.0	99.1	100
LR	86.0	86.6	94.3	86.8	87.5	94.0
NB	80.2	77.8	88.4	83.8	84.7	89.5

Section	Sub section	Number of pairs	Median Δ ECE	95% Median CI Low	95% Median CI High	Mean Δ ECE	Wilcoxon p	Frac quantile < uniform
Overall	—-	120	0.0069	0.0056	0.0089	0.0054	4.87×10⁻⁸	0.7417
By model	XGB	20	0.0115	0.0074	0.0149	0.011	9.54×10⁻⁶	0.9
	RF	20	0.0098	0.0057	0.0119	0.0099	0.000261	0.95
	SVM	20	0.0066	0.0007	0.01	0.006	0.009436	0.8
	LR	20	0.0061	-0.0044	0.008	0.0024	0.2774	0.6
	KNN	20	0.0053	0.0017	0.0074	0.0066	0.000655	0.75
	NB	20	-0.0024	-0.0093	0.013	-0.0037	0.7841	0.45
By calibration	Uncalibrated	30	0.0069	0.0012	0.0119	0.0078	8.09×10⁻⁵	0.7333
	Isotonic	30	0.0068	0.0048	0.0083	0.0069	0.00073	0.8667
	Platt	30	0.0073	0.0016	0.0108	-0.0004	0.2534	0.7
	Temperature	30	0.0064	0.0004	0.0147	0.0072	0.005383	0.6667

Model	Calibration	Accuracy	F1	ROC AUC	Brier Score	Log Loss	ECE (uniform, 10)	ECE (quantile, 10)	Sharpness (Var)	Z-Score	Z p-value
KNN	Isotonic	99.6	99.6	100	0.0044	0.0211	0.0146	0.0094	0.2396	0.9252	0.5618
	Platt	99.6	99.6	100	0.0054	0.0388	0.0308	0.0237	0.2231	0.6622	0.5969
	Temperature	96.7	96.7	99	0.0258	0.1228	0.0287	0.0148	0.2295	1.0477	0.3933
	Uncalibrated	99.6	99.6	100	0.0026	0.007	0.0039	0.0039	0.2487	0.9849	0.6608
LR	Isotonic	87.3	87.8	94.4	0.0905	0.3018	0.055	0.0482	0.1639	-0.1645	0.5713
	Platt	86.7	87.5	94	0.0957	0.3182	0.0567	0.0645	0.1394	-0.0513	0.6791
	Temperature	85.1	85.7	93.6	0.0975	0.3259	0.0593	0.056	0.1504	0.4082	0.4916
	Uncalibrated	86.8	87.5	94	0.0944	0.3171	0.0646	0.0571	0.1565	0.021	0.577
NB	Isotonic	83.8	84.7	90.7	0.1196	0.3839	0.0621	0.0534	0.1344	-0.0773	0.5412
	Platt	83.7	84.7	90.1	0.1291	0.4222	0.0545	0.0942	0.1023	-0.1822	0.6847
	Temperature	81.2	80.1	89.9	0.1248	0.4487	0.0741	0.0689	0.1656	-0.0968	0.6696
	Uncalibrated	83.8	84.7	89.5	0.1492	1.51	0.146	0.1348	0.2292	-3.1409	0.2343
RF	Isotonic	99.6	99.6	100	0.0042	0.0201	0.0144	0.0098	0.2387	0.8125	0.5283
	Platt	99.6	99.6	100	0.0048	0.0366	0.0331	0.0223	0.2217	0.5198	0.6463
	Temperature	97	97	99	0.0242	0.1024	0.0318	0.0201	0.2264	0.9775	0.4323
	Uncalibrated	99.6	99.6	100	0.0058	0.0484	0.0449	0.0322	0.2109	0.6992	0.506
SVM	Isotonic	99.1	99.1	100	0.0087	0.0442	0.0337	0.0268	0.2228	0.4598	0.4639
	Platt	98.8	98.9	99.9	0.0125	0.075	0.0594	0.0452	0.1991	0.3284	0.5607
	Temperature	95.6	95.7	98.2	0.0365	0.1675	0.0426	0.0411	0.2074	0.6681	0.4894
	Uncalibrated	99	99.1	100	0.0065	0.0376	0.0226	0.0214	0.2316	0.0207	0.3804
XGB	Isotonic	99.2	99.2	100	0.007	0.0311	0.0241	0.0147	0.2313	0.4402	0.5234
	Platt	99.4	99.4	100	0.0092	0.0534	0.0438	0.0307	0.2125	0.2697	0.7105
	Temperature	96.9	96.9	98.1	0.0308	0.1453	0.0385	0.0311	0.2142	0.7084	0.4043
	Uncalibrated	99	99	100	0.0135	0.0764	0.0639	0.0497	0.1964	0.2525	0.8046

Model	Calibration	Brier	Brier 95% CI (Lower – Upper)	ECE (uniform, 10)	ECE (uniform,10) 95% CI (Lower – Upper)	ECE (quantile, 10)	ECE (quantile,10) 95% CI (Lower – Upper)	Log Loss	Log Loss 95% CI (Lower – Upper)
KNN	Uncalibrated	0.0026	0.0 – 0.0075	0.0039	0.0 – 0.01	0.0039	0.0 – 0.01	0.007	0.0 – 0.0192
	Platt	0.0054	0.0019 – 0.0114	0.0308	0.0263 – 0.0352	0.0237	0.0185 – 0.029	0.0388	0.0274 – 0.0537
	Isotonic	0.0044	0.0009 – 0.0108	0.0146	0.0083 – 0.0211	0.0094	0.0036 – 0.0162	0.0211	0.0088 – 0.0393
	Temperature	0.0258	0.0199 – 0.0326	0.0287	0.0206 – 0.0388	0.0148	0.0102 – 0.0193	0.1228	0.068 – 0.1916
RF	Uncalibrated	0.0058	0.0046 – 0.0078	0.0449	0.0422 – 0.049	0.0322	0.0316 – 0.0328	0.0484	0.0449 – 0.054
	Platt	0.0048	0.0027 – 0.0083	0.0331	0.0289 – 0.0374	0.0223	0.0195 – 0.0256	0.0366	0.0303 – 0.0442
	Isotonic	0.0042	0.0012 – 0.0095	0.0144	0.0104 – 0.0184	0.0098	0.0071 – 0.0133	0.0201	0.0111 – 0.0329
	Temperature	0.0242	0.017 – 0.0306	0.0318	0.0257 – 0.0378	0.0201	0.0109 – 0.0308	0.1024	0.076 – 0.1339
XGB	Uncalibrated	0.0135	0.0119 – 0.0152	0.0639	0.0592 – 0.069	0.0497	0.046 – 0.0534	0.0764	0.0716 – 0.0812
	Platt	0.0092	0.0074 – 0.0112	0.0438	0.0382 – 0.0496	0.0307	0.0261 – 0.0371	0.0534	0.0484 – 0.0574
	Isotonic	0.007	0.0044 – 0.0096	0.0241	0.0204 – 0.0294	0.0147	0.011 – 0.0194	0.0311	0.0248 – 0.0372
	Temperature	0.0308	0.0216 – 0.04	0.0385	0.0317 – 0.0444	0.0311	0.0268 – 0.0388	0.1453	0.1089 – 0.1871
SVM	Uncalibrated	0.0065	0.002 – 0.0132	0.0226	0.0157 – 0.0307	0.0214	0.0133 – 0.0299	0.0376	0.0204 – 0.061
	Platt	0.0125	0.0094 – 0.0174	0.0594	0.0512 – 0.0664	0.0452	0.0312 – 0.0567	0.075	0.0668 – 0.0861
	Isotonic	0.0087	0.0056 – 0.0128	0.0337	0.0309 – 0.0365	0.0268	0.0221 – 0.0313	0.0442	0.0376 – 0.052
	Temperature	0.0365	0.0304 – 0.0412	0.0426	0.0368 – 0.0484	0.0411	0.0322 – 0.05	0.1675	0.1266 – 0.2111
LR	Uncalibrated	0.0944	0.088 – 0.1002	0.0646	0.0575 – 0.0745	0.0571	0.0505 – 0.0637	0.3171	0.2912 – 0.34
	Platt	0.0957	0.0906 – 0.1007	0.0567	0.0446 – 0.0693	0.0645	0.0546 – 0.0746	0.3182	0.3001 – 0.3352
	Isotonic	0.0905	0.0842 – 0.0962	0.055	0.0511 – 0.0589	0.0482	0.0415 – 0.0539	0.3018	0.2784 – 0.3194
	Temperature	0.0975	0.0922 – 0.1027	0.0593	0.0497 – 0.0697	0.056	0.0462 – 0.0655	0.3259	0.3062 – 0.3455
NB	Uncalibrated	0.1492	0.1365 – 0.1634	0.146	0.1314 – 0.1649	0.1348	0.1191 – 0.148	1.51	1.2434 – 1.7586
	Platt	0.1291	0.1201 – 0.1381	0.0545	0.0407 – 0.0715	0.0942	0.0759 – 0.1117	0.4222	0.4009 – 0.4453
	Isotonic	0.1196	0.1105 – 0.1308	0.0621	0.0498 – 0.0784	0.0534	0.0425 – 0.0637	0.3839	0.3556 – 0.4166
	Temperature	0.1248	0.1134 – 0.1382	0.0741	0.0542 – 0.0893	0.0689	0.057 – 0.0771	0.4487	0.3869 – 0.5153

Model	Calibration vs Uncalibrated	Brier Δ (Cal – Uncal)	Permutation p (Brier)	ECE (uniform, 10) Δ (Cal – Uncal)	Permutation p (ECE (uniform, 10)	ECE (quantile, 10) Δ (Cal – Uncal)	Permutation p (ECE (quantile, 10)	Log Loss Δ (Cal – Uncal)	Permutation p (Log Loss)
KNN	Platt	0.0028	0.0626	0.0269	0.0632	0.0198	0.0624	0.0318	0.0682
	Isotonic	0.0018	0.0608	0.0107	0.0684	0.0055	0.0638	0.0141	0.0624
	Temperature	0.0232	0.0633	0.0248	0.0637	0.0109	0.1284	0.1158	0.0618
RF	Platt	-0.001	0.2537	-0.0119	0.0601	-0.0099	0.0566	-0.0118	0.0637
	Isotonic	-0.0016	0.3717	-0.0305	0.0612	-0.0224	0.0604	-0.0283	0.0664
	Temperature	0.0184	0.0611	-0.0131	0.0605	-0.0121	0.1826	0.054	0.0604
XGB	Platt	-0.0043	0.0654	-0.0202	0.0624	-0.019	0.0605	-0.0231	0.0611
	Isotonic	-0.0065	0.0613	-0.0398	0.064	-0.0349	0.0625	-0.0453	0.0612
	Temperature	0.0173	0.0616	-0.0254	0.0642	-0.0185	0.1278	0.0688	0.0632
SVM	Platt	0.006	0.06	0.0368	0.0626	0.0238	0.0637	0.0374	0.0625
	Isotonic	0.0022	0.3037	0.0111	0.0618	0.0054	0.1889	0.0065	0.4374
	Temperature	0.03	0.0622	0.02	0.1236	0.0197	0.1863	0.1299	0.0634
LR	Platt	0.0013	0.0637	-0.0079	0.1285	0.0074	0.1236	0.0011	1
	Isotonic	-0.0039	0.0644	-0.0096	0.0611	-0.0089	0.1241	-0.0153	0.0637
	Temperature	0.0031	0.1859	-0.0053	0.5643	-0.0011	0.8708	0.0088	0.0625
NB	Platt	-0.0201	0.0589	-0.0915	0.0619	-0.0406	0.0632	-1.0878	0.0628
	Isotonic	-0.0296	0.0589	-0.0838	0.063	-0.0814	0.0599	-1.126	0.0612
	Temperature	-0.0244	0.0609	-0.0719	0.0662	-0.0659	0.0633	-1.0613	0.0622