Explainable AI for SSD Failure Prediction: Using LIME and SHAP for Transparency
Journal of Engineering Research and Sciences, Volume 5, Issue 4, Page # 1-16, 2026; DOI: 10.55708/js0504001
Keywords: Predictive Maintenance, SSD Failure Prediction, Model Interpretability, Explainable AI, LIME, SHAP
(This article belongs to the Section Artificial Intelligence – Computer Science (AIC))
Export Citations
Cite
Kumar, S. K. (2026). Explainable AI for SSD Failure Prediction: Using LIME and SHAP for Transparency. Journal of Engineering Research and Sciences, 5(4), 1–16. https://doi.org/10.55708/js0504001
Saurav Kant Kumar. "Explainable AI for SSD Failure Prediction: Using LIME and SHAP for Transparency." Journal of Engineering Research and Sciences 5, no. 4 (April 2026): 1–16. https://doi.org/10.55708/js0504001
S.K. Kumar, "Explainable AI for SSD Failure Prediction: Using LIME and SHAP for Transparency," Journal of Engineering Research and Sciences, vol. 5, no. 4, pp. 1–16, Apr. 2026, doi: 10.55708/js0504001.
Artificial Intelligence (AI) has become increasingly crucial for modern data centers for automating tasks ranging from anomaly detection to predictive maintenance. Nevertheless, a significant limitation of underlying machine learning (ML) models is their “black box” nature. This lack of transparency limits trust among stakeholders who require visibility into model decisions. We address this lack of transparency by evaluating explainable AI techniques within an SSD failure prediction pipeline to improve interpretability and operational trust. Our study makes the following three main contributions. First, we provide a large-scale empirical evaluation of explainable AI techniques (LIME and SHAP) within an SSD failure prediction pipeline under realistic temporal validation and deployment constraints. Second, we provide a qualitative comparison of LIME and SHAP, focusing on their roles in local and global interpretability and their practical behavior in SSD failure prediction. Third, we analyze model performance from an operational perspective using a cost-sensitive framework, demonstrating how explainability supports decision-making in data center environments. To address temporal data leakage and model robustness, we evaluate our approach on a temporal split with 10,637,778 training records and 5,499,337 test records from the Alibaba dataset, which contains data from over 500,000 SSDs. The tuned XGBoost model achieved a recall of 67.98%, precision of 4.43%, and false alarm rate of 0.1878, by optimizing a custom "Safety-First" cost function at a decision threshold of 0.680, effectively functioning as a high-sensitivity screening tool. This approach resulted in an estimated net operational savings of $13.42 million compared to baseline maintenance strategies. Additionally, the findings show that LIME generates intuitive, human-readable justifications for individual predictions and SHAP explains the model at both global and local levels. Integration of an explainable AI layer to ML pipelines turns “black box” models into systems that are easy to understand and verify, which makes them more trustworthy and reliable.
- D. Reinsel, J. Gantz, J. Rydning, “The digitization of the world from edge to core”, Tech. Rep. US44413318, IDC, 2018.
- F. Xu, S. Han, P. P. C. Lee, Y. Liu, C. He, J. Liu, “General feature selection for failure prediction in large-scale ssd deployment”, “2021 51st Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN)”, pp. 263–270, 2021, https://doi.org/10.1109/DSN48987.2021.00039.
- J. Sevilla, L. Heim, A. Ho, T. Besiroglu, M. Hobbhahn, P. Villalobos, “Compute trends across three eras of machine learning”, “2022 International Joint Conference on Neural Networks (IJCNN)”, pp. 1–8, 2022, https://doi.org/10.1109/IJCNN55064.2022.9891914.
- S. Maneas, K. Mahdaviani, T. Emami, B. Schroeder, “A study of SSD reliability in large scale enterprise storage deployments”, “18th USENIX Conference on File and Storage Technologies (FAST 20)”, pp. 137–149, USENIX Association, Santa Clara, CA, 2020.
- M. T. Ribeiro, S. Singh, C. Guestrin, “Why should I trust you?: Explaining the predictions of any classifier”, “Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining”, KDD ’16, pp. 1135–1144, Association for Computing Machinery, New York, NY, USA, 2016, https://doi.org/10.1145/2939672.2939778.
- S. M. Lundberg, S.-I. Lee, “A unified approach to interpreting model predictions”, in “Advances in Neural Information Processing Systems”, vol. 30, Curran Associates, Inc., 2017.
- B. C. Cheong, “Transparency and accountability in AI systems: safeguarding wellbeing in the age of algorithmic decision-making”, Frontiers in Human Dynamics, vol. 6, 2024, https://doi.org/10.3389/fhumd.2024.1421273.
- L. Lin, C. Walker, V. Agarwal, “Explainable machine-learning tools for predictive maintenance of circulating water systems in nuclear power plants”, Nuclear Engineering and Technology, vol. 57, no. 9, p. 103588, 2025, https://doi.org/10.1016/j.net.2025.103588.
- B. Lund, T. Wang, N. R. Mannuru, B. Nie, S. Shimray, Z. Wang, “Standards, frameworks, and legislation for artificial intelligence (AI) transparency”, AI and Ethics, 2025, https://doi.org/10.1007/s43681-025-00661-4.
- A. Batool, D. Zowghi, M. Bano, “AI governance: a systematic literature review”, AI and Ethics, 2025, https://doi.org/10.1007/s43681-024-00653-w.
- J. Jakubowski, et al., “Performance of explainable AI methods in asset failure prediction”, “Computational Science – ICCS 2022”, Springer, Cham, Switzerland, 2022, https://doi.org/10.1007/978-3-031-08760-8_40.
- F. Amato, et al., “A comparative assessment of explainable AI tools in predicting hard disk drive health”, “Symposium on Advanced Database Systems (SEBD)”, Villasimius, Italy, 2024.
- B. Goodman, S. Flaxman, “European Union regulations on algorithmic decision-making and a ‘right to explanation’”, AI Magazine, vol. 38, no. 3, pp. 50–57, 2017, https://doi.org/10.1609/aimag.v38i3.2741.
- European Parliament and Council, “Regulation (EU) 2024/1689 laying down harmonised rules on artificial intelligence”, 2024, available: https://eur-lex.europa.eu/eli/reg/2024/1689/oj.
- National Institute of Standards and Technology, “AI Risk Management Framework (AI RMF 1.0)”, U.S. Department of Commerce, 2023, available: https://nvlpubs.nist.gov/nistpubs/ai/NIST.AI.100-1.pdf.
- European Union Agency for Fundamental Rights, “Data protection and AI: The right to explanation in practice”, 2022, available: https://fra.europa.eu/en/publication/2022/artificial-intelligence-right-to-explanation.
- Cybersecurity and Infrastructure Security Agency, “Secure and resilient AI framework”, 2024, available: https://www.cisa.gov/resources-tools/resources/secure-and-resilient-ai-framework.
- Y. Zhang, et al., “Multi-view feature-based SSD failure prediction: what, when, and why”, “USENIX FAST”, 2023.
- E. Pinheiro, W.-D. Weber, L. A. Barroso, “Failure trends in a large disk drive population”, “USENIX FAST”, 2007.
- B. Schroeder, G. A. Gibson, “Disk failures in the real world: what does an MTTF of 1,000,000 hours mean?”, “USENIX FAST”, 2007.
- F. Mahdisoltani, I. Stefanovici, B. Schroeder, “Predicting disk replacement toward reliable data centers”, “USENIX ATC”, 2017.
- J. Wen, Y. Zhang, X. Wang, Z. Chen, “A deep learning approach for hard drive failure prediction”, “IEEE Big Data”, 2018.
- V. Luković, Z. Jovanović, S. Durašević Pešović, U. Pešović, B. Đorđević, “Solid-state drive failure prediction using anomaly detection”, Electronics, vol. 14, no. 7, 2025, https://doi.org/10.3390/electronics14071433.
- J. Xiao, Z. Xiong, S. Wu, Y. Yi, H. Jin, K. Hu, “Disk failure prediction in data centers via online learning”, “ICPP ’18”, 2018, https://doi.org/10.1145/3225058.3225106.
- C. Lu, K. Ye, G. Xu, C.-Z. Xu, T. Bai, “Imbalance in the cloud: An analysis on Alibaba cluster trace”, “IEEE Big Data”, 2017, https://doi.org/10.1109/BigData.2017.8258257.
- N. V. Chawla, K. W. Bowyer, L. O. Hall, W. P. Kegelmeyer, “SMOTE: synthetic minority over-sampling technique”, Journal of Artificial Intelligence Research, vol. 16, pp. 321–357, 2002, https://doi.org/10.1613/jair.953.
- Saurav Kant Kumar, “A Tunable Dual-mode SIW Cavity Based Bandpass Filter with Wide Upper Stopband Characteristics”, Journal of Engineering Research and Sciences, vol. 2, no. 1, pp. 24–29, 2023. doi: 10.55708/js0201003
