How to Fix Automation Flakiness: Root Causes and Enterprise-Level Solutions
Journal of Engineering Research and Sciences, Volume 5, Issue 2, Page # 9-23, 2026; DOI: 10.55708/js0502002
Keywords: Automation Flakiness, CI/CD Pipelines, Multi-layer Stability Framework, AI-driven Flakiness Analytics, Enterprise Test Automation
(This article belongs to the Section Hardware and Architecture – Computer Science (HAC))
Export Citations
Cite
Tiwari, S. K. (2026). How to Fix Automation Flakiness: Root Causes and Enterprise-Level Solutions. Journal of Engineering Research and Sciences, 5(2), 9–23. https://doi.org/10.55708/js0502002
Sujeet Kumar Tiwari. "How to Fix Automation Flakiness: Root Causes and Enterprise-Level Solutions." Journal of Engineering Research and Sciences 5, no. 2 (February 2026): 9–23. https://doi.org/10.55708/js0502002
S.K. Tiwari, "How to Fix Automation Flakiness: Root Causes and Enterprise-Level Solutions," Journal of Engineering Research and Sciences, vol. 5, no. 2, pp. 9–23, Feb. 2026, doi: 10.55708/js0502002.
Flakiness in automation is one of the most intractable barriers to dependable enterprise CI/CD, in which organizations can run more than 50M tests daily, and a 5-10% flaky rate may spoil thousands of builds. The paper brings together empirical research and industrial case studies on UI, API, mobile, and data pipelines to describe the prevalent flakiness root causes, such as asynchronous UI, discontinuous DOM, unsteady test data, environment latency spikes, concurrency defects, and the absence of synchronization. It advances a multi-level stability design which incorporates deterministic locator strategies, clever wait handling, resilient API contracts, controlled test-data administration, and reliability designs of cloud-based environment patterns. The method is supported with the help of AI-supported analytics, where execution telemetry, the flakiness probability score, and heatmaps would be utilized to identify unstable tests early and devote effort to remediation. Large-scale settings reported case studies indicate a 5-10% to less than 1% drop in the rate of flaky tests, infrastructure savings of up to 2 times, and savings in diagnostic effort of up to 30-50%. The paper ends with providing explicit future directions, such as achieving a >95% accuracy on automated flakiness location, benchmark and KPI standardization, and stability engineering and integration with wider practices of reliability and governance. These guidelines will help ensure that test stability is an SRE first-class objective.
- S. Habchi, G. Haben, M. Papadakis, M. Cordy, and Y. Le Traon, “A qualitative study on the sources, impacts, and mitigation strategies of flaky tests,” arXiv preprint arXiv:2112.04919, 2021. [Online]. Available: https://arxiv.org/pdf/2112.04919.
- A. Tahir, S. Rasheed, J. Dietrich, N. Hashemi, and L. Zhang, “Test flakiness causes, detection, impact and responses: A multi-vocal review,” Journal of Systems and Software, vol. 206, Art. no. 111837, 2023, doi:10.1016/j.jss.2023.111837.
- F. Leinen, D. Elsner, A. Pretschner, A. Stahlbauer, M. Sailer, and E. Jürgens, “Cost of flaky tests in continuous integration: An industrial case study,” in Proceedings of the 2024 IEEE International Conference on Software Testing, Verification and Validation, 2024, pp. 329–340, doi:10.1109/ICST60714.2024.00037.
- O. Parry, G. M. Kapfhammer, M. Hilton, and P. McMinn, “A survey of flaky tests,” ACM Transactions on Software Engineering and Methodology, vol. 31, no. 1, pp. 1–74, 2021, doi:10.1145/3476105.
- Q. Luo, F. Hariri, L. Eloussi, and D. Marinov, “An empirical analysis of flaky tests,” in Proceedings of the 22nd ACM SIGSOFT International Symposium on the Foundations of Software Engineering, 2014, pp. 643–653, doi:10.1145/2635868.2635920.
- M. Gruber, S. Lukasczyk, F. Kroiß, and G. Fraser, “An empirical study of flaky tests in Python,” in Proceedings of the 2021 IEEE International Conference on Software Testing, Verification and Validation, 2021, pp. 148–158, doi:10.1109/ICST49551.2021.00026.
- A. Romano, Z. Song, S. Grandhi, W. Yang, and W. Wang, “An empirical analysis of UI-based flaky tests,” in Proceedings of the 2021 IEEE/ACM International Conference on Software Engineering, 2021, pp. 1585–1597, doi:10.1109/ICSE43902.2021.00141.
- Z. Dong, A. Tiwari, X. L. Yu, and A. Roychoudhury, “Flaky test detection in Android via event order exploration,” in Proceedings of the 2021 ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, 2021, pp. 367–378, doi:10.1145/3468264.3468584.
- G. Haben, S. Habchi, M. Papadakis, M. Cordy, and Y. Le Traon, “The importance of discerning flaky from fault-triggering test failures: A case study on the Chromium continuous integration,” arXiv preprint arXiv:2302.10594, 2023. [Online]. Available: https://arxiv.org/abs/2302.10594.
- C. Ziftci and D. Cavalcanti, “De-flake your tests: Automatically locating root causes of flaky tests in code at Google,” in Proceedings of the 2020 IEEE International Conference on Software Maintenance and Evolution, 2020, pp. 736–745, doi:10.1109/ICSME46990.2020.00075.
- M. Machalica, A. Samylkin, M. Porth, and S. Chandra, “Predictive test selection,” in Proceedings of the 2019 IEEE/ACM International Conference on Software Engineering: Software Engineering in Practice, 2019, pp. 91–100, doi:10.1109/ICSE-SEIP.2019.00018.
- P. R. Vennamaneni, “Building compliance-driven AI systems: Navigating IEC 62304 and PCI-DSS constraints,” International Journal of Network Security, 2025.
- R. Khankhoje, “Strategies for mitigating flaky tests in automated environments,” 2025. International Journal of Science and Research, vol. 8, no. 3, 1950-1954, 2019
- O. Parry, G. Kapfhammer, M. Hilton, and P. McMinn, “Systemic flakiness: An empirical analysis of co-occurring flaky test failures,” arXiv preprint arXiv:2504.16777, 2025. [Online]. Available: https://arxiv.org/abs/2504.16777.
- V. K. Enugala, “BIM-to-field inspection workflows for zero paper sites,” Utilitas Mathematica, vol. 122, no. 2, pp. 372–404, 2025.
- A. Gajera Jr., “Comparative analysis of Jenkins, GitLab CI, and GitHub Actions: Performance evaluation in CI/CD pipelines,” Bachelor’s thesis, Metropolia University of Applied Sciences, Helsinki, Finland, 2025. [Online]. Available: https://urn.fi/URN:NBN:fi:amk-202504309434.
- A. C. Jha, “AI-optimized spine–leaf fabrics: NVIDIA Quantum-2 vs. Cisco Nexus,” Journal of Information Systems Engineering and Management, vol. 10, no. 60s, pp. 1209–1234, 2025.
- M. Jonson and S. Törnqvist, “Analyzing root causes and smells of test flakiness by simulating resource usage,” unpublished manuscript, 2025.
- S. R. Rouholamini, M. Mirabi, R. Farazkish, and A. Sahafi, “Proactive self-healing techniques for cloud computing: A systematic review,” Concurrency and Computation: Practice and Experience, vol. 36, no. 24, Art. no. e8246, 2024, doi:10.1002/cpe.8246.
- A. Khan, “What is a flaky test in software testing, and how to fix it,” Currents.dev, Oct. 30, 2025. [Online]. Available: https://currents.dev/posts/what-is-a-flaky-test-and-how-to-fix-it.
- P. Gannavarapu, “Cloud infrastructure management and automation,” AJT Journal, 2025.
- A. Chandrachood, “Optimizing resource allocation through telemetry-based performance monitoring,” North American Journal of Engineering Research, vol. 4, no. 4, 2023.
- S. Singh, “Early-warning prediction for machine failures in automated industries using advanced machine learning techniques,” unpublished manuscript, 2023.
- M. Y. H. Yeow, C. Y. Chong, M. K. Lim, and Y. Y. Yee, “Predicting software reuse using machine learning techniques—A case study on open-source Java software systems,” PLOS ONE, vol. 20, no. 2, e0314512, 2025, doi:10.1371/journal.pone.0314512.
- O. Patlak, “Strategies to ensure software quality in existing Java applications,” Doctoral dissertation, University of Applied Sciences, 2023.
- C. Saastamoinen, “Evaluation of machine learning models in predicting software flakiness,” Master’s thesis, 2024.
- S. Thorve, C. Sreshtha, and N. Meng, “An empirical study of flaky tests in Android apps,” in Proceedings of the 2018 IEEE International Conference on Software Maintenance and Evolution, 2018, pp. 534–538, doi:10.1109/ICSME.2018.00062.
- M. R. Dhanagari, “Choosing the right NoSQL database: MongoDB vs. Aerospike for enterprise applications,” SciPubHouse, 2025.
- G. Oliveira, “The hidden costs of flaky tests: A deep dive into test reliability,” StickyMinds, May 5, 2025. [Online]. Available: https://www.stickyminds.com/article/hidden-costs-flaky-tests-deep-dive-test-reliability-0.
- M. Machalica, W. Chmiel, S. Swierc, and R. Sakevych, “Probabilistic flakiness: How do you test your tests?” Engineering at Meta, Dec. 10, 2020. [Online]. Available: https://engineering.fb.com/2020/12/10/developer-tools/probabilistic-flakiness/.
- Y. Priya, “AI meets CI/CD: Supercharging test automation for speed and reliability,” 2024.
- S. Hashem, “Exploring confidence challenges in integrating third-party binaries in a CI/CD pipeline with limited transparency,” 2025.
- K. Lulla, “Cross-border compliance and quality assurance in semiconductor manufacturing,” Journal of Electrical Systems, vol. 21, no. 1s, pp. 493–511, 2025. [Online]. Available: https://journal.esrgroups.org/jes/article/view/9196.
- H. N. Zhu, R. M. Furth, M. Pradel, and C. Rubio-González, “From bugs to benchmarks: A comprehensive survey of software defect datasets,” arXiv preprint arXiv:2504.17977, 2025. [Online]. Available: https://arxiv.org/abs/2504.17977
- Z. Sayyed, “Application-level scalable leader selection algorithm for distributed systems,” International Journal of Computational and Experimental Science and Engineering, 2025.
- D. Drofa, “Integrating advanced API solutions into full-stack web and mobile applications to optimise user experience,” International Journal of Current Science Research and Review, vol. 8, no. 5, pp. 2086–2100, 2025.
- V. Pontillo, F. Palomba, and F. Ferrucci, “Test code flakiness in mobile apps: The developer’s perspective,” Information and Software Technology, vol. 168, Art. no. 107394, 2024, doi:10.1016/j.infsof.2023.107394.
- S. Grover, S. Yadav, S. K. Tiwari, and S. Ramachandran, “Engineering robust AI products through continuous quality assurance: A framework for testing, monitoring, and validation of adaptive live learning AI/ML systems in dynamic production environments,” International Journal of Applied Mathematics, vol. 38, no. 2s, 2025, doi:10.12732/ijam.v38i2s.710.
No related articles were found.
