Open AccessArticle
AI-Driven Data Lake Optimization: Integrating Quality Monitoring with Intelligent Physical Design Decisions
Independent Researcher, MPS in Data Science, University of Maryland Baltimore County, Baltimore, MD 21250, USA
*whom correspondence should be addressed. E-mail: deva20829@gmail.com
Journal of Engineering Research and Sciences, Volume 5, Issue 3, Page # 1-13, 2026; DOI: 10.55708/js0503001
Keywords: Data Lake Optimization, Machine Learning, Reinforcement Learning, Data Quality Monitoring, Physical Database Design, Drift Detection
Received: 24 December 2025, Revised: 5 February 2026, Accepted: 9 February 2026, Published Online: 6 March 2026
(This article belongs to the Section Artificial Intelligence – Computer Science (AIC))
Export Citations
Cite
APA Style
Deva, S. and Chintacunta, S. N. R. (2026). AI-Driven Data Lake Optimization: Integrating Quality Monitoring with Intelligent Physical Design Decisions. Journal of Engineering Research and Sciences, 5(3), 1–13. https://doi.org/10.55708/js0503001
Deva, S. and Chintacunta, S. N. R. (2026). AI-Driven Data Lake Optimization: Integrating Quality Monitoring with Intelligent Physical Design Decisions. Journal of Engineering Research and Sciences, 5(3), 1–13. https://doi.org/10.55708/js0503001
Chicago/Turabian Style
Sowjanya Deva and Surya Narayana Reddy Chintacunta. "AI-Driven Data Lake Optimization: Integrating Quality Monitoring with Intelligent Physical Design Decisions." Journal of Engineering Research and Sciences 5, no. 3 (March 2026): 1–13. https://doi.org/10.55708/js0503001
Sowjanya Deva and Surya Narayana Reddy Chintacunta. "AI-Driven Data Lake Optimization: Integrating Quality Monitoring with Intelligent Physical Design Decisions." Journal of Engineering Research and Sciences 5, no. 3 (March 2026): 1–13. https://doi.org/10.55708/js0503001
IEEE Style
S. Deva and S.N.R. Chintacunta, "AI-Driven Data Lake Optimization: Integrating Quality Monitoring with Intelligent Physical Design Decisions," Journal of Engineering Research and Sciences, vol. 5, no. 3, pp. 1–13, Mar. 2026, doi: 10.55708/js0503001.
S. Deva and S.N.R. Chintacunta, "AI-Driven Data Lake Optimization: Integrating Quality Monitoring with Intelligent Physical Design Decisions," Journal of Engineering Research and Sciences, vol. 5, no. 3, pp. 1–13, Mar. 2026, doi: 10.55708/js0503001.
187 Downloads
Abstract
Full Text
References
Cited By
Metrics
Related Articles
Abstract
Full Text
References
- D. Sculley, G. Holt, D. Golovin, E. Davydov, T. Phillips, D. Ebner, V. Chaudhary, M. Young, J.-F. Crespo, and D. Dennison, “Hidden technical debt in machine learning systems,” Advances in Neural Information Processing Systems, vol. 28, pp. 2503–2511, 2015.
- J. Dixon, “Data lakes: a new generation of data repositories,” Proceedings of the ACM SIGMOD Workshop on Data Analytics in the Cloud, 2010.
- Sharma, V. Kumar, and R. Gupta, “Modern data lakes: a conceptual framework,” IEEE Access, vol. 9, pp. 127876–127891, 2021, doi:10.1109/ACCESS.2021.3112517.
- M. Armbrust, T. Das, S. Zhu, R. Xin, B. Ghodsi, J. Stoica, and M. Zaharia, “Delta lake: high-performance ACID table storage,” Proceedings of the VLDB Endowment, vol. 13, no. 12, pp. 3411–3424, 2020, doi:10.14778/3415478.3415560.
- M. Armbrust, J. Shi, A. Jindal, G. K. Lee, K. Xin, M. Zaharia, and I. Stoica, “Lakehouse: a new generation of open platforms that unify data warehousing and advanced analytics,” Proceedings of the Conference on Innovative Data Systems Research (CIDR), 2021.
- V. Prashanth, S. Das, J. Li, and V. Narasayya, “Apache hudi: the case for incremental processing on big data,” IEEE Data Engineering Bulletin, vol. 44, no. 1, pp. 13–27, 2021.
- R. Blue, D. Petersohn, A. Reeves, and M. Rodgers, “Apache iceberg: a modern table format for big data,” Proceedings of the VLDB Endowment, vol. 13, no. 12, pp. 3411–3424, 2020.
- T. Kraska, A. Beutel, E. H. Chi, J. Dean, and N. Polyzotis, “The case for learned index structures,” Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 489–504, 2018, doi:10.1145/3183713.3196909.
- R. Marcus, P. Negi, H. Mao, C. Zhang, N. Tatbul, M. Alizadeh, T. Kraska, O. Papaemmanouil, and N. Polyzotis, “Neo: a learned query optimizer,” Proceedings of the VLDB Endowment, vol. 12, no. 11, pp. 1705–1718, 2019.
- Kipf, T. Kipf, B. Radke, and V. Markl, “Learned cardinalities: estimating correlated joins with deep learning,” Proceedings of the Conference on Innovative Data Systems Research (CIDR), 2019.
- S. Chaudhuri and V. Narasayya, “An efficient cost-driven index selection tool for Microsoft SQL Server,” Proceedings of the VLDB Conference, pp. 146–155, 1997.
- N. Bruno and S. Chaudhuri, “Automatic physical database tuning: a relaxation-based approach,” Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 227–238, 2005, doi:10.1145/1066157.1066187.
- Schirmer, T. Neumann, and A. Kemper, “Workload-driven horizontal partitioning and pruning for large OLTP systems,” Proceedings of the IEEE ICDE Workshops, pp. 146–151, 2018.
- Z. Abedjan, L. Golab, and F. Naumann, “Data profiling,” Synthesis Lectures on Data Management, vol. 10, no. 4, pp. 1–154, 2018, doi:10.2200/S00838ED1V01Y201808DTM045.
- J. Gama, I. Žliobaitė, A. Bifet, M. Pechenizkiy, and A. Bouchachia, “A survey on concept drift adaptation,” ACM Computing Surveys, vol. 46, no. 4, pp. 1–37, 2014, doi:10.1145/2523813.
- R. Chalapathy and S. Chawla, “Deep learning for anomaly detection: a survey,” arXiv preprint arXiv:1901.03407, 2019.
- Pavlo, G. Angulo, J. Arulraj, H. Lin, J. Lin, L. Ma, P. Menon, T. Mühlbauer, S. Tozer, and D. Stonebraker, “Self-driving database management systems,” Proceedings of the Conference on Innovative Data Systems Research (CIDR), 2017.
- T. Kraska, M. Alizadeh, A. Beutel, E. H. Chi, A. Kristo, G. Leclerc, S. Madden, H. Mao, and V. Nathan, “SageDB: a learned database system,” Proceedings of the Conference on Innovative Data Systems Research (CIDR), 2019.
- D. Van Aken, A. Pavlo, G. Gordon, and B. Zhang, “Automatic database management system tuning through large-scale machine learning,” Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 1009–1024, 2017, doi:10.1145/3035918.3064029.
- K. Deb, A. Pratap, S. Agarwal, and T. Meyarivan, “A fast and elitist multiobjective genetic algorithm: NSGA-II,” IEEE Transactions on Evolutionary Computation, vol. 6, no. 2, pp. 182–197, Apr. 2002, doi:10.1109/4235.996017.
- V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, D. Wierstra, and M. Riedmiller, “Human-level control through deep reinforcement learning,” Nature, vol. 518, no. 7540, pp. 529–533, 2015, doi:10.1038/nature14236.
- R. Sutton and A. Barto, Reinforcement learning: an introduction, 2nd ed., Cambridge, MA, USA: MIT Press, 2018.
- Z. Wang, T. Schaul, M. Hessel, H. van Hasselt, M. Lanctot, and N. de Freitas, “Dueling network architectures for deep reinforcement learning,” Proceedings of the International Conference on Machine Learning, pp. 1995–2003, 2016.
- D. P. Kingma and J. Ba, “Adam: a method for stochastic optimization,” Proceedings of the International Conference on Learning Representations, 2015.
- T. Schaul, J. Quan, I. Antonoglou, and D. Silver, “Prioritized experience replay,” Proceedings of the International Conference on Learning Representations, 2016.
- E. S. Page, “Continuous inspection schemes,” Biometrika, vol. 41, no. 1–2, pp. 100–115, 1954, doi:10.1093/biomet/41.1-2.100.
- Bifet and R. Gavaldà, “Learning from time-changing data with adaptive windowing,” Proceedings of the SIAM International Conference on Data Mining, 2007.
- Gretton, K. Borgwardt, M. Rasch, B. Schölkopf, and A. Smola, “A kernel two-sample test,” Journal of Machine Learning Research, vol. 13, pp. 723–773, 2012.
- T. Chen and C. Guestrin, “XGBoost: a scalable tree boosting system,” Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 785–794, 2016, doi:10.1145/2939672.2939785.
- D. P. Kingma and M. Welling, “Auto-encoding variational Bayes,” Proceedings of the International Conference on Learning Representations, 2014.
- S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural Computation, vol. 9, no. 8, pp. 1735–1780, 1997, doi:10.1162/neco.1997.9.8.1735.
- D. Bahdanau, K. Cho, and Y. Bengio, “Neural machine translation by jointly learning to align and translate,” Proceedings of the International Conference on Learning Representations, 2015.
- L. Breiman, “Random forests,” Machine Learning, vol. 45, no. 1, pp. 5–32, 2001.
- T. N. Kipf and M. Welling, “Semi-supervised classification with graph convolutional networks,” Proceedings of the International Conference on Learning Representations, 2017.
- M. Zaharia, M. Chowdhury, T. Das, A. Dave, J. Ma, M. McCauley, M. J. Franklin, S. Shenker, and I. Stoica, “Apache Spark: a unified engine for big data processing,” Communications of the ACM, vol. 59, no. 11, pp. 56–65, Nov. 2016, doi:10.1145/2934664.
- R. Marcus, P. Negi, H. Mao, C. Zhang, N. Tatbul, M. Alizadeh, T. Kraska, O. Papaemmanouil, and N. Polyzotis, “Bao: making learned query optimization practical,” Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 1275–1288, 2021.
Cited By
Metrics
Related Articles
