An Integrated Approach to Manage Imbalanced Datasets using PCA with Neural Networks

Swarup Kumar Mondal; Anindya Sen

doi:10.55708/js0310001

Open AccessArticle

An Integrated Approach to Manage Imbalanced Datasets using PCA with Neural Networks

by Swarup Kumar Mondal^* and Anindya Sen

Department of Electronics and Communication Engineering, Heritage Institute of Technology, Kolkata, 700107, India

^*whom correspondence should be addressed. E-mail: swarup.kumarmondal.ece24@heritageit.edu.in

Journal of Engineering Research and Sciences, Volume 3, Issue 10, Page # 1-12, 2024; DOI: 10.55708/js0310001

Keywords: Imbalanced data, Regression, Deep Neural Network, Artificial Neural Network, Support Vector Machine

Received: 19 August 2024, Revised: 20 September 2024, Accepted: 21 September 2024, Published Online: 11 October 2024

(This article belongs to the Special Issue on SP5 (Special Issue on Multidisciplinary Sciences and Advanced Technology 2024) and the Section Artificial Intelligence – Computer Science (AIC))

Export Citations

Cite

APA Style
Mondal, S. K. and Sen, A. (2024). An Integrated Approach to Manage Imbalanced Datasets using PCA with Neural Networks. Journal of Engineering Research and Sciences, 3(10), 1–12. https://doi.org/10.55708/js0310001

Chicago/Turabian Style
Swarup Kumar Mondal and Anindya Sen. "An Integrated Approach to Manage Imbalanced Datasets using PCA with Neural Networks." Journal of Engineering Research and Sciences 3, no. 10 (October 2024): 1–12. https://doi.org/10.55708/js0310001

IEEE Style
S.K. Mondal and A. Sen, "An Integrated Approach to Manage Imbalanced Datasets using PCA with Neural Networks," Journal of Engineering Research and Sciences, vol. 3, no. 10, pp. 1–12, Oct. 2024, doi: 10.55708/js0310001.

Download Now!

546 Downloads

Abstract

Full Text

References

Cited By

Metrics

Abstract

Full Text

References

Cited By

Metrics

Features	Total Missing Values
PM_2.5	647,689
PM₁₀	1,119,252
NO	553,711
NO₂	528,973
NO_x	490,808
NH₃	1,236,618
CO	499,302
SO₂	742,737
O₃	725,973
Benzene	861,579
Toluene	1,042,366
Xylene	2,075,104
AQI	570,190
AQI Bucket	570,190

Sl. No.	Features	Correlation value	Sl. No	Features	Correlation value
1	PM_2.5	0.786344	7	CO	0.432508
2	PM₁₀	0.757663	8	SO₂	0.135806
3	NO	0.288469	9	O₃	0.094589
4	NO₂	0.441733	10	Benzene	0.125644
5	NO_x	0.426584	11	Toluene	0.169872
6	NH₃	0.283593	12	Xylene	0.090680

Model	Non PCA
	Normal		GS
	MSE	R2	MSE	R2
MLR	440.823	0.752
RR	439.022	0.753	439.022	0.753
LR	446.132	0.749	439.294	0.753
ELR	510.646	0.712	439.148	0.753

Model Name	PCA				Model				Type
	2 PCA Component				3 PCA Component				5 PCA Component
	Normal		GS		Normal		GS		Normal		GS
	MSE	R2	MSE	R2	MSE	R2	MSE	R2	MSE	R2	MSE	R2
MLR	546.891	0.692			461.973	0.740			450.89	0.746
RR	546.896	0.692	546.997	0.692	461.979	0.740	462.039	0.74	450.90	0.746	450.97	0.746
LR	549.518	0.691	546.909	0.692	467.196	0.737	462.006	0.74	457.81	0.742	450.89	0.746
ELR	596.489	0.664	547.008	0.692	527.029	0.703	461.988	0.74	520.04	0.707	451.07	0.746

Kernel Type		NON PCA			PCA		Model				Type
				2 PCA Component			3 PCA Component			5 PCA Component
	RMSE	R2	Accuracy	RMSE	R2	Accuracy	RMSE	R2	Accuracy	RMSE	R2	Accuracy
Linear	21.476	0.725		23.619	0.686		21.667	0.736		21.383	0.742
RBF	20.867	0.741	80.86	20.540	0.762	80.16	20.351	0.767	80.523	20.217	0.770	81.00
RBF_GS	20.045	0.761		20.186	0.770		19.541	0.785		18.984	0.797

An Integrated Approach to Manage Imbalanced Datasets using PCA with Neural Networks

An Integrated Approach to Manage Imbalanced Datasets using PCA with Neural Networks

1. Introduction

1.1. Hypothesis

1.2. Objective

2. Literature Review

3. Materials and Methods

3.1. Dataset

3.2. Operating system and software

3.3. Data Preprocessing

3.4. Feature Selection

3.5. Experiment Performed

3.5.1. Proposed Architecture

3.6. Algorithm

3.7. Steps

3.7.1. Proposed Model

3.7.1.1. Multilinear Regression

3.7.1.2. Ridge, Lasso and Elastic Net Regression

3.7.1.3. Support Vector Machine (SVM)

3.7.1.4. Artificial Neural Network (ANN)

3.7.1.5. Deep Neural Network (DNN)

4. Results

5. Conclusion

6. Future Work

Acknowledgement

Conflict of Interest

Citations by Dimensions

Citations by PlumX

Crossref Citations

Important Links

Copyright

Address

Share Link

Metrics	NON PCA		PCA Model Type
		2 PCA Component	3 PCA Component	5 PCA Component
MAE	14.616	15.025	14.979	15.235
MSE	390.485	408.578	398.932	427.794
RMSE	19.760	20.213	19.973	20.683
MAPE	0.198	0.206	0.205	0.207
R2	0.757	0.746	0.752	0.734
Accuracy	80.152	79.341	79.442	79.206

Metrics	NON PCA		PCA Model Type
		2 PCA Component	3 PCA Component	5 PCA Component
MAE	13.08907	15.67781	17.27715	15.70377
MSE	334.054	438.4483	583.0541	469.349
RMSE	18.27715	20.93916	24.14651	21.66446
MAPE	0.172188	0.213611	0.229836	0.208632
R2	0.792717	0.72794	0.638211	0.708766
Accuracy	82.78118	78.63886	77.01642	79.13676