I’m currently working on image denoising.
I have written a pretty basic CNN model architecture which has the following architecture:-
CNNmodel(
(model): Sequential(
(0): Conv2d(1, 8, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): BatchNorm2d(8, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): PReLU(num_parameters=1)
(3): Conv2d(8, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(4): BatchNorm2d(16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(5): PReLU(num_parameters=1)
(6): Dropout(p=0.2, inplace=False)
(7): Conv2d(16, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(8): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(9): PReLU(num_parameters=1)
(10): Dropout(p=0.2, inplace=False)
(11): Conv2d(32, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(12): BatchNorm2d(16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(13): PReLU(num_parameters=1)
(14): Dropout(p=0.2, inplace=False)
(15): Conv2d(16, 8, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(16): BatchNorm2d(8, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(17): PReLU(num_parameters=1)
(18): Dropout(p=0.2, inplace=False)
(19): Conv2d(8, 1, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(20): PReLU(num_parameters=1)
)
)
Few details of my project-
- The training data has 5k images and validation data has 500 images
- Batch size is of "1" and total iteration of 30k which brings num_epochs = 5
Now the issue I’m facing is:-
- When I start training the model the 0th epoch (i.e. 5k iters) takes around 70 minutes and rest 4 epochs get over in next 70 mins. Also the loss( I've selected MSE loss doesn't seem to reduce or vary much over 5 epochs either.
I think this is due to vanishing gradients.
Any comments why is this happening and what can be done to overcome this challenge?