I have been trying to replicate a paper and build the same model but with few changes. Adding Non-Linear Contrastive loss (Lossless triplet loss) and better data augmentation and I have not been able to get past the 70% accuracy mark on the test set and also the test loss doesn’t seem to decrease despite 20+ epochs of training on using the standard contrastive loss as well as the Lossless Triplet Loss. I have been forced use the learning rate <= 1e-5 as anything grater results in the model predicting mean values of the bias for any input. I have also implements weight regularization and reduced the model complexity severely (250M+ to 4M+ parameters). The Model seems to overfit as only the train loss seems to be going down and the test loss is increasing.
The Model :
class PhiNet(nn.Module):
def __init__(self, ):
super(PhiNet, self).__init__()
self.layer1 = nn.Sequential(
nn.Conv2d(1,96,kernel_size=11,stride=1,padding=1),
nn.ReLU(),
nn.BatchNorm2d(96,eps=1e-06, momentum=0.9),
nn.MaxPool2d(kernel_size=3, stride=2))
self.layer2 = nn.Sequential(
nn.Conv2d(96, 256, kernel_size=5, stride=1, padding=2),
nn.ReLU(),
nn.BatchNorm2d(256,eps=1e-06, momentum=0.9),
nn.MaxPool2d(kernel_size=3, stride=2),
nn.Dropout2d(p=0.3))
self.layer3 = nn.Sequential(
nn.Conv2d(256,324, kernel_size=3, stride=1, padding=1),
nn.ReLU()
)
self.layer4 = nn.Sequential(
nn.Conv2d(324,64, kernel_size=3, stride=1),
nn.ReLU(),
nn.MaxPool2d(kernel_size=3, stride=2),
nn.Dropout2d(p=0.3))
self.layer5 = nn.Sequential(
nn.Conv2d(64,32, kernel_size=3, stride=1),
nn.ReLU(),
nn.MaxPool2d(kernel_size=3, stride=2),
nn.Dropout2d(p=0.3))
self.layer6 = nn.Sequential(
nn.Linear(2880,1024),
nn.ReLU(),
nn.Dropout(p=0.6))
self.layer7 = nn.Sequential(
nn.Linear(1024,128),
nn.Sigmoid())
for m in self.modules():
if isinstance(m, nn.Conv2d):
nn.init.kaiming_normal_(m.weight, mode='fan_in')
def forward(self, x):
out = self.layer1(x)
#print (out.size())
out = self.layer2(out)
out = self.layer3(out)
out = self.layer4(out)
out = self.layer5(out
out = out.reshape(out.size()[0], -1)
#FC
out = self.layer6(out)
out = self.layer7(out)
return out
Loss:
class ContrastiveLoss(torch.nn.Module):
"""
Contrastive loss function.
Based on: http://yann.lecun.com/exdb/publis/pdf/hadsell-chopra-lecun-06.pdf
"""
def __init__(self, margin=2.0):
super(ContrastiveLoss, self).__init__()
self.margin = margin
self.eps=1e-5
def forward(self, output1,output2, label):
euclidean_distance = F.pairwise_distance(output1, output2)
loss_contrastive = torch.mean((1-label) * torch.pow(euclidean_distance, 2) +
(label) * torch.pow(torch.clamp(self.margin - euclidean_distance, min=0.0), 2))
return loss_contrastive
Model Summary:
from torchsummary import summary
summary(phinet, (1,300, 150))
----------------------------------------------------------------
Layer (type) Output Shape Param #
================================================================
Conv2d-1 [-1, 96, 292, 142] 11,712
ReLU-2 [-1, 96, 292, 142] 0
BatchNorm2d-3 [-1, 96, 292, 142] 192
MaxPool2d-4 [-1, 96, 145, 70] 0
Conv2d-5 [-1, 256, 145, 70] 614,656
ReLU-6 [-1, 256, 145, 70] 0
BatchNorm2d-7 [-1, 256, 145, 70] 512
MaxPool2d-8 [-1, 256, 72, 34] 0
Dropout2d-9 [-1, 256, 72, 34] 0
Conv2d-10 [-1, 324, 72, 34] 746,820
ReLU-11 [-1, 324, 72, 34] 0
Conv2d-12 [-1, 64, 70, 32] 186,688
ReLU-13 [-1, 64, 70, 32] 0
MaxPool2d-14 [-1, 64, 34, 15] 0
Dropout2d-15 [-1, 64, 34, 15] 0
Conv2d-16 [-1, 32, 32, 13] 18,464
ReLU-17 [-1, 32, 32, 13] 0
MaxPool2d-18 [-1, 32, 15, 6] 0
Dropout2d-19 [-1, 32, 15, 6] 0
Linear-20 [-1, 1024] 2,950,144
ReLU-21 [-1, 1024] 0
Dropout-22 [-1, 1024] 0
Linear-23 [-1, 128] 131,200
Sigmoid-24 [-1, 128] 0
================================================================
Total params: 4,660,388
Trainable params: 4,660,388
Non-trainable params: 0
----------------------------------------------------------------
Input size (MB): 0.17
Forward/backward pass size (MB): 182.64
Params size (MB): 17.78
Estimated Total Size (MB): 200.59
----------------------------------------------------------------
And here is the model training:
0%| | 0/480 [00:00<?, ?it/s]
Epoch: 0
100%|██████████| 480/480 [02:07<00:00, 3.81it/s]
100%|██████████| 160/160 [00:15<00:00, 11.36it/s]
100%|██████████| 160/160 [00:15<00:00, 11.31it/s]
0%| | 0/480 [00:00<?, ?it/s]
[0.25258303 0.18019645 0.33193994 0.22606815 0.4684852 ]
Accuracy:62.989 Threshold:0.260
Saving..
Train Loss: 1.0204066408177217
Test Loss: 1.2720871403813363
Epoch: 1
100%|██████████| 480/480 [02:07<00:00, 3.79it/s]
100%|██████████| 160/160 [00:15<00:00, 11.28it/s]
100%|██████████| 160/160 [00:15<00:00, 11.69it/s]
0%| | 0/480 [00:00<?, ?it/s]
[1.0923935 0.52174014 0.58343613 0.5127276 0.09465909]
Accuracy:60.300 Threshold:0.390
Saving..
Train Loss: 0.7175934920708339
Test Loss: 1.2123102966696024
Epoch: 2
100%|██████████| 480/480 [02:07<00:00, 3.81it/s]
100%|██████████| 160/160 [00:15<00:00, 11.77it/s]
100%|██████████| 160/160 [00:15<00:00, 11.27it/s]
[0.20261455 0.3708717 0.37443015 1.8816589 2.446019 ]
Accuracy:62.704 Threshold:0.450
Saving..
0%| | 0/480 [00:00<?, ?it/s]
Train Loss: 0.5991284149698913
Test Loss: 1.0973380882292987
Epoch: 3
100%|██████████| 480/480 [02:07<00:00, 3.78it/s]
100%|██████████| 160/160 [00:15<00:00, 11.58it/s]
100%|██████████| 160/160 [00:15<00:00, 11.56it/s]
0%| | 0/480 [00:00<?, ?it/s]
[2.9831722 0.46837586 0.86912566 0.6062222 0.26007086]
Accuracy:62.067 Threshold:0.320
Train Loss: 0.5078814978090426
Test Loss: 1.1578331850469112
I would really appreciate any kind of input. Thank you!