Actually, I found out problem was my custom Siamese net not my loss function. I want to use a pretrained Vgg face model and continuously train on my dataset. My Siamese net likes that:
class SiameseNetwork(nn.Module):
def __init__(self, vgg_model):
super(SiameseNetwork, self).__init__()
self.vgg = vgg_model
def forward(self,x0,x1):
out1 = self.vgg(x0)
out2 = self.vgg(x1)
return out1, out2
And I found out that nan values come from output of my network, now I’m trying my best to discover why my network gives nan values. I have checked input to find anomaly but nothing abnormally. Can you give me some advice? I will appreciate it.
[[0.9608, 0.9569, 0.9451, ..., 0.1098, 0.1059, 0.1059],
[0.9569, 0.9490, 0.9333, ..., 0.1098, 0.1098, 0.1098],
[0.9451, 0.9333, 0.9098, ..., 0.1137, 0.1137, 0.1137],
...,
[0.9529, 0.9529, 0.9490, ..., 0.4235, 0.4275, 0.4275],
[0.9490, 0.9490, 0.9529, ..., 0.4235, 0.4275, 0.4275],
[0.9490, 0.9490, 0.9529, ..., 0.4235, 0.4275, 0.4275]],
[[0.8941, 0.8902, 0.8784, ..., 0.0902, 0.0863, 0.0863],
[0.8902, 0.8863, 0.8706, ..., 0.0902, 0.0863, 0.0863],
[0.8863, 0.8745, 0.8471, ..., 0.0941, 0.0902, 0.0902],
...,
[0.9569, 0.9569, 0.9529, ..., 0.2902, 0.2902, 0.2902],
[0.9529, 0.9529, 0.9529, ..., 0.2863, 0.2902, 0.2902],
[0.9529, 0.9529, 0.9529, ..., 0.2863, 0.2902, 0.2902]]]],
device='cuda:0')
----------------------------------------------------------
output: tensor([[-0.0138, -0.0085, 0.0077, ..., -0.0088, -0.0003, -0.0021],
[ nan, nan, nan, ..., nan, nan, nan],
[ 0.0359, 0.0134, 0.0074, ..., 0.0280, 0.0116, 0.0102],
...,
[ nan, nan, nan, ..., nan, nan, nan],
[ 0.0099, 0.0099, -0.0156, ..., 0.0109, -0.0009, 0.0184],
[-0.0121, -0.0051, 0.0370, ..., 0.0406, 0.0065, -0.0012]],
device='cuda:0', dtype=torch.float16, grad_fn=<AddmmBackward>) tensor([[ nan, nan, nan, ..., nan, nan, nan],
[-0.0158, -0.0266, 0.0180, ..., -0.0115, 0.0026, -0.0345],
[ nan, nan, nan, ..., nan, nan, nan],
...,
[ 0.0249, 0.0007, -0.0102, ..., -0.0132, 0.0214, 0.0118],
[ 0.0028, 0.0037, 0.0042, ..., 0.0135, 0.0115, -0.0005],
[ 0.0220, 0.0144, 0.0100, ..., 0.0045, 0.0385, -0.0046]],
device='cuda:0', dtype=torch.float16, grad_fn=<AddmmBackward>)
Additionally, I load Vgg model like that:
from torchsummary import summary
vgg_model = vgg_face_dag('pretrained/vgg_face_dag.pth').to(device)
'''for param in vgg_model.parameters():
param.requires_grad = False'''
idx = 0
for layer in vgg_model.children():
idx += 1
if idx < 34:
for param in layer.parameters():
param.requires_grad = False
summary(vgg_model,(3,224,224))