Loss doesn't Decrease and the output is zero

Rickyim · April 10, 2018, 5:53am

I am implementing a 3d deconvolution autoencoder. The issue I am having now is that the training(and testing) loss doesn’t seems to decrease and the output of the neural network is a zero matrix.
I wonder if my training process is right, i.e., declare a net, read in the data, change the data into variable type, feed the data into the net, calculating loss, backpropogate and update the weight.
this is my net:

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv3d(1, 4, 5, padding=2)
        self.conv2 = nn.Conv3d(4, 16, 5, padding=2)
        self.enc1 = nn.Conv3d(16, 32, 5, stride=2, padding=2)
        self.enc2 = nn.Conv3d(32, 64, 5, stride=2, padding=2)
        self.enc3 = nn.Conv3d(64, 128, 5, stride=2, padding=2)
        self.enc4 = nn.Conv3d(128, 256, 5, stride=2, padding=2)
        self.dec4 = nn.ConvTranspose3d(256, 128, 5, stride=2, padding=2)
        self.dec3 = nn.ConvTranspose3d(128, 64, 5, stride=2, padding=2, output_padding=1)
        self.dec2 = nn.ConvTranspose3d(64, 32, 5, stride=2, padding=2, )
        self.dec1 = nn.ConvTranspose3d(32, 16, 5, stride=2, padding=2)
        self.conv3= nn.Conv3d(16, 8, 5, padding=2)
        self.conv4= nn.Conv3d(8, 1, 5, padding=2)
    def forward(self, x):
        x=F.relu(self.conv1(x))
        x=F.relu(self.conv2(x))
        x=F.relu(self.enc1(x))
        x=F.relu(self.enc2(x))
        x=F.relu(self.enc3(x))
        x=F.relu(self.enc4(x))
        x=F.relu(self.dec4(x))
        x=F.relu(self.dec3(x))
        x=F.relu(self.dec2(x))
        x=F.relu(self.dec1(x))
        x=F.relu(self.conv3(x))
        return x

and this is my training process

#define a net
net = Net()
#using Gpus
if torch.cuda.is_available():
    net.cuda()
else:
    print('cuda disabled')
print(net)
optimizer=optim.SGD(net.parameters(), lr=0.0001)
criterion=nn.MSELoss()


for i_batch, sample in enumerate(dataLoader):
    print('read the data')
    input,target=sample['tr'].type(torch.FloatTensor), sample['gt'].type(torch.FloatTensor)
    if torch.cuda.is_available():
        input, target=input.unsqueeze(1).cuda(), target.unsqueeze(1).cuda()
    else:
        input, target=input.unsqueeze(1), target.unsqueeze(1)
    input, target=Variable(input), Variable(target)
    optimizer.zero_grad()
    print('put the data into net')
    output=net(input)
    loss = criterion(output, target)
    loss = loss*10
    print('back propagate')
    loss.backward()
    optimizer.step()
    print('iter %d,  training loss: mse %.4f' %(i_batch, loss))

    if i_batch%50==0:
        id_test=random.randint(0,100)
        testsample=dataSet.__getitem__(id_test)
        test_input=Variable(testsample['tr'].unsqueeze(0).unsqueeze(1).type(torch.FloatTensor).cuda(), volatile=True)
        test_output=net(test_input)
        save_result_path='/gdata/Deconv/testresult/test%d'%int(time.time())
        save_blur_path='/gdata/Deconv/blurresult/blur%d'%int(time.time())
        np.save(save_result_path,test_output.squeeze().cpu().data.numpy())
        np.save(save_blur_path,test_input.squeeze().cpu().data.numpy())
        print('save test sample id %d blur to %s, result to path %s'%(id_test, save_blur_path, save_result_path))

L0SG · April 10, 2018, 6:11am

Did you return the output from forward function?

return x

I think adding this line to the forward() would return the value for the model output correctly

Rickyim · April 10, 2018, 6:23am

Yes, I did. There was a type in the question. Actually I did have return x in my code.

Rickyim · April 10, 2018, 6:24am

And I try different dataset. After a few iteration, the output decrease to zero. Fairly strange

ptrblck · April 10, 2018, 6:53am

Could you try to initialize the layers?

def weights_init(m):
    if isinstance(m, nn.Conv3d):
        nn.init.xavier_uniform(m.weight.data)
        nn.init.xavier_uniform(m.bias.data)

model.apply()

Rickyim · April 10, 2018, 11:20am

Thanks! I initialize this way now. still not working…

def weights_init(m):
    if isinstance(m, nn.Conv3d):
        nn.init.xavier_uniform(m.weight.data)
    elif isinstance(m, nn.ConvTranspose3d):
        nn.init.xavier_uniform(m.weight.data)
net.apply(weights_init)

ptrblck · April 10, 2018, 11:26am

Ok, thanks for trying!
It seems you are missing the last conv layer in your forward pass, but I assume it’s a typo like the missing return statement.

In summary: your model does not learn anyhing, neither the training sequences nor the validation/test sequences.
The losses stay the same and the model’s output is all zeros.

In this case, I would try to boil down the problem by making the problem simpler.
One approach to see if your model architecture is good at all is to use very few samples (you could also use just one sample) and to try to overfit badly on it.
If this doesn’t work at all, we can dig a bit deeper.

Could you try that and report your results?

Rickyim · April 10, 2018, 11:37am

Sure, it is a typo. I am working on that now And the loss is decreasing, but super slow. The mse loss just took 20 iterations to decrease from 2705 to 2700

Rickyim · April 10, 2018, 11:39am

And the speed slow down now to 0.003 per iteration

ptrblck · April 10, 2018, 11:40am

As long as it’s decreasing, it’s a good sign. We can probably speed it up with an adaptive optimizer like Adam, but in the debug phase it’s kind of useless to tune the optimizer.

Is you network architecture taken from some working example or did you just come up with it?

Rickyim · April 10, 2018, 11:41am

It is a simple AutoEncoder I come up myself:joy:

Rickyim · April 10, 2018, 11:43am

Probably I should try some network of verified performance? But fairly strange that even that I feed the target to the input(the input is the same with the target output) and only feed one sample, it performs so bad

ptrblck · April 10, 2018, 11:45am

Ok, nice. Could you explain a bit more about your dataset, i.e.

the image sequence size
where does your data come from (natural images, medical images?)
batch_size

Are you normalizing the input? If not, you should try it.
If you are normalizing, try a tanh in your model output.
Your output might be just “out of bounds”, which could make the training quite hard.

Rickyim · April 10, 2018, 11:54am

I am trying to use the 3D convolution neural nets to do some Deconvolution task.(i.e. task like 3D deblur)

The image sequences are of size 101*101*101
The data are fake 3D tree shape blood vessel models. I use the code from http://vascusynth.cs.sfu.ca/Data.html to generate 1500 volumn models, each of which is of size 101*101*101. I convolve them with a 10*10*10 kernel to simulate the psf in the imaging process and add 30dB psnr noise. But now I just feed the same data into the net as the targeted output. I originally hoped to use 1200 of them as training set and the rest as the test set.
Batch size is four(larger size like 8 would cause cuda out of memory error)
I am not normalizing it currently, I will first try with that next.
Thank you so much for helping me!