Some problems in custom loss functions and so on

Shura · February 7, 2019, 3:35pm

Hi, I’m new in pytorch and try to implement the network for image enhancing.
I managed to run the code below with some warnings(numpy and softmax), but I have some problems in there:

I designed MSE loss function newly for generated images (consists of High Frequency factors in DWT),
and added to loss function for generator like this:

loss_g = criterion_g(imgHR, imgSR.cpu()) + myMSELoss(imgSR_W, imgHR_W)# + criterion_val(validity.cpu(), valid.cpu())

I saw the loss_g.item() has right value,
but it doesn’t seem to be applied in grad. (= .grad unchanges whether myMSELoss is added or net, it generates same images)

For training discriminator, I made some loss function like below:

    val_LR, aux_LR = discriminator(imgLRd.cuda())
    val_HR, aux_HR = discriminator(imgHRd.cuda())
    val_SR, aux_SR = discriminator(imgSRd.cuda())
    loss_d_LR = (criterion_d(aux_LR, labels) + criterion_val(val_LR, fake)) / 2
    loss_d_HR = (criterion_d(aux_HR, labels) + criterion_val(val_HR, valid)) / 2
    loss_d_SR = (criterion_d(aux_SR, labels) + criterion_val(val_SR, fake)) / 2
    loss_d = (loss_d_HR + loss_d_LR + loss_d_SR) / 3
    loss_d.backward()

I saw some solutions for multiple losses in here, I cannot figure out what is wrong with this code.
There are gradients being updated for each trials, but the loss is not converged and have bad performance in test set.
(I like to make the discriminator to classify well for all the images which have various resolutions)

In discriminator, I have two outputs; class label and validity.
And I used pre-trained resnet18 weights as below.
I think it isn’t wrong to append some layers like that, but I don’t have much confidence.

Thanks in advance.

here is the my code:

https://pastebin.com/dVZdmx5J

ptrblck · February 8, 2019, 12:15am

There seem to be some issues in your code.

myMSELoss uses the .data attribute to calculate the loss, which is not recommended. It might break the computation graph, which is most likely the case in your code. Have a look at this small code snippet:

x = torch.randn(1, requires_grad=True)
y = torch.randn(1)
loss = torch.sum((x-y)**2)  # .data will yield an error
loss.backward()
print(x.grad)

If you add .data to the loss calculation, you’ll get an error:

RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn

I guess this error might be masked, since you are using different losses, where some of them might still calculate the correct gradients. You could just remove the .data op in this case.

Also, your transformations like transform_RGB2YCBCR detach the tensor from the computation graph, as you are leaving PyTorch and use methods of PIL, which calls numpy under the hood.
I haven’t run your code, but I think your generator won’t be updated at all, since imgSR_Y is detached from it before being passed to the discriminator.
Based on your code it looks like you would like to feed RGB images wo the generator and CBCR to the discriminator. If that’s the case, you would have to write the transformation using PyTorch operations.

Shura · February 8, 2019, 2:18am

Thanks a lot.
Now I can see why the generation results were same at different loss functions.
In fact there was only working criterion.
I’ll try to modify my code more elegantly.

BTW in second question, Does the loss function of D work properly?
I read various ACGAN codes in github, they designed the loss function of discriminator by adding 2 criterions of real and fake images.
And many answers in here about this, they said it is OK to add criterions like that.
In my case, I assigned the generated images as fake, and like to force the generator makes real(high resolution)-like images.
When I ran the code, there were some grads, but now I cannot say that is going well.