Finetuning Image Colorization model

Hi, I was wondering if anyone can off their aid.
I am using someone else’s repo that I want to finetune, it is a image colorization nn. I want to utilise the models and introduce my own dataset to colorize various natural scenery.
I am having difficulty finetuning as most of the tutorials are based on the pretrained models. I am currently struggling on loading their models and training my dataset.
The git hub is called “InstColorization” by ericsujw. https://github.com/ericsujw/InstColorization

Thanks in advance! :slight_smile:

Where are you stuck and what is not working? :slight_smile:

1 Like

Hi @ptrblck, where do I begin :sweat_smile:. I solved some of my preexisting issues but the first issue I had was listed here, [solved] KeyError: 'unexpected key "module.encoder.embedding.weight" in state_dict'.
I originally tried to change my model to nn.dataparallel but I was running into all sorts of attribute errors, so I eventually used ‘load_state_dict(state_dict,strict=False)’. My question now is that will having strict=false effect my model, or should I go for the solution in the above link to remove the ‘.module’ prefix.

Cheers :slightly_smiling_face:

Go for the .module solution. Otherwise strict=False might just drop mismatched keys and your could end up with a randomly initialized model.

1 Like

Thanks, I will definitely give that a shot!
I have another question regarding fine tuning, I am following this tutorial: https://pytorch.org/tutorials/beginner/finetuning_torchvision_models_tutorial.html

I have a param_to_update tensor which I want to feed into my optim function. Currently my optimizer func is called as self.optimizer_G = torch.optim.Adam(self.netG.parameters(),
lr=opt.lr, betas=(opt.beta1, 0.999))

If I want to call the function, how do I set the netG.parameters() = param_to_update?
Sorry if my explanation is a bit confusing.

If param_to_update is not part of the netG.parameters(), you could pass it additionally to the optimizer as:

self.optimizer_G = torch.optim.Adam(list(self.netG.parameters()) + [param_to_update], lr=opt.lr, betas=(opt.beta1, 0.999))
1 Like

So in the tutorial they passed param_to_update through the optim function as such optimizer_ft = optim.SGD(params_to_update, lr=0.001, momentum=0.9).

So in my case I was hoping to pass my param_to_update into my optimizer function by only calling the function. The function is in a different script file so I thought the only way to call the funcion with my param_to_update tensor was to update my self.netG.parameters(). So is there a way to set self.netG.parameters() = param_to_update

Sorry for the silly questions and thanks for the help!

I don’t understand, why you would need to add this parameter to the model parameters.
Wouldn’t it work, if you pass the parameter with netG.parameters() directly to the optimizer?
If you want to add the parameter later, you could still use optimizer.add_param_group.

1 Like

I was hoping to pass param_to_update into the optimizer instead of netG.parameters(), but im not entirely too sure how to achieve this.

I will try follow your method though, thanks for the help :slight_smile:

Hi @ptrblck, I was hoping you could aid me on another problem. I am currently getting this error RuntimeError: element 0 of variables does not require grad and does not have a grad_fn and I have seen your previous solutions to this. The error occurs when the .backward() func is called, and I believe it is due to the loss functions and them possibly having the required_grad = False?

Here is my optimize_parameter and forward() func:

def forward(self):

    if self.opt.stage == 'full' or self.opt.stage == 'instance':

        (_, self.fake_B_reg) = self.netG(self.real_A, self.hint_B, self.mask_B)
    else:
        print('Error! Wrong stage selection!')
        exit()

def optimize_parameters(self, optimize):

    self.forward()

    optimize.zero_grad()

    if self.opt.stage == 'full' or self.opt.stage == 'instance':

        self.loss_L1 = torch.mean(self.criterionL1(self.fake_B_reg.type(torch.cuda.FloatTensor),

                                                    self.real_B.type(torch.cuda.FloatTensor)))

        self.loss_G = 10 * torch.mean(self.criterionL1(self.fake_B_reg.type(torch.cuda.FloatTensor),

                                                    self.real_B.type(torch.cuda.FloatTensor)))

else:

        print('Error! Wrong stage selection!')

        exit()

    self.loss_G.backward()

    optimize.step()

Any help is much appreciated :slight_smile:

The computation graph seems to be detached at one point.
Could you check, if the model output and the loss tensors have a valid .grad_fn?

I have tried to print the .grad_fn and the requires_grad of the model and the loss tensors. On both occasions the loss tensors does not print anything. Not entirely sure If I am getting this right.

I tried print the first iteration of the loss tensor and it displayed tensor(2.155, device:'cuda:0')

Are you wrapping the forward pass in a with torch.no_grad() block or disabling the gradient calculation globally?
If not, could you post the model definition, please?

I have not used the torch.no_grad(), Here’s the model that I am working on

class SIGGRAPHGenerator(nn.Module):

def __init__(self, input_nc, output_nc, norm_layer=nn.BatchNorm2d, use_tanh=True, classification=True):

    super(SIGGRAPHGenerator, self).__init__()

    self.input_nc = input_nc

    self.output_nc = output_nc

    self.classification = classification

    use_bias = True

    # Conv1

    model1=[nn.Conv2d(input_nc, 64, kernel_size=3, stride=1, padding=1, bias=use_bias),]

    model1+=[nn.ReLU(True),]

    model1+=[nn.Conv2d(64, 64, kernel_size=3, stride=1, padding=1, bias=use_bias),]

    model1+=[nn.ReLU(True),]

    model1+=[norm_layer(64),]

    # add a subsampling operation

    # Conv2

    model2=[nn.Conv2d(64, 128, kernel_size=3, stride=1, padding=1, bias=use_bias),]

    model2+=[nn.ReLU(True),]

    model2+=[nn.Conv2d(128, 128, kernel_size=3, stride=1, padding=1, bias=use_bias),]

    model2+=[nn.ReLU(True),]

    model2+=[norm_layer(128),]

    # add a subsampling layer operation

    # Conv3

    model3=[nn.Conv2d(128, 256, kernel_size=3, stride=1, padding=1, bias=use_bias),]

    model3+=[nn.ReLU(True),]

    model3+=[nn.Conv2d(256, 256, kernel_size=3, stride=1, padding=1, bias=use_bias),]
    model3+=[nn.ReLU(True),]
    model3+=[nn.Conv2d(256, 256, kernel_size=3, stride=1, padding=1, bias=use_bias),]

    model3+=[nn.ReLU(True),]

    model3+=[norm_layer(256),]

    # add a subsampling layer operation

    # Conv4
    model4=[nn.Conv2d(256, 512, kernel_size=3, stride=1, padding=1, bias=use_bias),]
    model4+=[nn.ReLU(True),]

    model4+=[nn.Conv2d(512, 512, kernel_size=3, stride=1, padding=1, bias=use_bias),]

    model4+=[nn.ReLU(True),]

    model4+=[nn.Conv2d(512, 512, kernel_size=3, stride=1, padding=1, bias=use_bias),]

    model4+=[nn.ReLU(True),]

    model4+=[norm_layer(512),]

    # Conv5

    model5=[nn.Conv2d(512, 512, kernel_size=3, dilation=2, stride=1, padding=2, bias=use_bias),]
    model5+=[nn.ReLU(True),]


    model5+=[nn.Conv2d(512, 512, kernel_size=3, dilation=2, stride=1, padding=2, bias=use_bias),]

    model5+=[nn.ReLU(True),]

    model5+=[nn.Conv2d(512, 512, kernel_size=3, dilation=2, stride=1, padding=2, bias=use_bias),]

    model5+=[nn.ReLU(True),]

    model5+=[norm_layer(512),]

    # Conv6

    model6=[nn.Conv2d(512, 512, kernel_size=3, dilation=2, stride=1, padding=2, bias=use_bias),]

    model6+=[nn.ReLU(True),]

    model6+=[nn.Conv2d(512, 512, kernel_size=3, dilation=2, stride=1, padding=2, bias=use_bias),]

    model6+=[nn.ReLU(True),]

    model6+=[nn.Conv2d(512, 512, kernel_size=3, dilation=2, stride=1, padding=2, bias=use_bias),]

    model6+=[nn.ReLU(True),]

    model6+=[norm_layer(512),]

    # Conv7

    model7=[nn.Conv2d(512, 512, kernel_size=3, stride=1, padding=1, bias=use_bias),]

    model7+=[nn.ReLU(True),]

    model7+=[nn.Conv2d(512, 512, kernel_size=3, stride=1, padding=1, bias=use_bias),]

    model7+=[nn.ReLU(True),]

    model7+=[nn.Conv2d(512, 512, kernel_size=3, stride=1, padding=1, bias=use_bias),]

    model7+=[nn.ReLU(True),]

    model7+=[norm_layer(512),]

    # Conv7

    model8up=[nn.ConvTranspose2d(512, 256, kernel_size=4, stride=2, padding=1, bias=use_bias)]

    model3short8=[nn.Conv2d(256, 256, kernel_size=3, stride=1, padding=1, bias=use_bias),]

    model8=[nn.ReLU(True),]

    model8+=[nn.Conv2d(256, 256, kernel_size=3, stride=1, padding=1, bias=use_bias),]
    model8+=[nn.ReLU(True),]


    model8+=[nn.Conv2d(256, 256, kernel_size=3, stride=1, padding=1, bias=use_bias),]

    model8+=[nn.ReLU(True),]

    model8+=[norm_layer(256),]

    # Conv9

    model9up=[nn.ConvTranspose2d(256, 128, kernel_size=4, stride=2, padding=1, bias=use_bias),]
    model2short9=[nn.Conv2d(128, 128, kernel_size=3, stride=1, padding=1, bias=use_bias),]

    # add the two feature maps above        

    model9=[nn.ReLU(True),]

    model9+=[nn.Conv2d(128, 128, kernel_size=3, stride=1, padding=1, bias=use_bias),]

    model9+=[nn.ReLU(True),]

    model9+=[norm_layer(128),]

    # Conv10

    model10up=[nn.ConvTranspose2d(128, 128, kernel_size=4, stride=2, padding=1, bias=use_bias),]

    model1short10=[nn.Conv2d(64, 128, kernel_size=3, stride=1, padding=1, bias=use_bias),]

    # add the two feature maps above

    model10=[nn.ReLU(True),]

    model10+=[nn.Conv2d(128, 128, kernel_size=3, dilation=1, stride=1, padding=1, bias=use_bias),]

    model10+=[nn.LeakyReLU(negative_slope=.2),]

    # classification output - possibly change this output

    model_class=[nn.Conv2d(256, 529, kernel_size=1, padding=0, dilation=1, stride=1, bias=use_bias),]

    # regression output

    model_out=[nn.Conv2d(128, 2, kernel_size=1, padding=0, dilation=1, stride=1, bias=use_bias),]

    if(use_tanh):

        model_out+=[nn.Tanh()]

    self.model1 = nn.Sequential(*model1)

    self.model2 = nn.Sequential(*model2)

    self.model3 = nn.Sequential(*model3)

    self.model4 = nn.Sequential(*model4)

    self.model5 = nn.Sequential(*model5)

    self.model6 = nn.Sequential(*model6)

    self.model7 = nn.Sequential(*model7)

    self.model8up = nn.Sequential(*model8up)

    self.model8 = nn.Sequential(*model8)

    self.model9up = nn.Sequential(*model9up)

    self.model9 = nn.Sequential(*model9)

    self.model10up = nn.Sequential(*model10up)

    self.model10 = nn.Sequential(*model10)

    self.model3short8 = nn.Sequential(*model3short8)

    self.model2short9 = nn.Sequential(*model2short9)

    self.model1short10 = nn.Sequential(*model1short10)

    self.model_class = nn.Sequential(*model_class)

    self.model_out = nn.Sequential(*model_out)

    self.upsample4 = nn.Sequential(*[nn.Upsample(scale_factor=4, mode='nearest'),])

    self.softmax = nn.Sequential(*[nn.Softmax(dim=1),])

def forward(self, input_A, input_B, mask_B):

    conv1_2 = self.model1(torch.cat((input_A,input_B,mask_B),dim=1))

    conv2_2 = self.model2(conv1_2[:,:,::2,::2])

    conv3_3 = self.model3(conv2_2[:,:,::2,::2])

    conv4_3 = self.model4(conv3_3[:,:,::2,::2])

    conv5_3 = self.model5(conv4_3)

    conv6_3 = self.model6(conv5_3)

    conv7_3 = self.model7(conv6_3)

    conv8_up = self.model8up(conv7_3) + self.model3short8(conv3_3)

    conv8_3 = self.model8(conv8_up)

    if(self.classification):

        out_class = self.model_class(conv8_3)

        conv9_up = self.model9up(conv8_3.detach()) + self.model2short9(conv2_2.detach())

        conv9_3 = self.model9(conv9_up)

        conv10_up = self.model10up(conv9_3) + self.model1short10(conv1_2.detach())

        conv10_2 = self.model10(conv10_up)

        out_reg = self.model_out(conv10_2)

    else:

        out_class = self.model_class(conv8_3.detach())

        conv9_up = self.model9up(conv8_3) + self.model2short9(conv2_2)

        conv9_3 = self.model9(conv9_up)

        conv10_up = self.model10up(conv9_3) + self.model1short10(conv1_2)

        conv10_2 = self.model10(conv10_up)

        out_reg = self.model_out(conv10_2)

    return (out_class, out_reg)

Your model definition works and the output tensors have valid grad_fns, so I’m unsure why they are None in your script:

model = SIGGRAPHGenerator(3, 1)
x = torch.randn(1, 1, 24, 24)
out = model(x, x, x)
print(out[0].grad_fn)
> <ThnnConv2DBackward object at ...>

print(out[1].grad_fn)
> <TanhBackward object at ...>

Does this mean that the loss functions does not have the attribute requires_grad=True ? And if not, how do I set these tensors to have this attribute.
I saw in another post that you said loss.requries_grad =True will result in undesirable things. Thanks!

Could you verify, that your model outputs also have valid .grad_fns?
From your last post I understood that’s not the case.

Apologies for the late reply. I followed your steps and I obtained similar results and were able to print out
<ThnnConv2DBackward object at …>

Ah OK, could you post the loss function then?
If the model output contains a valid grad_fn, while the loss doesn’t, the loss function might detach the graph.

Hi ptrblck, I have found that when i use self.loss_G = Variable(self.loss_G, requires_grad=True), the error doesn’t occur, so I assume it is due to the loss tensors, but if there are some other reason for the error please let me know! Again, thanks for all the help!

However, here are the loss functions if you are still interested:

self.criterionL1 = networks.HuberLoss(delta=1. / opt.ab_norm)
class HuberLoss(nn.Module):
    def __init__(self, delta=.01):
        super(HuberLoss, self).__init__()
        self.delta=delta

    def __call__(self, in0, in1):
        mask = torch.zeros_like(in0)
        mann = torch.abs(in0-in1)
        eucl = .5 * (mann**2)
        mask[...] = mann < self.delta

        # loss = eucl*mask + self.delta*(mann-.5*self.delta)*(1-mask)
        loss = eucl*mask/self.delta + (mann-.5*self.delta)*(1-mask)
        return torch.sum(loss,dim=1,keepdim=True)