How to add layers to a pre-trained model (Model made and pre-trained by me only)

Suraj · September 14, 2020, 8:45am

So I have this:

class Net1(nn.Module):
   ... #defined it
Model1 = Net1()
optimizer1 #defined accordingly
#pretraining over a loss function, say loss_fn1
for i,(img,label) in enumerate(train_loader):
   ...

Best way to add a few more layer to this and do training over a new loss function (say loss_fn2) ???
Please let me know if I am being unclear.

One way I tried was to define these new layers as a new model and a new optimizer (say Model2 and optimizer2) and do this:

for i,(img,label) in enumerate(train_loader):
   out = Model1(img)
   out = Model2(out)
   loss = loss_fn2(out,label) 
   loss.backward(retain_graph = True)
   optimizer2.step()
   optimizer1.step()

This looks fundamentally wrong to me now. Are the gradient of loss function by chain rule backpropagated to the pre-trained model parameters? I don’t think they are connected.

I don’t know how to patch these two models with parameters of first model coming from pretraining. Any lead or hint is appreciated. Thanks!!

ptrblck · September 16, 2020, 9:55am

Why do you think the computation graph is not connected? Did you check the gradients in Model1 or do you see any other issues?

Your code looks alright and you can chain different models as you are also chaining different layers.
Since Model1 and Model2 are implemented as nn.Modules there would be no difference to “layers” such as nn.Conv2d.

Suraj · September 16, 2020, 10:27am

Thank you for responding. I think I got it now. Computational graph should be connected. (Any way you suggest to confirm that? would be a good check to make while designing any model)

So loss.backward() is solely responsible for calculating all the gradients of loss function in the computational graph wrt to the parameters and optimizer.step() just picks up these gradient values to make an update.

Sorry, I thought optimizer.step() also has role in deciding what all (and how) gradients will be computed (based on parameters passed to the optimizer).