Loading parameters into a extended model

kenmikanmi · December 12, 2018, 12:25pm

Hi, I’m considering to build a network like this:
models

Each linear layer A,B receive 128 dim output from CNN(Φ).
CNN’s parameters are trained in another task, and saved as model.pth.
Now I want to train A,B layers and CNN in new task.

In this case, is next pseudo code works properly?

model = CNN()               # class CNN() defines Φ's archtecture
model.load_state_dict(torch.load("model.pth"))

modelA = LinearUnitA()
modelB = LinearUnitB()
optim = optimizer_definition()     # some operations to define optimizer

for i in N_training:

    f_128 = model.forward(input_tensor)
    outputA = modelA.forward(f_128)
    outputB = modelA.forward(f_128)

    lossA = loss_computation_A(outputA)
    lossB = loss_computation_B(outputB)

    lossA.backward()
    lossB.backward()
    optim.step()

Thanks.

ptrblck · December 12, 2018, 6:00pm

Your code looks alright. Note that the gradients will be accumulated in the CNN from both losses.
A small side note: you shouldn’t call forward but the model directly: model(input_tensor).
This will make sure to register all hooks properly (if there are any) etc.

kenmikanmi · December 13, 2018, 7:33am

Hi @ptrblck, I appreciate your advice, and the last note was useful for me.
Thanks