Hi,
All of the parameters (of the encoder, and any other layers) are ideally leaf tensors in the computation graphs of the losses.
Backpropagation: You simply need to call loss.backward()
to calculate the gradient of the loss
wrt the model parameters (specifically, wrt any leaf tensors in the graph of loss).
Parameter updates: After the loss is backpropagated, use optimizer.step
to update the model parameters.
For your use case, the following pseudo code should work.
import itertools
import torch
params = [encoder.parameters(), fc1.parameters(), fc2.parameters()]
optimizer = torch.optim.Adam(itertools.chain(*params), lr=0.01)
for batch_idx, batch in dataloader_instance:
# calculate lcce and lwd
lcce.backward()
optimizer.step()
optimizer.zero_grad()
lwd = -1 * lwd
lwd.backward()
for param in encoder.parameters():
param.grad = -1*beta*param.grad
optimizer.step()
optimizer.zero_grad()