Hi there!
I wanted to train a model for faceswapping which is basically using a shared encoder for 2 decoders (1 for each face). I didn’t put them all in one model, but I kept it modular as indicated in the code below. My question is, if I don’t include the weights of the encoder in the optimizer, do they ever get updated in the training stage?
# Instantiate three different modules
encoder = model.module.encoder
decoder_potter = copy.deepcopy(model.module.decoder)
decoder_chua = copy.deepcopy(model.module.decoder)
# Other hyperparameters
criterion = nn.MSELoss()
params_chua = list(decoder_chua.parameters())
params_potter = list(decoder_potter.parameters())
optimizer_chua = torch.optim.SGD(
filter(lambda p: p.requires_grad, params_chua),
args.lr,
weight_decay=args.weight_decay)
optimizer_potter = torch.optim.SGD(
filter(lambda p: p.requires_grad, params_potter),
args.lr,
weight_decay=args.weight_decay)
Basically, I only have these two optimizers during training. My training loop can be seen below:
for i, (img_chua, img_potter) in tqdm(enumerate(zip(train_loader_chua, cycle(train_loader_potter)))):
img_chua, img_potter = img_chua[0].cuda(), img_potter[0].cuda()
img_chua_recon = decoder_chua(encoder(img_chua))
chua_loss = criterion(img_chua_recon, img_chua)
total_chua_loss += chua_loss
optimizer_chua.zero_grad()
chua_loss.backward()
optimizer_chua.step()
img_potter_recon = decoder_potter(encoder(img_potter))
potter_loss = criterion(img_potter_recon, img_potter)
total_potter_loss += potter_loss
optimizer_potter.zero_grad()
potter_loss.backward()
optimizer_potter.step()
Thank you very much.