I want to train an encoder, which contains batch-normalization layers, with two different losses. I use all the samples from the minibatch to compute loss1
, whereas I can only use a portion of samples within a mini-batch to compute loss2
(due to the heavy computation burden of loss2
). *More specifically, the loss2
is a variant of noise contrastive loss for which I want to use only a portion of samples for its negative sample. In summary:
- the encoder with (sync) batch normalization layer converts N samples into Nx2048 (2048 for arbitrary feature dimension)
-
loss1
uses all samples (Nx2048) to compute the loss -
loss2
uses only a portion of minibatch, e.g. N/16x2048, to compute the loss
class Model_with_losses(nn.Module):
def __init__(self, loss1, loss2):
self.encoder = resnet50()
self.loss1 = loss1
self.loss2 = loss2
...
def forward(self, img):
feature = self.encoder(img)
l1 = self.loss1(feature)
l2 = self.loss2(feature[:8, ::]) # feature.size(0) >> 8, e.g., 64
loss = l1 + l2
return loss # which will be backproped, e.g., loss.backward()
Will there be a problem?
Thank you!!