Can using only a portion of samples within a mini-batch be a problem?

I want to train an encoder, which contains batch-normalization layers, with two different losses. I use all the samples from the minibatch to compute loss1, whereas I can only use a portion of samples within a mini-batch to compute loss2 (due to the heavy computation burden of loss2). *More specifically, the loss2 is a variant of noise contrastive loss for which I want to use only a portion of samples for its negative sample. In summary:

  1. the encoder with (sync) batch normalization layer converts N samples into Nx2048 (2048 for arbitrary feature dimension)
  2. loss1 uses all samples (Nx2048) to compute the loss
  3. loss2 uses only a portion of minibatch, e.g. N/16x2048, to compute the loss
class Model_with_losses(nn.Module):
def __init__(self, loss1, loss2):
   self.encoder = resnet50()
   self.loss1 = loss1
   self.loss2 = loss2
...
def forward(self, img):
   feature = self.encoder(img)
   l1 = self.loss1(feature)
   l2 = self.loss2(feature[:8, ::]) # feature.size(0) >> 8, e.g., 64
   loss = l1 + l2
   return loss # which will be backproped, e.g., loss.backward()

Will there be a problem?
Thank you!!

I don’t think you should expect to see any problems besides the obvious limitations that the gradients from loss2 would only be calculated based on the indexed features while the rest will be ignored, but this seems to fit your use case.

1 Like