Selective masking when computing the gradient

Eweler · July 8, 2020, 11:53am

Is it possible to only compute the loss of some model with respect to certain batch elements?

For concrete, let’s say we have some model, with loss = model(x), where x has size [batch_size, dim]. There is a boolean mask mask with entries which are True at the batch indices where the tensor should be evaluated and False otherwise. Is there a way to allow the model to only compute the gradient for these entries?

While it is possible to compute the loss on unmasked elements only, loss = model(x[mask]), I would like to make use of the loss of the masked elements in x, just not allow these elements to affect the model.

chetan_patil · July 8, 2020, 12:58pm

Consider you have simple classifier whose input size is [20,5] with 20 being the batch_size. After you forward the input to the model, it will give the output of size [20, num_classes].
Now, when you calculate the loss using the nn.CrossEntropyLoss() or nn.MSELoss(), you have an argument reduction, if you pass its value as 'none', the resulting loss-tensor would be a vector of size 25. Each entry in the loss would be the corresponding loss for each image in the batch.
You can apply the Boolean-mask here and then average it or sum it up.