I have video features (Batch, Time, Spatial, Feature_dim), like (32,15,4,2048).
and I need first a dimension transform using a nn.Liear(2048,512)
.
But some features are masked according the associating mask (B,T,S) (True means yes,False means no)
How do I avoid updating this layer with unwanted features ?
How is the mask applied? How do you update the parameters of this layer? Is there a loss function?
For example :
-
in attention models the mask is multiplied by the attention scores to avoid taking them into account in the calculation of the output: everything is managed at the model level
-
in classification it is also common to provide a weight parameter to the loss function (binary_cross_entropy_with_logits, cross_entropy…) to avoid the update of some neuron in output : everything is managed at the loss level.
Or if we have to deal with a regression, we can play on the loss like this:
import torch
import torch.nn.functional as F
B, n = 4, 6
torch.manual_seed(0)
y = torch.empty((B, n)).uniform_(-10, 10)
torch.manual_seed(1)
y_pred = torch.empty((B, n)).uniform_(-10, 10)
if True :
# if the mask is batch wise
mask = torch.empty(B).random_(2).bool()
loss = F.mse_loss(y_pred, y, reduction='none').sum(dim=1) * mask
else :
# if the mask is element wise
mask = torch.empty((B, n)).random_(2).bool()
loss = ( F.mse_loss(y_pred, y, reduction='none') * mask ).sum(dim=1)
loss = loss.mean() # loss.sum()
loss
If possible provide more details about your task.