I have data batches with dimensions (100,20,768) that I pass through a two-layer neural network. I also have masks with dimensions (100,20,1) for each batch that tell which of the samples are relevant for training. How can I stop the model from updating its weights for the samples where the mask is 0?
The reason why I am doing this is that the original samples are with varying number of rows. So I stacked zero tensors underneath to increase the number of rows so I can support batching, because, evaluation took a lot of time when doing it sample by sample.
Or, maybe there is a better approach for handling inputs with variable number of rows?
Quick fix, why don’t you ignore inputs with mask 0 if you aren’t training with it?
Are you using ANN or anything else? With CNN, you can use adaptive average pooling which converts varying input into a fixed length output.
You can also create chunks of your input, for eg: with chunk length 20, slide length 10. This works well for CNN, LSTM, not sure about ANN.
That way the samples will have varying number of rows, and I cannot create batches. I would like to support batching since it increases the speed by 2 orders of magnitude.
is it possible to set the gradients of some specific rows in the layer matrix to be = 0. say the gradient then becomes [0,0,0,0, 0.5,0.04] for a 7x1 layer. I’m trying to use this to mask the embedding layers of a language model.