Using masking during training

Hello everyone,

I have data batches with dimensions (100,20,768) that I pass through a two-layer neural network. I also have masks with dimensions (100,20,1) for each batch that tell which of the samples are relevant for training. How can I stop the model from updating its weights for the samples where the mask is 0?

The reason why I am doing this is that the original samples are with varying number of rows. So I stacked zero tensors underneath to increase the number of rows so I can support batching, because, evaluation took a lot of time when doing it sample by sample.

Or, maybe there is a better approach for handling inputs with variable number of rows?

Thanks in advance.

Quick fix, why don’t you ignore inputs with mask 0 if you aren’t training with it?

Are you using ANN or anything else? With CNN, you can use adaptive average pooling which converts varying input into a fixed length output.
You can also create chunks of your input, for eg: with chunk length 20, slide length 10. This works well for CNN, LSTM, not sure about ANN.

It is a simple two layer NN that tries to learn a transformation.

Is multiplying the input with the mask sufficient? It will nullify the samples but I think It will still calculate the gradient for those.

I meant to say, why don’t you ignore thee data points with masks 0 if you don’t want to update weights? edited my reply appropriately

That way the samples will have varying number of rows, and I cannot create batches. I would like to support batching since it increases the speed by 2 orders of magnitude.