Using masking during training

korunosk · May 4, 2020, 11:11pm

Hello everyone,

I have data batches with dimensions (100,20,768) that I pass through a two-layer neural network. I also have masks with dimensions (100,20,1) for each batch that tell which of the samples are relevant for training. How can I stop the model from updating its weights for the samples where the mask is 0?

The reason why I am doing this is that the original samples are with varying number of rows. So I stacked zero tensors underneath to increase the number of rows so I can support batching, because, evaluation took a lot of time when doing it sample by sample.

Or, maybe there is a better approach for handling inputs with variable number of rows?

Thanks in advance.

sathvik_udupa · May 5, 2020, 3:06am

Quick fix, why don’t you ignore inputs with mask 0 if you aren’t training with it?

Are you using ANN or anything else? With CNN, you can use adaptive average pooling which converts varying input into a fixed length output.
You can also create chunks of your input, for eg: with chunk length 20, slide length 10. This works well for CNN, LSTM, not sure about ANN.

korunosk · May 5, 2020, 9:15am

It is a simple two layer NN that tries to learn a transformation.

Is multiplying the input with the mask sufficient? It will nullify the samples but I think It will still calculate the gradient for those.

sathvik_udupa · May 5, 2020, 9:55am

I meant to say, why don’t you ignore thee data points with masks 0 if you don’t want to update weights? edited my reply appropriately

korunosk · May 5, 2020, 9:57am

That way the samples will have varying number of rows, and I cannot create batches. I would like to support batching since it increases the speed by 2 orders of magnitude.

yanzhou01 · June 1, 2023, 7:14am

I am facing similar issue. Did you find a way to solve it? Best,

abrahamowos · October 5, 2024, 2:41am

is it possible to set the gradients of some specific rows in the layer matrix to be = 0. say the gradient then becomes [0,0,0,0, 0.5,0.04] for a 7x1 layer. I’m trying to use this to mask the embedding layers of a language model.