Forcing loss to 0 for padding characters?

I’m training a sequence where it’s much more efficient to pass in multiple items ala backprop through time. I’ve got a special padding character that i’m using to split the sequences up which trigger resets within the network, but when I looked at how the embedding layer handles padding characters I think it’s only preventing the embeddings from changing, not the entire network. I’d like to have the network untouched for padding as these don’t occur in my live data.

I’m wondering if I can create a simple layer where the forward is a passthrough that sets a flag when it encounters a padding character, and the backwards is either a passthrough or a zeroing depending on the flag. Chain rule should work in this case, preventing the entire network from updating.

Has this been done before? I don’t see why it wouldn’t work but I’ve never seen this type of layer used before.