Padding for variable length sequences and batch norm layer

Hello,

I want to do batch norm over variable length sequences and I am curious how to do it properly so 0’th would not be taken into account for calculating mean/std? Or the only way is to implement batch norm layer myself to solve this issue?

Thank you in advance!