I have a variable length input for a neural network (e.g., CNN or MLP). The shape of this tensor is like [b, n, d], where b is the batch size, [n, d] is the feature for a sample, but only part of the data is valid. e.g., for n=3, d=1, a tensor may like [[1], [0], [0]] or [[1], [1], [0]] or [[1], [1], [1]], 0 means the padding.

How can I process this with CNN or MLP (I can flatten the tensor to 1D). I think I need some operation like masking?

So, one difference between these is, pooling will always reduce dimensions, and filling with 0s to make all input sizes equal will increase the dimensions.
So if we are adding a pooling layer in the getitem function, it will have the same general advantages that we get after adding a pooling layer inside the model (eg: reducing no. of training parameters, making the model more robust, etc), whereas adding 0s will not have any such effect.