I have videos of variable lengths, where each frame is represented as a feature vector I create of size 1000.
With a batch size of 16, my tensors are in the shape (16, 1000, LEN)
- where LEN
is the maximum length of a video in this batch.
If I instantiate a batch norm considering the 1000 dimensions as different channels:
self.batch_norm = nn.BatchNorm1d(1000)
and in the forward-pass run self.batch_norm(tensor)
, my training_loss
goes down, but my validation_loss
stays stagnant - and thus, 0% accuracy for my task.
Without using the batch norm, my model trains well - validation_loss
goes down, and accuracy is 65%~
Perhaps this problem emerges from the padded sequences, where if I batch a video of 10 frames, with a video of 20 frames, I create 10 frames of zeros.
To check that, instead of padding with zeros, I tried padding with the last frame of that video, so it is a valid frame. However, the results are the same.