How to use binary cross entropy with logits in binary target and 3d output

Below is the link to the author’s i3d network. In their case they frame-wise multi-label classification.
[GitHub - piergiaj/pytorch-i3d](http://i3d Network for charades dataset)

I’m using Visual-Tactile dataset and

I3D is designed on kinetics dataset and I didn’t change default architecture from the above link having file “pytorch_i3d.py”.

I’m also new to this. But according to the author input frames to the network is 64. So each video i have converted to 64 frames. because I do not have to do multilabel classification.

Probably I need to change the final layer since i don’t want multi-label classification.

I still have the issue with dimension because it is clear that target and output are not of the same shape or as expected input to the loss function.