I have a Batched Audio Signal with the Shape [N, C, L] with e.g. N=4, C=4, L=2048, which I want to apply a Sliding Window with Overlap to, and later I want to reconstruct the original shape by using overlap and add.
Example:
Input Shape → [4, 4, 2048]
After Sliding Window with Window Size 256 and Overlap 128 → [4, 4, 15, 256]
Overlap and Add → [4, 4, 2048]
Using Tensor.unfold() works well for applying the sliding window but I haven’t found a way to overlap and add later on.
Also I played around with nn.Fold and nn.Unfold but if I understand correctly those transformation mess with the Channel Dimension which I don’t want.
Thank you for your reply! When using nn.Fold, the Channel Dimension is divided by the product of kernel_size. Since I only have 4 Channels, I can use only very small Kernels so I don’t know how I can achieve for example a window size of 256 with 128 overlap.
I want the Channels to stay the same and only “cut” along the Signal Length.
This snippet does exactly what I want but I haven’t found a way to achieve the same behavior with nn.Fold/Unfold.