Applying convolution across batched sequences


I have batched and sequenced 2d data, meaning I am working with tensors of size (batch_size, max_seq_len, d1, d2). I want to use convolution to reduce it to (batch_size, max_seq_len, new_dim). The layer that does this doesn’t care about sequences, meaning all elements of the sequence are passed through the same weights. The issue is that Conv1d and Conv2d do not accept 4d inputs, so I would have to first reshape them into (batch_size * max_seq_len, d1, d2).

My question is, is there a more efficient way to do this? Or is it ok? The worry is that batch_size*max_seq_len ends up equalling 2400, which seems to big for a batch and it might cause my computation to run too slowly.

I would just attempt to run your computation as-is and check if the throughput you are getting is acceptable given e.g., your hardware’s peak FLOP throughput. I would be more concerned by the effectively small channel dimension if you intend to interpret your data as (N, 1, d1, d2), as this would be a fairly uncommon setup.

There is no (N, 1, d1, d2). I squeeze it to (N, d1, d2) where N=batch_size*max_seq_len. (N, 1, d1, d2) is impossible bc like I said Conv1d/2d doesn’t accept 4d input, which is why I have to squeeze in the first place.