Parallelize Conv1D on text

I have an input with size [B, D, T*L], where B for batch size, D for embedding dimension, T for label space, and L for the length of the input. This means that each label has a unique word embedding.
And I have a Conv1D layer with input-size=D, kernel-size=9, and filter-maps=64.

I now want to do Conv1D on each word embedding with the same Conv1D layer. In order not to induce overlaps between different embedding matrices, one naive way is using a for loop for each target. Is there a faster way that I can do this? (Like, I copy the Conv1D layer T times?)