How can i run CNN encoder for tensors of different shape?

I want to have CNN encoder for my texts.
Here is what i have now:

class CNNEncoder(nn.Module):
    def __init__(self, embed_dim: int, vocab_size: int):
        super(CNNEncoder, self).__init__()

        self.embed = nn.Embedding(vocab_size, embed_dim)
        self.conv1 = nn.Conv1d(embed_dim, 120, 1)
        self.pool = nn.AvgPool1d(1, 1)

    def forward(self, x):
        x = self.embed(x)
        if len(x.shape) == 2:
            x = x.unsqueeze_(1).permute([0, 2, 1])
        x = self.conv1(x)
        x = torch.tanh(x)
        x = self.pool(x)
        return x

When i pass it tensor of size [120] it works fine.
But i need to encode a matrix NxK of sentences. It has size [5, 20, 120].
How can i deal with that?

nn.Embedding will output a tensor in the shape [*, H], where * is the input shape and H the embedding_dim.
Given your input shape, you’ll end up with a 4-dimensional tensor.
Could you explain a bit, in which dimension the conv layer should operate?
I.e. should the filters convole the “sentence dimension” or the “word dimension”?

If we talk about one sentence i.e. x.shape = [batch_size, sentence_length] after embedding i get x.shape = [batch_size, sentence_length, embedding_dim].
I want to convolve over words in this sentence.
So as it’s told in one of CNN tutorials in pytorch i need to remute and end up with [batch_size, embedding_dim, sentence_length] before convolution.
After convolution i’m getting strange shaped tensor [1, 100, 118]. 1 is batch_size in my current setting but i don’t get where other dims came from as i have embedding_dim=300 and sentence_length=120.
It seems like i’m using it wrong. I know what convolutional layer is as i worked with it in tf but torch api seems hard to me.
Correct me if i’m using it wrong please.
Also here’s my updated code.

class CNNEncoder(nn.Module):
    def __init__(self, embed_dim: int, vocab_size: int):
        super(CNNEncoder, self).__init__()

        self.embed = nn.Embedding(vocab_size, embed_dim)
        self.conv1 = nn.Conv1d(embed_dim, 100, 3)
        self.pool = nn.AvgPool1d(1, 1)

    def forward(self, x):
        x.unsqueeze_(0)
        x = self.embed(x)
        x = x.permute(0, 2, 1)
        x = self.conv1(x)
        x = torch.tanh(x)
        x = self.pool(x)
        x.unsqueeze_(1)
        return x

P.S.
Main question is actually about using same model for sentence (shape [batch_size, sentence_length]) and for matrix of sentences (shape [K, N, sentence_length])
But i think i need to figure out first if i’m using this model right just for sentence and then deal with matrix question.

Actually i think i solved main question by doing this:

        encoded_y_query = []

        for classs in y_query.split(1):
            encoded_sents = []
            for sentence in classs.squeeze_().split(1):
                encoded_sents.append(self.cnn_encoder.forward(sentence.squeeze_()))
            encoded_y_query.append(encoded_sents)
        y_tensors = []
        for i in range(y_query.shape[0]):
            temp_tens = []
            for j in range(y_query.shape[1]):
                temp_tens.append(torch.tensor(encoded_y_query[i][j]))

            y_tensors.append(torch.cat(temp_tens))

        y_query = torch.cat(y_tensors)

This code does forward pass for each sentence in tensor [K,N, sentence_length].
Although it needs a little of fine tuning due to batch_size. But i’ll sure handle that.
But i still would want ti know if my cnn works as i want it to work (described in previous post).
I’d be readlly glad if you could help me with that.

You’ll get this output, since you are passing an input of [batch_size=5, emb_dim=300, seq_len=120] to your conv layer, which will output [5, 100, 118].
dim1=100 is defined by the number of filters (out_channels), while the sequential dimension is decreased due to the kernel size without padding.

I’m not completely sure, what your last code snippet does.
Are you splitting the input into sentences and pass each sentence separately?

Yes, i split tensor [4,5, 120] and pass each sentence to encoder.
Here’s final version of correct encoder:

class CNNEncoder(nn.Module):
    def __init__(self, embed_dim: int, vocab_size: int):
        super(CNNEncoder, self).__init__()
        self.embed = nn.Embedding(vocab_size, embed_dim)
        self.conv1 = nn.Conv2d(1, 100, (2, embed_dim))

    def forward(self, x):
        x = self.embed(x)
        x = x.unsqueeze(0)
        x = x.unsqueeze(0)
        x = self.conv1(x)
        x = x.squeeze(3)
        x = torch.tanh(x)
        x = F.avg_pool1d(x, x.size()[2]).squeeze(2)
        return x

I’m not sure if this method of splitting tensor is ok.
Here it is (latest update):

for class_row in support_set.split(1):
    encoded_sents = []
    for sentence in class_row.squeeze_().split(1):
        encoded_sents.append(self.cnn_encoder.forward(sentence.squeeze_()))
     encoded_support_set.append(encoded_sents)

class_tensors = []
for i in range(support_set.shape[0]):
    sentence_tensors = []
    for j in range(support_set.shape[1]):
        sentence_tensors.append(torch.tensor(encoded_support_set[i][j].clone().detach().requires_grad_(True)))
    class_tensors.append(torch.cat(sentence_tensors))

support_set = torch.stack(class_tensors)