Conv1d layer for text classification

Nouf · July 23, 2023, 1:46pm

Hello everyone,
I have a question regarding the Conv1d in torch,
the simple model below, which works with text classification, has a ModuleList containing three Conv1d layers (each one dedicated to a specific filter size)

import torch
import torch.nn as nn

class TextClassifier(nn.Module):
def init(self, vocab_size, embedding_dim, num_classes):
super(TextClassifier, self).init()
    self.embedding = nn.Embedding(vocab_size, embedding_dim)
    self.conv_layers = nn.ModuleList([
        nn.Conv1d(embedding_dim, 100, kernel_size=2),
        nn.Conv1d(embedding_dim, 100, kernel_size=3),
        nn.Conv1d(embedding_dim, 100, kernel_size=4)
    ])
    self.fc = nn.Linear(300, num_classes)

my question is: it is necessary to use ModuleList and have Conv1d layer for each filter size, I mean is it work to have just one Conv1d layer that has the different filter sizes (2,3,4)

J_Johnson · July 23, 2023, 2:03pm

A ModuleList is just one way of organizing layers and is especially useful if you plan to use a for loop. What you do with those layers will be determined in the forward pass. (i.e. make them sequential or have them parallel process the same inputs).

Suppose you wanted 100x Conv1d layers with kernel sizes of 1 to 100, each taking the same embedded text input. You could do:

self.kernels = nn.ModuleList([])
for i in range(100):
    self.kernels.append(nn.Sequential(
            nn.Conv1d(embedding_dim, 100, kernel_size = i + 1), 
            nn.BatchNorm1d(100), 
            nn.ReLU()
            )
            )
...

#in the forward pass, with x as input
contexts = []
for i in range(100):
    y = self.kernels[i](x)
    contexts.append(y)
...

An individual Conv1d layer can only have one kernel_size.

Nouf · July 23, 2023, 2:49pm

Thank you @J_Johnson for your reply,
this is the forward function

def forward(self, x):
    embedded = self.embedding(x)  # shape: (batch_size, seq_len, embedding_dim)
    
    conv_outputs = []
    for conv in self.conv_layers:
        conv_out = torch.relu(conv(embedded.transpose(1, 2)))  # shape: (batch_size, num_filters_i, seq_len - kernel_size + 1)
        pooled_out = torch.max_pool1d(conv_out, conv_out.size(2)).squeeze(2)  # shape: (batch_size, num_filters_i)
        conv_outputs.append(pooled_out)
    
    concat_output = torch.cat(conv_outputs, dim=1)  # shape: (batch_size, sum(num_filters))
    
    logits = self.fc(concat_output)  # shape: (batch_size, num_classes)
    
    return logits

I can understand from your answer, we can not have a single Conv1d layer containing different kernel sizes , is that right?

J_Johnson · July 24, 2023, 4:50am

Right. If you want multiple kernels, use s ModuleList to store the convolutional layers with the specific kernel/padding/stride/etc you want for each.

Nouf · July 25, 2023, 6:23am

Thank you @J_Johnson

vdw · July 27, 2023, 6:46am

If you want to look at another example code, I have an old implementation of the architecture shown below.

The only difference is that I don’t use 1-max pooling but normal max-pooling. This implementation uses a nn.ModuleDict for a change :).

Please note that the code is rather verbose not make model very configurable, by specifying different conv_kernel_sizes as input parameter.