I have a question regarding the Conv1d in torch,
the simple model below, which works with text classification, has a ModuleList containing three Conv1d layers (each one dedicated to a specific filter size)
import torch.nn as nn
def init(self, vocab_size, embedding_dim, num_classes):
self.embedding = nn.Embedding(vocab_size, embedding_dim)
self.conv_layers = nn.ModuleList([
nn.Conv1d(embedding_dim, 100, kernel_size=2),
nn.Conv1d(embedding_dim, 100, kernel_size=3),
nn.Conv1d(embedding_dim, 100, kernel_size=4)
self.fc = nn.Linear(300, num_classes)
my question is: it is necessary to use ModuleList and have Conv1d layer for each filter size, I mean is it work to have just one Conv1d layer that has the different filter sizes (2,3,4)
A ModuleList is just one way of organizing layers and is especially useful if you plan to use a for loop. What you do with those layers will be determined in the forward pass. (i.e. make them sequential or have them parallel process the same inputs).
Suppose you wanted 100x Conv1d layers with kernel sizes of 1 to 100, each taking the same embedded text input. You could do:
self.kernels = nn.ModuleList()
for i in range(100):
nn.Conv1d(embedding_dim, 100, kernel_size = i + 1),
#in the forward pass, with x as input
contexts = 
for i in range(100):
y = self.kernels[i](x)
An individual Conv1d layer can only have one kernel_size.
Thank you @J_Johnson for your reply,
this is the forward function
def forward(self, x):
embedded = self.embedding(x) # shape: (batch_size, seq_len, embedding_dim)
conv_outputs = 
for conv in self.conv_layers:
conv_out = torch.relu(conv(embedded.transpose(1, 2))) # shape: (batch_size, num_filters_i, seq_len - kernel_size + 1)
pooled_out = torch.max_pool1d(conv_out, conv_out.size(2)).squeeze(2) # shape: (batch_size, num_filters_i)
concat_output = torch.cat(conv_outputs, dim=1) # shape: (batch_size, sum(num_filters))
logits = self.fc(concat_output) # shape: (batch_size, num_classes)
I can understand from your answer, we can not have a single Conv1d layer containing different kernel sizes , is that right?
Right. If you want multiple kernels, use s ModuleList to store the convolutional layers with the specific kernel/padding/stride/etc you want for each.
If you want to look at another example code, I have an old implementation of the architecture shown below.
The only difference is that I don’t use 1-max pooling but normal max-pooling. This implementation uses a
nn.ModuleDict for a change :).
Please note that the code is rather verbose not make model very configurable, by specifying different
conv_kernel_sizes as input parameter.