CNN: Batch width changes with every training loop iteration

Hi! I’m inexperienced with PyTorch and I’m having the following problem. I’m doing word vector processing very similar to this:

I’m working on my training loop right now. I cannot post my entire program as it is part of an assignment, but I will post the relevant parts and explain the rest. My training set is a dataset made up of 6400 sentences. I apply the SpaCy tokenizer to the whole thing, and then split it into batches of size 64 sentences (args.batch_size) per batch, using the BucketIterator in the following way:

train_iter, val_iter, test_iter = data.BucketIterator.splits(
      (train_data, val_data, test_data),
      batch_sizes=(args.batch_size, args.batch_size, args.batch_size),
      sort_key=lambda x: len(x.text), device=None, sort_within_batch=True, repeat=False)

Since the BucketIterator groups the sentences into the batches according to the number of tokens per sentence, this means I have batches of variable width. This is giving me a lot of problems in my training loop when it comes to constructing the CNN model.

In my CNN init, I first convert my tokens into word vectors using the GloVe pretrained word vectors from glove.6B.100d.txt in an embedding layer. I then have a convolutional layer like so:

        self.conv1 = nn.Conv2d(in_channels=1, out_channels=n_filters, kernel_size=(filter_sizes[0], embedding_dim))

where n_filters = 50, filter_sizes[0]=2 and embedding_dim is the size of the word vectors, 100. I also have a max pooling layer like so:

        batch_width = filter_sizes[2]
        filter_size_2 = batch_width - filter_sizes[0] + 1
        self.pool1 = nn.MaxPool2d(kernel_size=(filter_size_2, 1), stride=1)

where the filter_sizes[2] is the width of the batch (ie. the number of tokens in the longest sentence of the batch). The only time I know the value of filter_sizes[2] is after I call the following parts:

    for epoch in range(args.epochs):
        for i, batch in enumerate(train_iter):
            text, text_length = batch.text

            filter_sizes = [2, 4, text_length.max().item()]
            model = CNN(emb_dim, vocab, num_filt, filter_sizes)

The text_length.max().item() gives me the width of each batch, whenever I’m training on the batch. I found I cannot move the model outside of the two for loops because I won’t know what the width of the batches are. But putting the model inside the loop is wrong too, because with each batch, the model gets re-initialized and my loss ends up not changing per each epoch.

Please help, what am I doing wrong?

is there any specific reason why you want your filter size to be of maximum length of your sentence?
Because with this filter, you will end up having vector with one dimension as input to max pooling layer, which we don’t want to and we might not be able to extract useful information from the input sentence.

as far as I am aware of, optimal filter size will be less than 10. Please correct me if I am wrong

It’s the requirement of the assignment.

Per batch, the output of the convolutional layer is 64 sentences (batch size) x 50 kernels x # tokens per sentence - 1 x 1. This means that per each kernel, I get a grid of 64 sentences by # tokens per sentence - 1. I want to pick the largest value in each vector of convolutional outputs corresponding to each sentence, however because the sentence length varies per batch I don’t know how to handle this.

The problem is, the batches are of variable width. The width of each batch depends on the sentence of maximum length in each batch (since any sentences that are shorter than the maximum sentence get padded by the BucketIterator). To handle the variable width of the batches, the MaxPool is used to pick out the largest number in each sentence, thereby producing a vector of length of the number of sentences and width of 1.

okay, have you tried setting include_lengths=True while declaring Text Field, which will give length of actual sentence.

With this batch.text wil return text and text length, pass these two values into forward method.Hope this helps

Yes I did include them. However, when my BucketIterator sorts the sentences into batches, the batches are in random order when I call “text, text_length = batch.text”, so there’s no way I can predict which batch I will get beforehand. I am able to extract the sentence lengths, but I can only do that once I call “text, text_length = batch.text”. However, I need to be able to initialize the model outside of the two for loops, so I need to be able to predict beforehand what the order of the batches are.