Hi! I’m inexperienced with PyTorch and I’m having the following problem. I’m doing word vector processing very similar to this:
https://www.cs.toronto.edu/~lczhang/aps360_20191/hw/a5/a5.html
I’m working on my training loop right now. I cannot post my entire program as it is part of an assignment, but I will post the relevant parts and explain the rest. My training set is a dataset made up of 6400 sentences. I apply the SpaCy tokenizer to the whole thing, and then split it into batches of size 64 sentences (args.batch_size) per batch, using the BucketIterator in the following way:
train_iter, val_iter, test_iter = data.BucketIterator.splits(
(train_data, val_data, test_data),
batch_sizes=(args.batch_size, args.batch_size, args.batch_size),
sort_key=lambda x: len(x.text), device=None, sort_within_batch=True, repeat=False)
Since the BucketIterator groups the sentences into the batches according to the number of tokens per sentence, this means I have batches of variable width. This is giving me a lot of problems in my training loop when it comes to constructing the CNN model.
In my CNN init, I first convert my tokens into word vectors using the GloVe pretrained word vectors from glove.6B.100d.txt in an embedding layer. I then have a convolutional layer like so:
self.conv1 = nn.Conv2d(in_channels=1, out_channels=n_filters, kernel_size=(filter_sizes[0], embedding_dim))
where n_filters = 50, filter_sizes[0]=2 and embedding_dim is the size of the word vectors, 100. I also have a max pooling layer like so:
batch_width = filter_sizes[2]
filter_size_2 = batch_width - filter_sizes[0] + 1
self.pool1 = nn.MaxPool2d(kernel_size=(filter_size_2, 1), stride=1)
where the filter_sizes[2] is the width of the batch (ie. the number of tokens in the longest sentence of the batch). The only time I know the value of filter_sizes[2] is after I call the following parts:
for epoch in range(args.epochs):
for i, batch in enumerate(train_iter):
text, text_length = batch.text
filter_sizes = [2, 4, text_length.max().item()]
model = CNN(emb_dim, vocab, num_filt, filter_sizes)
The text_length.max().item() gives me the width of each batch, whenever I’m training on the batch. I found I cannot move the model outside of the two for loops because I won’t know what the width of the batches are. But putting the model inside the loop is wrong too, because with each batch, the model gets re-initialized and my loss ends up not changing per each epoch.
Please help, what am I doing wrong?