MaxPool2d error instead of 1d

mpourreza · February 12, 2020, 4:35am

I run the following code:

class Net(nn.Module):
    def __init__(self, vocab_size, embedding_size):
        torch.manual_seed(0)
        super(Net, self).__init__()
        self.word_embeddings = nn.Embedding(vocab_size, embedding_size)
        self.conv1 = nn.Conv1d(embedding_size, 64, 3)
        self.drop1 = nn.Dropout(0.5)
        self.max_pool1 = nn.MaxPool1d(2)
        self.flat1 = nn.Flatten()
        self.fc1 = nn.Linear(64*99, 100)
        self.fc2 = nn.Linear(100, 1)

    def forward(self, sentence):
        embedding = self.word_embeddings(sentence).permute(0, 2, 1)
        conv1 = F.relu(self.conv1(embedding))
        drop1 = self.drop1(conv1)
        max_pool1 = self.max_pool1(drop1)
        flat1 = self.flat1(max_pool1)
        fc1 = F.relu(self.fc1(flat1))
        fc2 = torch.sigmoid(self.fc2(fc1))
        return fc2

net = Net(vocab_size, EMBEDDING_SIZE)
EPOCHS = 10
net.cuda()
criterion = nn.BCELoss()
optimizer = optim.Adam(net.parameters(), lr=0.001)
loader = DataLoader(train, batch_size=32)

net.train()
for epoch in range(EPOCHS):
    progress = tqdm_notebook(loader, leave=False)
    for inputs, target in progress:
        net.zero_grad()
        output = net(inputs.to(device))
        loss = criterion(output, target.to(device))
        loss.backward()
        optimizer.step()
    print(loss)

and I get this error:
RuntimeError: max_pool2d_with_indices_out_cuda_frame failed with error code 0

I do not have any MaxPool2ds.

ptrblck · February 12, 2020, 6:52am

Could you post an executable code snippet to reproduce this issue, please?
You don’t need to provide the real data. Just random values with the right type and shape should be sufficient to run the code.

mpourreza · February 12, 2020, 6:36pm

Thank you for your response. I tried the following code to regenerate the error:

import pandas as pd
import pickle
import torch
from keras.preprocessing.text import Tokenizer
from keras.preprocessing.sequence import pad_sequences
import numpy as np
import torch.nn as nn
import torch.nn.functional as F
from tqdm import tqdm, tqdm_notebook
import torch.optim as optim
from torch.utils.data import DataLoader

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

vocab_size = 257945
EMBEDDING_SIZE = 100
sequences = np.random.randint(0, vocab_size, size=(1188946, 200))
sequences = torch.from_numpy(sequences)
sequences = sequences.type(torch.long)
labels = np.random.randint(0, 2, size=(1188946, 1))
labels = torch.from_numpy(labels)
labels = labels.type(torch.float)
train = list(zip(sequences, labels))

But it does not show the error and it works. However, when I run my original code, I still get the error. All of the data dimensions are the same. My original code is as follows:

MAX_LEN = 200
OOV_TOKEN = '<OOV>'
TRUNCATE_MODE = 'post'
PADDING_MODE = 'post'
EMBEDDING_SIZE = 100

tokenizer = Tokenizer(oov_token=OOV_TOKEN)
tokenizer.fit_on_texts(sentences)
sequences = tokenizer.texts_to_sequences(sentences)
sequences = pad_sequences(sequences, maxlen=MAX_LEN, padding=PADDING_MODE, truncating=TRUNCATE_MODE)
sequences = torch.from_numpy(sequences)
sequences = sequences.type(torch.long)
vocab_size = len(tokenizer.word_index)

labels = np.array([some formula to find the labels])
labels = torch.from_numpy(np.reshape(labels, (-1, 1)))
labels = labels.type(torch.float)

train = list(zip(sequences, labels))

I would appreciate it if you could help me with this issue.

ptrblck · February 13, 2020, 5:36am

In your second code snippet you are not using the model, so where does the pooling operation come from?
Anyway, could you post the shapes of drop1 in your forward method?
Also, could you update to the latest nightly build, as we’ve seen this error for an invalid grid size calling into the pooling kernel, which should have been fixed already.

David_Brown · February 15, 2020, 10:50pm

I too am seeing exactly the same error when trying to run my model on the GPU. The problem occurs in MaxPool2D. Did you figure it how to fix it?

David_Brown · February 15, 2020, 11:02pm

I figured out that this error happens when the number of channels is too large (256 in my case.) I reduced to 128 channels and it worked.

mpourreza · February 15, 2020, 11:03pm

Thank you @ptrblck for your help. I reduced the vocab_size and it worked for me. However, with the large values of vocab_size, I still get the same error.

YongWookHa · February 17, 2020, 1:15am

I had same error and struggled with that for a while.
At the end, I figured out my solution for this error.

In my case, RuntimeError: max_pool2d_with_indices_out_cuda_frame failed with error code 0 error occured, because of the input data was allocated in ‘cpu’.

I allocated it to ‘cuda’ and the problem has been fixed now.

So, if you are using GPU, I guess you can try allocating your input data to ‘gpu device’ like below.

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
...
labels = torch.from_numpy(np.reshape(labels, (-1, 1)))
labels = labels.type(torch.float)
labels = labels.to(device)

train = list(zip(sequences, labels))

I hope this works.

mpourreza · February 18, 2020, 7:35pm

@YongWookHa I allocate the inputs to the GPU device but I still get the same error. I have updated the code. The only way that it works is when is reduce the vocab_size to 50000.

import pandas as pd
import pickle
import torch
from keras.preprocessing.text import Tokenizer
from keras.preprocessing.sequence import pad_sequences
import numpy as np
import torch.nn as nn
import torch.nn.functional as F
from tqdm import tqdm, tqdm_notebook
import torch.optim as optim
from torch.utils.data import DataLoader

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

MAX_LEN = 200
OOV_TOKEN = '<OOV>'
TRUNCATE_MODE = 'post'
PADDING_MODE = 'post'
EMBEDDING_SIZE = 100

tokenizer = Tokenizer(oov_token=OOV_TOKEN)
tokenizer.fit_on_texts(sentences)
sequences = tokenizer.texts_to_sequences(sentences)
sequences = pad_sequences(sequences, maxlen=MAX_LEN, padding=PADDING_MODE, truncating=TRUNCATE_MODE)
sequences = torch.from_numpy(sequences)
sequences = sequences.type(torch.long)
vocab_size = len(tokenizer.word_index)

labels = np.array([some formula to find the labels])
labels = torch.from_numpy(np.reshape(labels, (-1, 1)))
labels = labels.type(torch.float)

train = list(zip(sequences, labels))

class Net(nn.Module):
    def __init__(self, vocab_size, embedding_size):
        torch.manual_seed(0)
        super(Net, self).__init__()
        self.word_embeddings = nn.Embedding(vocab_size, embedding_size)
        self.conv1 = nn.Conv1d(embedding_size, 64, 3)
        self.drop1 = nn.Dropout(0.5)
        self.max_pool1 = nn.MaxPool1d(2)
        self.flat1 = nn.Flatten()
        self.fc1 = nn.Linear(64*99, 100)
        self.fc2 = nn.Linear(100, 1)

    def forward(self, sentence):
        embedding = self.word_embeddings(sentence).permute(0, 2, 1)
        conv1 = F.relu(self.conv1(embedding))
        drop1 = self.drop1(conv1)
        max_pool1 = self.max_pool1(drop1)
        flat1 = self.flat1(max_pool1)
        fc1 = F.relu(self.fc1(flat1))
        fc2 = torch.sigmoid(self.fc2(fc1))
        return fc2

net = Net(vocab_size, EMBEDDING_SIZE)
EPOCHS = 10
net.cuda()
criterion = nn.BCELoss()
optimizer = optim.Adam(net.parameters(), lr=0.001)
loader = DataLoader(train, batch_size=32)

net.train()
for epoch in range(EPOCHS):
    progress = tqdm_notebook(loader, leave=False)
    for inputs, target in progress:
        net.zero_grad()
        output = net(inputs.to(device))
        loss = criterion(output, target.to(device))
        loss.backward()
        optimizer.step()
    print(loss)

net = Net(vocab_size, EMBEDDING_SIZE)
EPOCHS = 10
net.cuda()
criterion = nn.BCELoss()
optimizer = optim.Adam(net.parameters(), lr=0.001)
loader = DataLoader(train, batch_size=32)

net.train()
for epoch in range(EPOCHS):
    progress = tqdm_notebook(loader, leave=False)
    for inputs, target in progress:
        net.zero_grad()
        output = net(inputs.to(device))
        loss = criterion(output, target.to(device))
        loss.backward()
        optimizer.step()
    print(loss)

Yash_Belhe · February 19, 2020, 2:00am

I also have the same error - none of the fixes above work for me. This doesn’t seem to be a very popular bug, is it something new?

Yash_Belhe · February 19, 2020, 2:41am

I was able to fix this error by downgrading to PyTorch 1.2.0 - this seems like an error with the current PyTorch version (i.e 1.4.0)

voronoi · March 14, 2020, 1:03pm

THANK YOU FOR SHARING THIS.
I’ve been struggling for this problem 2 days and downgraded torch like you said.
The problem solved for now. Hope this will patched soon.