I’m using cuda 11 and a RTX 2070 graphic card. I have trained a few smaller data set without problem using similar code. This time I have a data set of a size 200,000X17 and got this error message: CUDA out of memory. Tried to allocate 98.07 GiB (GPU 0; 8.00 GiB total capacity; 161.76 MiB already allocated; 6.29 GiB free; 188.00 MiB reserved in total by PyTorch)

My code:

class NNTrain(nn.Module):
def __init__(self):
super(NNTrain, self).__init__()
self.l1 = nn.Linear(n, nodes)
self.l5 = nn.Linear(nodes, 1)
def forward(self, inputs):
x = F.relu(self.l1(inputs))
x = self.l5(x)
return x
for j in range(l):
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.2, shuffle = True, random_state=j)
X_train_tensor = torch.from_numpy(X_train).float().cuda()
Y_train_tensor = torch.from_numpy(Y_train).float().cuda()
X_test_tensor = torch.from_numpy(X_test).float().cuda()
Y_test_tensor = torch.from_numpy(Y_test).float().cuda()
epochs = 10
LEARNING_RATE = 0.001
model = NNTrain()
model.cuda()
optimizer = torch.optim.Adam(model.parameters(), lr = LEARNING_RATE)
model.train()
for i in range(epochs):
optimizer.zero_grad()
Y_pred = model(X_train_tensor)
loss = nn.L1Loss()(Y_pred, Y_train_tensor)
loss.backward()
optimizer.step()
print(loss)

It looks like you’re trying to put your whole training dataset onto the GPU memory. Networks are usually trained using batches of sizes: 16, 32, 64, … – depending on your GPU memory, but also other factors; and it doesn’t have to be 2^x values either :).

You might want to use batches, and only put each batch onto the GPU. Something like

for i in range(epochs):
for X_batch, Y_batch in get_next_batch(X_train_tensor, Y_train_tensor):
X_batch, Y_batch = X_batch.cuda(), Y_batch.cuda()
Y_batch_pred = model(X_batch)

get_next_batch() is just a placeholder function that creates the batches.

Thanks for the reply! I tried your method, the original allocate memory error is gone. Now I get out of memory error half way in the first epochs. I may need to study more about the issue.

Can you check if this helps? I cannot see anything else immediately that would cause a memory issue.

I also assume that the 16 in range(16) is handcrafted to fit your dataset. But it seems odd, given that you have 200k samples. What’s your batch_size? Because currently you go only through 16*batch_size samples. Maybe just let the inner for loop run indefinitely and break if len(X_batch) == 0.

Thank you! It works! I have about 200K data points and am holding out 20% so it’s about 160K in the training set. Also changed batch size to 10K, I was using 1K which is wrong.