Pytorch try to allocate huge amount of memory in GPU

I’m using cuda 11 and a RTX 2070 graphic card. I have trained a few smaller data set without problem using similar code. This time I have a data set of a size 200,000X17 and got this error message:
CUDA out of memory. Tried to allocate 98.07 GiB (GPU 0; 8.00 GiB total capacity; 161.76 MiB already allocated; 6.29 GiB free; 188.00 MiB reserved in total by PyTorch)

My code:

class NNTrain(nn.Module):
    def __init__(self):
        super(NNTrain, self).__init__()
        self.l1 = nn.Linear(n, nodes)
        self.l5 = nn.Linear(nodes, 1)
    
    def forward(self, inputs):
        x = F.relu(self.l1(inputs))
        x = self.l5(x)
        return x

for j in range(l):
    X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.2, shuffle = True, random_state=j)
    X_train_tensor = torch.from_numpy(X_train).float().cuda()
    Y_train_tensor = torch.from_numpy(Y_train).float().cuda()
    X_test_tensor = torch.from_numpy(X_test).float().cuda()
    Y_test_tensor = torch.from_numpy(Y_test).float().cuda()
    
    epochs = 10
    LEARNING_RATE = 0.001

    model = NNTrain()
    model.cuda()
    optimizer = torch.optim.Adam(model.parameters(), lr = LEARNING_RATE)

    model.train()
    for i in range(epochs):
        optimizer.zero_grad()
        Y_pred = model(X_train_tensor)
        loss = nn.L1Loss()(Y_pred, Y_train_tensor)
        loss.backward()
        optimizer.step()
    print(loss)

Any help is greatly appreciated!

It looks like you’re trying to put your whole training dataset onto the GPU memory. Networks are usually trained using batches of sizes: 16, 32, 64, … – depending on your GPU memory, but also other factors; and it doesn’t have to be 2^x values either :).

You might want to use batches, and only put each batch onto the GPU. Something like

for i in range(epochs):
    for X_batch, Y_batch in get_next_batch(X_train_tensor, Y_train_tensor):
        X_batch, Y_batch = X_batch.cuda(), Y_batch.cuda()
        Y_batch_pred = model(X_batch)

get_next_batch() is just a placeholder function that creates the batches.

1 Like

Thanks for the reply! I tried your method, the original allocate memory error is gone. Now I get out of memory error half way in the first epochs. I may need to study more about the issue.

Can you post your training loop? Note that you have to zero_grad() etc. for each batch now. Maybe you forgot something to move into the inner loop.

Here is my loop. Thanks for helping a noob!

X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.2, shuffle = True, random_state=0)
    
epochs = 1
LEARNING_RATE = 0.001
batch_size = 1000

model = NNTrain()
model.cuda()
optimizer = torch.optim.Adam(model.parameters(), lr = LEARNING_RATE)

model.train()
for i in range(epochs):
    for j in range(16):
        X_batch = torch.from_numpy(X_train[batch_size*j:batch_size*(j+1),:]).float().cuda()
        Y_batch = torch.from_numpy(Y_train[batch_size*j:batch_size*(j+1),:]).float().cuda()
        Y_batch_pred = model(X_batch)
        optimizer.zero_grad()
        loss = nn.L1Loss()(Y_pred, Y_train_tensor)
        loss.backward()
        optimizer.step()
    print(loss)

This line presumably needs to be

loss = nn.L1Loss()(Y_batch_pred, Y_batch)

Can you check if this helps? I cannot see anything else immediately that would cause a memory issue.

I also assume that the 16 in range(16) is handcrafted to fit your dataset. But it seems odd, given that you have 200k samples. What’s your batch_size? Because currently you go only through 16*batch_size samples. Maybe just let the inner for loop run indefinitely and break if len(X_batch) == 0.

1 Like

Thank you! It works! I have about 200K data points and am holding out 20% so it’s about 160K in the training set. Also changed batch size to 10K, I was using 1K which is wrong.