I am trying to train a neural network for a very large input (5*100,000,000) and it requires much more memory than expected.
Here is some minimal example:
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
import time
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
self.conv1 = nn.Conv1d(in_channels=5, out_channels=1, kernel_size=100000000, stride=10)
def forward(self, x):
x = self.conv1(x)
x = torch.sigmoid(x)
return x
model = Net().cuda()
optimizer = optim.Adam(model.parameters(), lr=0.001)
criterion = torch.nn.BCELoss()
data = torch.normal(torch.zeros(1,5,100000000),torch.ones(1,5,100000000))
data = data.cuda()
label = torch.ones(1,1,1)
label = label.cuda()
for epoch in range(10):
output = model(data)
loss = criterion(output, label)
optimizer.zero_grad()
loss.backward()
optimizer.step()
print("Epoch :", epoch)
The input is some random data, it uses approximately 2Gb, as expected (32 bit * 5 * 100,000,000= 1.86Gb).This variable has no gradient.
The network consists of a single convolutional layer with one filter of the same size as an input, so it has 500M weights, that is another 2Gb.
After the forward pass another 2Gb get used.
After loss.backprop()
8Gb are used, after optimizer.step()
12 Gb are used, that is all the available memory.
During the second epoch forward pass runs ok, but during backpropagation I get RuntimeError: CUDA error: out of memory.
What exactly is saved in GPU memory during the epoch? Why the memory is not released after the optimization step is finished? How to reduce memory usage in this case?