Hello,
Here’s a simple code:
class Identity(nn.Module):
def __init__(self):
super().__init__()
def forward(self, x):
return x
class CNNLSTM(nn.Module):
def __init__(self):
super(CNNLSTM, self).__init__()
self.BackBone = models.resnet18(pretrained= True)
num_ftrs = self.BackBone.fc.in_features
self.BackBone.fc = Identity()
self.lstm = nn.LSTM(512, 512, batch_first = True)
self.fc = nn.Linear(512, 1)
def forward(self, video):
batch_size, time_steps, C, H, W = video.size()
c_in = video.view(batch_size * time_steps, C, H, W)
c_out = self.BackBone(c_in)
r_in = c_out.view(batch_size, time_steps, -1)
r_out, _ = self.lstm(r_in)
output = torch.sigmoid(self.fc(r_out[:, -1, :]))
return output.squeeze()
If I run the training loop without actually using the model, means only fetching the batches for a number of epochs my GPU uses 1.1 GB out of 4.0 GB.
means I only do this:
for epoch in range(num_epochs):
print('Epoch {}/{}'.format(epoch+1, num_epochs))
for i, (inputs, labels) in enumerate(dataloader):
inputs = inputs.to(device, non_blocking=True)
labels = labels.to(device , non_blocking=True)
If I call my model however on the batch, I get CUDA out of memory. Means if I only add these to my loop:
optimizer.zero_grad()
outputs = model(inputs)
where does the 3 GB go! do the gradients occupy memory. Because I thought once you call the model in:
model = CNNLSTM()
model.to(device)
The graphs and the gradients have already allocated their memory?
Thank you