Hi, I am using the latest pytorch master branch and am iteratively sending images to a neural net and saving the output predictions to a video file.
Due to the iterative nature, after a few runs my puny GPU runs out of memory, so I tried adding this to each iteration to free caches:
torch.cuda.empty_cache()
But that had no effect. The only solution that worked is very slowly recreating the model in each for loop iteration:
while True:
model = ENet(num_classes).to(device)
optimizer = optim.Adam(model.parameters())
model = utils.load_checkpoint(model, optimizer, model_path, model_name)[0]
model.eval()
# unrelated code then pulls camera images as my "input"
with torch.no_grad():
predictions = model(input)
# Predictions is one-hot encoded with "num_classes" channels.
# Convert it to a single int using the indices where the maximum (1) occurs
_, predictions = torch.max(predictions.data, 1)
label_to_rgb = transforms.Compose([
ext_transforms.LongTensorToRGBPIL(class_encoding),
transforms.ToTensor()
])
color_predictions = utils.batch_transform(predictions.cpu(), label_to_rgb)
fig, (ax1, ax2) = plt.subplots(2, 1, figsize=(15, 7))
ax1.imshow(np.transpose(torchvision.utils.make_grid(input.data.cpu()).numpy(), (1, 2, 0)))
ax2.imshow(np.transpose(torchvision.utils.make_grid(color_predictions).numpy(), (1, 2, 0)))
So how can I aggressively drop all consumed memory without having to recreate the model from scratch. Could this be indicative of a memory leak?
Hi and thank you but that doesn’t solve it. Why does my approach by recreating the model fix the issue? Could there be some state/cache the model is holding onto?