When I train CNN on my dataset, it takes about 7 GB GPU memory in (epoch 0, training) phrase. However, it takes 11 GB since (epoch0, testing) phrase, and it keeps 11 GB from then on. It seems that some memory are not released when switching between training and testing.
Due to GPU limits, can I release unused memory to allow larger batch_size?
Here is my training process (from official example):
for epoch in range(args.start_epoch, args.epochs):
if args.distributed:
train_sampler.set_epoch(epoch)
adjust_learning_rate(optimizer, epoch)
# train for one epoch
train(train_loader, model, criterion, optimizer, epoch)
# evaluate on validation set
prec1 = validate(val_loader, model, criterion)
# remember best prec@1 and save checkpoint
is_best = False #prec1 > best_prec1
best_prec1 = max(prec1, best_prec1)
if epoch % args.save_freq == 0:
save_checkpoint({
'epoch': epoch + 1,
'arch': args.arch,
'state_dict': model.state_dict(),
'best_prec1': best_prec1,
'optimizer' : optimizer.state_dict(),
'loss_acc1':plot_statistic,
}, is_best)
I guess it’s pytorch’s problem. The version I used is 0.4.0 at present. When I downgrade it to 0.3.0.post4, the training memory usage is about 7 GB, and the testing usage is about 2 GB. Anyone who knows the difference between these two versions?
I replace volatile=True with with torch.no_grad():, and loss.data[0] with loss.item(). But testing still costs similar memory.
Are you running out of memory? Maybe the memory is just cached and looks like it’s used in nvidia-smi.
You can find more information about the memory management here.
If you want to release the cached memory to the OS, you could call torch.cuda.empty_cache().