After training one model, I tried to use it calculate the gradient of loss of test data, but it kept increasing the allocated CUDA memory when going through test loader, finally CUDA out of memory.

```
def show_mem_usage(device=1):
gpu_stats = gpustat.GPUStatCollection.new_query()
item = gpu_stats.jsonify()["gpus"][device]
print('Used/total: ' + "{}/{}".format(item["memory.used"], item["memory.total"]))
def compute_grad_testdata(model, test_loader):
model.eval()
batch = 1
loss = 0
show_mem_usage()
for x_test, y_test in test_loader:
show_mem_usage()
with torch.cuda.device(CUDA_VISIBLE_DEVICES):
x_test = x_test.cuda()
y_test = y_test.cuda()
score, feature, _ = model(x_test)
batch_loss = calc_loss(score, y_test)
loss += batch_loss
batch += 1
show_mem_usage()
model.zero_grad()
show_mem_usage()
loss = loss/batch
params = [p for p in model.parameters() if p.requires_grad]
return list(grad(loss, params, create_graph=True))
```

```
Used/total: 3362/24268
Used/total: 3362/24268
Used/total: 3362/24268
Used/total: 3362/24268
Used/total: 3366/24268
Used/total: 3366/24268
Used/total: 3370/24268
Used/total: 3370/24268
Used/total: 3376/24268
Used/total: 3376/24268
Used/total: 3380/24268
Used/total: 3380/24268
Used/total: 3384/24268
Used/total: 3384/24268
Used/total: 3390/24268
Used/total: 3390/24268
Used/total: 3526/24268
Used/total: 3526/24268
Used/total: 3680/24268
Used/total: 3680/24268
Used/total: 3848/24268
Used/total: 3848/24268
Used/total: 4060/24268
Used/total: 4060/24268
Used/total: 4214/24268
Used/total: 4214/24268
Used/total: 4426/24268
Used/total: 4426/24268
Used/total: 4578/24268
Used/total: 4578/24268
Used/total: 4792/24268
Used/total: 4792/24268
Used/total: 4944/24268
Used/total: 4944/24268
Used/total: 5158/24268
Used/total: 5158/24268
Used/total: 5310/24268
Used/total: 5310/24268
Used/total: 5522/24268
Used/total: 5522/24268
Used/total: 5676/24268
Used/total: 5676/24268
Used/total: 5888/24268
Used/total: 5888/24268
Used/total: 6040/24268
Used/total: 6040/24268
Used/total: 6254/24268
Used/total: 6254/24268
Used/total: 6406/24268
Used/total: 6406/24268
Used/total: 6620/24268
Used/total: 6620/24268
Used/total: 6772/24268
Used/total: 6772/24268
Used/total: 6984/24268
Used/total: 6984/24268
Used/total: 7138/24268
Used/total: 7138/24268
Used/total: 7350/24268
Used/total: 7350/24268
Used/total: 7504/24268
Used/total: 7504/24268
Used/total: 7716/24268
Used/total: 7716/24268
Used/total: 7868/24268
Used/total: 7868/24268
Used/total: 8082/24268
Used/total: 8082/24268
Used/total: 8234/24268
Used/total: 8234/24268
Used/total: 8446/24268
Used/total: 8446/24268
Used/total: 8600/24268
Used/total: 8600/24268
Used/total: 8812/24268
Used/total: 8812/24268
Used/total: 8966/24268
Used/total: 8966/24268
Used/total: 9178/24268
Used/total: 9178/24268
Used/total: 9330/24268
Used/total: 9330/24268
Used/total: 9544/24268
Used/total: 9544/24268
Used/total: 9696/24268
Used/total: 9696/24268
Used/total: 9908/24268
Used/total: 9908/24268
Used/total: 10062/24268
Used/total: 10062/24268
Used/total: 10274/24268
Used/total: 10274/24268
Used/total: 10428/24268
Used/total: 10428/24268
Used/total: 10640/24268
Used/total: 10640/24268
Used/total: 10792/24268
Used/total: 10792/24268
Used/total: 11006/24268
Used/total: 11006/24268
Used/total: 11158/24268
Used/total: 11158/24268
Used/total: 11372/24268
Used/total: 11372/24268
Used/total: 11524/24268
Used/total: 11524/24268
Used/total: 11736/24268
Used/total: 11736/24268
Used/total: 11890/24268
Used/total: 11890/24268
Used/total: 12102/24268
Used/total: 12102/24268
Used/total: 12254/24268
Used/total: 12254/24268
Used/total: 12468/24268
Used/total: 12468/24268
Used/total: 12620/24268
Used/total: 12620/24268
Used/total: 12834/24268
Used/total: 12834/24268
Used/total: 12986/24268
Used/total: 12986/24268
Used/total: 13198/24268
Used/total: 13198/24268
Used/total: 13352/24268
Used/total: 13352/24268
Used/total: 13564/24268
Used/total: 13564/24268
Used/total: 13716/24268
Used/total: 13716/24268
Used/total: 13930/24268
Used/total: 13930/24268
Used/total: 14082/24268
Used/total: 14082/24268
Used/total: 14296/24268
Used/total: 14296/24268
Used/total: 14448/24268
Used/total: 14448/24268
Used/total: 14660/24268
Used/total: 14660/24268
Used/total: 14814/24268
Used/total: 14814/24268
Used/total: 15026/24268
Used/total: 15026/24268
Used/total: 15180/24268
Used/total: 15180/24268
Used/total: 15392/24268
Used/total: 15392/24268
Used/total: 15544/24268
Used/total: 15544/24268
Used/total: 15758/24268
Used/total: 15758/24268
Used/total: 15910/24268
Used/total: 15910/24268
Used/total: 16122/24268
Used/total: 16122/24268
Used/total: 16276/24268
Used/total: 16276/24268
Used/total: 16288/24268
Used/total: 16288/24268
........
RuntimeError: CUDA out of memory. Tried to allocate 32.00 MiB (GPU 0; 23.70 GiB total capacity; 22.07 GiB already allocated; 7.69 MiB free; 22.34 GiB reserved in total by PyTorch)
```

I know that usually for testing, we need use “with torch.no_grad()” such that it will set requires_grad=False for new Tensors, I have tried this, it is true that it wont use more memory, but if I do this in my case I can not use autograd.grad to calculate the gradient. So any solution for calculate gradient on test data without increasing allocating CUDA memory. Thanks.