Hi all,
I recently tried torch.set_deterministic(True) and observe that it could reduce the GPU memory usage of backward!
If I use mixed precision, the memory won’t reduce by torch.set_deterministic(True).
Can anyone tell me why these happen?
(My torch version: 1.7.0)
*** The code to reproduce my result is as the following.
You can turn on/off use_amp
and deterministic
***
import torch
import torchvision
device = 'cuda'
use_amp = True
deterministic = False
# Set deterministic
os.environ['CUBLAS_WORKSPACE_CONFIG'] = ':4096:8' # Deterministic behavior of torch.addmm. Please refer to https://docs.nvidia.com/cuda/cublas/index.html#cublasApi_reproducibility
torch.set_deterministic(deterministic)
# Initialize model
model = torchvision.models.resnet50().to(device)
optim = torch.optim.Adam(model.parameters(), lr=1e-4)
scaler = torch.cuda.amp.GradScaler(enabled=use_amp)
# Forward
x = torch.rand(16, 3, 800, 800).to(device)
with torch.cuda.amp.autocast(enabled=use_amp):
loss = model(x).sum()
# Backward
optim.zero_grad()
loss.backward()
optim.step()
print(torch.cuda.max_memory_allocated() / 1024 ** 2)
Thx!