Today I encountered something very strange. Running the exact same training command on a 16GB V100 GPU and a 32GB V100 GPU, the former shows max mem: 11229 while the latter shows max mem: 20109, where the command for printing was
Yes, this behavior might be expected as different libraries could pick different kernels using different internal workspaces.
E.g. cuDNN would use its heuristics or profile different kernels (the behavior depends on torch.backends.cudnn.benchmark being False or True, respectively) to select the best kernel, while meeting the memory requirements.
You could try to disable cuDNN and see if the change in memory usage is still visible.