GPU memory footprints are different between 16GB and 32GB V100

Today I encountered something very strange. Running the exact same training command on a 16GB V100 GPU and a 32GB V100 GPU, the former shows max mem: 11229 while the latter shows max mem: 20109, where the command for printing was

print(f'max mem: {torch.cuda.max_memory_allocated() / MB:.0f}')

Is this behavior normal?
Thank you in advance for your replies!

Yes, this behavior might be expected as different libraries could pick different kernels using different internal workspaces.
E.g. cuDNN would use its heuristics or profile different kernels (the behavior depends on torch.backends.cudnn.benchmark being False or True, respectively) to select the best kernel, while meeting the memory requirements.
You could try to disable cuDNN and see if the change in memory usage is still visible.

Interesting. Thanks for the answer, @ptrblck !