OutOfMemoryError on A100 GPU

Why do think it’s not possible?
The memory usage is not only defined by the inputs and parameters of the model, but also by the intermediate forward actiations. Especially conv layers use often a tiny filter kernel creating potentially huge outputs which need to be stored for the gradient calculation.

This post gives you a simple example.