The gradients are the same size as the model parameters. So if you have 10 GB of parameters you need another 10 GB for gradients (and possibly extra for intermediate calculations).
The gradients are the same size as the model parameters. So if you have 10 GB of parameters you need another 10 GB for gradients (and possibly extra for intermediate calculations).