The memory usage might be expected and you could take a look at e.g. this post to get some information about the size of the model parameters as well as the intermediate forward activations.
The memory usage might be expected and you could take a look at e.g. this post to get some information about the size of the model parameters as well as the intermediate forward activations.