Help with CUDA memory allocation during forward Linear

Hey,

I have a follow-up question regarding the cuBLAS workspace size. When I observe GPU memory usage during fine-tuning a BERT-large model, I Initially observe the expected 8.125 MiB allocation for the cuBLAS workspace on my GPU. However, after completing the first forward pass in the fine-tuning process, I noticed an additional 8.125 MiB being allocated, resulting in a total of 16.25 MiB for cuBLAS workspace memory usage.

Could this indicate that the cuBLAS workspace size is dynamically adjusted during the fine-tuning process? If so, what factors might contribute to this increase in the cuBLAS workspace size? Your insights on this would be greatly appreciated. Thank you for your assistance!