I am new to using the half-precision for tensors in PyTorch. So I had a very basic question if it’s possible that in my neural network model I can have some variables as half tensors and some as normal full precision tensors?
Basically my model is taking too much memory so instead of decreasing the batch size, I wanted to check if it’s possible to make some variables as half-precision.
Yes, you can change the dtype of a tensor manually and make sure that the performed operations use expected input types.
While this manual approach would be feasible, I would recommend to have a look at the automatic mixed-precision training, which provides utility functions such as automatic casting and gradient scaling.
I tried automatic mixed-precision training with autocast, but by directly plugging in the autocast, it doesn’t significantly reduce memory usage. I think it is due to that autocast works better with certain operations.
I also tried model.half().to(‘cuda’) which also doesn’t show significant memory save. Is it supposed to significantly reduce memory used by model?
Yes, calling .half() on the model as well as the inputs would give you an estimate of how much memory would be saved in the extreme case of using float16 for all operations. Make sure to check the memory usage via torch.cuda.memory_summary(), since nvidia-smi will also show the cache not only the allocated memory.
I tried with and without .half() on my model, the outputs of torch.cuda.memory_summary() seem to have no difference on Cur Usage | Peak Usage | Tot Alloc | Tot Freed. Could you provide an example model that shows significant memory reduce after set data type to half?
Also, is there an documents explaining the meaning of Allocated memory, Active memory, GPU reserved memory and Non-releasable memory? Thanks.