Using half precision

s_n · September 3, 2020, 1:21am

Hi,

I am new to using the half-precision for tensors in PyTorch. So I had a very basic question if it’s possible that in my neural network model I can have some variables as half tensors and some as normal full precision tensors?

Basically my model is taking too much memory so instead of decreasing the batch size, I wanted to check if it’s possible to make some variables as half-precision.

Thanks!

ptrblck · September 4, 2020, 9:46am

Yes, you can change the dtype of a tensor manually and make sure that the performed operations use expected input types.
While this manual approach would be feasible, I would recommend to have a look at the automatic mixed-precision training, which provides utility functions such as automatic casting and gradient scaling.

s_n · September 21, 2020, 5:03pm

Hi,

While using amp with my custom forward function using autograd.function. I am getting below error:

NameError: name ‘custom_fwd’ is not defined

I have added @custom_fwd decorator above my forward function.

ptrblck · September 22, 2020, 5:10am

What does print(torch.cuda.amp.custom_fwd) return?

ericlormul · March 8, 2021, 8:18am

I encounter similar problem, need to save memory.

I tried automatic mixed-precision training with autocast, but by directly plugging in the autocast, it doesn’t significantly reduce memory usage. I think it is due to that autocast works better with certain operations.

I also tried model.half().to(‘cuda’) which also doesn’t show significant memory save. Is it supposed to significantly reduce memory used by model?

ptrblck · March 8, 2021, 8:44am

Yes, calling .half() on the model as well as the inputs would give you an estimate of how much memory would be saved in the extreme case of using float16 for all operations. Make sure to check the memory usage via torch.cuda.memory_summary(), since nvidia-smi will also show the cache not only the allocated memory.

ericlormul · March 8, 2021, 8:15pm

I tried with and without .half() on my model, the outputs of torch.cuda.memory_summary() seem to have no difference on Cur Usage | Peak Usage | Tot Alloc | Tot Freed. Could you provide an example model that shows significant memory reduce after set data type to half?

Also, is there an documents explaining the meaning of Allocated memory, Active memory, GPU reserved memory and Non-releasable memory? Thanks.

ptrblck · March 9, 2021, 6:00am

The memory management docs would give you some more information.

I’m not able to reproduce the same memory usage and see the expected reduction by 2 using float16 for a single tensor:

x = torch.randn(1024, 1024, dtype=torch.float32, device='cuda') # switch the dtype to torch.float16

Output:

# float16:
|---------------------------------------------------------------------------|
| Allocated memory      |    2048 KB |    2048 KB |    2048 KB |       0 B  |
|       from large pool |    2048 KB |    2048 KB |    2048 KB |       0 B  |
|       from small pool |       0 KB |       0 KB |       0 KB |       0 B  |
|---------------------------------------------------------------------------|
# float32:
|---------------------------------------------------------------------------|
| Allocated memory      |    4096 KB |    4096 KB |    4096 KB |       0 B  |
|       from large pool |    4096 KB |    4096 KB |    4096 KB |       0 B  |
|       from small pool |       0 KB |       0 KB |       0 KB |       0 B  |
|---------------------------------------------------------------------------|

as well as torchvision.models.resnet18:

# float16:
|---------------------------------------------------------------------------|
| Allocated memory      |   23768 KB |   23768 KB |   23768 KB |       0 B  |
|       from large pool |   20480 KB |   20480 KB |   20480 KB |       0 B  |
|       from small pool |    3288 KB |    3288 KB |    3288 KB |       0 B  |
|---------------------------------------------------------------------------|
# float32:
|---------------------------------------------------------------------------|
| Allocated memory      |   45763 KB |   45763 KB |   45763 KB |       0 B  |
|       from large pool |   42368 KB |   42368 KB |   42368 KB |       0 B  |
|       from small pool |    3395 KB |    3395 KB |    3395 KB |       0 B  |
|---------------------------------------------------------------------------|