Use and Abuse of .register_buffer( )

mbp28 · June 19, 2017, 9:31am

Hi,

I have some trouble understanding the use of register_buffer().
I found just a little bit of explanation in the docs, mentioning “running_mean” in BatchNorm.

My questions are:

When should I register a buffer? For what sort of Variables and for which not?
Could someone provide me with a simple example and code snippet of using register_buffer()?
[3.] At the moment, I’m running some tests on an implementation of a custom gradient, which I subsequently modify. record the gradients before and after modification in two separate lists. Is that something I should consider register_buffer for to make the code cleaner? I guess not, if the buffer only holds one state at a time…

Any help much appreciated.
Many thanks,
Max

smth · June 22, 2017, 5:12am

you use register_buffer when:

you want a stateful part of your model that is not a parameter, but you want it in your state_dict
https://github.com/pytorch/pytorch/blob/master/torch/nn/modules/batchnorm.py#L23-L24
registered buffers are Tensors (not Variables)

igreen · June 22, 2018, 9:45pm

Hi, using the dataparallel to train the model in the multi-gpu mode, BN is conducted in device-wise manner. how running mean and running variance is estimated? also in device-wise manner? or one master and multi replica? Thanks

lumin_liu · March 26, 2019, 2:07pm

A question about the register buffer. Can we delete the buffer after we register and how ?

mingzhe_lv · August 19, 2022, 8:13am

I have the same question.

Saleh · February 15, 2023, 10:22am

I have a question regarding multi-gpu training of a transformer model in which I have a register buffer variable for causal attention masking. in forward pass all 4 GPUs are being used (checked with nvidia-smi), however I get an error when it arrives to loss.backward(). Im getting this error : “RuntimeError: You are trying to call the hook of a dead Module!”. Can this issue be related to this register buffer or masking operation?

My code is working properly with 1 GPU or only on a CPU.