I would like to add a tensor to a model so that model.to(device)
also moves the tensor to the device. register_buffer
seems to do this; however, I don’t want the tensor to be inside the model’s state_dict
. Is there a method to do this?
My specific application is that I have a model that uses a transformer encoder. For convenience, it creates a casual mask and precomputes a positional encoding matrix in its constructor which is used in forward
. I want the mask and positional encoding to move to the device that the model is moved to, but when I save the model to disk with torch.save(model.state_dict())
I don’t want those tensors wasting space.