this is a newbie question (sorry…). I’m trying to speed-up my GPU training. The docs recommend pin_memory(), which works with tensors, but not with variables (torch version 0.3.0):
import torch
from torch.autograd import Variable
dtype = torch.FloatTensor
my_tensor.cuda() # → works
my_tensor.pin_memory() # → works
my_variable.cuda() # → works
my_variable.pin_memory() # → does not work
gives me the error
Traceback (most recent call last):
File "check.py", line 14, in <module>
my_variable.pin_memory() # -> does not work
File "/usr/local/lib/python2.7/dist-packages/torch/autograd/variable.py", line 67, in __getattr__
return object.__getattribute__(self, name)
AttributeError: 'Variable' object has no attribute 'pin_memory'
I wonder why this is… As my_variable.cuda() allows me to send Variables to GPU (just like tensors), I would expect pin_memory() to work for Variables too.
Is my understanding fundamentally wrong here?
What’s the correct way of pinning memory (a) for all of a model’s variables/tensors, (b) when only pinning some of the variables? (A web search gives me only examples in combination with DataLoader(), which does not apply in my case).
you can do my_variable.data.pin_memory() for now. But do make sure you understand why you are pinning memory, easy to fall into a trap there. Pinning memory is only useful for CPU Tensors that have to be moved to the GPU. So, pinning all of a model’s variables/tensors doesn’t make sense at all.
Pinning memory is only useful for CPU Tensors that have to be moved to the GPU.
OK, I understand now that this is about moving input data to the GPU.
But what about the model? Am I correct that the model (and its parameters + operations) reside / are carried out on the GPU anyway?
And if not: How can I taylor which operations to carry out on the GPU? (sorry if this is a stupid question, but I haven’t been able to dig up proper documentation for that…).
So, pinning all of a model’s variables/tensors doesn’t make sense at all.
Yes, it does. I have a complex application which has a number of Tensors which I have no idea which ones and how often there are copied back and forth from the GPU. Before I go through 10’s of thousands of lines of code I didn’t write to see if any improvements can be make I’d like to just test and see what happens. Do I realize the risks of pinning a lot of memory. Yes, I am a perf expert and can judge the footprint of the process relative to the free memory I have and at most risk crashing my system.
If I have to I’ll binary patch the assembly code using “gdb -w” on the function at::_ops::is_pinned::call and see what happens. Is this a bad thing if I suppress the pin/extra copy if the memory page gets swapped out? Yes, it is bad and will probably segv. 99% chance it won’t and I’ll have some idea whether ANY speed up is possible without spending 3 days wading through large amounts of app code to only find it doesn’t help.