Second, I’m very confused by all of this, and have some comments,
below.
I have been able to reproduce and “fix” your issue, but I don’t really
understand it.
Also, a disclaimer: My particular pytorch-gpu setup requires the
old pytorch version 0.3.0. So part or all of my analysis might be
a red herring, and specific to 0.3.0.
First, for reasons I don’t understand, I had to explicitly move the
parameters in param_need_l1_penalty_case_2 to the gpu
(but not those in param_need_l1_penalty_case_1).
Second, when running on the gpu, I had to convert the “penalty loss”
to a python scalar before adding it to loss in order to get rid of your
specific error:
if scalarPenalty:
penalty = 0.1 * torch.norm (param, 1).data[0]
loss += penalty
else:
loss += 0.1 * torch.norm (param, 1)
Here is a complete, runnable pytorch-version-0.3.0 test program,
modelled after yours:
import torch
print (torch.__version__)
torch.manual_seed (0)
gpu = True
print ('gpu = ' + str (gpu))
scalarPenalty = True
print ('scalarPenalty = ' + str (scalarPenalty))
class mySubModule(torch.nn.Module):
def __init__(self, n):
super().__init__()
self.my_param = torch.nn.Parameter (torch.rand (n, 2), requires_grad=True)
self.register_buffer('mask_choice', torch.autograd.Variable (torch.Tensor([[1.], [0.]])))
def forward(self, x):
out = torch.matmul(
torch.matmul(x, self.my_param),
self.mask_choice)
return out
class myModule (torch.nn.Module):
def __init__(self, n):
super().__init__()
self.subModule = mySubModule(n)
self.linear = torch.nn.Linear(n, 1)
self.param_need_l1_penalty_case_1 = [self.subModule.my_param]
#### Here's where Error happens
self.param_need_l1_penalty_case_2 = [self.subModule.my_param[:,0]]
####
def forward(self, x):
return self.linear(x) + self.subModule(x)
model = myModule(n=2)
my_data = torch.autograd.Variable (torch.ones (4,2))
label = torch.autograd.Variable (torch.ones (4), requires_grad = False)
if gpu:
model.cuda()
my_data = my_data.cuda()
label = label.cuda()
for i in range (len (model.param_need_l1_penalty_case_2)):
model.param_need_l1_penalty_case_2[i] = model.param_need_l1_penalty_case_2[i].cuda()
pred = model (my_data).squeeze(1)
criteria = torch.nn.MSELoss()
# loss = criteria (label, pred)
loss = criteria (pred, label) # probably an 0.3.0 requirement
print ('loss (pre-penalty) = ...\n', loss)
for param in model.param_need_l1_penalty_case_2:
if scalarPenalty:
penalty = 0.1 * torch.norm (param, 1).data[0]
loss += penalty
else:
loss += 0.1 * torch.norm (param, 1)
print ('loss (post-penalty) = ...\n', loss)
print ('calling loss.backward()...')
loss.backward()
Here is the output:
0.3.0b0+591e73e
gpu = True
scalarPenalty = True
loss (pre-penalty) = ...
Variable containing:
0.2621
[torch.cuda.FloatTensor of size 1 (GPU 0)]
loss (post-penalty) = ...
Variable containing:
0.3206
[torch.cuda.FloatTensor of size 1 (GPU 0)]
calling loss.backward()...
When I turn off the scalar-penalty modification:
scalarPenalty = False
I get what appears to be your error:
0.3.0b0+591e73e
gpu = True
scalarPenalty = False
loss (pre-penalty) = ...
Variable containing:
0.2621
[torch.cuda.FloatTensor of size 1 (GPU 0)]
loss (post-penalty = ...
Variable containing:
0.3206
[torch.cuda.FloatTensor of size 1 (GPU 0)]
calling loss.backward()...
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<string>", line 66, in <module>
File "C:\Users\LisaBrown\Documents\admin\programs\Miniconda3\lib\site-packages\torch\autograd\variable.py", line 167, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph, retain_variables)
File "C:\Users\LisaBrown\Documents\admin\programs\Miniconda3\lib\site-packages\torch\autograd\__init__.py", line 99, in backward
variables, grad_variables, retain_graph)
RuntimeError: Expected object of type torch.FloatTensor but found type torch.cuda.FloatTensor for argument #3 'other'
(line 66, in <module> is loss.backward().)
So, something fishy is definitely going on here. (Whether pytorch 0.3.0
and pytorch 1.x.x share the same fishiness, I don’t know.)
As practical matter, it sounds like you have your program working.
But, for the greater good, it would be nice to see if my results are
reproducible on an up-to-date version of pytorch.
And if any experts have an idea of what is going on under the hood,
please chime in.
Take this “debugging” with a grain of salt, but that would be my best guess: param_need_l1_penalty_case_1 was defined as an nn.Parameter and just wrapped in a list.
Iterating this list will yield these parameters, which were properly pushed to the device by calling model.to('cuda'), since they were also properly registered inside the module.
However, an operation on param_need_l1_penalty_case_2 was executed before storing them in the list (the slicing op). The result of this operation will be a tensor with a grad_fn (SelectBackward in this case).
Since this operation was performed inside the __init__ function of the module, and thus before pushing the parameters to the device, the slice will stay on CPU and you would need to push it manually to the device afterwards.
I doubt this will yield the desired behavior, since you are detaching the tensor with the .data call.
I would consider the usage of .data dangerous, as you might get rid of error messages, but in fact create unwanted behavior.
Is the code working without the .data call and by manually using cuda() on the slice?
If so, I would stick to this solution.
That was my thinking as well. Anyway, explicitly moving the list
elements did work (for me).
A couple of odd things: According to Fly’s original post, he apparently didn’t see this error. For me, this error occurred before calling loss.backward(), and was, understandably, of the form expected type torch.cuda.FloatTensor (instead of the
other way around).
Also I think (I don’t remember everything I tried) that it involved
the list in an essential way. I think I tried getting rid of the list:
and the (pre-loss.backward()) error went away, even though I was
slicing before moving the model (and hence my_param to the gpu.
(Maybe model.cuda() knows to move Tensor variables of the
model, but not lists. And since Fly didn’t see it, maybe it’s an 0.3.0
thing.)
I’m sure you’re right about this – I should have realized. The forward
calculation works, but (presumably) the backward calculation will fail
to include the penalty-loss gradient in its overall gradient (defeating
the purpose).
No, two different errors:
Explicitly moving the slice gets rid of the reasonably-understandable expected type torch.cuda.FloatTensor error.
The error I tried to address with my incorrect modification was the expected type torch.FloatTensor that Fly originally reported,
and that I was able to reproduce.
To me it’s baffling. I probed all of the tensors in sight to make sure
that they were torch.cuda.FloatTensors, so I couldn’t figure out
where (in the bowels of loss.backward()) somebody was expecting
a torch.FloatTensor. And it seems like this isn’t just 0.3.0 weirdness
because Fly sees it with whatever his version is, as well.