How to add customized l1/l2 penalty on parameter slice?

I’ve defined a submodule which contains parameters I want to penalize. To be specific, I want to penalize the 1st dim of this parameter only!

Here is a demo of my implementation. it works on cpu, while throws RuntimeError on ‘cuda’.

import torch
import numpy as np

class mySubModule(torch.nn.Module):
    def __init__(self, n):
        super().__init__()
        self.my_param = torch.nn.Parameter(
            torch.empty(n, 2).uniform_(0.0, 1.0), requires_grad=True)
        self.register_buffer('mask_choice', torch.tensor([[1.], [0.]]))

    def forward(self, x):
        out = torch.matmul(
            torch.matmul(x, self.my_param),
            self.mask_choice)
        return out

class myModule(torch.nn.Module):
    def __init__(self, n):
        super().__init__()
        self.subModule = mySubModule(n)
        self.linear = torch.nn.Linear(n, 1)

        self.param_need_l1_penalty_case_1 = [self.subModule.my_param]
        #### Here's where Error happens
        self.param_need_l1_penalty_case_2 = [self.subModule.my_param[:,0]]
        ####

    def forward(self, x):
        return self.linear(x) + self.subModule(x)

a tipical useage during training looks like

# demo of trainer
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
np.random.seed(999)

model = myModule(n=2)
model.to(device)

my_data = torch.from_numpy(np.random.random((4,2))).to(torch.float).to(device)
label = torch.from_numpy(np.ones((4,))).to(torch.float).to(device)

pred = model(my_data).squeeze(1)
criteria = torch.nn.MSELoss()
loss = criteria(label, pred)
# add penalty
for param in model.param_need_l1_penalty_case_2:
    loss += 0.1 * torch.norm(param, 1)
print(loss)
loss.backward()

When cuda is available, it raises error saying

RuntimeError: Function AddBackward0 returned an invalid gradient at index 1 - expected type torch.FloatTensor but got torch.cuda.FloatTensor

Any suggestion on how to add l1/l2 penalization in this scenario?

Appreciated~

I’ve found a solution to avoid it, making param_need_l1_penalty_case_2 a property of my class.

    @property
    def param_need_l1_penalty_case_2(self):
        return [self.subModule.my_param[:,0]]

Then this is called after mode.to(device).
and the trainer code do not need modification.

Hi Fly!

First, I’m glad you found a solution.

Second, I’m very confused by all of this, and have some comments,
below.

I have been able to reproduce and “fix” your issue, but I don’t really
understand it.

Also, a disclaimer: My particular pytorch-gpu setup requires the
old pytorch version 0.3.0. So part or all of my analysis might be
a red herring, and specific to 0.3.0.

First, for reasons I don’t understand, I had to explicitly move the
parameters in param_need_l1_penalty_case_2 to the gpu
(but not those in param_need_l1_penalty_case_1).

Second, when running on the gpu, I had to convert the “penalty loss”
to a python scalar before adding it to loss in order to get rid of your
specific error:

    if  scalarPenalty:
        penalty = 0.1 * torch.norm (param, 1).data[0]
        loss += penalty
    else:            
        loss += 0.1 * torch.norm (param, 1)

Here is a complete, runnable pytorch-version-0.3.0 test program,
modelled after yours:

import torch
print (torch.__version__)

torch.manual_seed (0)

gpu = True
print  ('gpu = ' + str (gpu))

scalarPenalty = True
print  ('scalarPenalty = ' + str (scalarPenalty))

class mySubModule(torch.nn.Module):
    def __init__(self, n):
        super().__init__()
        self.my_param = torch.nn.Parameter (torch.rand (n, 2), requires_grad=True)
        self.register_buffer('mask_choice', torch.autograd.Variable (torch.Tensor([[1.], [0.]])))

    def forward(self, x):
        out = torch.matmul(
            torch.matmul(x, self.my_param),
            self.mask_choice)
        return out

class myModule (torch.nn.Module):
    def __init__(self, n):
        super().__init__()
        self.subModule = mySubModule(n)
        self.linear = torch.nn.Linear(n, 1)

        self.param_need_l1_penalty_case_1 = [self.subModule.my_param]
        #### Here's where Error happens
        self.param_need_l1_penalty_case_2 = [self.subModule.my_param[:,0]]
        ####

    def forward(self, x):
        return self.linear(x) + self.subModule(x)

model = myModule(n=2)

my_data = torch.autograd.Variable (torch.ones (4,2))
label = torch.autograd.Variable (torch.ones (4), requires_grad = False)

if  gpu:
    model.cuda()
    my_data = my_data.cuda()
    label = label.cuda()
    for  i in range (len (model.param_need_l1_penalty_case_2)):
        model.param_need_l1_penalty_case_2[i] = model.param_need_l1_penalty_case_2[i].cuda()


pred = model (my_data).squeeze(1)
criteria = torch.nn.MSELoss()
# loss = criteria (label, pred)
loss = criteria (pred, label)  # probably an 0.3.0 requirement

print ('loss (pre-penalty) = ...\n', loss)
for param in model.param_need_l1_penalty_case_2:
    if  scalarPenalty:
        penalty = 0.1 * torch.norm (param, 1).data[0]
        loss += penalty
    else:            
        loss += 0.1 * torch.norm (param, 1)
print ('loss (post-penalty) = ...\n', loss)

print ('calling loss.backward()...')
loss.backward()

Here is the output:

0.3.0b0+591e73e
gpu = True
scalarPenalty = True
loss (pre-penalty) = ...
 Variable containing:
 0.2621
[torch.cuda.FloatTensor of size 1 (GPU 0)]

loss (post-penalty) = ...
 Variable containing:
 0.3206
[torch.cuda.FloatTensor of size 1 (GPU 0)]

calling loss.backward()...

When I turn off the scalar-penalty modification:

scalarPenalty = False

I get what appears to be your error:

0.3.0b0+591e73e
gpu = True
scalarPenalty = False
loss (pre-penalty) = ...
 Variable containing:
 0.2621
[torch.cuda.FloatTensor of size 1 (GPU 0)]

loss (post-penalty = ...
 Variable containing:
 0.3206
[torch.cuda.FloatTensor of size 1 (GPU 0)]

calling loss.backward()...
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<string>", line 66, in <module>
  File "C:\Users\LisaBrown\Documents\admin\programs\Miniconda3\lib\site-packages\torch\autograd\variable.py", line 167, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph, retain_variables)
  File "C:\Users\LisaBrown\Documents\admin\programs\Miniconda3\lib\site-packages\torch\autograd\__init__.py", line 99, in backward
    variables, grad_variables, retain_graph)
RuntimeError: Expected object of type torch.FloatTensor but found type torch.cuda.FloatTensor for argument #3 'other'

(line 66, in <module> is loss.backward().)

So, something fishy is definitely going on here. (Whether pytorch 0.3.0
and pytorch 1.x.x share the same fishiness, I don’t know.)

As practical matter, it sounds like you have your program working.
But, for the greater good, it would be nice to see if my results are
reproducible on an up-to-date version of pytorch.

And if any experts have an idea of what is going on under the hood,
please chime in.

Best.

K. Frank

Take this “debugging” with a grain of salt, but that would be my best guess:
param_need_l1_penalty_case_1 was defined as an nn.Parameter and just wrapped in a list.
Iterating this list will yield these parameters, which were properly pushed to the device by calling model.to('cuda'), since they were also properly registered inside the module.
However, an operation on param_need_l1_penalty_case_2 was executed before storing them in the list (the slicing op). The result of this operation will be a tensor with a grad_fn (SelectBackward in this case).
Since this operation was performed inside the __init__ function of the module, and thus before pushing the parameters to the device, the slice will stay on CPU and you would need to push it manually to the device afterwards.

I doubt this will yield the desired behavior, since you are detaching the tensor with the .data call.
I would consider the usage of .data dangerous, as you might get rid of error messages, but in fact create unwanted behavior.
Is the code working without the .data call and by manually using cuda() on the slice?
If so, I would stick to this solution. :slight_smile:

Hi @ptrblck!

That was my thinking as well. Anyway, explicitly moving the list
elements did work (for me).

A couple of odd things: According to Fly’s original post, he apparently
didn’t see this error. For me, this error occurred before calling
loss.backward(), and was, understandably, of the form
expected type torch.cuda.FloatTensor (instead of the
other way around).

Also I think (I don’t remember everything I tried) that it involved
the list in an essential way. I think I tried getting rid of the list:

self.param_need_l1_penalty_case_2 = self.subModule.my_param[:,0]

and the (pre-loss.backward()) error went away, even though I was
slicing before moving the model (and hence my_param to the gpu.
(Maybe model.cuda() knows to move Tensor variables of the
model, but not lists. And since Fly didn’t see it, maybe it’s an 0.3.0
thing.)

I’m sure you’re right about this – I should have realized. The forward
calculation works, but (presumably) the backward calculation will fail
to include the penalty-loss gradient in its overall gradient (defeating
the purpose).

No, two different errors:

Explicitly moving the slice gets rid of the reasonably-understandable
expected type torch.cuda.FloatTensor error.

The error I tried to address with my incorrect modification was the
expected type torch.FloatTensor that Fly originally reported,
and that I was able to reproduce.

To me it’s baffling. I probed all of the tensors in sight to make sure
that they were torch.cuda.FloatTensors, so I couldn’t figure out
where (in the bowels of loss.backward()) somebody was expecting
a torch.FloatTensor. And it seems like this isn’t just 0.3.0 weirdness
because Fly sees it with whatever his version is, as well.

Remaining baffled,

K. Frank