I do not want a fixed lambda in the Softshrink function. I want the network to learn this parameter lambda. Do I need to write a new function? Can anybody help me with it? Thanks so much!
You have to write a new Softshrink function that takes lambda as a variable. Example:
import torch
from torch.autograd import Variable
import torch.nn as nn
def softshrink(x, lambd):
mask1 = x > lambd
mask2 = x < -lambd
out = torch.zeros_like(x)
out += mask1.float() * -lambd + mask1.float() * x
out += mask2.float() * lambd + mask2.float() * x
return out
x = Variable(torch.randn(2,2,2), requires_grad=True)
l = Variable(torch.Tensor([0.5]), requires_grad=True)
out = softshrink(x, l)
# do things to out
y = sum(sum(sum(out)))
y.backward()
x.grad # exists
l.grad # also exists
Thanks so much! I really appreciate it.
Am I doing the right thing you teach me? Because when I try model.state_dict(), there is still no l in it. I think I am doing things wrong. How can I fix it?
# Learnable softshrink func
def softshrink(x, lambd):
mask1 = x > lambd
mask2 = x < -lambd
out = torch.zeros_like(x)
out += mask1.float() * -lambd + mask1.float() * x
out += mask2.float() * lambd + mask2.float() * x
return out
# A Neural Network
class LISTA(torch.nn.Module):
def __init__(self, D_in, D_out):
super(LISTA, self).__init__()
self.We = torch.nn.Linear(D_in, D_out)
self.S = torch.nn.Linear(D_out, D_out)
self.l = Variable(torch.Tensor([0.5]), requires_grad=True)
def forward(self, x):
out = softshrink(self.We(x),self.l)
for i in range(20):
out = softshrink(self.We(x)+self.S(out),self.l)
out = nn.functional.sigmoid(out)
return out
I think the key is to use Parameter instead of Variable.
Then l
shows up in model.parameters()
(but not in model.state_dict(), but I’m not sure it’s supposed to).
>>> import torch
>>> from torch.autograd import Variable
>>> from torch.nn import Parameter
>>>
>>> # Learnable softshrink func
... def softshrink(x, lambd):
... mask1 = x > lambd
... mask2 = x < -lambd
... out = torch.zeros_like(x)
... out += mask1.float() * -lambd + mask1.float() * x
... out += mask2.float() * lambd + mask2.float() * x
... return out
...
>>> # A Neural Network
... class LISTA(torch.nn.Module):
... def __init__(self, D_in, D_out):
... super(LISTA, self).__init__()
... self.We = torch.nn.Linear(D_in, D_out)
... self.S = torch.nn.Linear(D_out, D_out)
... self.l = Parameter(torch.Tensor([0.5]))
... def forward(self, x):
... out = softshrink(self.We(x),self.l)
... for i in range(20):
... out = softshrink(self.We(x)+self.S(out),self.l)
... out = nn.functional.sigmoid(out)
... return out
...
>>> model = LISTA(1, 1)
>>> list(model.parameters())
[Parameter containing:
0.5000
[torch.FloatTensor of size 1]
, Parameter containing:
0.6897
[torch.FloatTensor of size 1x1]
, Parameter containing:
-0.3733
[torch.FloatTensor of size 1]
, Parameter containing:
0.4282
[torch.FloatTensor of size 1x1]
, Parameter containing:
-0.5218
[torch.FloatTensor of size 1]
]
It helps. I really appreciate it.
Just some follow-up. l appears in model.state_dict(), and I can save the model by model.save_state_dict().
Why do you add a sigmoid output to your LISTA model?
Just a different task. In LISTA, we do not need this sigmoid layer.
I am currently working on a variant of LISTA. Can I as what task you are working on using LISTA?
This version is not very memory efficient. I trained a LISTA using the following softshrink :
def softshrink(x, lambd):
return nn.functional.relu(x - lambd) - nn.functional.relu(-x - lambd)
it saved me 40% memory