My problem relates to a masked forward pass where I ignore zeros in the input. I believe what’s happening is related to the optimizer (torch.optim.Adam
below) not recording any parameter updates in the case of masking.
If I have a simple layer that has an option to ignore zeros in the input:
import torch
import torch.nn as nn
import torch.optim as onn
import torch.autograd as ann
import torch.nn.functional as fnn
torch.manual_seed(123)
class SimpleLayer(nn.Module):
def __init__(self, size_in, size_out, ignore_zero=True):
super(SimpleLayer, self).__init__()
self.weight = nn.Parameter(
torch.randn(size_in, size_out) * 1e-5,
requires_grad=True
)
self.ignore_zero = ignore_zero
def forward(self, input_var):
if self.ignore_zero:
nz_inds = input_var.data.nonzero()[:, 1]
return input_var[:, nz_inds].mm(self.weight[nz_inds])
else:
return input_var.mm(self.weight)
And I create a training stack and some sparse data:
layer = SimpleLayer(10, 5, ignore_zero=True)
loss_func = fnn.smooth_l1_loss
optimizer = onn.Adam(layer.parameters())
sparse_input = torch.zeros(1, 10)
sparse_input[0][2] = 0.2
sparse_input[0][5] = 0.3
sparse_input = ann.Variable(sparse_input)
The output is identical whether ‘ignore_zero’ is set or not:
layer.ignore_zero = True
print layer.forward((sparse_input))
layer.ignore_zero = False
print layer.forward((sparse_input))
Outputs:
Variable containing:
1.00000e-06 *
-2.2359 5.9174 3.7352 -3.4771 1.3588
[torch.FloatTensor of size 1x5]
Variable containing:
1.00000e-06 *
-2.2359 5.9174 3.7352 -3.4771 1.3588
[torch.FloatTensor of size 1x5]
On the other hand, results start to diverge after some training steps:
layer.ignore_zero = False
print 'ignore_zero False:'
for i in range(5):
outp = layer.forward((sparse_input))
loss = fnn.smooth_l1_loss(outp, ann.Variable(torch.randn(1, 5)))
loss.backward()
optimizer.step()
print loss.data[0]
Gives:
ignore_zero False:
0.297815024853
0.872213542461
0.316926777363
0.0565339252353
0.746583342552
layer.ignore_zero = True
print 'ignore_zero True:'
for i in range(5):
outp = layer.forward((sparse_input))
loss = fnn.smooth_l1_loss(outp, ann.Variable(torch.randn(1, 5)))
loss.backward()
optimizer.step()
print loss.data[0]
Gives:
ignore_zero True:
0.297815024853
0.871760487556
0.316960245371
0.056279104203
0.747062385082
The parameters in optimizer.param_groups
do not update at all in the layer.ignore_zero = True
case.
Is there some way to get the optimizer to agree with the masking step in the module’s forward pass?
Thanks!