Masking optimizers for sparse input data?

My problem relates to a masked forward pass where I ignore zeros in the input. I believe what’s happening is related to the optimizer (torch.optim.Adam below) not recording any parameter updates in the case of masking.

If I have a simple layer that has an option to ignore zeros in the input:

import torch
import torch.nn as nn
import torch.optim as onn
import torch.autograd as ann
import torch.nn.functional as fnn

torch.manual_seed(123)

class SimpleLayer(nn.Module):

    def __init__(self, size_in, size_out, ignore_zero=True):
        super(SimpleLayer, self).__init__()
        self.weight = nn.Parameter(
            torch.randn(size_in, size_out) * 1e-5,
            requires_grad=True
        )
        self.ignore_zero = ignore_zero

    def forward(self, input_var):
        if self.ignore_zero:
            nz_inds = input_var.data.nonzero()[:, 1]
            return input_var[:, nz_inds].mm(self.weight[nz_inds])
        else:
            return input_var.mm(self.weight)

And I create a training stack and some sparse data:

layer = SimpleLayer(10, 5, ignore_zero=True)
loss_func = fnn.smooth_l1_loss
optimizer = onn.Adam(layer.parameters())

sparse_input = torch.zeros(1, 10)
sparse_input[0][2] = 0.2
sparse_input[0][5] = 0.3
sparse_input = ann.Variable(sparse_input)

The output is identical whether ‘ignore_zero’ is set or not:

layer.ignore_zero = True
print layer.forward((sparse_input))

layer.ignore_zero = False
print layer.forward((sparse_input))

Outputs:

Variable containing:
1.00000e-06 *
 -2.2359  5.9174  3.7352 -3.4771  1.3588
[torch.FloatTensor of size 1x5]

Variable containing:
1.00000e-06 *
 -2.2359  5.9174  3.7352 -3.4771  1.3588
[torch.FloatTensor of size 1x5]

On the other hand, results start to diverge after some training steps:

layer.ignore_zero = False
print 'ignore_zero False:'
for i in range(5):
    outp = layer.forward((sparse_input))
    loss = fnn.smooth_l1_loss(outp, ann.Variable(torch.randn(1, 5)))
    loss.backward()
    optimizer.step()
    print loss.data[0]

Gives:

ignore_zero False:
0.297815024853
0.872213542461
0.316926777363
0.0565339252353
0.746583342552
layer.ignore_zero = True
print 'ignore_zero True:'
for i in range(5):
    outp = layer.forward((sparse_input))
    loss = fnn.smooth_l1_loss(outp, ann.Variable(torch.randn(1, 5)))
    loss.backward()
    optimizer.step()
    print loss.data[0]

Gives:

ignore_zero True:
0.297815024853
0.871760487556
0.316960245371
0.056279104203
0.747062385082

The parameters in optimizer.param_groups do not update at all in the layer.ignore_zero = True case.

Is there some way to get the optimizer to agree with the masking step in the module’s forward pass?

Thanks!

I found a solution and will leave it here for other users. First I updated PyTorch. Then I realized that Numpy-style indexing is not allowed in the latest release. Instead of Numpy indexing, I used .index_select. Like this the optimizer parameters update when .step() is called.

On a side note, Numpy style indexing is a super-cool feature, hopefully on the horizon :slight_smile:

class SimpleLayer(nn.Module):

    def __init__(self, size_in, size_out, ignore_zeros=True):
        super(SimpleLayer, self).__init__()
        self.weight = nn.Parameter(
            torch.randn(size_in, size_out) * 1e-5,
            requires_grad=True
        )
        self.ignore_zeros = ignore_zeros

    def forward(self, input_var):
        if self.ignore_zeros:
            nz_inds = ann.Variable(input_var.data.nonzero()[:, 1])
            inp_nz = input_var.index_select(1, nz_inds)
            weight_nz = self.weight.index_select(0, nz_inds)
            out = inp_nz.mm(weight_nz)
            return out
        else:
            return input_var.mm(self.weight)
3 Likes

Yes, we’re definitely planning to add it. Good to hear your problem is fixed in the newer version

2 Likes