Learning a binary matrix

learner47 · January 10, 2021, 9:03pm

Hi,

I am currently working on a code in which I need a matrix (I am currently using nn.Parameter) to be trainable. However I need the said matrix to have entries as either 0 or 1. How can I impose a constraint for the weight updates to happen in such a way that 0 is flipped to 1 or the other way around, based on if the back propagation requires the weight to be increased or decreased? It is alright if the solution uses something other than nn.Parameter.

PS: I have tried nn.Linear layers, nn.Conv1d layers as well. I can’t get the matrix/kernel to only have zeros and ones.

Any help is appreciated.

CedricLy · January 10, 2021, 9:32pm

Well on top of my head I guess the closest thing would be to use an activation function like sigmoid. Maybe one which is not as soft as sigmoid.
If you want to have 0 and 1 as acceptable values and the parameter shall remain trrainable, than this question is very tricky.
The backpropagation only works for float variables. If all values are 0 and 1 than you cannot calculate a definite derivative.

I hope I am wrong and someone give a solution here, because this could solve my reverse embedding layer problem.

learner47 · January 10, 2021, 9:36pm

Hi @CedricLy. I have tried using Sigmoid. I have gone, one step further, to make it Sigmoid(50*input) to have the thresholding done. But this messes up my training. The weight updates do not seem to happen and my network is stuck in the state it is in.

The reason I need the matrix to strictly have 0s and 1s is because I am working in a binary field (where the available digits are 0 and 1 only).

Thank you for your time!

CedricLy · January 10, 2021, 9:49pm

Your treshold Idea is good, but if you change the input this won’t change much, because the weights of the layer will adapt.

caonv · January 11, 2021, 1:51am

I don’t know why the matrix values should be binary. You can apply a sequence of sigmoid -> minus 0.9 -> relu to obtain hard-binarized values in a learnable way, but the problem is that your model may not converge.

Any way, to follow the above process, I recommend you to register a tensor as parameters as follows:

class BinaryLayer(nn.Module):
    def __init__(self, in_channels, out_channels): 
        super(BinaryLayer, self).__init__() 
        self.weight = nn.Parameter(torch.randn(in_channels, out_channels), requires_grad=True)  #
    def forward(self, x): 
        bin_weight = torch.nn.functional.relu(torch.sigmoi(self.weight) - 0.9)
        y = torch.mm(x, bin_weight)
        return y

The maybe some error in the code snip, but the flow is clear. After the training is complete, you can get a binary matrix in the same way the bin_weight is computed above.

learner47 · January 11, 2021, 5:49am

Hi @caonv, thank you for your reply.

I have a doubt. torch.sigmoid ranges between (0, 1) and the operation torch.sigmoid(self.weight) - 0.9 brings the range to (-0.9, 0.1). When relu is introduced, the range is now (0, 0.1). The threshold on 0 side is perfectly fine, but what about the threshold on 1? Am I missing something?

caonv · January 11, 2021, 6:09am

Actually that is a kind of thresholding with a threshold of 0.9, and 0.9 is very close to 1. I have just found torch.nn.Hardsigmoid which can give you 0-1 in some case but not all the values.