Create a linear layer with binary weights

AjayTalati · April 8, 2017, 1:26pm

Erm, I’m probably being very dumb, but I can’t see a simple way of doing this, without defining a new layer class?

That is, what is the simplest way to constrain the weights of a linear layer to [0,1] or [-1,1] - I guess either will work?

pranav · April 8, 2017, 2:23pm

You can clip the weights after the optimizer update each time or you can over-ride the forward call of Linear layer to do it before multiplying with input. If you are doing the latter, you could also apply tanh or sigmoid to the weights before doing the dot product.

Smooth ways of achieving this would be to simply include L1 or L2 penalty on the weights.

This is a very specific requirement, so I think there won’t be a basic way to do this.

AjayTalati · April 8, 2017, 2:43pm

Hi @pranav thank you very much

I’m experimenting with different ways to implement something like, XNOR net layers,

http://allenai.org/plato/xnornet/

It seems to me now, that there are really a few different plausible ways to do this? So I just wanted to try a few things very quickly, before actually doing lots of experiments using hard coded layers. My aims is to have a graph composed of mostly binary layers, which gives fast low cost inference.

Here’s the Torch modules in case you’re interested?

https://github.com/allenai/XNOR-Net/blob/master/newLayers/BinActiveZ.lua

Cheers,

Aj

tom · April 8, 2017, 7:25pm

Hi @AjayTalati ,
Cool!
If you want a FloatTensor, you could use

torch.rand(1000).round()

if you want a ByteTensor, this would work

(torch.rand(1000)>=0.5)

Best regards

Thomas

AjayTalati · April 8, 2017, 8:59pm

Hey @tom, some snippets to initialise weights and convert a real valued data_vec to -1 or 1 as they use in the paper above

a) Randomly Initialize weights as -1 or 1

weights = np.random.randint(2, size=10) weights = 2*weights weights = weights-1

b) convert data vectors to -1 or 1

data_vec = torch.randn(out_features, in_features) weights_ge0 = torch.ge(data_vec ,0) # greater than zero weights_le0 = torch.le(data_vec ,0) # less than zero weights_le0xmin1 = torch.mul(weights_le0.float(),-1) # flip 1 to -1 data_vec_for_xnor_layers = weights_ge0.float() + weights_le0xmin1.float()

Cheers,

Aj

tom · April 9, 2017, 2:35pm

Hi @AjayTalati

looking at the article and code I think the trick with the weights during training is to change the weights before the forward pass. This is done before the forward pass (line 1-6 in Algorithm 1 in the article or https://github.com/allenai/XNOR-Net/blob/master/util.lua#L111-L122 in the code).
The linear/convolution layers themselves are left unchanged as far as I can see.
The new binarization layer is later used for the inputs.

Best regards

Thomas