I am trying to implement an algorithm for a classification task that initializes and update the weights stochastically. More specifically, the weights are initialized and updated as follows:
w = C.mul(z) + mu
mu: D x K mean vector
C: D x K scale vector
z: D x K vector sampled from a standard normal distribution
I have a simple net which is
class Net(nn.Module): def __init__(self, inputfeatures, outputfeatures): super(Net, self).__init__() self.inputfeatures = inputfeatures self.outputfeatures = outputfeatures self.C = nn.Parameter(torch.FloatTensor(self.outputfeatures, self.inputfeatures)) self.mu = nn.Parameter(torch.FloatTensor(self.outputfeatures, self.inputfeatures)) self.weight = nn.Parameter(torch.FloatTensor(self.outputfeatures, self.inputfeatures)) self.linear = nn.Linear(self.inputfeatures, self.outputfeatures) self.softmax = nn.Softmax() self.criterion = nn.CrossEntropyLoss() def forward(self, t, X, mu, C): z = Variable(torch.FloatTensor(np.random.standard_normal(size=(self.outputfeatures, self.inputfeatures))), requires_grad=False) self.weight = C.mul(z) + mu y = self.linear(X) s = self.softmax(y) loss = self.criterion(y, t) return loss, s
And the update of the parameters happens as follows
C2 = C.mul(C) Cmu = C2 + torch.pow(mu, 2) """Stochastic gradient update of the parameters""" dmu = dg - torch.div(mu, Cmu) dC = (dg.mul(z)) + 1/C - torch.div(C, Cmu) mu = mu + ro*dmu C = C + (0.1*ro)*dC C[C <= 1e-4] = 1e-4
where dg is the gradient result of the backpropagation which I wish to recieve from the autograd and ro is the learning rate.
Is there a way to implement this custom update of the parameters? I am also not quite sure about the initialization either.
Appreciate all the help you can give me.