Hello everyone,
I am trying to implement an algorithm for a classification task that initializes and update the weights stochastically. More specifically, the weights are initialized and updated as follows:
w = C.mul(z) + mu
where
mu: D x K mean vector
C: D x K scale vector
z: D x K vector sampled from a standard normal distribution
I have a simple net which is
class Net(nn.Module):
def __init__(self, inputfeatures, outputfeatures):
super(Net, self).__init__()
self.inputfeatures = inputfeatures
self.outputfeatures = outputfeatures
self.C = nn.Parameter(torch.FloatTensor(self.outputfeatures, self.inputfeatures))
self.mu = nn.Parameter(torch.FloatTensor(self.outputfeatures, self.inputfeatures))
self.weight = nn.Parameter(torch.FloatTensor(self.outputfeatures, self.inputfeatures))
self.linear = nn.Linear(self.inputfeatures, self.outputfeatures)
self.softmax = nn.Softmax()
self.criterion = nn.CrossEntropyLoss()
def forward(self, t, X, mu, C):
z = Variable(torch.FloatTensor(np.random.standard_normal(size=(self.outputfeatures,
self.inputfeatures))), requires_grad=False)
self.weight = C.mul(z) + mu
y = self.linear(X)
s = self.softmax(y)
loss = self.criterion(y, t)
return loss, s
And the update of the parameters happens as follows
C2 = C.mul(C)
Cmu = C2 + torch.pow(mu, 2)
"""Stochastic gradient update of the parameters"""
dmu = dg - torch.div(mu, Cmu)
dC = (dg.mul(z)) + 1/C - torch.div(C, Cmu)
mu = mu + ro*dmu
C = C + (0.1*ro)*dC
C[C <= 1e-4] = 1e-4
where dg is the gradient result of the backpropagation which I wish to recieve from the autograd and ro is the learning rate.
Is there a way to implement this custom update of the parameters? I am also not quite sure about the initialization either.
Appreciate all the help you can give me.
Cheers