Stochastic gradient update of parameters

Hello everyone,

I am trying to implement an algorithm for a classification task that initializes and update the weights stochastically. More specifically, the weights are initialized and updated as follows:

w = C.mul(z) + mu

mu: D x K mean vector
C: D x K scale vector
z: D x K vector sampled from a standard normal distribution

I have a simple net which is

class Net(nn.Module):
    def __init__(self, inputfeatures, outputfeatures):
        super(Net, self).__init__()
        self.inputfeatures = inputfeatures
        self.outputfeatures = outputfeatures
        self.C = nn.Parameter(torch.FloatTensor(self.outputfeatures, self.inputfeatures)) = nn.Parameter(torch.FloatTensor(self.outputfeatures, self.inputfeatures))
        self.weight = nn.Parameter(torch.FloatTensor(self.outputfeatures, self.inputfeatures))

        self.linear = nn.Linear(self.inputfeatures, self.outputfeatures)
        self.softmax = nn.Softmax()
        self.criterion = nn.CrossEntropyLoss()
def forward(self, t, X, mu, C):
        z = Variable(torch.FloatTensor(np.random.standard_normal(size=(self.outputfeatures, 
                           self.inputfeatures))), requires_grad=False)
        self.weight = C.mul(z) + mu
        y = self.linear(X)
        s = self.softmax(y)
        loss = self.criterion(y, t)

        return loss, s

And the update of the parameters happens as follows

        C2 = C.mul(C)
        Cmu = C2 + torch.pow(mu, 2)

        """Stochastic gradient update of the parameters"""
        dmu = dg - torch.div(mu, Cmu)
        dC = (dg.mul(z)) + 1/C - torch.div(C, Cmu)

        mu = mu + ro*dmu
        C = C + (0.1*ro)*dC
        C[C <= 1e-4] = 1e-4

where dg is the gradient result of the backpropagation which I wish to recieve from the autograd and ro is the learning rate.
Is there a way to implement this custom update of the parameters? I am also not quite sure about the initialization either.
Appreciate all the help you can give me.