for param in net.parameters():
loss += weight_decay * 0.5 * torch.pow(param, 2).sum()

instead of the weight_decay option of the optimizer, depending on the weight_decay parameter,
i notice different updates after some epochs compared to using the latter method.
These differences ultimaltely lead to a higher variance when evaluating a model over multiple random seeds.
Where do these differences come from and how can i get rid of them?
Thanks in advance!

This will be more numerically stable and will have less floating point errors because it avoid doing the whole backward pass.

So the two are expected to be different but they should not lead to big change in convergence unless your net is big enough that the sum of the squared weights is big enough to have floating point precision problem.
Given what you say, this might actually be happening.
You can quickly check that by trying to do the accumulation in double: