Hello

I was reading this paper Learning explanations that are hard to vary and found the relative github repo. To keep it short, before updating the parameters `theta = theta - lr * final_grads` pytorch (cuda) computes by default the arithmetic mean of the gradients, whereas I want to compute the geometric mean or to apply a mask as shown in the code.

Is there a way to do this leveraging pytorch autograd + cuda without the need to write a custom training loop?

Code taken from the linked notebook

``````def opt(x, y, method, lr, weight_decay, n_iters, verbose=False):
thetas, iters, losses = [], [0], []
theta = torch.randn(5, requires_grad=True) * 0.1
thetas.append(theta.data.numpy())

loss = loss_fn(x, theta, y)
losses.append(loss.item())

for i in range(n_iters + 1):
lr *= 0.9995

loss_e = 0.
for e in range(x.shape[0]):
loss_e = loss_fn(x[e], theta, y[e])

if method == 'geom_mean':
** (1. / n_agreement_domains)
elif method == 'arithm_mean':
else:
raise ValueError()

theta = theta - lr * final_grads

# weight decay
theta -= weight_decay * lr * theta

if not i % (n_iters // 200):
thetas.append(theta.data.numpy())
iters.append(i)
loss = loss_fn(x, theta, y)
losses.append(loss.item())

if not i % (n_iters // 5):
print(".", end="")