Hello everyone,
I am trying to maximize a real function witt n arguments:
$$
f : \mathbb R ^n \mapsto R
f(X) = \frac 1 n \sum ^n f_i(X_i), X_i \in \mathbb R
$$
However, I want to maximize it in a batch-spirit, meaning that I want to maximize the batch_size first components, then the other batch_size…
Basically, I want to do this :
import torch
X = 100+torch.arange(30, dtype = torch.float32)
X.requires_grad_(True)
def get_batches(X):
for i in range(3):
yield (X[i*10:(i+1)*10])
optim = torch.optim.Rprop([X], lr = 0.1)
for j in range(3):
for batch in get_batches(X):
l = torch.norm(batch)
l.backward()
optim.step()
optim.zero_grad()
Here, I want to set all the components of X to 0, but I want to do so in a batch-manner, meaning that I want to take the batch_size first components of X, doing a gradient ascent only on those first components, then taking the batch_size others, doing a gradient ascent only on those etc.
rprop is storing step sizes for the n components. For each batch, I want rprop to update the step sizes only on the right components, i.e. the batch_size components I take. Here I am doing so, but I think Rprop does not update the step sizes correctly. Is there a way to tell him which step sizes to update ?
Basically, I want that the script above does the same as the script below :
X = 100+torch.arange(30, dtype = torch.float32)
X.requires_grad_(True)
optim = torch.optim.Rprop([X], lr = 0.1)
for j in range(3):
l = torch.norm(X)
l.backward()
optim.step()
optim.zero_grad()
If you print X, you can see that the non-batch gradient ascent is faster, since rprop is updating correctly the step sizes.
I have already tried to set requires_gradient = False to the right components, but it throw me an error :
local variable 'step_size_min' referenced before assignment
Thanks for your answer !