I am trying to maximize a real function witt n arguments:
f : \mathbb R ^n \mapsto R
f(X) = \frac 1 n \sum ^n f_i(X_i), X_i \in \mathbb R
However, I want to maximize it in a batch-spirit, meaning that I want to maximize the batch_size first components, then the other batch_size…
Basically, I want to do this :
import torch X = 100+torch.arange(30, dtype = torch.float32) X.requires_grad_(True) def get_batches(X): for i in range(3): yield (X[i*10:(i+1)*10]) optim = torch.optim.Rprop([X], lr = 0.1) for j in range(3): for batch in get_batches(X): l = torch.norm(batch) l.backward() optim.step() optim.zero_grad()
Here, I want to set all the components of X to 0, but I want to do so in a batch-manner, meaning that I want to take the batch_size first components of X, doing a gradient ascent only on those first components, then taking the batch_size others, doing a gradient ascent only on those etc.
rprop is storing step sizes for the n components. For each batch, I want rprop to update the step sizes only on the right components, i.e. the batch_size components I take. Here I am doing so, but I think Rprop does not update the step sizes correctly. Is there a way to tell him which step sizes to update ?
Basically, I want that the script above does the same as the script below :
X = 100+torch.arange(30, dtype = torch.float32) X.requires_grad_(True) optim = torch.optim.Rprop([X], lr = 0.1) for j in range(3): l = torch.norm(X) l.backward() optim.step() optim.zero_grad()
If you print X, you can see that the non-batch gradient ascent is faster, since rprop is updating correctly the step sizes.
I have already tried to set requires_gradient = False to the right components, but it throw me an error :
local variable 'step_size_min' referenced before assignment
Thanks for your answer !