Backward only for a batch, local variable 'step_size_min' referenced before assignment

bastien_batardiere · December 14, 2021, 10:07am

Hello everyone,

I am trying to maximize a real function witt n arguments:

$$
f : \mathbb R ^n \mapsto R
f(X) = \frac 1 n \sum ^n f_i(X_i), X_i \in \mathbb R
$$

However, I want to maximize it in a batch-spirit, meaning that I want to maximize the batch_size first components, then the other batch_size…

Basically, I want to do this :

import torch 
X =  100+torch.arange(30, dtype = torch.float32)
X.requires_grad_(True)

def get_batches(X): 
    for i in range(3):
        yield (X[i*10:(i+1)*10])

optim = torch.optim.Rprop([X], lr = 0.1)
for j in range(3): 
    for batch in get_batches(X):
        l = torch.norm(batch)
        l.backward()
        optim.step()
        optim.zero_grad()

Here, I want to set all the components of X to 0, but I want to do so in a batch-manner, meaning that I want to take the batch_size first components of X, doing a gradient ascent only on those first components, then taking the batch_size others, doing a gradient ascent only on those etc.
rprop is storing step sizes for the n components. For each batch, I want rprop to update the step sizes only on the right components, i.e. the batch_size components I take. Here I am doing so, but I think Rprop does not update the step sizes correctly. Is there a way to tell him which step sizes to update ?

Basically, I want that the script above does the same as the script below :

X =  100+torch.arange(30, dtype = torch.float32)
X.requires_grad_(True)
optim = torch.optim.Rprop([X], lr = 0.1)

for j in range(3): 
    l = torch.norm(X)
    l.backward()
    optim.step()
    optim.zero_grad()

If you print X, you can see that the non-batch gradient ascent is faster, since rprop is updating correctly the step sizes.
I have already tried to set requires_gradient = False to the right components, but it throw me an error :

local variable 'step_size_min' referenced before assignment

Thanks for your answer !