Theano library function equivalent in PyTorch

ahmadreza9 · June 6, 2020, 9:18am

Hello,
If you see (http://deeplearning.net/software/theano/library/compile/function.html) Theano can create functions. I absolutely sure that PyTorch also can have something like that but how we can equivalent the Update parameter of Theano function in PyTorch? (I saw this Porting code from Thano/Lasagne to PyTorch but they didn’t talk about the update)

Let’ s have an example:

def rmsprop_updates(grads, params, stepsize, rho=0.9, epsilon=1e-9):

    updates = []

    for param, grad in zip(params, grads):
        accum = theano.shared(np.zeros(param.get_value(borrow=True).shape, dtype=param.dtype))
        accum_new = rho * accum + (1 - rho) * grad ** 2
        updates.append((accum, accum_new))
        updates.append((param, param + (stepsize * grad / T.sqrt(accum_new + epsilon))))
        # lasagne has '-' after param
    return updates

updates = rmsprop_updates( grads, params, self.lr_rate, self.rms_rho, self.rms_eps)

N = states.shape[0]
loss = T.log(prob_act[T.arange(N), actions]).dot(values) / N

self._train_fn = theano.function([states, actions, values], loss, updates=updates)

iffiX · June 7, 2020, 5:11am

Instead of compiling a function from a string, the programming paradigm used by pytorch is using modules as building blocks to create a function, there is only one important thing in pytorch: tensor, tensor is both the container for data, the interface for low-level device handling, and the gradient flow component. Therefore, instead of defining a graph, you construct a graph dynamically using normal calculus operators like +, -, *, /

e.g.:

f = theano.function([x], 2*x)

is equivuivalent to:

# suppose you have some tensor ``x``
# or create it
x = torch.zeros([100,100], device="cuda:0", dtype=torch.int)
def f(input):
    return input * 2
print(f(x))

Moreover, since pytorch is dynamic, sometimes users may require a just-in-time compilation utility to remove the tensor construction cost in python, you can do that by using torch.jit

ahmadreza9 · June 7, 2020, 9:52am

I know paradigm and your simple example which is explained in the first hyperlink of my comment. I want to create custom optimizer like this (Custom Optimizer in PyTorch) so I should update weights like this:

weight_update = smth_with_good_dimensions
param.data.sub_(weight_update * learning_rate)

Now, how can we have a function in PyTorch like theano.function for my rms_prop update?

Nikronic · June 7, 2020, 10:11am

Hi,

As @iffiX mentioned, we do not compile graph in PyTorch, it will be constructed during data flow in forward pass. So, for an optimizer, you just define a class that accepts parameters of a model = nn.Module and implements step function for it. Literally, based on your code, you need to only remove last line self._train_fn = ... as there is no compile stage in PyTorch.
Here is the SGD implementation:
https://pytorch.org/docs/stable/_modules/torch/optim/sgd.html#SGD

Bests

iffiX · June 7, 2020, 10:12am

From the documentation of theano, I guess you are askinig for manual control over gradient updates?
then:
For simple modules, use register_backward_hook
Or control gradients on each of your input directly using: register_hook

Optimizers are implemented in torch.optim, including RMSProp, Adam, SGD, etc.

iffiX · June 7, 2020, 10:14am

And indeed, if you want a completely new optimizer rather than control gradients by hand, you should inherit from torch.optim.Optimizer and implement the step method. (remember to wrap it with @torch.no_grad())

ahmadreza9 · June 8, 2020, 9:09am

Thanks @iffiX. This comment is the most relevant answer to my question. I will test register_backward_hook function to manually control the gradient of my network(grads).

N = states.shape[0]

loss = T.log(prob_act[T.arange(N), actions]).dot(values) / N  # call it "loss"

grads = T.grad(loss, params)