If you see (http://deeplearning.net/software/theano/library/compile/function.html) Theano can create functions. I absolutely sure that PyTorch also can have something like that but how we can equivalent the Update parameter of Theano function in PyTorch? (I saw this Porting code from Thano/Lasagne to PyTorch but they didn’t talk about the update)
Let’ s have an example:
def rmsprop_updates(grads, params, stepsize, rho=0.9, epsilon=1e-9):
updates = 
for param, grad in zip(params, grads):
accum = theano.shared(np.zeros(param.get_value(borrow=True).shape, dtype=param.dtype))
accum_new = rho * accum + (1 - rho) * grad ** 2
updates.append((param, param + (stepsize * grad / T.sqrt(accum_new + epsilon))))
# lasagne has '-' after param
updates = rmsprop_updates( grads, params, self.lr_rate, self.rms_rho, self.rms_eps)
N = states.shape
loss = T.log(prob_act[T.arange(N), actions]).dot(values) / N
self._train_fn = theano.function([states, actions, values], loss, updates=updates)
Instead of compiling a function from a string, the programming paradigm used by pytorch is using modules as building blocks to create a function, there is only one important thing in pytorch: tensor, tensor is both the container for data, the interface for low-level device handling, and the gradient flow component. Therefore, instead of defining a graph, you construct a graph dynamically using normal calculus operators like
+, -, *, /
f = theano.function([x], 2*x)
is equivuivalent to:
# suppose you have some tensor ``x``
# or create it
x = torch.zeros([100,100], device="cuda:0", dtype=torch.int)
return input * 2
Moreover, since pytorch is dynamic, sometimes users may require a just-in-time compilation utility to remove the tensor construction cost in python, you can do that by using
I know paradigm and your simple example which is explained in the first hyperlink of my comment. I want to create custom optimizer like this (Custom Optimizer in PyTorch) so I should update weights like this:
weight_update = smth_with_good_dimensions
param.data.sub_(weight_update * learning_rate)
Now, how can we have a function in PyTorch like theano.function for my rms_prop update?
As @iffiX mentioned, we do not compile graph in PyTorch, it will be constructed during data flow in forward pass. So, for an optimizer, you just define a class that accepts parameters of a
model = nn.Module and implements
step function for it. Literally, based on your code, you need to only remove last line
self._train_fn = ... as there is no compile stage in PyTorch.
Here is the SGD implementation:
From the documentation of theano, I guess you are askinig for manual control over gradient updates?
For simple modules, use
Or control gradients on each of your input directly using:
Optimizers are implemented in
torch.optim, including RMSProp, Adam, SGD, etc.
And indeed, if you want a completely new optimizer rather than control gradients by hand, you should inherit from
torch.optim.Optimizer and implement the
step method. (remember to wrap it with @torch.no_grad())
Thanks @iffiX. This comment is the most relevant answer to my question. I will test
register_backward_hook function to manually control the gradient of my network(grads).
N = states.shape
loss = T.log(prob_act[T.arange(N), actions]).dot(values) / N # call it "loss"
grads = T.grad(loss, params)