Instead of compiling a function from a string, the programming paradigm used by pytorch is using modules as building blocks to create a function, there is only one important thing in pytorch: tensor, tensor is both the container for data, the interface for low-level device handling, and the gradient flow component. Therefore, instead of defining a graph, you construct a graph dynamically using normal calculus operators like +, -, *, /
e.g.:
f = theano.function([x], 2*x)
is equivuivalent to:
# suppose you have some tensor ``x``
# or create it
x = torch.zeros([100,100], device="cuda:0", dtype=torch.int)
def f(input):
return input * 2
print(f(x))
Moreover, since pytorch is dynamic, sometimes users may require a just-in-time compilation utility to remove the tensor construction cost in python, you can do that by using torch.jit
I know paradigm and your simple example which is explained in the first hyperlink of my comment. I want to create custom optimizer like this (Custom Optimizer in PyTorch) so I should update weights like this:
As @iffiX mentioned, we do not compile graph in PyTorch, it will be constructed during data flow in forward pass. So, for an optimizer, you just define a class that accepts parameters of a model = nn.Module and implements step function for it. Literally, based on your code, you need to only remove last line self._train_fn = ... as there is no compile stage in PyTorch.
Here is the SGD implementation: https://pytorch.org/docs/stable/_modules/torch/optim/sgd.html#SGD
From the documentation of theano, I guess you are askinig for manual control over gradient updates?
then:
For simple modules, use register_backward_hook
Or control gradients on each of your input directly using: register_hook
Optimizers are implemented in torch.optim, including RMSProp, Adam, SGD, etc.
And indeed, if you want a completely new optimizer rather than control gradients by hand, you should inherit from torch.optim.Optimizer and implement the step method. (remember to wrap it with @torch.no_grad())
Thanks @iffiX. This comment is the most relevant answer to my question. I will test register_backward_hook function to manually control the gradient of my network(grads).
N = states.shape[0]
loss = T.log(prob_act[T.arange(N), actions]).dot(values) / N # call it "loss"
grads = T.grad(loss, params)