Is it possible to just use arbitrary differentiable/supported functions to create other functions without having to implement their backward as described in examples?
One thing I really liked about TF is how you can just create an arbitrary compute graph of differentiable pieces. It’s not obvious how to do that here unless I’m missing something?
Lets say I want to quickly implement a GELU: y = 0.5 * x * (1 + tanh(sqrt(2 / pi) * (x + 0.044715 * x^3))))
Can I just do that using the differentiable tanh() and pow() ? Or will I have to create a special class and describe backward()?
You need to define backward if you’re implementing your own autograd.Function classes, not for your own Module classes. The difference is that the code in Module.forward operates on Variables using differentiable operations like F.tanh and other Modules, while you need to define a new autograd.Function subclass only if you want to define a totally new operation that can’t be written in terms of differentiable ops. It’s also helpful to define an autograd.Function rather than composing existing differentiable operations if the forwards or backwards passes would see a major performance benefit from custom C implementations.
Ultimately everything you use in a module is defined in terms of autograd.Functions (e.g. F.tanh implements forward and backward) but you rarely have to define one yourself.
^CProcess Process-1:
Traceback (most recent call last):
File "/home/rrr/anaconda/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap
Traceback (most recent call last):
File "main.py", line 121, in <module>
self.run()
File "/home/rrr/anaconda/lib/python2.7/multiprocessing/process.py", line 114, in run
self._target(*self._args, **self._kwargs)
File "/home/rrr/anaconda/lib/python2.7/site-packages/torch/utils/data/dataloader.py", line 26, in _worker_loop
output = net(input)
File "/home/rrr/anaconda/lib/python2.7/site-packages/torch/nn/modules/module.py", line 202, in __call__
r = index_queue.get()
File "/home/rrr/anaconda/lib/python2.7/multiprocessing/queues.py", line 378, in get
result = self.forward(*input, **kwargs)
File "main.py", line 79, in forward
x = gelu(self.fc1(x))
File "main.py", line 61, in gelu
return 0.5 * x * (1 + F.tanh(np.sqrt(2 / np.pi) * (x + 0.044715 * x*x*x)))
File "/home/rrr/anaconda/lib/python2.7/site-packages/torch/autograd/variable.py", line 818, in __iter__
return recv()
File "/home/rrr/anaconda/lib/python2.7/site-packages/torch/multiprocessing/queue.py", line 21, in recv
buf = self.recv_bytes()
KeyboardInterrupt
return iter(map(lambda i: self[i], range(self.size(0))))
File "/home/rrr/anaconda/lib/python2.7/site-packages/torch/autograd/variable.py", line 818, in <lambda>
return iter(map(lambda i: self[i], range(self.size(0))))
File "/home/rrr/anaconda/lib/python2.7/site-packages/torch/autograd/variable.py", line 68, in __getitem__
return Index(key)(self)
File "/home/rrr/anaconda/lib/python2.7/site-packages/torch/autograd/_functions/tensor.py", line 16, in forward
result = i.index(self.index)
KeyboardInterrupt
I think it’s an unrelated problem. You sent a tensor from one process to another, but before the receiver has managed to take it out of the queue, the sender has already died. You either have to ensure that the sender is alive as long as it’s tensors are in a queue, or you have to switch to a file_system sharing strategy (not really recommended).
I looked into the snippet and it seems that numpy is trying to be overly smart. np.sqrt(2 / np.pi) is a numpy.float64 object, not a regular float. Since it’s the first argument to multiplication, and apparently implements __mul__, then it can decide on what to do now. And apparently it starts treating Variables like sequences, but Variables are not regular sequences, because you can index them as many times as you want. That’s why it keeps adding dims, until it hits the numpy limit, and returns a very deeply nested list that contains single element Variables.
If you create a regular float object out of it, or reverse the order (i.e. put the constant after the expression with x the result should be ok).
I think the only fix we can do is to add scalar types to torch. We’ve been talking about that for some time now, and it’s probably going to happen, but rather in some farther future.