Is there anything in PyTortch similar to theano.function()
import theano
x = theano.tensor.dscalar()
f = theano.function(, 2*x)
f(4)
… array(8.0)
Is there anything in PyTortch similar to theano.function()
import theano
x = theano.tensor.dscalar()
f = theano.function(, 2*x)
f(4)
… array(8.0)
>>> import torch
>>> def f(x):
... return 2 * torch.DoubleTensor([x])
...
>>> f(4)
tensor([8.], dtype=torch.float64)
@Tony-Y thank you for answering my question, but your answer is not the one I was expecting and maybe because I used a simple example to present theano.function(). A more detailed explanation follows:
def function(inputs, outputs=None, mode=None, updates=None, …):
“”"
Return a class: callable object that will calculate outputs
from inputs
.
Parameters
----------
inputs : list of either Variable or In instances.
Function parameters, these are not allowed to be shared variables.
outputs : list or dict of Variables or Out instances.
If it is a dict, the keys must be strings. Expressions to compute.
mode : string or Mode
instance.
Compilation mode.
updates : iterable over pairs (shared_variable, new_expression). List, tuple or OrderedDict.
Updates the values for Shared Variable inputs according to these expressions.
>>> import torch
>>> import torch.nn as nn
>>> class F(nn.Module):
... def __init__(self):
... super(F, self).__init__()
... def forward(self, x):
... return 2 * x
...
>>> f = F()
>>> a = torch.DoubleTensor([4])
>>> f(a)
tensor([8.], dtype=torch.float64)
@Tony-Y thank you again for the answer, but what you give is a static solution while the theano.function() is a dynamic one in the following sense:
we have data and a list of expressions [‘expression_01’, ’ expression_02’, expression_03’, …]
for example:
expression_01 = 2*x
expression_02 = 3^x
expression_03 = 3x^2 - 5
…
and when there is given the following:
x = theano.tensor.dscalar()
list_of_expression = [‘expression_01’]
f = theano.function([x], list_of_expression, …)
f(2)
…array(8.0)
but in another scenario we have:
x = theano.tensor.dscalar()
list_of_expression = [‘expression_01’, ’ expression_02’, expression_03’]
f = theano.function([x], list_of_expression, …)
f(2)
…array(8.0, 9.0, 7.0)
this is how I understand the theano.function(), hence you just adjust the ‘list_of_expression’ according to your needs and theano.function() does the rest for you dynamically, and I hope it is a little bit more clear. for you too.
>>> expressions = {
... 'expression_01': lambda x: 2*x,
... 'expression_02': lambda x: 3**x,
... 'expression_03': lambda x: 3*x**2 - 5}
>>>
>>> import torch
>>> import torch.nn as nn
>>> class F(nn.Module):
... def __init__(self, expression):
... super(F, self).__init__()
... self.expression = expression
... def forward(self, x):
... return self.expression(x)
...
>>> a = torch.DoubleTensor([4])
>>> f = F(expressions['expression_01'])
>>> f(a)
tensor([8.], dtype=torch.float64)
>>> f = F(expressions['expression_02'])
>>> f(a)
tensor([81.], dtype=torch.float64)
>>> f = F(expressions['expression_03'])
>>> f(a)
tensor([43.], dtype=torch.float64)
@Tony-Y thank you very much for your time and help. This last answer is the closest one to what I was expecting. The only one requirement not fulfilled is that it is needed for the expressions to be fed one by one and not en block, but I believe that could be manged somehow in the init part with a ‘for’ loop, looping over all the expressions and assigning them in the following fashion:
for i, expression enumerate(list_of_expressions):
self.expression_0i = list_of_expression[‘expression_0i’]
and then at the return part we have:
return self.expression_0i(x)
I mean something like this in general. Do you think it is doable?
The core difference between PyTorch and Theano you’re wondering about here is that in Theano you create a symbolic graph that you then feed into function
to have it compiled to a function you can call while in PyTorch you write your calculation and PyTorch runs it as you write.
Modules are decidedly only there to hold learnable parameters / state - see Jeremy Howard’s recently added tutorial.
Now you could assemble lines of Python and then eval it to form your function, but quite likely, you’re not making the best use of PyTorch that way. One of the things people like about PyTorch is that you don’t have the create graph -> compile -> run
workflow.
Best regards
Thomas
@tom hi Thomas and thank you very much for reinforcing my knowledge about the differences between Theano/Lasagne and PyTorch.
I am an engineer and my main goal is to find a solution for my project. It is not that I am writing a code from scratches in PyTorch. As you have noticed my problem now is porting a code from Theano/Lasagne to PyTorch. I am doing this because in PyTorch it is easier to debug and there is more support, and I am experiencing this myself even communicating with you right now. In this porting procedure I would preferred to change the original code as less as possible up to that degree that gives me the possibility to debug it easier.
I understand that the workflow philosophy behind PyTorch is different from that of Theano/Lasagne. Telling you the truth PyTorch workflow is the one that I am used to, and it took me a while till I understood the workflow of Theano/Lasagne. Some times (I mean most of the times :-)) people do not have the luxury to be picky, they just have to float with the current. Even in my case I do not have the luxury of being picky in that sense that: no this is not exactly PyTorch, that is half Theano half PyTorch. With this I mean no offense for anyone else that is stringent in crossing the borders between the two libraries. My main goal is to complete my project with any reasonable mean possible.
I am trying to explain that I know this is not the best way to write a code.
Please know that I appreciate very much the help and advises offered by you guys all.
Cheers.
Ergnoor
Oh, sorry, I don’t want to make the impression to tell you how and how not to usw PyTorch. It was my impression that maybe you were looking for something more elaborate because the typical PyTorch transposition of that type of code can look suspiciously simple.
When I last did similar things, I tried to just write all the steps between the definition input variables later specified in of the Theano function
call and the output in one regular Python function using PyTorch arithmetic. This looks a lot like Tony’s first example (except that the DoubleTensor constructor is probably not a good idea and you’d just use x there). In a way this should be very similar to what Theano does except that Python’s function declaration takes the place of Theano’s function
call.
If there is code you find particularly difficult to translate, I’m sure we’ll try to help you out.
Best regards
Thomas
@tom Hi Thomas and hope you are doing alright.
I have a couple of questions in regard to the porting of code.
the Theano/Lasagne version:
def geoSGD(loss_or_grads, params, learning_rate):
“”“Geodesic Stochastic Gradient Descent (geoSGD) updates
Generates update expressions of the form:
* param := param - learning_rate * gradient
Parameters
----------
loss_or_grads : symbolic expression or list of expressions
A scalar loss expression, or a list of gradient expressions
params : list of shared variables
The variables to generate update expressions for (in our case they are: hh_W_u and hh_W_v)
learning_rate : float or symbolic scalar
The learning rate controlling the size of update steps
Returns
-------
OrderedDict
A dictionary mapping each parameter to its update expression
“””
grads = get_or_compute_grads(loss_or_grads, params)
updates = OrderedDict()
lr = learning_rate
for param, grad in zip(params, grads):
W = param.get_value(borrow=True)
G = grad
A = T.dot(G, W.T) - T.dot(W, G.T) # A = G * M.T - M * G.T
I = T.identity_like(A)
cayley = T.dot(T.nlinalg.matrix_inverse(I+(lr/2.)*A), I-(lr/2.)*A) # (I + eta/2 * A)**(-1) - (I - eta/2 * A)
updates[param] = T.dot(cayley, W) # cayley * M = ((I + eta/2 * A)**(-1) - (I - eta/2 * A)) * M
return updates
my PyTorch version:
def geoSGD(outputs, params, learning_rate):
grads_ = torch.autograd.grad(outputs, params, retain_graph=True, allow_unused=True) # had to add ‘retain_graph=True’
updates = OrderedDict()
lr = learning_rate
for param, grad_ in zip(params, grads_):
W = param.double()
G = grad_.double()
A = torch.mm(G, W.transpose(0, 1)) - torch.mm(W, G.transpose(0, 1))
if torch.all(torch.eq(A.transpose(0, 1), -A)):
if torch.sum(abs(A.transpose(0, 1) + A)) == 0:
print('The matrix A is skew symmetric')
else:
print('The matrix A is NOT skew symmetric')
I = torch.eye(A.size()[0],A.size()[1]).double()
cayley = torch.mm(torch.inverse(I+(lr/2.)*A), I-(lr/2.)*A) # (I + eta/2 * A)**(-1) - (I - eta/2 * A)
updates[param] = torch.mm(cayley, W) # cayley * M = ((I + eta/2 * A)**(-1) - (I - eta/2 * A)) * M
return updates
what I do not understand fully is that the ‘loss_or_grads’ have to be replaced by ‘outputs’ and, always if my porting is correct, how does the grad() function in PyTorch version knows the ‘loss’ to do the calculations and what ‘output’ is exactly in the case of PyTorch.
only if I do like following
F(U, S, V) = A * (U * S * V) + B
then
dF(U, S, V)/dU is possible
Is there any way to have simultaneously
dF/dW, dF/dU, dF/dS, dF/dV (‘d’ all the time means partial derivative)
Thank you very much in advance for your time, help and understanding.
Cheers.
Ergnoor
Hello Ergnoor,
great.
backward
.grad_ = ...
, then you replacefor param, grad_ in zip(params, grads_):
W = param.double()
G = grad_.double()
...
updates[...] = ...
with
with torch.no_grad(): # we don't actually want autograd in the gradient step
for param in params:
W = param.double()
G = param.grad.double()
...
param = torch.mm(cayley, W).float() # in stead of updates[param]=...
grad
at the same time or using backward will do the right thing, too. Note that you can only take derivatives of scalar functions.Best regards
Thomas
P.S.: If you use triple backticks ``` before and after your code, you’ll get all your code formatted. That makes it much nicer to look at.
@tom Hi Thomas, and YES I am trying to implement Orthogonal / Unitary RNN.
backward
" ? Do you mean that before calling this optimizer the code should have already called backward
in order for param
to have .grad
calculated? And by the way now I understand why in the PyTorch optimizer it is needed to pass only param
-s.I thank you very much again for your support.
Cheers.
Ergnoor
@tom Hi Thomas I have another question. When I tried the torch.nn.init.orthogonal_()
like following:
# Python code to check
# whether a matrix is
# orthogonal or not
def isOrthogonalN(a, m, n) :
if (m != n) :
return False
# Multiply A*A^t
for i in range(0, n) :
for j in range(0, n) :
sum = 0
for k in range(0, n) :
# Since we are multiplying
# with transpose of itself.
# We use a[j][k] instead
# of a[k][j]
sum = sum + (a[i][k] *
a[j][k])
if (i == j and sum != 1) :
return False
if (i != j and sum != 0) :
return False
return True
a = torch.empty(3, 3)
a=torch.nn.init.orthogonal_(a)
if (isOrthogonalN(a, len(a), len(a[0]))) :
print ("Yes")
else :
print ("No") ```
I got as asnwer
```NO```
When I tried with:
```a = [[1, 0, 0],
[0, 1, 0],
[0, 0, 1]]```
the answer was ```YES``` .
What do you think I am doing wrong here. I did not write myself the code ``isOrthogonal``` it is from ```GeeksforGeeks``` site.
Is there any function in PyTorch that I can use to check for orthogonality of matrices?
Thank you in advance for your help.
Cheers.
Ergnoor
@tom sorry Thomas the code is as following:
# Python code to check
# whether a matrix is
# orthogonal or not
def isOrthogonalN(a, m, n) :
if (m != n) :
return False
# Multiply A*A^t
for i in range(0, n) :
for j in range(0, n) :
sum = 0
for k in range(0, n) :
# Since we are multiplying
# with transpose of itself.
# We use a[j][k] instead
# of a[k][j]
sum = sum + (a[i][k] *
a[j][k])
if (i == j and sum != 1) :
return False
if (i != j and sum != 0) :
return False
return True
a = torch.empty(3, 3)
torch.nn.init.orthogonal_(a)
if (isOrthogonalN(a, len(a), len(a[0]))) :
print ("Yes")
else :
print ("No") ```
I think you’re seeing numerical precision:
a = torch.empty(3, 3)
a = torch.nn.init.orthogonal_(a)
almost_eye = torch.mm(a, a.t())
print((almost_eye - torch.eye(3)).abs().max().item())
gives something < 1e-6 or so.
yes I think so too because when I did this testing:
print(torch.mm(a, a.t()))
I got as output the following:
[ 3.7639e-08, 1.0000e+00, -2.8114e-08],
[-7.4154e-08, -2.8114e-08, 1.0000e+00]])```
sorry the output was:
tensor([[ 1.0000e+00, 3.7639e-08, -7.4154e-08],
[ 3.7639e-08, 1.0000e+00, -2.8114e-08],
[-7.4154e-08, -2.8114e-08, 1.0000e+00]])
@tom I replaced the line
if (i != j and sum != 0) :
with
if (i != j and sum > 1e-6) :
and it seems to work alright.
Thank you very much Thomas.
Cheers.
Ergnoor