Porting code from Thano/Lasagne to PyTorch

@tom Hi Thomas,

one strange question is there in PyTorch a function that calculates the square root of the sum of squares of elements of a vector / matrix?

Thank you very much in advance.

Cheers.

Ergnoor

Either you spell it out yourself or you could use torch.norm.

Best regards

Thomas

@tom Hi Thomas and thank you very much. Cheers. Ergnoor

@tom Hi Thomas, two more questions:

  1. Is there any way to label parameters, let us say for example ‘orthogonal’, ‘spectral’, ‘basis’, in order to be able to filter them later, based on these labels, when for example you would like to do optimization or add losses to the cost function? I have used the names of the parameters for filtering purposes but I am not sure if that is efficient and elegant. I could send you the code to have a look, but not here because it is quite long one.
  2. I know that L2 regularization on the parameters of the model is already included in most optimizers, you need only to tune weight_decay which by default is 0.
    I also know that on the other hand L1 regularization is not included in optimizers and you have to manually compute it as following (example given by Francisco Massa in another similar question):
l1_crit = nn.L1Loss(size_average=False)
reg_loss = 0
for param in model.parameters():
   reg_loss += l1_crit(param)

factor = 0.0005
loss += factor * reg_loss 

I know also that if you have to optimize to different types of parameters, let us say for example we have spectral and basis parameters, with the same optimizer, let us say for example SGD but with different learning rates you ca use the following:

optim.SGD([
                {'params': basis},
                {'params': spectral, 'lr': 1e-3}
            ], lr=1e-2, momentum=0.9)

Now my question is: what happens if you have specific optimizer for some parameters, let us say for orthogonal parameters,

def orthogonality(x):
    '''
    Penalty for deviation from orthogonality:
    
    ||dot(x.T, x) - I||**2
    '''
    xTx = torch.mm(x.t(),x)
    return torch.sum(torch.sqrt(xTx - identity_like(xTx)))

should I use the code given above for L1 to calculate the penalty and then add it to the loss function?

lorth_crit = orthogonality(x)
reg_loss = 0
...

If you have a better, more elegant, solution I will be very pleased to have it, always if that does not create any problem for you.

Thank you in advance for your time and support.

Cheers.

Ergnoor

Hello Ergnoor,

re 1. There is nothing wrong with matching, even if it isn’t the fastest op in the world - likely the time spent there is dwarfed by that spent running your network. You could keep a (global? or in each module) list of them around if you preferred that.
re 2. There is nothing special about using l1 or any other loss with only part of your parameters. Just write what seems intuitive to you.

Best regards

Thomas

@tom Hi Thomas and thank you very much for your suggestions.

I would like to have your opinion in regard of the general structure of my code (approx. 100 lines), including some details only for ‘Penalty’ part of it.

If you are willing to have a look at it, how can I present it to you?

Thank you in advance for your time and understanding.

Cheers.

Ergnoor

@tom

Hi Thomas, and hope you are doing alright.

I would like to have your opinion also in regard to the PyTorch version of geoSGD optimizer I just coded.

And as for the previous general structure of my code, I forgot to let you know that only roughly 20 lines are those that I am interested more to have your opinion, the rest of the code is for you to have a more clear idea what is going on there.

Thank you again for your time and understanding.

Cheers.

Ergnoor

Hello Ergnoor,

As a rule, I don’t do individual reviews of non-published code (on the forums).
If you have a github link, I might take a look. Also, there might be other who may have better hints for you, too, so I’d recommend to just post a link here…

Best regards

Thomas

@tom

Hi Thomas,

and thank you very much for your advice and suggestion. I appreciate it very much and I will make use of them.

Cheers.

Ergnoor

One question in regard with the activation function in RNNs.

In the source code for torch.nn.modules.rnn there is this part:

_VF = torch._C._VariableFunctions
_rnn_impls = {
    'LSTM': _VF.lstm,
    'GRU': _VF.gru,
    'RNN_TANH': _VF.rnn_tanh,
    'RNN_RELU': _VF.rnn_relu,
}

Is there any way how you can add another new activation function like:

_VF = torch._C._VariableFunctions
_rnn_impls = {
    'LSTM': _VF.lstm,
    'GRU': _VF.gru,
    'RNN_TANH': _VF.rnn_tanh,
    'RNN_RELU': _VF.rnn_relu,
    'RNN_OPLU': _VF.rnn_oplu,
} 

Another question is: where can I find the source code for torch._C._VariableFunctions?

Cheers.

Ergnoor

Hi Ergnoor,

if you want to capture more people’s attention, I’d recommend to start a new forum topic when the focus of your questions move to a new topic.

It won’t work quite as easily. The function you found dispatches to ATen’s C++ code, which is bound to torch._C._VariableFunctions with a fair amount of “magic” (a while ago I wrote a short guide of how to find the C++ source for a given function).
That said we’re trying to make sure that using the PyTorch JIT enables you to code RNNs yourself and have them run fast. To this end, you could use the RNN implementations used for benchmarking for inspiration of how to code up your own.

Best regards

Thomas

@tom

Hi Thomas,

and thank you very much for your suggestion and help.

I will have a look at links you provided for me and if I have uncertainties I will ask for help again.

Cheers.

Ergnoor