Autograd Function vs nn.Module?

Hi, I am new to pytorch. I want to implement a customized layer and insert it between two LSTM layers within a RNN network.
The layer should take input h and do the following:

parameters = W*h + b # W is the weight of the layer
a = parameters[0:x]
b = parameters[x:2x]
k = parameters[2x: ]
return some_complicated_function(a, b, k)

It seems that both autograd Function and nn.Module are used to design customized layers.
My question is

  1. What are the difference between them in a single layer case ?
  2. autograd Function usually take weights as input arguments. Can it store weights internally ?
  3. Which one should I pick for my implementation ?
  4. when do I need to specify backward function while gradients are all auto computed ?




This post Difference of methods between torch.nn and functional should answer most of your questions.

2: I would say nn.Module since you have parameters
3: You need to specify the backward function if you implement a Function because it works with Tensors. On the other hand, nn.Module work with Variable and thus are differentiated with autograd.