Regarding this topic, kindly I have a follow up question.
Suppose I define the parameters using paramDic like below, then during the backpropagation, Is the error properly propagate from nth parameters toward 1th parameters in the dictionary? or the gradient of the error w.r.t each parameter is computed independently?
What I want to do is to define a tensor parameter for each layer, and during the loss.backward, the gradient of the error w.r.t. these parameters is propagated from the latest parameters toward the lower one.
The gradient calculation depends on the actual computation so if your computation graph uses the parameters sequentially (as seems to be the case in your example) the gradients will be calculated for each parameter and Autograd will calculate the gradients in the backward pass using the multiplication of the parameters.
Your example would thus work as if you were calling e.g. nn.Linear layers sequentially.
Each internal weight (and bias) parameter will get the corresponding gradient and Autograd will use the chain rule in the backward pass.