Collection of Pytorch Tips for Implementing Fundamental Arhitectures

Hi,

Thanks for creating PyTorch!

I want to use Pytorch to implement different kinds of architectures.

I seems that Feed forward nets are simple to implement.

However, for RNNs a separate hidden variable apart from the layers is required. Could you provide the references for the design, what is required to be done to use those hidden variables, why not just pass the input of a layer as the output?

Do you have any other references, i.e why gradient needs to be set to zero in the training loop, other specific things?

How does Pytorch handle autodiff for changing the views of a tensor?

I am looking at several Pytorch code bases and almost always there is something niche that hasn’t been introduced?

Even any related links are much appreciated. Pytorch only has one autodiff specific paper, is there some paper that explains the design plus its inernals?

Thanks for taking the time to help me!

Cheers :slight_smile: