How to backpropagate a Transformer?

Hello, everyone.

In every source file implementing attentions and transformers,
I’ve found there is no backward function inside the class of Transformer

I wonder why there is no backward method in the class of Transformer ?

Also, how to backpropagate the Transformer?
In pytorch, will pytorch engine backpropagate the encoder and the decode together up thru all layers?
I wonder, as I mentioned, how this works without implementing backward in Transformer class.

Thank you in advance

Autograd will use the backward methods of each submodule used in the nn.Transformer.forward method to calculate the gradients so the nn.Transformer module doesn’t necessarily need to implement a custom backward method.
You could check these submodules and see, how the backward methods are defined (i.e. if they are using custom ones or just other PyTorch operations with already defined backwards).

Dear ptrblck:

Thank you for your help and your kind explanation.
Have a nice week and see you again.
Take care