I am starting to use
nn.TransformerEncoder for some experiments and was wondering if there was a way to obtain the outputs and attention weights from intermediate layers?
For example, Table 7 in the BERT paper, studies the feature extraction capabilities of BERT and utilizes outputs from intermediate layers. The attention weights would generally help with analyzing the results.
Hence, I was wondering if there was a way to obtain them easily.