Here’s the traceback that I’ve been able to get:
Traceback (most recent call last):
File "./main.py", line 27, in <module>
main()
File "./main.py", line 20, in main
output = model(data.long())
File "C:\Users\user1\Anaconda3\lib\site-packages\torch\nn\modules\module.py", line 541, in __call__
result = self.forward(*input, **kwargs)
File "C:\Users\user1\Desktop\github\transformer_implementation\transformer.py", line 25, in forward
enc_output = self.encoder(x)
File "C:\Users\user1\Anaconda3\lib\site-packages\torch\nn\modules\module.py", line 541, in __call__
result = self.forward(*input, **kwargs)
File "C:\Users\user1\Desktop\github\transformer_implementation\encoder.py", line 54, in forward
output = self.encoder_stacked(x)
File "C:\Users\user1\Anaconda3\lib\site-packages\torch\nn\modules\module.py", line 541, in __call__
result = self.forward(*input, **kwargs)
File "C:\Users\user1\Anaconda3\lib\site-packages\torch\nn\modules\module.py", line 97, in forward
raise NotImplementedError
NotImplementedError
I didn’t initially include too many details here in the original post, but the model
that I’m using has a nn.ModuleList
inside of it. After going through parts of the code, I found out that this was the problem. More specifically, it looks like:
self.encoder_stacked = nn.ModuleList([Encoder(self.config) for _ in range(self.N)])
According to this Stack Overflow question it seems that there are cases where the indentation may cause problems, but that’s not the case for me.
The Encoder
module looks like this:
class Encoder(nn.Module):
def __init__(self, config, dropout=True):
super().__init__()
self.config = config
self.d_model = self.config.d_model
self.attention = MultiheadAttention(self.config)
if dropout:
self.dropout = nn.Dropout(p=self.config.dropout_rate)
else:
self.dropout = nn.Dropout(p=0.0)
self.norm = Normalization(features=self.d_model)
self.ffnn = FeedforwardNN(self.config)
def forward(self, x):
attn_output = self.attention(x, x, x)
sublayer1_output = self.norm(x + attn_output)
sublayer1_output = self.dropout(sublayer1_output)
ffnn_output = self.ffnn(sublayer1_output)
sublayer2_output = self.norm(sublayer1_output + ffnn_output)
sublayer2_output = self.dropout(sublayer2_output)
return sublayer2_output