Hi,
I make a custom layer that has the parameter weight
of size (Cin, Cout, rank)
.
In the forward
method, I need to permute weight as follows:
weight_col = self.weight.permute(1, 0, 2).reshape(self.in_channels, self.out_channels * self.rank)
During the training of the model, it yields some odd errors with nan
value.
From my [reading], I thought that it may relate to the function permute
.
So I made an assertion as follows:
assert not torch.isnan(weight_col).any(), "weight_col tensor is nan"
In the training loop, sometimes the assert happens.
To debug this, I make this:
torch.autograd.set_detect_anomaly(True)
while epoch < 100:
train(epoch, train_loader, model, criterion, optimizer, scheduler)
_, valid_top1_acc, valid_top5_acc = validate(val_loader, model, criterion)
Here’s the log:
So my questions are:
- Does the
permute
function possibly yieldnan
in autograd flow?
If yes, how can I fix this? - In my custom layer, I only need to permute the
weight
one time. So I’m thinking of making the permutation outside of theforward
method. E.g, before instantiating the custom layer, I will permute theweight
. Is this a good practice?# permute before instantiating weight = weight.permute(1, 0, 2) myLayer = MyLayer(Cin, Cout, rank, padding, weight)
Any recommendation is appreciated!