I tried to convolve my input twice, then connect the input and the convolved vector(all 3 dimensional), send it to the model and pass an LSTM. But I found that when the order of my connection is different, as the training progresses and the loss decreases, my model finally achieves completely different results.
For example
for idx, (inputs, tlabels) in enumerate(train_loader):
optimizer.zero_grad()
inputs = inputs.float().permute(1, 0, 2, 3).to(device)
hinputs = model.conv1(inputs) # 2D-Conv, padding=same
hinputs2 = model.conv2(hinputs) # 2D-Conv, padding=same
tlabels = tlabels.float().to(device)
hlabels = model.conv1(tlabels)
hlabels2 = model.conv2(hlabels)
linputs = torch.cat((hinputs, hinputs2, inputs), 1) # order firstConv, secondConv, raw
llabels = torch.cat((hlabels, hlabels2, tlabels), 1)
# linputs = torch.cat((inputs, hinputs, hinputs2), 1) # order raw, onceConv, secondConv
# llabels = torch.cat((tlabels, hlabels, hlabels2), 1)
lpreds = model.low_level(linputs) # LSTM
loss = criterion(lpreds, llabels.reshape(1, 3, -1).permute(1, 0, 2))
loss.backward()
optimizer.step()
In the code, I ‘cat’ the three vectors,and the other ‘cat’ method was commented. These two methods lead to completely different prediction results for the model. In general, it is closer to the result of using the first dimension to train alone.
The only difference between them is the order of ‘next_functions’ in ‘grad_fn’.At first I thought that the calculation order in the calculation graph would cause the gradient of the leaf nodes to be different, but after testing with a few custom vectors, I found that it was not.
How to explain this problem?I would be grateful if anyone could answer me.