This problem is not closely related to the language property of Pytorch. If it is not proper to ask this question here, I will delete it right away.
However, I couldn’t think of anywhere that will have so many experts who are familiar with NN structure and design.
Recently, I tried to implement VINet a visual inertial odometry system that built with the neural network, and I open source it to GitHub HTLife/VINet
I already complete whole network structure, but the network can’t converge properly in training.
How could I fix this problem?
Possible problems & solutions
The dataset is too challenging:
I’m using the EuRoC MAV dataset, which is more challenging than the KITTI VO Dataset used by the DeepVO, Vinet(because the KITTI vehicle image does not shake up and down). NN cannot learn camera movement correctly.
L1 loss is been used and identical to the design in . (I’m not very confident about whether I understand the loss design in  currently.) Related code
Other hyperparameter problems
-  Clark, Ronald, et al. “VINet: Visual-Inertial Odometry as a Sequence-to-Sequence Learning Problem.” AAAI. 2017.