Hi,
I ran into a really peculiar situation here.
I have a pretrained network (weights frozen in training) that is supposed to take plane sweep volume (essentially stack of warped images) as input and produce intermediate results for other modules in the pipeline.
My plan is to run the said network twice in a training step.
However, for some unknown reason, the second pass of the network would result in all-zero output.
Here is a code snippet of what happened.
# Initialization
self.net = Model()
load_ckpts(self.net, path)
self.net.eval()
...
# Training step
with torch.no_grad():
result1 = self.gen_result(self.net, input_imgs1, input_exts, input_ints, depths)
result2 = self.gen_result(self.net, input_imgs2, input_exts, input_ints, depths) # result2 is all zero for some reason
Some scenarios:
-
Run only result2
If I comment out the first line which generates result1, then result2 would be nonzero values as expected. -
Run both result1 and result2
If I add a printing function between two lines and print out input_imgs2, then result2 is nonzero as expected.
However, if I move the printing function to after result2, then result2 would become all zeroes.
I have never run into similar issues before. It does not make sense to me how printing out the variables would ever change the results.
I tested line-by-line in the self.gen_result function and found out that this line could cause the issue:
trnfs = torch.matmul(src_exts, torch.inverse(tgt_exts))
If I print input_imgs2 or src_exts before this line, then it would work.
And the opposite would produce an all-zero output.
Edit:
Environment: Windows 10
PyTorch 1.7.1 (also tried 1.7.0)