Dear PyTorch Community,
I am encountering an issue related to the usage of a 1x1 convolutional layer in my script. The problem is comprehensively described in the attached screenshot:
I appreciate any assistance or insights the community can provide to help resolve this issue.
I can also provide all the pickle files loaded in the script, but I am currently couldn’t find an option to attach files. Please advise on the preferred method for sharing files to facilitate a full reproduction of the problem.
Could you describe what exactly the issue is?
I conduct inference on my validation dataset using the CNN. Initially, the results were anticipated to be independent of the batch size. However, upon experimenting with different batch sizes, I observed inconsistent outputs. Upon careful debugging, I identified that the first occurrence of variance appeared after the first 1x1 convolution layer. I localized the problem to this specific layer and conducted inference solely on this 1x1 convolution, not the entire CNN.
In the attached screenshot, the feature map “x” has four dimensions: (batch=2, channels=144, spatial_x=1, spatial_y=1). I expected that applying a 1x1 convolution to “x” would yield identical results to applying the same layer to “x” and “x,” followed by concatenating them. It means that in the provided screenshot, “out0” and “out1” tensors should ideally be identical. Regrettably, a small discrepancy (~1e-7) exists, although even torch/np.isclose returns True for these tensors. This variance accumulates throughout the subsequent layers of the CNN, reaching approximately ~1e-4 by the inference’s conclusion.
In the second part of code of the screenshot, I demonstrate that replacing the 1x1 convolution with a linear layer resolves the issue so I have implemented a workaround for myself, but I believe this discrepancy in the 1x1 convolution’s output is indicative of a potential bug that merits attention and resolution.
I appreciate your time and consideration in addressing this matter. If further clarification or additional information is required, please do not hesitate to reach out.
Thank you for your assistance.
I wish to extend my apologies for an error I previously made in my code related to the comparison of out0 and out1 tensors. I have now rectified this mistake and am including a corrected screenshot for your reference. Upon reviewing the updated output, it appears that the issue also extends to a linear layer.
It is also worth mentioning that, despite the expectation of identical behavior from linear and conv 1x1 layers with equivalent weight data, there is a difference in the absolute values of variance observed in their outputs.
These small numerical mismatches are expected and caused by a different order of operations to calculate the result and the limited floating point precision. Neither of the two outputs is “more correct” and should show a similar error to a wider
OK, I see, thanks for the response anyway