I am encountering an issue related to the usage of a 1x1 convolutional layer in my script. The problem is comprehensively described in the attached screenshot:
I can also provide all the pickle files loaded in the script, but I am currently couldn’t find an option to attach files. Please advise on the preferred method for sharing files to facilitate a full reproduction of the problem.
I conduct inference on my validation dataset using the CNN. Initially, the results were anticipated to be independent of the batch size. However, upon experimenting with different batch sizes, I observed inconsistent outputs. Upon careful debugging, I identified that the first occurrence of variance appeared after the first 1x1 convolution layer. I localized the problem to this specific layer and conducted inference solely on this 1x1 convolution, not the entire CNN.
In the attached screenshot, the feature map “x” has four dimensions: (batch=2, channels=144, spatial_x=1, spatial_y=1). I expected that applying a 1x1 convolution to “x” would yield identical results to applying the same layer to “x[0]” and “x[1],” followed by concatenating them. It means that in the provided screenshot, “out0” and “out1” tensors should ideally be identical. Regrettably, a small discrepancy (~1e-7) exists, although even torch/np.isclose returns True for these tensors. This variance accumulates throughout the subsequent layers of the CNN, reaching approximately ~1e-4 by the inference’s conclusion.
In the second part of code of the screenshot, I demonstrate that replacing the 1x1 convolution with a linear layer resolves the issue so I have implemented a workaround for myself, but I believe this discrepancy in the 1x1 convolution’s output is indicative of a potential bug that merits attention and resolution.
I appreciate your time and consideration in addressing this matter. If further clarification or additional information is required, please do not hesitate to reach out.
I wish to extend my apologies for an error I previously made in my code related to the comparison of out0 and out1 tensors. I have now rectified this mistake and am including a corrected screenshot for your reference. Upon reviewing the updated output, it appears that the issue also extends to a linear layer.
It is also worth mentioning that, despite the expectation of identical behavior from linear and conv 1x1 layers with equivalent weight data, there is a difference in the absolute values of variance observed in their outputs.
These small numerical mismatches are expected and caused by a different order of operations to calculate the result and the limited floating point precision. Neither of the two outputs is “more correct” and should show a similar error to a wider dtype.