Module output slightly off on Android

minhduc0711 · December 3, 2019, 6:44am

I’m learning how to port a module onto an Android app, but I’ve found that the output is a bit different on Android compared to the Python script.

Here’s my model:

class Dummy(nn.Module):
    def __init__(self):
        super(Dummy, self).__init__()
        self.fc = nn.Linear(416 * 416 * 3, 10)

    def forward(self, x):
        x = x.reshape(x.shape[0], -1)
        x = self.fc(x)
        return x

And the model convert script:

model = Dummy()
model.load_state_dict(torch.load("dummy.pt"))

example = torch.randn(1, 3, 416, 416)

traced_script_module = torch.jit.trace(model, example)
traced_script_module.save("dummy-android.pt")

In the Python script, the output to my example image was:
[-0.1566336453, 0.0841832012, 0.0985929519, 0.0115705878, -0.3650150299, 0.3889884949, 0.0307857171, -0.1416997164, 0.2296864390, -0.2360394597]
And on Android:
[-0.15890475, 0.07862158, 0.09994077, 0.015047294, -0.3655298, 0.38745546, 0.03088907, -0.13880219, 0.23389207, -0.24074274]

Is this behavior normal?

Thanks in advance!

ljk53 · December 5, 2019, 4:10am

I tried to reproduce the issue with the code you posted - the results I got on desktop and android device were the same with the same input tensor.
Could you please double check if the input tensor is identical? Did you try some fixed tensor like torch.ones(1, 3, 416, 416)? or did you decode it from some image/text file?

minhduc0711 · December 5, 2019, 10:29am

The results I got above was from an image.

When I try torch.ones(1, 3, 416, 416) the results are closer:

Python:
[-0.2647885382, 0.1896176785, 0.5414127707, -0.1591370702, -0.4948869348, 0.9562743902, 0.2362334579, -0.3959787488, 0.4024670124, -0.4271552861]
Android:
[-0.26478815, 0.18961564, 0.5414131, -0.15913847, -0.49488664, 0.956275, 0.23623104, -0.39597753, 0.40246645, -0.4271558]

The L2 norm for these vectors was 3.8e-06, comparing to 1e-02 for my first results.
@ljk53 What do your results look like?

ljk53 · December 6, 2019, 3:44am

Yeah, the new results were similar to the difference I saw. 1e-02 seems too large. What’s the format of the image? Do you use the same library to decode the image? Did you convert the input to 0~255 or 0~1? I’d use some random input at the same scale to double check. Can you please check if the raw input tensors decoded from the image are about the same?

minhduc0711 · December 6, 2019, 4:41am

I did convert the image to 0~1.
My image decoding code on Android:

Bitmap bitmap = BitmapFactory.decodeStream(getAssets().open("dog_processed.jpg"));
float[] zeros = new float[]{0f, 0f, 0f};
float[] ones = new float[]{1f, 1f, 1f};  // Since the return values is already 0~1
Tensor inputTensor = TensorImageUtils.bitmapToFloat32Tensor(bitmap, zeros, ones);

I also tried using a smaller input of size 3 x 2 x 2 (and smaller model), and the norm goes to 9.57e-08.
So my inputs were correct, and the error increases with more computations.

ljk53 · December 6, 2019, 6:15am

Could you please share the python code you used to convert image to tensor as well? I’ll try to repro on my side. Thanks!

minhduc0711 · December 6, 2019, 6:30am

Sure here it is:

import torch
import imageio

img = imageio.imread("data/small.jpg")
dummy_input = torch.tensor(img).permute([2, 0, 1]).unsqueeze(0) / 255.

xta0 · December 6, 2019, 8:29pm

Hi @minhduc0711 @ljk53 I remember somebody posted a very similar issue for iOS couple of months ago, not sure it’s related, but worth looking into - https://github.com/pytorch/pytorch/issues/27813. Short answer - Try .png or .bmp instead of .jpg.

minhduc0711 · December 8, 2019, 9:03am

Thanks, I tried .png and the norm was 2e-6.

Nevertheless, this behavior seems pretty strange. I hope there will be a workaround for .jpg files in the future.

David_Reiss · December 9, 2019, 7:33pm

We discussed this internally, and one other issue that came up was the size of your linear layer. You’re doing a sum of 50,000 products, which can accumulate a lot of floating point error (which can differ between different platforms). You should get better results with a more traditional model that uses some convolutional and pooling layers before the linear layer.