Value changed after loading a saved tensor

Xiaoyu_Liu · March 16, 2017, 11:48pm

Hello, I’m trying to use torch.save and torch.load in my script.

Something strange happened when I saved a 800*3*480*640 FloatTensor A, and re-loaded it, all value after A[582][2] became 0. I believe 582 and 2 are two special numbers because I have tested this on another 1200*3*480*640 FloatTensor B, still all value of re-loaded B after B[582][2] became 0.

I’m confused by this, is there any restriction about using torch.save and torch.load? Here’s how I used them:

A = torch.from_numpy(A_array)
checkEmpty(A) # passed

torch.save(A, 'A_tensor')
A = torch.load('A_tensor')
checkEmpty(A) # failed

Then to find the first ZERO map:

for i in range(A.size()[0]):
		for j in range(A.size()[1]):
			if torch.max(A[i][j]) == 0.0:
				print(i, j)

The first (i j) is (582 2)

smth · March 17, 2017, 12:14am

are you on version 0.1.6 or earlier? we fixed a bug for very large tensors being serialized in 0.1.7.

Xiaoyu_Liu · March 17, 2017, 12:23am

Thank you. But how to check the version of currently used Pytorch?

And I installed it through:
pip install https://download.pytorch.org/whl/cu75/torch-0.1.10.post2-cp27-none-linux_x86_64.whl

It looks like version 0.1.10.

smth · March 17, 2017, 12:52am

What OS are you on?

I just tried this small snippet on Linux (CentOS7) and on OSX:

import torch

a = torch.ones(800*3*480*640)
print(a.eq(0).sum())
torch.save(a, 'a.pth')
b = torch.load('a.pth')

print(b.eq(0).sum())

On Linux it works fine, on OSX i get an error, which i am investigating:

0
Traceback (most recent call last):
  File "a.py", line 5, in <module>
    torch.save(a, 'a.pth')
  File "/Users/soumith/code/pytorch/torch/serialization.py", line 120, in save
    return _save(obj, f, pickle_module, pickle_protocol)
  File "/Users/soumith/code/pytorch/torch/serialization.py", line 192, in _save
    serialized_storages[key]._write_file(f)
RuntimeError: Unknown error: -1

Xiaoyu_Liu · March 17, 2017, 3:04am

Thank you, I’m working on Ubuntu 14.04.1

Xiaoyu_Liu · March 17, 2017, 5:39pm

And by running your script, I got 200410112 as the result of
print(b.eq(0).sum())

smth · March 18, 2017, 3:02am

I tried it on an ubuntu 14.04 as well, but couldn’t reproduce the issue.

lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 14.04.2 LTS
Release:        14.04
Codename:       trusty

However, the OSX failure is good, I am tracking it here and trying to find out the issue there: https://github.com/pytorch/pytorch/issues/1031

Can you tell me about your OS, do you have any locale set, or do you just use the EN locale?
You can find your current locale with the command locale

$ locale
LANG=en_US.UTF-8
LANGUAGE=
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=

I’m not sure locale matters, i am trying to eliminate variables.

Also, can you give me your kernel version with uname -a:

$ uname -a
Linux fatbox 3.16.0-37-generic #51~14.04.1-Ubuntu SMP Wed May 6 15:23:14 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux

And lastly, can you check if you have enough free-space on your machine? df -h will give the answer:

$ df -h
Filesystem      Size  Used Avail Use% Mounted on
**/dev/sda2       355G  302G   35G  90% /**
none            4.0K     0  4.0K   0% /sys/fs/cgroup
udev            5.9G  4.0K  5.9G   1% /dev
tmpfs           1.2G  1.7M  1.2G   1% /run
none            5.0M     0  5.0M   0% /run/lock
none            5.9G  124M  5.8G   3% /run/shm
none            100M  152K  100M   1% /run/user
/dev/sda4        96M   29M   68M  30% /boot/efi
/dev/sdb2       2.7T  2.0T  609G  77% /media/hdd2

Xiaoyu_Liu · March 21, 2017, 12:02am

Sorry for the late, I used np.save and np.load to solve the problem in the end. And I just tried torch.save and torch.load again, still the same problem. I ran the commands you mentioned, the results are:

$ lab_release -a
No LSB modules are available.
Distributor ID:	Ubuntu
Description:	Ubuntu 14.04.4 LTS
Release:	14.04
Codename:	trusty

$ locale
LANG=en_CA.UTF-8
LANGUAGE=en_CA:en
LC_CTYPE="en_CA.UTF-8"
LC_NUMERIC="en_CA.UTF-8"
LC_TIME="en_CA.UTF-8"
LC_COLLATE="en_CA.UTF-8"
LC_MONETARY="en_CA.UTF-8"
LC_MESSAGES="en_CA.UTF-8"
LC_PAPER="en_CA.UTF-8"
LC_NAME="en_CA.UTF-8"
LC_ADDRESS="en_CA.UTF-8"
LC_TELEPHONE="en_CA.UTF-8"
LC_MEASUREMENT="en_CA.UTF-8"
LC_IDENTIFICATION="en_CA.UTF-8"
LC_ALL=

$ uname -a
Linux sengled-gpu-1 4.2.0-35-generic #40~14.04.1-Ubuntu SMP Fri Mar 18 16:37:35 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux

$ df -h
Filesystem      Size  Used Avail Use% Mounted on
udev             32G  4.0K   32G   1% /dev
tmpfs           6.3G  1.9M  6.3G   1% /run
/dev/sda2       854G  578G  233G  72% /
none            4.0K     0  4.0K   0% /sys/fs/cgroup
none            5.0M     0  5.0M   0% /run/lock
none             32G   76K   32G   1% /run/shm
none            100M   56K  100M   1% /run/user
/dev/sda1       511M  3.4M  508M   1% /boot/efi

While I don’t understand them at all.

Xiaoyu_Liu · March 21, 2017, 12:31am

Hi I suddenly realized perhaps my version is not the latest one, because I installed it one day before I checked it again. Also I can’t use torch.__version__ , which seems proved the thought. Really sorry about the interruption!

But when I try to install the latest version, this error happens,

SSLError: [Errno 1] _ssl.c:510: error:14077410:SSL routines:SSL23_GET_SERVER_HELLO:sslv3 alert handshake failure

If you are experienced with it, can you tell me how to solve that? If not, I’ll figure out it somewhere else:smiley:

apolis · March 21, 2017, 1:09am

Hello,

do

pip install pyopenssl
pip install ndg-httpsclient
pip install pyasn1

and try installing the package again.

Best,
Alex

Xiaoyu_Liu · March 21, 2017, 4:39pm

Thanks! But after installing these, the problem can’t be solved yet… Then I refreshed the pytorch website to install again, it works. Magical…

smth · March 21, 2017, 9:09pm

I fixed the commands on the website to not use https, that’s why it got fixed