Value changed after loading a saved tensor


(Xiaoyu Liu) #1

Hello, I’m trying to use torch.save and torch.load in my script.

Something strange happened when I saved a 800*3*480*640 FloatTensor A, and re-loaded it, all value after A[582][2] became 0. I believe 582 and 2 are two special numbers because I have tested this on another 1200*3*480*640 FloatTensor B, still all value of re-loaded B after B[582][2] became 0.

I’m confused by this, is there any restriction about using torch.save and torch.load? Here’s how I used them:

A = torch.from_numpy(A_array)
checkEmpty(A) # passed

torch.save(A, 'A_tensor')
A = torch.load('A_tensor')
checkEmpty(A) # failed

Then to find the first ZERO map:

for i in range(A.size()[0]):
		for j in range(A.size()[1]):
			if torch.max(A[i][j]) == 0.0:
				print(i, j)

The first (i j) is (582 2)


#2

are you on version 0.1.6 or earlier? we fixed a bug for very large tensors being serialized in 0.1.7.


(Xiaoyu Liu) #3

Thank you. But how to check the version of currently used Pytorch?

And I installed it through:
pip install https://download.pytorch.org/whl/cu75/torch-0.1.10.post2-cp27-none-linux_x86_64.whl

It looks like version 0.1.10.


#4

What OS are you on?

I just tried this small snippet on Linux (CentOS7) and on OSX:

import torch

a = torch.ones(800*3*480*640)
print(a.eq(0).sum())
torch.save(a, 'a.pth')
b = torch.load('a.pth')

print(b.eq(0).sum())

On Linux it works fine, on OSX i get an error, which i am investigating:

0
Traceback (most recent call last):
  File "a.py", line 5, in <module>
    torch.save(a, 'a.pth')
  File "/Users/soumith/code/pytorch/torch/serialization.py", line 120, in save
    return _save(obj, f, pickle_module, pickle_protocol)
  File "/Users/soumith/code/pytorch/torch/serialization.py", line 192, in _save
    serialized_storages[key]._write_file(f)
RuntimeError: Unknown error: -1

(Xiaoyu Liu) #5

Thank you, I’m working on Ubuntu 14.04.1


(Xiaoyu Liu) #6

And by running your script, I got 200410112 as the result of
print(b.eq(0).sum())


#7

I tried it on an ubuntu 14.04 as well, but couldn’t reproduce the issue.

lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 14.04.2 LTS
Release:        14.04
Codename:       trusty

However, the OSX failure is good, I am tracking it here and trying to find out the issue there: https://github.com/pytorch/pytorch/issues/1031

Can you tell me about your OS, do you have any locale set, or do you just use the EN locale?
You can find your current locale with the command locale

$ locale
LANG=en_US.UTF-8
LANGUAGE=
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=

I’m not sure locale matters, i am trying to eliminate variables.

Also, can you give me your kernel version with uname -a:

$ uname -a
Linux fatbox 3.16.0-37-generic #51~14.04.1-Ubuntu SMP Wed May 6 15:23:14 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux

And lastly, can you check if you have enough free-space on your machine? df -h will give the answer:

$ df -h
Filesystem      Size  Used Avail Use% Mounted on
**/dev/sda2       355G  302G   35G  90% /**
none            4.0K     0  4.0K   0% /sys/fs/cgroup
udev            5.9G  4.0K  5.9G   1% /dev
tmpfs           1.2G  1.7M  1.2G   1% /run
none            5.0M     0  5.0M   0% /run/lock
none            5.9G  124M  5.8G   3% /run/shm
none            100M  152K  100M   1% /run/user
/dev/sda4        96M   29M   68M  30% /boot/efi
/dev/sdb2       2.7T  2.0T  609G  77% /media/hdd2

(Xiaoyu Liu) #8

Sorry for the late, I used np.save and np.load to solve the problem in the end. And I just tried torch.save and torch.load again, still the same problem. I ran the commands you mentioned, the results are:

$ lab_release -a
No LSB modules are available.
Distributor ID:	Ubuntu
Description:	Ubuntu 14.04.4 LTS
Release:	14.04
Codename:	trusty
$ locale
LANG=en_CA.UTF-8
LANGUAGE=en_CA:en
LC_CTYPE="en_CA.UTF-8"
LC_NUMERIC="en_CA.UTF-8"
LC_TIME="en_CA.UTF-8"
LC_COLLATE="en_CA.UTF-8"
LC_MONETARY="en_CA.UTF-8"
LC_MESSAGES="en_CA.UTF-8"
LC_PAPER="en_CA.UTF-8"
LC_NAME="en_CA.UTF-8"
LC_ADDRESS="en_CA.UTF-8"
LC_TELEPHONE="en_CA.UTF-8"
LC_MEASUREMENT="en_CA.UTF-8"
LC_IDENTIFICATION="en_CA.UTF-8"
LC_ALL=
$ uname -a
Linux sengled-gpu-1 4.2.0-35-generic #40~14.04.1-Ubuntu SMP Fri Mar 18 16:37:35 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
$ df -h
Filesystem      Size  Used Avail Use% Mounted on
udev             32G  4.0K   32G   1% /dev
tmpfs           6.3G  1.9M  6.3G   1% /run
/dev/sda2       854G  578G  233G  72% /
none            4.0K     0  4.0K   0% /sys/fs/cgroup
none            5.0M     0  5.0M   0% /run/lock
none             32G   76K   32G   1% /run/shm
none            100M   56K  100M   1% /run/user
/dev/sda1       511M  3.4M  508M   1% /boot/efi

While I don’t understand them at all. :slight_smile:


(Xiaoyu Liu) #9

Hi I suddenly realized perhaps my version is not the latest one, because I installed it one day before I checked it again. Also I can’t use torch.__version__ , which seems proved the thought. Really sorry about the interruption!

But when I try to install the latest version, this error happens,

SSLError: [Errno 1] _ssl.c:510: error:14077410:SSL routines:SSL23_GET_SERVER_HELLO:sslv3 alert handshake failure

If you are experienced with it, can you tell me how to solve that? If not, I’ll figure out it somewhere else:smiley:


(Alexander Polis) #10

Hello,

do

pip install pyopenssl
pip install ndg-httpsclient
pip install pyasn1

and try installing the package again.

Best,
Alex


(Xiaoyu Liu) #11

Thanks! But after installing these, the problem can’t be solved yet… Then I refreshed the pytorch website to install again, it works. Magical…


#12

I fixed the commands on the website to not use https, that’s why it got fixed :slight_smile: