Hello, I’m trying to use torch.save
and torch.load
in my script.
Something strange happened when I saved a 800*3*480*640
FloatTensor A, and re-loaded it, all value after A[582][2] became 0. I believe 582
and 2
are two special numbers because I have tested this on another 1200*3*480*640
FloatTensor B, still all value of re-loaded B after B[582][2] became 0.
I’m confused by this, is there any restriction about using torch.save
and torch.load
? Here’s how I used them:
A = torch.from_numpy(A_array)
checkEmpty(A) # passed
torch.save(A, 'A_tensor')
A = torch.load('A_tensor')
checkEmpty(A) # failed
Then to find the first ZERO map:
for i in range(A.size()[0]):
for j in range(A.size()[1]):
if torch.max(A[i][j]) == 0.0:
print(i, j)
The first (i j)
is (582 2)
smth
March 17, 2017, 12:14am
2
are you on version 0.1.6
or earlier? we fixed a bug for very large tensors being serialized in 0.1.7
.
Thank you. But how to check the version of currently used Pytorch?
And I installed it through:
pip install https://download.pytorch.org/whl/cu75/torch-0.1.10.post2-cp27-none-linux_x86_64.whl
It looks like version 0.1.10
.
smth
March 17, 2017, 12:52am
4
What OS are you on?
I just tried this small snippet on Linux (CentOS7) and on OSX:
import torch
a = torch.ones(800*3*480*640)
print(a.eq(0).sum())
torch.save(a, 'a.pth')
b = torch.load('a.pth')
print(b.eq(0).sum())
On Linux it works fine, on OSX i get an error, which i am investigating:
0
Traceback (most recent call last):
File "a.py", line 5, in <module>
torch.save(a, 'a.pth')
File "/Users/soumith/code/pytorch/torch/serialization.py", line 120, in save
return _save(obj, f, pickle_module, pickle_protocol)
File "/Users/soumith/code/pytorch/torch/serialization.py", line 192, in _save
serialized_storages[key]._write_file(f)
RuntimeError: Unknown error: -1
Thank you, I’m working on Ubuntu 14.04.1
And by running your script, I got 200410112
as the result of
print(b.eq(0).sum())
smth
March 18, 2017, 3:02am
7
I tried it on an ubuntu 14.04 as well, but couldn’t reproduce the issue.
lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 14.04.2 LTS
Release: 14.04
Codename: trusty
However, the OSX failure is good, I am tracking it here and trying to find out the issue there: https://github.com/pytorch/pytorch/issues/1031
Can you tell me about your OS, do you have any locale set, or do you just use the EN locale?
You can find your current locale with the command locale
$ locale
LANG=en_US.UTF-8
LANGUAGE=
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=
I’m not sure locale matters, i am trying to eliminate variables.
Also, can you give me your kernel version with uname -a
:
$ uname -a
Linux fatbox 3.16.0-37-generic #51~14.04.1-Ubuntu SMP Wed May 6 15:23:14 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux
And lastly, can you check if you have enough free-space on your machine? df -h
will give the answer:
$ df -h
Filesystem Size Used Avail Use% Mounted on
**/dev/sda2 355G 302G 35G 90% /**
none 4.0K 0 4.0K 0% /sys/fs/cgroup
udev 5.9G 4.0K 5.9G 1% /dev
tmpfs 1.2G 1.7M 1.2G 1% /run
none 5.0M 0 5.0M 0% /run/lock
none 5.9G 124M 5.8G 3% /run/shm
none 100M 152K 100M 1% /run/user
/dev/sda4 96M 29M 68M 30% /boot/efi
/dev/sdb2 2.7T 2.0T 609G 77% /media/hdd2
Sorry for the late, I used np.save
and np.load
to solve the problem in the end. And I just tried torch.save
and torch.load
again, still the same problem. I ran the commands you mentioned, the results are:
$ lab_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 14.04.4 LTS
Release: 14.04
Codename: trusty
$ locale
LANG=en_CA.UTF-8
LANGUAGE=en_CA:en
LC_CTYPE="en_CA.UTF-8"
LC_NUMERIC="en_CA.UTF-8"
LC_TIME="en_CA.UTF-8"
LC_COLLATE="en_CA.UTF-8"
LC_MONETARY="en_CA.UTF-8"
LC_MESSAGES="en_CA.UTF-8"
LC_PAPER="en_CA.UTF-8"
LC_NAME="en_CA.UTF-8"
LC_ADDRESS="en_CA.UTF-8"
LC_TELEPHONE="en_CA.UTF-8"
LC_MEASUREMENT="en_CA.UTF-8"
LC_IDENTIFICATION="en_CA.UTF-8"
LC_ALL=
$ uname -a
Linux sengled-gpu-1 4.2.0-35-generic #40~14.04.1-Ubuntu SMP Fri Mar 18 16:37:35 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
$ df -h
Filesystem Size Used Avail Use% Mounted on
udev 32G 4.0K 32G 1% /dev
tmpfs 6.3G 1.9M 6.3G 1% /run
/dev/sda2 854G 578G 233G 72% /
none 4.0K 0 4.0K 0% /sys/fs/cgroup
none 5.0M 0 5.0M 0% /run/lock
none 32G 76K 32G 1% /run/shm
none 100M 56K 100M 1% /run/user
/dev/sda1 511M 3.4M 508M 1% /boot/efi
While I don’t understand them at all.
Hi I suddenly realized perhaps my version is not the latest one, because I installed it one day before I checked it again. Also I can’t use torch.__version__
, which seems proved the thought. Really sorry about the interruption!
But when I try to install the latest version, this error happens,
SSLError: [Errno 1] _ssl.c:510: error:14077410:SSL routines:SSL23_GET_SERVER_HELLO:sslv3 alert handshake failure
If you are experienced with it, can you tell me how to solve that? If not, I’ll figure out it somewhere else:smiley:
apolis
(Alexander Polis)
March 21, 2017, 1:09am
10
Hello,
do
pip install pyopenssl
pip install ndg-httpsclient
pip install pyasn1
and try installing the package again.
Best,
Alex
Thanks! But after installing these, the problem can’t be solved yet… Then I refreshed the pytorch website to install again, it works. Magical…
smth
March 21, 2017, 9:09pm
12
I fixed the commands on the website to not use https
, that’s why it got fixed
3 Likes