ByteTensor casting boolean to underlying byte?

naifrec · June 7, 2018, 12:59pm

Describe the issue

If you use torch.ByteTensor as a constructor and pass it a numpy array with dtype np.bool, the returned array contains the value of the underlying byte, not 0 and 1’s (as expected right?). Am I missing part of the picture?

Reproduce the issue

python 3.6, torch 0.3.1

In [7]: array = np.random.rand(30) > .5

In [8]: torch.ByteTensor(array)
Out[8]: 

   0
   0
   0
   0
   0
   0
   0
  16
 103
 209
 153
 179
 255
   7
   0
  80
   2
   0
  78
  55
 251
 127
   0
   0
 237
 204
 134
 124
  63
  63
[torch.ByteTensor of size 30]

In [9]: array
Out[9]: 
array([ True, False,  True,  True,  True,  True,  True, False,  True,
        True,  True,  True,  True, False, False,  True,  True, False,
        True,  True,  True,  True, False,  True, False,  True,  True,
       False, False,  True])

albanD · June 7, 2018, 1:03pm

Hi,

You should use torch.from_numpy() to create tensor from numpy arrays to avoid such issues.

naifrec · June 7, 2018, 1:05pm

I understand that this is a solution, but do we really want to keep such a behavior? It can never be intended, and it is unclear from the documentation that torch.ByteTensor should not be used this way. Instead of doing unsafe casting, should we just raise an error when passing a boolean array to torch.ByteTensor?

albanD · June 7, 2018, 1:11pm

Hi,

The problem is that numpy array behave like a sequence. So if your function supports things like torch.ByteTensor([0, 1, 2]), then the numpy array will be converted to look like a sequence and a Tensor will be created from it. Unfortunately, during this conversion, types can change as it’s going through python objects.
I’m not sure we can easily change this behavior.