# Why does each torch.bool value take up an entire byte?

``````In : import torch
In : t = ~torch.empty(100000000, dtype=torch.bool, device="cuda")
In : print(torch.cuda.memory_allocated())
100663296
In : print(100663296/100000000)
1.00663296
``````

As shown above, it looks like a large `torch.bool` tensor will take up roughly 1 byte per item (at least on CUDA. I don’t know how to measure this on CPU).

Why is that? I would have assumed that a bool needs only 1 bit to represent, not 8 (making up a byte).

The size of a bool is implementation dependent, and I think 1 byte is a relatively common choice. See also: torch.cuda.BoolTensor uses 8 bits per element, not 1 bit as reported by element_size() · Issue #41571 · pytorch/pytorch · GitHub

As a quick summary (and to verify I got this right), looks like this is currently implemented as-is to facilitate in-memory addressing, but some experts are already looking into fixing this, using features developed for newer 2-bit/4-bit dtypes.

In the mean time, as a workaround, given that PyTorch has a set of bitwise logical operator that can work with uint8, we could conceivably re-organize a 8*N sized bool tensor into a N sized uint8 tensor with the following benefits:

1. Using less memory
2. More efficient computations (maybe?)

And the following cons:

1. Indexing/slicing the compressed tensor at the sub-byte level is complicated without de-compression
2. Some more complex operations like masking will not be supported

As for the conversion method, my current best idea would be:

• converting from “uint8bool” to bool: right shift the values according to indexes, and check for 1 mod 2
• converting from bool to “uint8bool”: convert bool to int, left shift according to index, and sum
``````In : uint8_tensor = torch.randint(0, 255, (1, 2), dtype=torch.uint8)
In : uint8_tensor
tensor([[67, 27]], dtype=torch.uint8)

In : indexes = torch.arange(8).unsqueeze(0).unsqueeze(0)
In : indexes
tensor([[[0, 1, 2, 3, 4, 5, 6, 7]]])

In : bool_tensor = bool_tensor = (uint8_tensor.unsqueeze(-1)>>indexes)%2==1
In : bool_tensor
tensor([[[ True,  True, False, False, False, False,  True, False],
[ True,  True, False,  True,  True, False, False, False]]])

In : reconverted = (bool_tensor.byte() << indexes).sum(dim=-1)
In : reconverted
tensor([[67, 27]])

In : uint8_tensor[0,0] | uint8_tensor[0,1]
tensor(91, dtype=torch.uint8)
In : 1+2+8+16+64
91
In : uint8_tensor[0,0] & uint8_tensor[0,1]
tensor(3, dtype=torch.uint8)
``````

The “uint8bool” has the most significant bit at the end, which might not be what most people prefer. In that case, inverting `indexes` should be enough.