Why are `torch.bool`'s elements 1 byte and not 1 bit?

Geremia · June 8, 2024, 8:58pm

For example, torch.rand(10,10).round().bool().element_size() returns 1 byte (not 0.25 bytes = 1 bit).

Is 1 byte PyTorch’s smallest quantization amount?

Geremia · June 8, 2024, 10:34pm

This is a pretty good answer:

In PyTorch, torch.bool elements occupy 1 byte (8 bits) instead of 1 bit due to the way memory is managed in modern computing architectures. There are several reasons for this design choice:

Alignment and Access Efficiency:

Modern processors are optimized for accessing data aligned to byte boundaries. Accessing single bits would require additional operations to extract the bit from a byte, which would slow down computation.

Byte-aligned data ensures that memory accesses are efficient and can be performed quickly.

Simplicity in Implementation:

Using 1 byte per boolean simplifies the implementation of tensor operations. Bitwise operations on arrays of bits would require more complex logic to handle individual bit manipulation.

Storing booleans as bytes allows the use of existing memory and data handling mechanisms without needing special cases for bit-level operations.

Compatibility:

Many libraries and hardware interfaces expect data to be byte-aligned. Using 1 byte per boolean ensures compatibility with other systems and libraries.

This design choice aligns with how other programming languages and libraries often handle boolean arrays, providing consistency across different tools and environments.

Example

To illustrate the memory usage difference, here is an example with boolean and uint8 tensors:
import torch

# Boolean tensor representing 16 bits
bool_vector = torch.tensor([True, True, False, False, True, False, True, False,
                            False, True, True, True, False, True, True, False], dtype=torch.bool)

print("Boolean tensor:", bool_vector)
print("Size of boolean tensor elements:", bool_vector.element_size())
print("Total memory size of boolean tensor:", bool_vector.element_size() * bool_vector.nelement(), "bytes")

# `uint8` tensor representing the same 16 bits
uint8_vector = torch.tensor([0b11001010, 0b01110110], dtype=torch.uint8)

print("\nuint8 tensor:", uint8_vector)
print("Size of uint8 tensor elements:", uint8_vector.element_size())
print("Total memory size of uint8 tensor:", uint8_vector.element_size() * uint8_vector.nelement(), "bytes")
Output
Boolean tensor: tensor([ True,  True, False, False,  True, False,  True, False, False,  True,  True,  True, False,  True,  True, False])
Size of boolean tensor elements: 1
Total memory size of boolean tensor: 16 bytes

uint8 tensor: tensor([202, 118], dtype=torch.uint8)
Size of uint8 tensor elements: 1
Total memory size of uint8 tensor: 2 bytes
In this example, the boolean tensor uses 16 bytes of memory (one byte per boolean), while the uint8 tensor uses only 2 bytes to store the same number of bits.

Conclusion

The choice to use 1 byte per torch.bool element is a trade-off between memory efficiency and practical performance considerations. By using byte-aligned memory, PyTorch ensures that tensor operations remain fast and compatible with modern hardware and software ecosystems.

marksaroufim · June 8, 2024, 10:49pm

FWIW this is also true for bools in C++ for that matter, you can bitpack 8 bools into a single byte though https://www.youtube.com/watch?v=LRJclCVrvQI