For example, `torch.rand(10,10).round().bool().element_size()`

returns 1 byte (not 0.25 bytes = 1 bit).

Is 1 byte PyTorch’s smallest quantization amount?

For example, `torch.rand(10,10).round().bool().element_size()`

returns 1 byte (not 0.25 bytes = 1 bit).

Is 1 byte PyTorch’s smallest quantization amount?

This is a pretty good answer:

In PyTorch,

`torch.bool`

elements occupy 1 byte (8 bits) instead of 1 bit due to the way memory is managed in modern computing architectures. There are several reasons for this design choice:

Alignment and Access Efficiency:

- Modern processors are optimized for accessing data aligned to byte boundaries. Accessing single bits would require additional operations to extract the bit from a byte, which would slow down computation.
- Byte-aligned data ensures that memory accesses are efficient and can be performed quickly.

Simplicity in Implementation:

- Using 1 byte per boolean simplifies the implementation of tensor operations. Bitwise operations on arrays of bits would require more complex logic to handle individual bit manipulation.
- Storing booleans as bytes allows the use of existing memory and data handling mechanisms without needing special cases for bit-level operations.

Compatibility:

- Many libraries and hardware interfaces expect data to be byte-aligned. Using 1 byte per boolean ensures compatibility with other systems and libraries.
- This design choice aligns with how other programming languages and libraries often handle boolean arrays, providing consistency across different tools and environments.
## Example

To illustrate the memory usage difference, here is an example with boolean and

`uint8`

tensors:`import torch # Boolean tensor representing 16 bits bool_vector = torch.tensor([True, True, False, False, True, False, True, False, False, True, True, True, False, True, True, False], dtype=torch.bool) print("Boolean tensor:", bool_vector) print("Size of boolean tensor elements:", bool_vector.element_size()) print("Total memory size of boolean tensor:", bool_vector.element_size() * bool_vector.nelement(), "bytes") # `uint8` tensor representing the same 16 bits uint8_vector = torch.tensor([0b11001010, 0b01110110], dtype=torch.uint8) print("\nuint8 tensor:", uint8_vector) print("Size of uint8 tensor elements:", uint8_vector.element_size()) print("Total memory size of uint8 tensor:", uint8_vector.element_size() * uint8_vector.nelement(), "bytes")`

## Output

`Boolean tensor: tensor([ True, True, False, False, True, False, True, False, False, True, True, True, False, True, True, False]) Size of boolean tensor elements: 1 Total memory size of boolean tensor: 16 bytes uint8 tensor: tensor([202, 118], dtype=torch.uint8) Size of uint8 tensor elements: 1 Total memory size of uint8 tensor: 2 bytes`

In this example, the boolean tensor uses 16 bytes of memory (one byte per boolean), while the

`uint8`

tensor uses only 2 bytes to store the same number of bits.## Conclusion

The choice to use 1 byte per

`torch.bool`

element is a trade-off between memory efficiency and practical performance considerations. By using byte-aligned memory, PyTorch ensures that tensor operations remain fast and compatible with modern hardware and software ecosystems.

FWIW this is also true for bools in C++ for that matter, you can bitpack 8 bools into a single byte though https://www.youtube.com/watch?v=LRJclCVrvQI