For example, torch.rand(10,10).round().bool().element_size()
returns 1 byte (not 0.25 bytes = 1 bit).
Is 1 byte PyTorch’s smallest quantization amount?
For example, torch.rand(10,10).round().bool().element_size()
returns 1 byte (not 0.25 bytes = 1 bit).
Is 1 byte PyTorch’s smallest quantization amount?
This is a pretty good answer:
In PyTorch,
torch.bool
elements occupy 1 byte (8 bits) instead of 1 bit due to the way memory is managed in modern computing architectures. There are several reasons for this design choice:
Alignment and Access Efficiency:
- Modern processors are optimized for accessing data aligned to byte boundaries. Accessing single bits would require additional operations to extract the bit from a byte, which would slow down computation.
- Byte-aligned data ensures that memory accesses are efficient and can be performed quickly.
Simplicity in Implementation:
- Using 1 byte per boolean simplifies the implementation of tensor operations. Bitwise operations on arrays of bits would require more complex logic to handle individual bit manipulation.
- Storing booleans as bytes allows the use of existing memory and data handling mechanisms without needing special cases for bit-level operations.
Compatibility:
- Many libraries and hardware interfaces expect data to be byte-aligned. Using 1 byte per boolean ensures compatibility with other systems and libraries.
- This design choice aligns with how other programming languages and libraries often handle boolean arrays, providing consistency across different tools and environments.
Example
To illustrate the memory usage difference, here is an example with boolean and
uint8
tensors:import torch # Boolean tensor representing 16 bits bool_vector = torch.tensor([True, True, False, False, True, False, True, False, False, True, True, True, False, True, True, False], dtype=torch.bool) print("Boolean tensor:", bool_vector) print("Size of boolean tensor elements:", bool_vector.element_size()) print("Total memory size of boolean tensor:", bool_vector.element_size() * bool_vector.nelement(), "bytes") # `uint8` tensor representing the same 16 bits uint8_vector = torch.tensor([0b11001010, 0b01110110], dtype=torch.uint8) print("\nuint8 tensor:", uint8_vector) print("Size of uint8 tensor elements:", uint8_vector.element_size()) print("Total memory size of uint8 tensor:", uint8_vector.element_size() * uint8_vector.nelement(), "bytes")
Output
Boolean tensor: tensor([ True, True, False, False, True, False, True, False, False, True, True, True, False, True, True, False]) Size of boolean tensor elements: 1 Total memory size of boolean tensor: 16 bytes uint8 tensor: tensor([202, 118], dtype=torch.uint8) Size of uint8 tensor elements: 1 Total memory size of uint8 tensor: 2 bytes
In this example, the boolean tensor uses 16 bytes of memory (one byte per boolean), while the
uint8
tensor uses only 2 bytes to store the same number of bits.Conclusion
The choice to use 1 byte per
torch.bool
element is a trade-off between memory efficiency and practical performance considerations. By using byte-aligned memory, PyTorch ensures that tensor operations remain fast and compatible with modern hardware and software ecosystems.
FWIW this is also true for bools in C++ for that matter, you can bitpack 8 bools into a single byte though https://www.youtube.com/watch?v=LRJclCVrvQI