error " "baddbmm_with_gemm" not implemented for 'Half' " when running scaled_dot_product_attention

Hello, I’m new to Pytorch and learning through the documentation.
The pytorch version is the CPU version on Windows 10 python 3. My laptop only has an Intel Iris graphic card.

When I was learning scaled dot product attention on:
I was trying to reproduce the example on this page. Since my pytorch is not a cuda version so I typed the following on my Jupyter:

query = torch.rand(32, 8, 128, 64, dtype=torch.float16)
key = torch.rand(32, 8, 128, 64, dtype=torch.float16)
value = torch.rand(32, 8, 128, 64, dtype=torch.float16)
atten1 = F.scaled_dot_product_attention(query,key,value)

And I get the following error:
RuntimeError Traceback (most recent call last)
~\AppData\Local\Temp\ipykernel_15000\ in
2 key = torch.rand(32, 8, 128, 64, dtype=torch.float16)
3 value = torch.rand(32, 8, 128, 64, dtype=torch.float16)
----> 4 atten1 = F.scaled_dot_product_attention(query,key,value)

RuntimeError: “baddbmm_with_gemm” not implemented for ‘Half’

Any help is greatly appreciated. Thanks!


PyTorch version: 2.0.1+cpu
Is debug build: False
CUDA used to build PyTorch: None
ROCM used to build PyTorch: N/A

OS: Microsoft Windows 10 Enterprise
GCC version: Could not collect
Clang version: Could not collect
CMake version: Could not collect
Libc version: N/A

Python version: 3.9.13 (main, Aug 25 2022, 23:51:50) [MSC v.1916 64 bit (AMD64)] (64-bit runtime)
Python platform: Windows-10-10.0.19044-SP0
Is CUDA available: False
CUDA runtime version: No CUDA
GPU models and configuration: No CUDA
Nvidia driver version: No CUDA
cuDNN version: No CUDA
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

Name=11th Gen Intel(R) Core™ i5-1145G7 @ 2.60GHz

Versions of relevant libraries:
[pip3] flake8==4.0.1
[pip3] mypy-extensions==0.4.3
[pip3] numpy==1.21.5
[pip3] numpydoc==1.4.0
[pip3] torch==2.0.1
[pip3] torchaudio==2.0.2
[pip3] torchvision==0.15.2
[conda] blas 1.0 mkl
[conda] mkl 2021.4.0 haa95532_640
[conda] mkl-service 2.4.0 py39h2bbff1b_0
[conda] mkl_fft 1.3.1 py39h277e83a_0
[conda] mkl_random 1.2.2 py39hf11a4ad_0
[conda] numpy 1.21.5 py39h7a0a035_3
[conda] numpy-base 1.21.5 py39hca35cd5_3
[conda] numpydoc 1.4.0 py39haa95532_0
[conda] torch 2.0.1 pypi_0 pypi
[conda] torchaudio 2.0.2 pypi_0 pypi
[conda] torchvision 0.15.2 pypi_0 pypi

You need a GPU to use half precision (torch.float16). If you keep the dtype=torch.float (default) it will work fine.

Hello Suraj,

Thanks for your reply! I noticed that I should specify:
device = “cuda” if torch.cuda.is_available() else “cpu”
and specify: device = device in the definition of query, key and value. After that, it works.