error " "baddbmm_with_gemm" not implemented for 'Half' " when running scaled_dot_product_attention

Hello, I’m new to Pytorch and learning through the documentation.
The pytorch version is the CPU version on Windows 10 python 3. My laptop only has an Intel Iris graphic card.

When I was learning scaled dot product attention on:
https://pytorch.org/docs/stable/generated/torch.nn.functional.scaled_dot_product_attention.html
I was trying to reproduce the example on this page. Since my pytorch is not a cuda version so I typed the following on my Jupyter:

query = torch.rand(32, 8, 128, 64, dtype=torch.float16)
key = torch.rand(32, 8, 128, 64, dtype=torch.float16)
value = torch.rand(32, 8, 128, 64, dtype=torch.float16)
atten1 = F.scaled_dot_product_attention(query,key,value)

And I get the following error:
RuntimeError Traceback (most recent call last)
~\AppData\Local\Temp\ipykernel_15000\311139239.py in
2 key = torch.rand(32, 8, 128, 64, dtype=torch.float16)
3 value = torch.rand(32, 8, 128, 64, dtype=torch.float16)
----> 4 atten1 = F.scaled_dot_product_attention(query,key,value)

RuntimeError: “baddbmm_with_gemm” not implemented for ‘Half’

Any help is greatly appreciated. Thanks!

Versions

PyTorch version: 2.0.1+cpu
Is debug build: False
CUDA used to build PyTorch: None
ROCM used to build PyTorch: N/A

OS: Microsoft Windows 10 Enterprise
GCC version: Could not collect
Clang version: Could not collect
CMake version: Could not collect
Libc version: N/A

Python version: 3.9.13 (main, Aug 25 2022, 23:51:50) [MSC v.1916 64 bit (AMD64)] (64-bit runtime)
Python platform: Windows-10-10.0.19044-SP0
Is CUDA available: False
CUDA runtime version: No CUDA
CUDA_MODULE_LOADING set to: N/A
GPU models and configuration: No CUDA
Nvidia driver version: No CUDA
cuDNN version: No CUDA
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

CPU:
Architecture=9
CurrentClockSpeed=2611
DeviceID=CPU0
Family=205
L2CacheSize=5120
L2CacheSpeed=
Manufacturer=GenuineIntel
MaxClockSpeed=2611
Name=11th Gen Intel(R) Core™ i5-1145G7 @ 2.60GHz
ProcessorType=3
Revision=

Versions of relevant libraries:
[pip3] flake8==4.0.1
[pip3] mypy-extensions==0.4.3
[pip3] numpy==1.21.5
[pip3] numpydoc==1.4.0
[pip3] torch==2.0.1
[pip3] torchaudio==2.0.2
[pip3] torchvision==0.15.2
[conda] blas 1.0 mkl
[conda] mkl 2021.4.0 haa95532_640
[conda] mkl-service 2.4.0 py39h2bbff1b_0
[conda] mkl_fft 1.3.1 py39h277e83a_0
[conda] mkl_random 1.2.2 py39hf11a4ad_0
[conda] numpy 1.21.5 py39h7a0a035_3
[conda] numpy-base 1.21.5 py39hca35cd5_3
[conda] numpydoc 1.4.0 py39haa95532_0
[conda] torch 2.0.1 pypi_0 pypi
[conda] torchaudio 2.0.2 pypi_0 pypi
[conda] torchvision 0.15.2 pypi_0 pypi

You need a GPU to use half precision (torch.float16). If you keep the dtype=torch.float (default) it will work fine.

Hello Suraj,

Thanks for your reply! I noticed that I should specify:
device = “cuda” if torch.cuda.is_available() else “cpu”
and specify: device = device in the definition of query, key and value. After that, it works.