Trying to pass too many CPU scalars to CUDA kernel!

I would guess two 0dim Tensors. And so the kernel tries to pass both as direct arguments hence the issue.
I’m sure @ngimel will know?