Quadruple precision complex float

I am dealing with very large and small numbers that can sometimes overflow/underflow a torch.dtype.complex128. Specifically, I am computing something resembling a discrete Laplace transform over a large window

t = torch.arange(10_000).reshape(1, -1)
s = torch.rand(torch.zeros(5, 1), 1.0)
torch.cumsum(f(x) * torch.exp(-t * s), dim=-1)

I’m aware that torch does not currently support quadruple precision floats. What would be the easiest way to do this operation with autograd support? Would I need to write it up in CUDA – does CUDA even support complex quads? Should I do some trickery and explicitly represent the mantissa and exponent as torch floats, rewriting basic operations like addition/multiplication?

What options do I have?