Hello. I am currently attempting to run a full autocast-ed forward and backward pass over a mostly sparse model (the sparse weights are utilized each in only torch.sparse.mm() calls). However, if I store the weights in coo format, matrix multiplication simply isn’t implemented (at half precision), and for csr, the backward fails without a proper error message:
RuntimeError: sampled_addmm: Expected mat1 and mat2 to have the same dtype, but got Half and Float
Is there anything I can do to utilize autocast with this setup? Note that full-precision sparse training and half-precision dense training both work.
A simple reproduction of the printed error is below.
weight = torch.rand(5, 4, device = "cuda", dtype = torch.float32)
weight = (weight * (weight < 0.2)).to_sparse_csr().requires_grad_(True)
inp = torch.rand(4, 3, device = "cuda", dtype = torch.float32, requires_grad = False)
with torch.autocast("cuda", dtype = torch.float16, enabled = True):
loss = torch.sparse.mm(weight, inp).sum()
loss.backward()
When autocast’s enabled=False, it works fine. When it is formatted to coo instead, it fails on the matmul.
cc @ptrblck