How to calculate the product along a dimmension efficiently on a sparse matrix

For example, I have 400*800 tensor A and about 10 non-zero elements for each row. Here’s how I do it:

A[A==0] = 1
result =, dim=1)

I think it’s not very efficient because there are many redundant multiplications. Is there a faster way to do that on GPU?