Help needed: Specific function call causing 20X slowdown of computation

Sai_Kiran · February 25, 2020, 8:18pm

Hi, I am looking for help with the following code:

SRAM_rows = 256
step_size = 1
prob_table = torch.eye(SRAM_rows2+1).cuda()
levels = torch.tensor([i-SRAM_rows for i in range(SRAM_rows2+1)]).cuda().float()
cmprob = 0
Expected_outputs = torch.tensor([0]*257).cuda().float()

def quant_XNORSRAM(x, prob_table, levels, step_size, lower_bound):
x_ind = (x.type(torch.int64) - lower_bound) / step_size
num_levels = len(levels)
x_cdf = prob_table[x_ind, 0:num_levels-1].cumsum(dim=-1)
x_rand = torch.rand(x.shape, device=‘cuda:0’)
x_rand = torch.stack([x_rand] * (num_levels-1), dim=-1)
x_comp = (x_rand > x_cdf).type(torch.int64).sum(dim=-1)
#import pdb; pdb.set_trace()
#here if cmprob is set, we add the ideal value + the noise
if(cmprob):
    y = Expected_outputs[x_ind] + levels[x_comp]
else:
    y = levels[x_comp]
return y

Just by calling the quant_XNORSRAM function, my compute time is increased 20 times, can you please help me identify the issue causing this and how to fix it?

Thanks

dskhudia · February 25, 2020, 9:40pm

This doesn’t seem related to quantization. Computation time increased 20x wrt what baseline?

Sai_Kiran · February 25, 2020, 10:30pm

This is kind of quantization, as I am transforming X to Y using a specific quantization, the scenario is compute time increases 20 times if I use the function shown above compared to the case where I don’t use it (no quantization)

Zafar · March 3, 2020, 10:33pm

This is not related to PyTorch Quantization.
To get help faster, I think it is better to add a different tag. Maybe “CUDA”?

Meanwhile, @Sai_Kiran can you explain what is the 20x slow down compared to (as Daya asked: what’s the baseline)?

Also, CUDA doesn’t really benefit from quantization unless you are using TensorRT (see this for throughputs in CUDA)