Pytorch tensor division hangs in multiprocess linux


I’m experiencing a hanging in process. it happens at the following codes

samping_priorities = (self.priority_memory[0:upper_bound] / self.priority_memory[0:upper_bound].sum()).cpu().detach().numpy()

batch_idx = T.LongTensor(np.random.choice(upper_bound, batch_size,p=samping_priorities[0:upper_bound].cpu().detach().numpy()))

samping_priorities is a 2000000*1 tensor.
upper_bound is the range I’m interested in and upper_bound+=1 through iterations

at the beginning everything is okay. then I noticed when upper_bound exceeds 32768, the process hangs between the first line and second line

It works fine on my windows workstation but hangs in linux cluster. What could be the cause and how can i fix it

This sounds like a int16_t overflow bug (or that might hit some different branch). Could you please create an issue in pytorch repo? Thanks!