Matrices manipulation in pytorch


I am trying to implement a loss function which gets a Ground-truth vector, which is valid in some points and invalid in others (invalid gets -1), and use this data to “squeeze” an output matrix.
in my example the GT vector is in the dimension of 200 numbers per image, meaning 4x200 for a 4 images batch, and the network output has the dimension of 4x68x200.

My current implementation, which is not working, looks like:

relvancy_mask =                                                    # set a 0-1 mask
nonz_indices =  torch.nonzero(              #get the relevant indices
masked_output = torch.index_select(output,2,nonz_indices)

But it is not working - the change from the 0/1 mask to the indices does not work as nicely as it did in the little experiment I made outside - from a 0/1 mask of [torch.cuda.ByteTensor of size 4x200 (GPU 0)] - I get a [torch.cuda.LongTensor of size 591x2 (GPU 0)] - 591 - I guess it squeezed together the ones out of the 800, but I don’t understand the meaning of the 2 columns, and surprisingly - the last number of the nonz_indices is 198, which makes me suspicious.

Any idea what can I do to solve this issue?