Query regaring Dataloader in pytorch utils

satwik_bh · August 14, 2017, 12:28am

I have an input_matrix which is scipy verion of sparse matrix in csr format. It’s a binary representation and consists of only 1’s and 0’s.

> input_matrix
<1500x24995 sparse matrix of type '<type 'numpy.float32'>'
        with 1068434 stored elements in Compressed Sparse Row format>

I load it into a DataLoader using the below code:

cuda = torch.cuda.is_available()
kwargs = {'num_workers': 1, 'pin_memory': True} if cuda else {}

input_loader = DataLoader(input_matrix.toarray(), batch_size=32, shuffle=True, **kwargs)

Now when I check the input_loader in the interpreter, I see 0’s, 1’s and other values such as 2’s appearing.

> input_loader
    1     1     1  ...      0     0     0
    0     0     0  ...      0     0     0
    0     1     1  ...      0     0     0
       ...          ?          ...       
    0     2     2  ...      0     0     0
    0     0     0  ...      0     0     0
    1     1     1  ...      0     0     0
[torch.FloatTensor of size 32x24995]

If it helps, when I convert the csr_matrix into tensor using torch.from_numpy(input_matrix) I donot see values other than 0’s and 1’s.

    0     1     1  ...      0     0     0
    0     0     0  ...      0     0     0
    0     1     0  ...      0     0     0
       ...          ?          ...       
    1     1     1  ...      0     0     0
    0     0     0  ...      0     0     0
    1     1     1  ...      0     0     0
[torch.FloatTensor of size 1500x24995]

Is the method employed to load the data correct? If not can how to correctly load the data into dataloader.