Problems with training loop for a CNN that use a dataset of .npy files

Arcano97 · December 1, 2021, 12:26am

I’m trying to train a CNN that uses a set of .npy files that describes images. I load my files using DatasetFolder for label my files with the name of the subdirectories, similar to the training that uses Cat vs Dogs, but with astronomical images.

def npy_loader(path):
    sample = torch.from_numpy(np.load(path))
    return sample
    
#np.load(os.path.join(self.directory, fname)) np.load(path)

train_ds = datasets.DatasetFolder(
    root='/content/NPYTrainData_S1617_S1618',
    loader=npy_loader,
    extensions=('.npy')
)

test_ds = datasets.DatasetFolder(
    root='/content/NPYTestData_17185',
    loader=npy_loader,
    extensions=('.npy')
)```

 When trying to use a similar training loop like in the PyTorch documentation, the given error is this:

ValueError: given numpy array has byte order different from the native byte order. Conversion between byte orders is currently not supported.

The loop is this:

for epoch in range(2): # loop over the dataset multiple times

running_loss = 0.0
for i, data in enumerate(train_loader, 0):
    # get the inputs; data is a list of [inputs, labels]
    inputs, labels = data

    # zero the parameter gradients
    optimizer.zero_grad()

    # forward + backward + optimize
    outputs = net(inputs)
    loss = criterion(outputs, labels)
    loss.backward()
    optimizer.step()

    # print statistics
    running_loss += loss.item()
    if i % 2000 == 1999:    # print every 2000 mini-batches
        print('[%d, %5d] loss: %.3f' %
              (epoch + 1, i + 1, running_loss / 2000))
        running_loss = 0.0

print(‘Finished Training’)


I'm barely new with all this stuff of programming and don't know what's going on. Any advice would be appreciated!

ptrblck · December 1, 2021, 9:20am

Based on the raised error message I think the endianness of the loaded numpy array isn’t matching the default value (little-endian if I’m not mistaken).
Trigger a copy on the numpy array to use the default endian byte order and it should work:

def npy_loader(path):
    arr = np.load(path)
    arr = arr.astype(np.float32)
    sample = torch.from_numpy(arr)
    return sample

Arcano97 · December 3, 2021, 9:18am

Hi, @ptrblck !
Thanks for the response, the solution that you give me was great to solve the problem of the endianness, but now the problem is that my tensors do not have the same dimensions, the following error is given to me (with some variations in the ‘‘at entry n’’):

RuntimeError: stack expects each tensor to be equal size, but got [126, 126] at entry 0 and [46, 126] at entry 11

For solve that, I modify the code to this:

def npy_loader(path):
    arr_load = np.load(path)
    arr = np.resize(arr_load, (126, 126))
    arr = arr.astype(np.float32)
    sample = torch.from_numpy(arr)
    return sample

But when I run the code again, the following error emerges:

Expected 4-dimensional input for 4-dimensional weight [6, 3, 5, 5], but got 3-dimensional input of size [64, 126, 126] instead

Well, my next shot was modify again the code of the npy_loader to this:

def npy_loader(path):
    arr_load = np.load(path)
    arr = np.resize(arr_load, (3, 126, 126))
    arr = arr.astype(np.float32)
    sample = torch.from_numpy(arr)
    return sample

Using np.resize(arr_load, (3, 126, 126)) to solve the problem with the three channels in ‘‘weight [6, 3, 5, 5]’’, but again, another error raised:

RuntimeError Traceback (most recent call last)
in ()
10
11 # forward + backward + optimize
—> 12 outputs = net(inputs)
13 loss = criterion(outputs, labels)
14 loss.backward()

4 frames
/usr/local/lib/python3.7/dist-packages/torch/nn/functional.py in linear(input, weight, bias)
1846 if has_torch_function_variadic(input, weight, bias):
1847 return handle_torch_function(linear, (input, weight, bias), input, weight, bias=bias)
→ 1848 return torch._C._nn.linear(input, weight, bias)
1849
1850

RuntimeError: mat1 and mat2 shapes cannot be multiplied (64x12544 and 400x120)

I’m afraid I’m stuck again, your help would be super appreciated, thanks a lot!

ptrblck · December 3, 2021, 11:52am

This error:

Expected 4-dimensional input for 4-dimensional weight [6, 3, 5, 5], but got 3-dimensional input of size [64, 126, 126] instead

is raised because your samples seem to be missing a channel dimension.
In your next approach it seems you were trying to clone the data and create 3 input channels, which then was running into a shape mismatch in a linear layer:

RuntimeError: mat1 and mat2 shapes cannot be multiplied (64x12544 and 400x120)

Based on this error a linear layer is using in_features=400 while your input activation to this layer has 12544 features. Set the in_features argument to 12544 and rerun your code.