Session crashes! in making dataloader from numpy array

this is the shape of numpy array, (35628, 1, 16000)

  1. i train/test split it
    from sklearn.model_selection import train_test_split
    x_train, x_test ,y_train, y_test = train_test_split(datan, label, test_size=0.2, shuffle=True, random_state=40)
    (this works fine)

  2. make dataset from split arrays
    import torch.utils.data as utils

tensor_x = torch.stack([torch.Tensor(i) for i in list(x_train)])
tensor_y = torch.Tensor(y_train)
my_dataset = utils.TensorDataset(tensor_x, tensor_y)
trainloader = utils.DataLoader(my_dataset, batch_size = 1)

tensor_xte = torch.stack([torch.Tensor(i) for i in list(x_test)])
tensor_yte = torch.Tensor(y_test)
my_datasette = utils.TensorDataset(tensor_xte, tensor_yte)
testloader = utils.DataLoader(my_datasette, batch_size = 1)

ERROR HERE!!! (session crashes in google colab)

This works well for all other dataset, but its not working only for this dataset.

is it becoz the data is large?? if yes, how do i do it.

Try to lower the size of your Dataset and run the code again.
I haven’t used Colab that often, but do you get any error message?

To lower the size, try to slice both numpy arrays:

datan = datan[:100]
label = label[:100]

hi @ptrblck , yeah it works normally with numpy array of shape(100,1,16000). How do i add large numpy to dataloader?

It should work with any size as long as your system can handle it properly.
If you are using np.float32 values, the data should take approx 2GB of RAM.
Are you using multiple workers in your DataLoaders?
I’m not sure, what limitations Colab has on the RAM.

Also, could you try to use torch.from_numpy instead of the list comprehension with torch.stack?
The former approach would avoid a copy of the data.

1 Like

I converted numpy array to float32 and used torch.from_numpy , it worked.

Thank you @ptrblck

1 Like