shangeth
(Shangeth Rajaa)
February 24, 2019, 4:05pm
1
this is the shape of numpy array, (35628, 1, 16000)
i train/test split it
from sklearn.model_selection import train_test_split
x_train, x_test ,y_train, y_test = train_test_split(datan, label, test_size=0.2, shuffle=True, random_state=40)
(this works fine)
make dataset from split arrays
import torch.utils.data as utils
tensor_x = torch.stack([torch.Tensor(i) for i in list(x_train)])
tensor_y = torch.Tensor(y_train)
my_dataset = utils.TensorDataset(tensor_x, tensor_y)
trainloader = utils.DataLoader(my_dataset, batch_size = 1)
tensor_xte = torch.stack([torch.Tensor(i) for i in list(x_test)])
tensor_yte = torch.Tensor(y_test)
my_datasette = utils.TensorDataset(tensor_xte, tensor_yte)
testloader = utils.DataLoader(my_datasette, batch_size = 1)
ERROR HERE!!! (session crashes in google colab)
This works well for all other dataset, but its not working only for this dataset.
is it becoz the data is large?? if yes, how do i do it.
ptrblck
February 24, 2019, 9:08pm
2
Try to lower the size of your Dataset
and run the code again.
I haven’t used Colab that often, but do you get any error message?
To lower the size, try to slice both numpy arrays:
datan = datan[:100]
label = label[:100]
shangeth
(Shangeth Rajaa)
February 25, 2019, 12:41pm
3
hi @ptrblck , yeah it works normally with numpy array of shape(100,1,16000). How do i add large numpy to dataloader?
ptrblck
February 25, 2019, 1:10pm
4
It should work with any size as long as your system can handle it properly.
If you are using np.float32
values, the data should take approx 2GB
of RAM.
Are you using multiple workers in your DataLoaders
?
I’m not sure, what limitations Colab has on the RAM.
Also, could you try to use torch.from_numpy
instead of the list comprehension with torch.stack
?
The former approach would avoid a copy of the data.
1 Like
shangeth
(Shangeth Rajaa)
February 25, 2019, 1:31pm
5
I converted numpy array to float32 and used torch.from_numpy , it worked.
Thank you @ptrblck
1 Like