I want to train a model from a folder of (hundreds of) .csv
files. How can I load this data and feed it to my model without loading all of it into memory at once?
1 Like
You can define a class, and for every step, you just read the data you need with the dataloader. For more information, you can read the tutorial below:
https://pytorch.org/tutorials/beginner/data_loading_tutorial.html
Best wishes.
Thanks Jindong, I was reading through that tutorial, however is it possible to do something like:
def get_data(path):
df = pd.read_csv(path)
return df.as_matrix
data_sets = datasets.DatasetFolder(path_to_datasets,
loader=get_data, extensions=['.csv'])
train_loader = torch.utils.data.DataLoader(data_sets,
batch_size=32,
shuffle=False,
num_workers=4)
to read from all .csv
files in the folder and train the model? (this gives an error, telling me data_sets is a method, but is anything along these line possible?).
Hi,
That is because you forgot the () after the as_matrix.
Try
df.as_matrix()
Best wishes.