I want to train a model from a folder of (hundreds of) .csv files. How can I load this data and feed it to my model without loading all of it into memory at once?

You can define a class, and for every step, you just read the data you need with the dataloader. For more information, you can read the tutorial below:

Thanks Jindong, I was reading through that tutorial, however is it possible to do something like:

def get_data(path):
    df = pd.read_csv(path)
    return df.as_matrix 

data_sets = datasets.DatasetFolder(path_to_datasets, 
                                   loader=get_data, extensions=['.csv'])
train_loader =,

to read from all .csv files in the folder and train the model? (this gives an error, telling me data_sets is a method, but is anything along these line possible?).


That is because you forgot the () after the as_matrix.



