How should I organize my dataset to use it as training data in PyTorch?

VictorVidigal · October 31, 2024, 12:21pm

Hi everyone,

I would appreciate some tips on organizing my dataset or guidance on using it as input for my model.

The dataset contains intraday stock values, and I want to extract input and output data for model training as shown in the figure:

Do I need to reorganize my data to use it as a parameter for my model? If so, what should the data format look like?
If reorganizing isn’t necessary, how can I load this data as PyTorch tensors to pass it to my model?

ptrblck · November 2, 2024, 12:32am

Based on the screenshot you could try to load and process the data with e.g. pandas. Once done you could create PyTorch tensors by reusing the numpy arrays used in pandas via torch.from_numpy.
Depending on your use case, the standard inputs to a model would use a floating point format or integers if you want to use e.g. an nn.Embedding layer first.