Formatting Dataset and labels (Dataframes). Please Help:)

Benjamin_Spiegel · September 9, 2021, 6:18am

Hi,
I am new to PyTorch and it is unclear to me how I should format my dataset before feeding it into a model. In the videos and most of the postings I have found, datasets used were images, I am working with numerical dataframes to make an RNN. Currently I have a list of numpy arrays representing each seperate dataframe and a corresponding sequenced list of targets/ labels, each target corresponds to one dataframe. What is the best way to turn this data into a dataset for pytorch ?. Should I use a dictionary with keys of targets and values of arrays? Or Should I use lists as I currently have the data arranged? If I do use lists is there a specifc function I will want to call? Also will I want to turn all of the dataset into one tensor? will I want to turn each dataframe into a seperate dataframe? Or will I want to create batches? Lastly, if I turn each dataframe into it’s own tensor, what should I do with the corresponding target? I believe it takes more than one number to make a tensor. I imagine that this question is for beginners, but I have looked around for answers to this questions extensively and I don’t think there is much in the way of detailed explanations for this. I would be very appreciative of any answers you could give. Thanks!

andreys42 · September 9, 2021, 2:20pm

What kind of information is in each numpy array or dataframe? Why You chose RNN for that? If you have classification problem I would recommend to test class ML algorithms like decision trees, forest, gradient boosting over algorithms