Working on pickle file with custom data

eranb · January 11, 2021, 2:58pm

I create custom data with torchvision datasets and I have a pickle file that I want to load.
The size of the pickle file is 79600x1x30 (batch, class, len of the vector).
I work with the getitem function, I tried to divide this pickle file, but it didn’t work.
There is some function that slices the pickle file or takes a batch from him?

ptrblck · January 19, 2021, 9:40am

I assume you are loading a tensor or e.g. numpy array from the pickle file?
If so, what kind of error are you seeing in the __getitem__ by indexing this object?

eranb · January 20, 2021, 1:33pm

The size of my pickle file is 19700x4
I’m trying to work with 32-batch, but it does not divide the size of the pickle (row), so it takes all the rows, e.g: (32, 19700, 4). and it needs to be (32,4).
I don’t know where I’m wrong I think I need to define the sampler.

code:
class ROboDataset(Dataset):
def init(self, root, path, train=True, transform=None):
self.root = root
self.path = path
self.train = train
self.transform = transform

def __getitem__(self, index):
    root = self.root
    path = self.path
    data = pickle.load(file=open(os.path.join(root, path), "rb"))


    features = data[0]
    target = data[1]

    if self.transform is not None:
        features = self.transform(features)

    return features, target

def __len__(self):
    data = pickle.load(file=open(os.path.join(self.root, self.path), "rb"))
    return len(data[0])

ptrblck · January 21, 2021, 5:57am

In your code snippet you are not using the index, but load the complete pickle file and split it into the features and target tensors. I assume you could load the file in the __init__ and index it in the __getitem__ using the passed index value.

eranb · January 21, 2021, 6:28am

Yes, I tried not to load it on init, but I see there is no other option.
Thanks.