Hi all!!
I am new in torch. My task is to train a model by using batch samples from the dataset. I can not use loops for collecting samples into the batch and torch.utils.data.DataLoader is also prohibited. I can only iterate over the batches in the dataset. So my question is, how to create these batches in the dataset with the restrictions that I mentioned above.
I wrote this piece of code for another topic. It does use a loop to collect a batch from a dataset so I’m not sure if it could be useful to you. If it’s not, have you taken a look at pytorch’s batch sampler?
Edit: What datastructures are X_train & y_train? Numpy arrays? Tensors?
import numpy as np
import torch
from torch.utils.data import TensorDataset
import random
import more_itertools
def load_data():
# Fake data. You can also load your images and convert them into tensors.
number_images = 100
images = torch.randn(number_images, 3, 2, 2)
labels = torch.ones(number_images, 1)
return TensorDataset(images, labels)
def get_batch(dataset, batch_idx):
''' Returns the data items given batch indexes '''
# Set up the datastructures
im_size = dataset[0][0].size()
batch_size = len(batch_idx)
batch_data = torch.empty((batch_size, *im_size))
batch_labels = torch.empty((batch_size, 1))
# Add data to datastructures
for i, data_idx in enumerate(batch_idx):
data, label = dataset[data_idx]
batch_data[i] = data
batch_labels[i] = label
return batch_data, batch_labels
dataset = load_data()
data_length = len(dataset)
batch_size = 10
n_epochs = 10
for epoch in range(n_epochs):
# Create indexes, shuffles them and split them into batches
indexes = list(range(data_length))
random.shuffle(indexes)
indexes = more_itertools.chunked(indexes, batch_size)
for batch_idx in indexes:
images, labels = get_batch(dataset, batch_idx)
# You can now work with your data