How can i create a single dataloader for my two csv files . I have tried this

data_file1 = “/Data_train_A.csv”
rnaseq = pd.read_csv(data_file1, index_col=0, header=0)
rnaseq_tensor1 = torch.FloatTensor(rnaseq.values)
#print(rnaseq.shape)

data_file2 = “/Data_train_B.csv”
rnaseq = pd.read_csv(data_file2, index_col=0, header=0)
rnaseq_tensor2 = torch.FloatTensor(rnaseq.values)
#print(rnaseq.shape)

dataset = TensorDataset(rnaseq_tensor1,rnaseq_tensor2)
dataloader = DataLoader(dataset,batchsize=2)

for batch_idx, (a,b) in enumerate(dataloader):
print(a.shape, b.shape)

What does the print statement output and what is your expectation? :slight_smile:

Do you want this as one big list or do you want them parallel?

i want them as an input to my cycleGan, where we have data from domain A and domain B

i want them as an input to my cycleGan, where we have data from domain A and domain B. It is used for mapping from domain A to B . So my inputs are both ata time to cycleGan

And do you get any error using this approach?
We would need the error message or some more information on what’s not working to help debugging. :wink:

yeah it shows me this


AssertionError Traceback (most recent call last)

in ()
10 #print(rnaseq.shape)
11
—> 12 dataset = TensorDataset(rnaseq_tensor1,rnaseq_tensor2)
13 dataloader = DataLoader(dataset,batchsize=2)
14

/usr/local/lib/python3.6/dist-packages/torch/utils/data/dataset.py in init(self, *tensors)
156
157 def init(self, *tensors):
–> 158 assert all(tensors[0].size(0) == tensor.size(0) for tensor in tensors)
159 self.tensors = tensors
160

AssertionError:

This error points to different lengths of both input tensors.
TensorDataset would return samples from both inputs using the same index internally.
If one tensor contains more samples (dim0 is used for indexing), this error is thrown.

You could e.g. crop the larger tensor to the same length as the smaller one or duplicate the smaller one.
If you want to apply some more complicated sampling strategy, I would recommend to write a custom Dataset and create the pairs in __getitem__.

means i have to create separate dataloaders then

No. You should pass the same amount of samples or create some correspondence between both tensors in your custom Dataset (or collate_fn etc.).

could you please provide me an example

Here is a small example to reproduce this error and how to slice the larger tensor:

# Works, since a and b have the same length
a = torch.arange(10).view(-1, 1)
b = torch.arange(10).view(-1, 1)

dataset = TensorDataset(a, b)
loader = DataLoader(
    dataset,
    batch_size=5
)

for idx, (data1, data2) in enumerate(loader):
    print('Idx ', idx)
    print(data1)
    print(data2)

# Use different lengths
a = torch.arange(10).view(-1, 1)
b = torch.arange(20).view(-1, 1)

dataset = TensorDataset(a, b) # fails
# Slice b to have the same length
dataset = TensorDataset(a, b[:a.size(0)])
loader = DataLoader(
    dataset,
    batch_size=5
)

for idx, (data1, data2) in enumerate(loader):
    print('Idx ', idx)
    print(data1)
    print(data2)

Note that this approach might not be the best for your use case, so you should apply your method to yield corresponding pairs of both input tensors.

Thanks it really helped. Thanks again