Concat Dataset giving error

Hello Everyone, I have two datasets and want to use them simultaneously while training. The first dataset is a regression dataset containing 5000 while the second dataset is a classification dataset containing 25000 images. The target of the regression dataset is a list containing four numbers while for classification dataset it is a single value depicting the class. I want to train my model simultaneously on both the datasets. I tried creating two dataloaders and iterating through them as follows:
for index,data in enumerate(zip(cycle(dataloader1),dataloader2)):
I also tried reducing the batchsize of the smaller dataset in such a way that both the dataloaders execute the same number of times. But both the approaches were very slow and took more than an hour to process single epoch.
After that I am trying to use Concat dataset to concat both the datasets but when I am iterating through the dataloader created after concatenating the two datasets, it is throwing an error saying
only one element tensors can be converted to Python scalars .
Please guide me if you have any solutions.

If your data and target tensors have a different shape in both Datasets, note that the DataLoader using the ConcatDataset would try to stack these tensors and would thus assume they have the same shape.
E.g. this example works, because the batch size is chosen such that each batch has a valid shape:

class MyDataset1(Dataset):
    def __init__(self):
        pass
    
    def __getitem__(self, index):
        x = torch.randn(10)
        y = torch.randn(4)
        return x, y
    
    def __len__(self):
        return 10


class MyDataset2(Dataset):
    def __init__(self):
        pass
    
    def __getitem__(self, index):
        x = torch.randn(3, 24, 24)
        y = torch.randint(0, 10, (1,))
        return x, y
    
    def __len__(self):
        return 20
    

dataset1 = MyDataset1()
dataset2 = MyDataset2()
dataset = ConcatDataset((dataset1, dataset2))
loader = DataLoader(dataset, batch_size=10)

for data, target in loader:
    print(data.shape)
    print(target.shape)

If you would use shuffle=True, then the DataLoader would yield an error.
You could potentially avoid it by using a custom collate_fn.

However, the first approach might be more suitable (using zip).
How many workers are you using in both DataLoaders? Could you reduce them and check, if this might help the performance?

Thank you for your reply.

The code I am using is shown below.
I ran the model previously without specifying num_workers and it was still very slow. The configuration of my workstation is 4 RTX 2080ti GPUs,512 GB memory and Intel Xeon Processor.
An average epoch is taking about an hour to complete.

data_loader1_train = DataLoader(train_cls,batch_size=32,shuffle=False,sampler=sampler,num_workers=8,pin_memory=True)
data_loader1_valid = DataLoader(valid_cls,batch_size=32,shuffle=True,num_workers=8,pin_memory=True)
data_loader2_train = DataLoader(train_reg,batch_size=32,shuffle=True,num_workers=4,pin_memory=True)
data_loader2_valid = DataLoader(valid_reg,batch_size=32,shuffle=True,num_workers=4,pin_memory=True)

for x in range(epochs):
    model.train()
    print(model.training)
    for index,data in enumerate(zip(data_loader1_train,cycle(data_loader2_train))):
        x1,y1,x2,y2 = data[0][0],data[0][1],data[1][0],data[1][1]
        x1,y1,x2,y2 = x1.to(device),y1.to(device),x2.to(device),y2.to(device)
        x1,y1,x2,y2 = x1.double(),y1,x2.double(),y2
        #print(x1.shape,x2.shape,y1.shape)
        #print(x1.shape,y1.shape,x2.shape,y2.shape)
        pred1,_ = model(x1)
        
        _,pred2 = model(x2)
        
        
        
        loss_1 = loss1(pred1,y1).double()
        loss_2 = loss2(pred2,y2).double()
        total_loss = loss_1 + loss_2
        print(f"Loss1:{loss_1} Loss2:{loss_2} TotalLoss:{total_loss}")
        total_loss.backward()
        optimizer.step()
        optimizer.zero_grad()
        

Could you profile the data loading loop separately (remove the actual training) and compare the epoch time for the ConcatDataset and the “zip approach” using a different number of workers?
Note that too many workers might also decrease your performance.

PS: Not sure if that’s your use case, but do you really need DoubleTensors? You will get a performance hit, if you are using a GPU, as float64 is slower than float32.

Sir I don’t want to use Double Tensor but without it the code is breaking when is call loss.backward() with the following error.

RuntimeError                              Traceback (most recent call last)
<timed exec> in <module>

~/anaconda3/envs/pytorch/lib/python3.7/site-packages/torch/tensor.py in backward(self, gradient, retain_graph, create_graph)
    196                 products. Defaults to ``False``.
    197         """
--> 198         torch.autograd.backward(self, gradient, retain_graph, create_graph)
    199 
    200     def register_hook(self, hook):

~/anaconda3/envs/pytorch/lib/python3.7/site-packages/torch/autograd/__init__.py in backward(tensors, grad_tensors, retain_graph, create_graph, grad_variables)
     98     Variable._execution_engine.run_backward(
     99         tensors, grad_tensors, retain_graph, create_graph,
--> 100         allow_unreachable=True)  # allow_unreachable flag
    101 
    102 

RuntimeError: expected dtype Float but got dtype Long (validate_dtype at /opt/conda/conda-bld/pytorch_1587428266983/work/aten/src/ATen/native/TensorIterator.cpp:143)
frame #0: c10::Error::Error(c10::SourceLocation, std::string const&) + 0x4e (0x7ff72c78db5e in /home/thsticore/anaconda3/envs/pytorch/lib/python3.7/site-packages/torch/lib/libc10.so)
frame #1: at::TensorIterator::compute_types() + 0xce3 (0x7ff70bacb793 in /home/thsticore/anaconda3/envs/pytorch/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #2: at::TensorIterator::build() + 0x44 (0x7ff70bace174 in /home/thsticore/anaconda3/envs/pytorch/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #3: at::native::mse_loss_backward_out(at::Tensor&, at::Tensor const&, at::Tensor const&, at::Tensor const&, long) + 0x193 (0x7ff70b91c1a3 in /home/thsticore/anaconda3/envs/pytorch/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #4: <unknown function> + 0xdf46a7 (0x7ff6e11616a7 in /home/thsticore/anaconda3/envs/pytorch/lib/python3.7/site-packages/torch/lib/libtorch_cuda.so)
frame #5: at::native::mse_loss_backward(at::Tensor const&, at::Tensor const&, at::Tensor const&, long) + 0x172 (0x7ff70b9248e2 in /home/thsticore/anaconda3/envs/pytorch/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #6: <unknown function> + 0xdf495f (0x7ff6e116195f in /home/thsticore/anaconda3/envs/pytorch/lib/python3.7/site-packages/torch/lib/libtorch_cuda.so)
frame #7: <unknown function> + 0xe22286 (0x7ff70bd54286 in /home/thsticore/anaconda3/envs/pytorch/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #8: <unknown function> + 0x27fd2fb (0x7ff70d72f2fb in /home/thsticore/anaconda3/envs/pytorch/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #9: <unknown function> + 0xe22286 (0x7ff70bd54286 in /home/thsticore/anaconda3/envs/pytorch/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #10: torch::autograd::generated::MseLossBackward::apply(std::vector<at::Tensor, std::allocator<at::Tensor> >&&) + 0x1f7 (0x7ff70d536d97 in /home/thsticore/anaconda3/envs/pytorch/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #11: <unknown function> + 0x2ae8215 (0x7ff70da1a215 in /home/thsticore/anaconda3/envs/pytorch/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #12: torch::autograd::Engine::evaluate_function(std::shared_ptr<torch::autograd::GraphTask>&, torch::autograd::Node*, torch::autograd::InputBuffer&) + 0x16f3 (0x7ff70da17513 in /home/thsticore/anaconda3/envs/pytorch/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #13: torch::autograd::Engine::thread_main(std::shared_ptr<torch::autograd::GraphTask> const&, bool) + 0x3d2 (0x7ff70da182f2 in /home/thsticore/anaconda3/envs/pytorch/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #14: torch::autograd::Engine::thread_init(int) + 0x39 (0x7ff70da10969 in /home/thsticore/anaconda3/envs/pytorch/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #15: torch::autograd::python::PythonEngine::thread_init(int) + 0x38 (0x7ff72d0c6558 in /home/thsticore/anaconda3/envs/pytorch/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #16: <unknown function> + 0xc819d (0x7ff748e1b19d in /home/thsticore/anaconda3/envs/pytorch/lib/python3.7/site-packages/zmq/backend/cython/../../../../.././libstdc++.so.6)
frame #17: <unknown function> + 0x76db (0x7ff74c86b6db in /lib/x86_64-linux-gnu/libpthread.so.0)
frame #18: clone + 0x3f (0x7ff74c59488f in /lib/x86_64-linux-gnu/libc.so.6)

The error message points to a LongTensor. Could you check the .type() of the model inputs, outputs, loss etc.?

The type of various tensors in the following order is shown below.
x1,x2,y1,y2,loss1,loss2.

torch.cuda.FloatTensor torch.cuda.FloatTensor torch.cuda.LongTensor torch.cuda.LongTensor torch.cuda.FloatTensor torch.cuda.FloatTensor

I changed the type of y2 from LongTensor to FloatTensor, the error was gone and also the training speed seems to have improved.

Now the average epoch time is around 7 mins which is way better than before.