I have two pathways named as augmented1 and augmented2 which are a set of convolutional layers. Now i want to feed two datasets (Dataset A and Dataset B) to my base model which would then be passed on two these two (augmented1 and augmented2 ) pathways. So this could only be done with batches. So i want to have batch size of 64, 32 samples from DatasetA need to go to pathway1 (base model->pathway1) and another 32 samples from other DatasetB need to go through pathway2 (base model->pathway2). Base Model is common to both pathways. Also i want to pick up batches in the same order when i am setting shuffle=True since I will be extracting feature maps from both pathways
You’re absolutely right. What I mean by last part is that when I will shuffle the samples from both dataset (data loader will do that), i don’t want them to shuffle separately rather it should shuffle both in one go since it contains images from different sources and the sequence of samples from sources matter. I would be extracting feature maps afterwards
Assuming you maintain some index of which samples are A and which are B you can just simply index into the outputs and calculate the loss on each output only with respect to the right population.
OK, I think I’ve understood it.
I assume both datasets have the same length.
Here is a small example. I’ve modified your model to work with dummy data:
class hybrid_cnn(nn.Module):
def __init__(self,**kwargs):
super(hybrid_cnn,self).__init__()
resnet = torchvision.models.resnet50(pretrained=False)
self.base = nn.Sequential(*list(resnet.children())[:-2])
setattr(self,"fc0",nn.Linear(100352, 2))
setattr(self,"fc1",nn.Linear(100352, 2))
def forward(self,x):
x = self.base(x)
clf_outputs = {}
num_fcs = 2
x = x.view(x.size(0), -1)
xs = torch.cat([x[::2], x[1::2]])
for i in range(num_fcs):
clf_outputs["fc%d" %i] = getattr(self, "fc%d" %i)(xs[i])
return clf_outputs
class MyDatasetA(Dataset):
def __init__(self):
self.data = torch.randn(640, 3, 224, 224)
def __getitem__(self, index):
return self.data[index]
def __len__(self):
return len(self.data)
class MyDatasetB(Dataset):
def __init__(self):
self.data = torch.randn(640, 3, 224, 224)
def __getitem__(self, index):
return self.data[index]
def __len__(self):
return len(self.data)
class MyDatasetC(Dataset):
def __init__(self):
self.datasetA = MyDatasetA()
self.datasetB = MyDatasetB()
def __getitem__(self, index):
dataA = self.datasetA[index].unsqueeze(0)
dataB = self.datasetB[index].unsqueeze(0)
data = torch.cat((dataA, dataB), 0)
return data
def __len__(self):
return len(self.datasetA)
dataset = MyDatasetC()
x = dataset[0]
x.shape
loader = DataLoader(
dataset,
batch_size=64,
shuffle=True,
num_workers=1
)
# Your training routine (just one iteration)
loader_iter = iter(loader)
x = loader_iter.next()
x = x.view(-1, 3, 224, 224)
model = hybrid_cnn()
output = model(x)
Let me know, if this works for you.
EDIT: I think the datasets are currently interleaved. Let me check it real quick.
EDIT2: Should work now.
I didn’t get you, You used a Tensor in self.data, but my dataset is an object, which when wrapped under ImageDataset gives an iterator, how can i make it usable with self.data in datasetA and datasetB
Ok, so is your code working already? I posted another approach using two separate Datasets while you are apparently sampling the two classes from one.
If your code is not working properly, could you try to adapt it to mine?
I want to adapt to yours only. Mine does’t work the way I have mentioned. While adapting to yours, I wanted to know how can i feed dataset in DatasetA and DatasetB since my form is different. I am not sure how can i add my both datasets in init of datasetA and datasetB. I just wanted you to know how was I performing training earlier and want to transition to yours completely
Ok, got it.
It seems that your datasets are somehow loaded using __img_factory. You just need to pass the name to your dataset_manager and it will create the appropriate dataset for the passed class?
If so, can you create two separate datasets for your two classes?
That’s where I’m stuck, using those two classes (datasetA and datasetB) to use dataloaders.
I was iterating earlier like,
for batch,(imgs,pids,camids) in enumerate(trainloader):
Now I want to be able to feed half of the batch to augmented 1(datasetA) and the other half to augmented 2(datasetB). Your approach (self.data) expects a Tensor , instead i want to use your approach with the form I have i.e
dataset_ = dataset_manager.init_img_dataset(
root='data',name=dataset_name # Can be datasetA or datasetB
)
So as per your approach ( i want to implement in class datasetA and class datasetB).
So that it may become of the form self.data = dataset_) So how can I possibly do that ?
Just try to assign your dataset to self.data.
My classes uses currently tensors, but you basically need a class which returns tensors when indexing. Your Dataset should be just fine.