How do I average photo feature outputs for later concatenation?


I have the following model schema where one set of photos go through an encoder, another set of photos go through an encoder and both get concatenated with a tabular data set for a final model to predict a binary target.

How do I average the features of the image encoders? I want to average by ID within each model. See schema below.

There are 1 to 5 photos per ID. There is an ID in each image model and tabular data set.

My data class returns the image, label and ID (called policy)…see below code for that class:

class image_Dataset(Dataset):
    roof data class    
    def __init__(self, csv_file, transform = None):
            csv_file (string): Path to the csv file with annotations.
            transform (callable, optional): Optional transform to be applied
                on a sample.
        self.roof_frame = pd.read_csv(csv_file)
        self.transform = transform
    def __len__(self):
        return len(self.roof_frame)
    def __getitem__(self, idx):
        if torch.is_tensor(idx):
            idx = idx.tolist()
        root_dir = self.roof_frame.iloc[idx,4]
        pic = self.roof_frame.iloc[idx, 1]
        label = self.roof_frame.iloc[idx, 3]
        img_name = os.path.join(root_dir, pic)
        image =
        policy = self.roof_frame.iloc[idx, 0]
        sample = {'image': image, 'policy': policy, 'label':label}

        if self.transform:
            image = self.transform(image)

        return image, label, policy

I took the following code snipped off of this forum but I’m not sure how to get the mean of each feature set per ID. Could anyone help with that code assuming the below snippet as an example? (note: I did not add anything from the tabular dataset yet to the below)

class SuperEncoder(nn.Module):
    def __init__(self):
        super(MyModel, self).__init__()
        self.roofEncoder = nn.Sequential(
            nn.Conv2d(3, 6, 3, 1, 1),
            nn.Conv2d(6, 12, 3, 1, 1),
        self.dwellingEconder = nn.Sequential(
            nn.Conv2d(1, 6, 3, 1, 1),
            nn.Conv2d(6, 12, 3, 1, 1),
        self.fc1 = nn.Linear(54*54*16, 1000)
        self.fc2 = nn.Linear(54*54*16, 1000)
    def forward(self, x1, x2):
        x1 = self.roofEncoder(x1)
        x1 = x1.view(x1.size(0), -1)
        x1 = F.relu(self.fc1(x1))
        x2 = self.dwellingEncoder(x2)
        x2 = x2.view(x2.size(0), -1)
        x2 = F.relu(self.fc2(x2))

        # Concatenate in dim1 (feature dimension)
        X = torch.concat((x1, x2), 1)
        x = self.fc_out(x)
        return self.log_softmax(X, dim=1)


I’m not sure what you want to do here.
Do you have a pseudo code that shows what you want to do?
What is the link between the diagram and the code you posted?

Sorry for the confusion. The diagram is a concept of what I’m trying to do. The code is the custom data class that returns the transformed image, target label and the policy (which is the ID). I need to average a feature set when there are more than one image per ID to the ID, then join that with the tabular dataset to predict a binary outcome.

Does that make sense? Let me know if it doesn’t. I’m newish to Pytorch.

Just for me to understand better, you want to do something like that?

for input, target, ID in dataloader:
  # Assume batch_size == input.size(0)
  features = SuperEncoder()(input) # Is that right? You encode enverything here?
  # We have features.size(0) == batch_size
  avg_features = # Some ops that average all the features with the same ID?
  # But now avg_features.size(0) == ??
  # What shall we do with the targets? Should they be averaged as well?

  # More code

Yes but the targets do not necessarily need anything done with them here. I just want to join the features to the tabular dataset later.

I was hoping to get a feature size of 1000 out.

You want avg_features.size(0) == batch_size ?
But what should each of these contain? For each sample, its feature should be the average of all the features with this ID? That would mean that many samples will have the same features right?

I don’t think so. The IDs with many images are very different between each image though maybe I’m thinking about this wrong?

I think my confusion maybe coming from the novice skill set I have. I’m trying to predict a binary outcome per ID, where I have a tabular dataset with one observation per ID, and two sets of images with a relationship of 1 ID to N amount of photos. I was thinking this concept maybe like Uber’s Ludwig model but now I think I’m way off. Any guidance at this point will be greatly appreciated. Thank you for sticking with me.


I’m not sure to understand what is the task you’re trying to accomplish. Whether your IDs are labels or not? What is the link between IDs and labels?

Labels are the target labels. Ids are the level I ultimately want to predict at. Which is why I’m wanting to average by ID.


So I’ve made headway, I think. Starting with my test loop:

Run the testing batches

with torch.no_grad(): # do not update weights and bias
    for b, (X_test, y_test, policy) in enumerate(test_loader):
        X_test =
        y_test =
        if b == max_tst_batch:

        # Apply the model
        y_val = AlexNetmodel(X_test)
        # Tally the number of correct predictions
        predicted = torch.max(, 1)[1] 
        tst_corr += (predicted == y_test).sum()

loss = criterion(y_val, y_test)

My batch size is 100 for 15600 so I get a list of 156 tensors consisting of 100 policies each and another list of 156 tensors consisting of predictions. Here is a print out:

I’m guessing I would need to unpack this list and the policy list and join them in a dataframe. Then average the predictions by policy?

I guess you can compute these averages on the fly.
Accumulate the values for each policy and the number of samples per policy. At the end just divide the accumulated value by the number of samples for each policy.

Yep! Is there some syntax to change these from a tensor? The policies are coming back a tensor as well.

I’m using this to unpack the policy and prediction list.

policies = []
while policy_list:


This is what I get back…each policy still a tensor.

You have a tolist() method on tensors to do that