RuntimeError: Expected 4-dimensional input for 4-dimensional weight [32, 3, 5, 5], but got 1-dimensional input of size [128] instead

I am getting this error for the following neural network, according to the error log, it’s causing problems in Siamesenet as I mentioned below in Neural Network Architecture. Any Suggestion are welcome.
Thanks in advance.

File "/home/sharad/miniconda3/lib/python3.8/site-packages/torch/nn/modules/", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/sharad/Few-shot-classification-siamese/siamese-triplet-master/", line 68, in forward
    output2 = self.embedding_net(x2)
  File "/home/sharad/miniconda3/lib/python3.8/site-packages/torch/nn/modules/", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/sharad/Few-shot-classification-siamese/siamese-triplet-master/", line 21, in forward
    output = self.convnet(x)
  File "/home/sharad/miniconda3/lib/python3.8/site-packages/torch/nn/modules/", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/sharad/miniconda3/lib/python3.8/site-packages/torch/nn/modules/", line 119, in forward
   input = module(input)
  File "/home/sharad/miniconda3/lib/python3.8/site-packages/torch/nn/modules/", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/sharad/miniconda3/lib/python3.8/site-packages/torch/nn/modules/", line 399, in forward
    return self._conv_forward(input, self.weight, self.bias)
  File "/home/sharad/miniconda3/lib/python3.8/site-packages/torch/nn/modules/", line 395, in _conv_forward
    return F.conv2d(input, weight, bias, self.stride,
RuntimeError: Expected 4-dimensional input for 4-dimensional weight [32, 3, 5, 5], but got 1-dimensional input of size [128] instead

import torch.nn as nn
import torch.nn.functional as F

class EmbeddingNet(nn.Module):
    def __init__(self):
        super(EmbeddingNet, self).__init__()
        self.convnet = nn.Sequential(nn.Conv2d(3, 32, 5), nn.PReLU(),
                                     nn.MaxPool2d(2, stride=2),
                                     nn.Conv2d(32, 64, 5), nn.PReLU(),
                                     nn.MaxPool2d(2, stride=2))

        self.fc = nn.Sequential(nn.Linear(64 * 61 * 61, 256),
                                nn.Linear(256, 256),
                                nn.Linear(256, 2)

    def forward(self, x):
        output = self.convnet(x)
        output = output.view(output.size()[0], -1)
        output = self.fc(output)
        return output

    def get_embedding(self, x):
        return self.forward(x)

class EmbeddingNetL2(EmbeddingNet):
    def __init__(self):
        super(EmbeddingNetL2, self).__init__()

    def forward(self, x):
        output = super(EmbeddingNetL2, self).forward(x)
        output /= output.pow(2).sum(1, keepdim=True).sqrt()
        return output

    def get_embedding(self, x):
        return self.forward(x)

class ClassificationNet(nn.Module):
    def __init__(self, embedding_net, n_classes):
        super(ClassificationNet, self).__init__()
        self.embedding_net = embedding_net
        self.n_classes = n_classes
        self.nonlinear = nn.PReLU()
        self.fc1 = nn.Linear(2, n_classes)

    def forward(self, x):
        output = self.embedding_net(x)
        output = self.nonlinear(output)
        scores = F.log_softmax(self.fc1(output), dim=-1)
        return scores

    def get_embedding(self, x):
        return self.nonlinear(self.embedding_net(x))

class SiameseNet(nn.Module):
    def __init__(self, embedding_net):
        super(SiameseNet, self).__init__()
        self.embedding_net = embedding_net

    def forward(self, x1, x2):
        output1 = self.embedding_net(x1)
        output2 = self.embedding_net(x2)
        return output1, output2

    def get_embedding(self, x):
        return self.embedding_net(x)

class TripletNet(nn.Module):
    def __init__(self, embedding_net):
        super(TripletNet, self).__init__()
        self.embedding_net = embedding_net

    def forward(self, x1, x2, x3):
        output1 = self.embedding_net(x1)
        output2 = self.embedding_net(x2)
        output3 = self.embedding_net(x3)
        return output1, output2, output3

    def get_embedding(self, x):
        return self.embedding_net(x)

Based on the error message it seems you are passing a 1-dimensional input to EmbeddingNet while the internal convnet expects a 4-dimensional input.
Check the shape of x inside the forward method and make sure it contains 4 dimensions as [batch_size, channels, height, width].

Thanks for the reply @ptrblck. I have checked the size of the x i.e output at each stage in internal convnet for forward. It is as follows:

1: torch.Size([128, 64, 61, 61])
2: torch.Size([128, 238144])
3: torch.Size([128, 2])

as I am quite new to PyTorch I am not getting how can I convert it into 4D as per your suggestion? Shall I use unsqueez? but I guess it expands the dimension by 1D only.

Any suggestions or solutions from your side is welcome. It would be great help.
Thanks in advance.

That’s a bit unexpected, as the error claims a 1D activation is used at one point:

but got 1-dimensional input of size [128] instead

Could you post the shape of the input to the model as well as the setup you are using (model initialization etc.) so that we could reproduce and debug it?

Thanks for the reply @ptrblck.
here is my classifier module which is reponsible for loading my dataset. The size of .img images that are in dataset are 1728*2393. I have such 21600 images in a single folder. I classified them using .csv file (Training,validation,testing).

ROOT_PATH = '/home/kumar/iter3/materials/'
Root_path1 = '/home/kumar/dataset/'

class Classifier(Dataset):

    def __init__(self, setname,train=True):
        csv_path = osp.join(ROOT_PATH, setname + '.csv')
        lines = [x.strip() for x in open(csv_path, 'r').readlines()][1:]
        data = []
        label = []
        lb = -1

        self.wnids = []
        for l in lines:
            name, wnid = l.split(',')
            path = osp.join(Root_path1, 'images', name)
            if wnid not in self.wnids:
                lb += 1
            label.append(lb) = data
        self.label = label

        self.transform = transforms.Compose([
            #transforms.Normalize(mean=[0.485, 0.456, 0.406],
                                 #std=[0.229, 0.224, 0.225])
            transforms.Normalize(mean=[0.9439, 0.9439, 0.9439],
                                 std=[0.208, 0.208, 0.208])

    def __len__(self):
        return len(

    def __getitem__(self, i):
        path, label =[i], self.label[i]
        image = self.transform('RGB'))
        return image, label

below is my main module

#!/usr/bin/env python
# coding: utf-8

# In[ ]:

from __future__ import print_function, division

import torch
import torch.nn as nn
import torch.optim as optim
from torch.optim import lr_scheduler
import numpy as np
import torchvision
from torchvision import datasets, models, transforms
import matplotlib.pyplot as plt
import time
import os
import copy
from classifier import Classifier

plt.ion()   # interactive mode

# In[ ]:

train_dataset = Classifier('train')
test_dataset = Classifier('val')

# In[ ]:

import torch
from torchvision.datasets import FashionMNIST
from torchvision import transforms

mean, std = 0.28604059698879553, 0.35302424451492237
batch_size = 256

cuda = torch.cuda.is_available()
kwargs = {'num_workers': 1, 'pin_memory': True} if cuda else {}
train_loader =, batch_size=batch_size, shuffle=True, **kwargs)
test_loader =, batch_size=batch_size, shuffle=False, **kwargs)

n_classes = 27

# In[ ]:

import torch
from torch.optim import lr_scheduler
import torch.optim as optim
from torch.autograd import Variable

from trainer import fit
import numpy as np
cuda = torch.cuda.is_available()
get_ipython().run_line_magic('matplotlib', 'inline')
import matplotlib
import matplotlib.pyplot as plt

classes = [................]
colors = ['#1f77b4', '#ff7f0e', '#2ca02c', '#d62728',
              '#9467bd', '#8c564b', '#e377c2', '#7f7f7f',
              '#bcbd22', '#17becf','#00FFFF','#7FFFD4',
f_classes = classes

def plot_embeddings(embeddings, targets, xlim=None, ylim=None):
    for i in range(10):
        inds = np.where(targets==i)[0]
        plt.scatter(embeddings[inds,0], embeddings[inds,1], alpha=0.5, color=colors[i])
    if xlim:
        plt.xlim(xlim[0], xlim[1])
    if ylim:
        plt.ylim(ylim[0], ylim[1])

def extract_embeddings(dataloader, model):
    with torch.no_grad():
        embeddings = np.zeros((len(dataloader.dataset), 2))
        labels = np.zeros(len(dataloader.dataset))
        k = 0
        for images, target in dataloader:
            if cuda:
                images = images.cuda()
            embeddings[k:k+len(images)] = model.get_embedding(images).data.cpu().numpy()
            labels[k:k+len(images)] = target.numpy()
            k += len(images)
    return embeddings, labels

# In[ ]:

# Set up data loaders
batch_size = 256
kwargs = {'num_workers': 1, 'pin_memory': True} if cuda else {}
train_loader =, batch_size=batch_size, shuffle=True, **kwargs)
#for i_batch, sample_batched in enumerate(train_loader):
    #print(i_batch, sample_batched)
test_loader =, batch_size=batch_size, shuffle=False, **kwargs)

# Set up the network and training parameters
from networks import EmbeddingNet, ClassificationNet
from metrics import AccumulatedAccuracyMetric

embedding_net = EmbeddingNet()
model = ClassificationNet(embedding_net, n_classes=n_classes)
if cuda:
loss_fn = torch.nn.NLLLoss()
lr = 1e-2
optimizer = optim.Adam(model.parameters(), lr=lr)
scheduler = lr_scheduler.StepLR(optimizer, 8, gamma=0.1, last_epoch=-1)
n_epochs = 100
log_interval = 10

# In[ ]:

#fit(train_loader, test_loader, model, loss_fn, optimizer, scheduler, n_epochs, cuda, log_interval, metrics=[AccumulatedAccuracyMetric()])

# In[ ]:

train_embeddings_baseline, train_labels_baseline = extract_embeddings(train_loader, model)
plot_embeddings(train_embeddings_baseline, train_labels_baseline)
val_embeddings_baseline, val_labels_baseline = extract_embeddings(test_loader, model)
plot_embeddings(val_embeddings_baseline, val_labels_baseline)

# In[ ]:

# Set up data loaders
from datasets import Siamese

# Step 1
siamese_train_dataset = Siamese(train_dataset) # Returns pairs of images and target same/different
siamese_test_dataset = Siamese(test_dataset)
batch_size = 128
kwargs = {'num_workers': 1, 'pin_memory': True} if cuda else {}
siamese_train_loader =, batch_size=batch_size, shuffle=True, **kwargs)
siamese_test_loader =, batch_size=batch_size, shuffle=False, **kwargs)

# Set up the network and training parameters
from networks import EmbeddingNet, SiameseNet
from losses import ContrastiveLoss

# Step 2
embedding_net = EmbeddingNet()
# Step 3
model = SiameseNet(embedding_net)
if cuda:
# Step 4
margin = 1.
loss_fn = ContrastiveLoss(margin)
lr = 1e-3
optimizer = optim.Adam(model.parameters(), lr=lr)
scheduler = lr_scheduler.StepLR(optimizer, 8, gamma=0.1, last_epoch=-1)
n_epochs = 100
log_interval = 10

# In[ ]:

fit(siamese_train_loader, siamese_test_loader, model, loss_fn, optimizer, scheduler, n_epochs, cuda, log_interval)

# In[ ]:

train_embeddings_cl, train_labels_cl = extract_embeddings(train_loader, model)
plot_embeddings(train_embeddings_cl, train_labels_cl)
val_embeddings_cl, val_labels_cl = extract_embeddings(test_loader, model)
plot_embeddings(val_embeddings_cl, val_labels_cl)


This is my module where I am defining my siamese function:

class Siamese(Dataset):
    def __init__(self, dataset):
        self.dataset = dataset

    def __getitem__(self, index):
        # We need approx 50 % of  images of the same class
        same_class = random.randint(0, 1)
        img_0, label_0 = self.dataset[index]
        if same_class:
            while True:
                # keep looping till the same class image is found
                index_1 = random.randint(0, self.__len__()-1)
                img_1, label_1 = self.dataset[index_1]

                if label_0 == label_1:
            while True:
                index_1 = random.randint(0, self.__len__()-1)
                img_1, label_1 = self.dataset[index_1]
                if label_0 != label_1:

        return (img_0, label_0), (img_1, label_1)

    def __len__(self):
        return len(self.dataset)

Here are all the losses function module:

import torch
import torch.nn as nn
import torch.nn.functional as F

class ContrastiveLoss(nn.Module):
    Contrastive loss
    Takes embeddings of two samples and a target label == 1 if samples are from the same class and label == 0 otherwise

    def __init__(self, margin):
        super(ContrastiveLoss, self).__init__()
        self.margin = margin
        self.eps = 1e-9

    def forward(self, output1, output2, target, size_average=True):
        distances = (output2 - output1).pow(2).sum(1)  # squared distances
        losses = 0.5 * (target.float() * distances +
                        (1 + -1 * target).float() * F.relu(self.margin - (distances + self.eps).sqrt()).pow(2))
        return losses.mean() if size_average else losses.sum()

class TripletLoss(nn.Module):
    Triplet loss
    Takes embeddings of an anchor sample, a positive sample and a negative sample

    def __init__(self, margin):
        super(TripletLoss, self).__init__()
        self.margin = margin

    def forward(self, anchor, positive, negative, size_average=True):
        distance_positive = (anchor - positive).pow(2).sum(1)  # .pow(.5)
        distance_negative = (anchor - negative).pow(2).sum(1)  # .pow(.5)
        losses = F.relu(distance_positive - distance_negative + self.margin)
        return losses.mean() if size_average else losses.sum()

class OnlineContrastiveLoss(nn.Module):
    Online Contrastive loss
    Takes a batch of embeddings and corresponding labels.
    Pairs are generated using pair_selector object that take embeddings and targets and return indices of positive
    and negative pairs

    def __init__(self, margin, pair_selector):
        super(OnlineContrastiveLoss, self).__init__()
        self.margin = margin
        self.pair_selector = pair_selector

    def forward(self, embeddings, target):
        positive_pairs, negative_pairs = self.pair_selector.get_pairs(embeddings, target)
        if embeddings.is_cuda:
            positive_pairs = positive_pairs.cuda()
            negative_pairs = negative_pairs.cuda()
        positive_loss = (embeddings[positive_pairs[:, 0]] - embeddings[positive_pairs[:, 1]]).pow(2).sum(1)
        negative_loss = F.relu(
            self.margin - (embeddings[negative_pairs[:, 0]] - embeddings[negative_pairs[:, 1]]).pow(2).sum(
        loss =[positive_loss, negative_loss], dim=0)
        return loss.mean()

class OnlineTripletLoss(nn.Module):
    Online Triplets loss
    Takes a batch of embeddings and corresponding labels.
    Triplets are generated using triplet_selector object that take embeddings and targets and return indices of

    def __init__(self, margin, triplet_selector):
        super(OnlineTripletLoss, self).__init__()
        self.margin = margin
        self.triplet_selector = triplet_selector

    def forward(self, embeddings, target):

        triplets = self.triplet_selector.get_triplets(embeddings, target)

        if embeddings.is_cuda:
            triplets = triplets.cuda()

        ap_distances = (embeddings[triplets[:, 0]] - embeddings[triplets[:, 1]]).pow(2).sum(1)  # .pow(.5)
        an_distances = (embeddings[triplets[:, 0]] - embeddings[triplets[:, 2]]).pow(2).sum(1)  # .pow(.5)
        losses = F.relu(ap_distances - an_distances + self.margin)

        return losses.mean(), len(triplets) module:

import numpy as np

class Metric:
    def __init__(self):

    def __call__(self, outputs, target, loss):
        raise NotImplementedError

    def reset(self):
        raise NotImplementedError

    def value(self):
        raise NotImplementedError

    def name(self):
        raise NotImplementedError

class AccumulatedAccuracyMetric(Metric):
    Works with classification model

    def __init__(self):
        self.correct = 0 = 0

    def __call__(self, outputs, target, loss):
        pred = outputs[0].data.max(1, keepdim=True)[1]
        self.correct += pred.eq(target[0].data.view_as(pred)).cpu().sum() += target[0].size(0)
        return self.value()

    def reset(self):
        self.correct = 0 = 0

    def value(self):
        return 100 * float(self.correct) /

    def name(self):
        return 'Accuracy'

class AverageNonzeroTripletsMetric(Metric):
    Counts average number of nonzero triplets found in minibatches

    def __init__(self):
        self.values = []

    def __call__(self, outputs, target, loss):
        return self.value()

    def reset(self):
        self.values = []

    def value(self):
        return np.mean(self.values)

    def name(self):
        return 'Average nonzero triplets' module

import torch.nn as nn
import torch.nn.functional as F

import torch
import numpy as np

def fit(train_loader, val_loader, model, loss_fn, optimizer, scheduler, n_epochs, cuda, log_interval, metrics=[],
    Loaders, model, loss function and metrics should work together for a given task,
    i.e. The model should be able to process data output of loaders,
    loss function should process target output of loaders and outputs from the model

    Examples: Classification: batch loader, classification model, NLL loss, accuracy metric
    Siamese network: Siamese loader, siamese model, contrastive loss
    Online triplet learning: batch loader, embedding model, online triplet loss
    for epoch in range(0, start_epoch):

    for epoch in range(start_epoch, n_epochs):

        # Train stage
        train_loss, metrics = train_epoch(train_loader, model, loss_fn, optimizer, cuda, log_interval, metrics)

        message = 'Epoch: {}/{}. Train set: Average loss: {:.4f}'.format(epoch + 1, n_epochs, train_loss)
        for metric in metrics:
            message += '\t{}: {}'.format(, metric.value())

        val_loss, metrics = test_epoch(val_loader, model, loss_fn, cuda, metrics)
        val_loss /= len(val_loader)

        message += '\nEpoch: {}/{}. Validation set: Average loss: {:.4f}'.format(epoch + 1, n_epochs,
        for metric in metrics:
            message += '\t{}: {}'.format(, metric.value())


def train_epoch(train_loader, model, loss_fn, optimizer, cuda, log_interval, metrics):
    for metric in metrics:

    losses = []
    total_loss = 0

    for batch_idx, (data, target) in enumerate(train_loader):
        target = target if len(target) > 0 else None
        if not type(data) in (tuple, list):
            data = (data,)
        if cuda:
            data = tuple(d.cuda() for d in data)
            if target is not None:
                target = (t.cuda() for t in target)

        outputs = model(*data)

        if type(outputs) not in (tuple, list):
            outputs = (outputs,)

        loss_inputs = outputs
        if target is not None:
            target = (target,)
            loss_inputs += target

        loss_outputs = loss_fn(*loss_inputs)
        loss = loss_outputs[0] if type(loss_outputs) in (tuple, list) else loss_outputs
        total_loss += loss.item()

        for metric in metrics:
            metric(outputs, target, loss_outputs)

        if batch_idx % log_interval == 0:
            message = 'Train: [{}/{} ({:.0f}%)]\tLoss: {:.6f}'.format(
                batch_idx * len(data[0]), len(train_loader.dataset),
                100. * batch_idx / len(train_loader), np.mean(losses))
            for metric in metrics:
                message += '\t{}: {}'.format(, metric.value())

            losses = []

    total_loss /= (batch_idx + 1)
    return total_loss, metrics

def test_epoch(val_loader, model, loss_fn, cuda, metrics):
    with torch.no_grad():
        for metric in metrics:
        val_loss = 0
        for batch_idx, (data, target) in enumerate(val_loader):
            target = target if len(target) > 0 else None
            if not type(data) in (tuple, list):
                data = (data,)
            if cuda:
                data = tuple(d.cuda() for d in data)
                if target is not None:
                    target = target.cuda()

            outputs = model(*data)

            if type(outputs) not in (tuple, list):
                outputs = (outputs,)
            loss_inputs = outputs
            if target is not None:
                target = (target,)
                loss_inputs += target

            loss_outputs = loss_fn(*loss_inputs)
            loss = loss_outputs[0] if type(loss_outputs) in (tuple, list) else loss_outputs
            val_loss += loss.item()

            for metric in metrics:
                metric(outputs, target, loss_outputs)

    return val_loss, metrics

from itertools import combinations

import numpy as np
import torch

def pdist(vectors):
    distance_matrix = -2 * + vectors.pow(2).sum(dim=1).view(1, -1) + vectors.pow(2).sum(
        dim=1).view(-1, 1)
    return distance_matrix

class PairSelector:
    Implementation should return indices of positive pairs and negative pairs that will be passed to compute
    Contrastive Loss
    return positive_pairs, negative_pairs

    def __init__(self):

    def get_pairs(self, embeddings, labels):
        raise NotImplementedError

class AllPositivePairSelector(PairSelector):
    Discards embeddings and generates all possible pairs given labels.
    If balance is True, negative pairs are a random sample to match the number of positive samples
    def __init__(self, balance=True):
        super(AllPositivePairSelector, self).__init__()
        self.balance = balance

    def get_pairs(self, embeddings, labels):
        labels = labels.cpu().data.numpy()
        all_pairs = np.array(list(combinations(range(len(labels)), 2)))
        all_pairs = torch.LongTensor(all_pairs)
        positive_pairs = all_pairs[(labels[all_pairs[:, 0]] == labels[all_pairs[:, 1]]).nonzero()]
        negative_pairs = all_pairs[(labels[all_pairs[:, 0]] != labels[all_pairs[:, 1]]).nonzero()]
        if self.balance:
            negative_pairs = negative_pairs[torch.randperm(len(negative_pairs))[:len(positive_pairs)]]

        return positive_pairs, negative_pairs

class HardNegativePairSelector(PairSelector):
    Creates all possible positive pairs. For negative pairs, pairs with smallest distance are taken into consideration,
    matching the number of positive pairs.

    def __init__(self, cpu=True):
        super(HardNegativePairSelector, self).__init__()
        self.cpu = cpu

    def get_pairs(self, embeddings, labels):
        if self.cpu:
            embeddings = embeddings.cpu()
        distance_matrix = pdist(embeddings)

        labels = labels.cpu().data.numpy()
        all_pairs = np.array(list(combinations(range(len(labels)), 2)))
        all_pairs = torch.LongTensor(all_pairs)
        positive_pairs = all_pairs[(labels[all_pairs[:, 0]] == labels[all_pairs[:, 1]]).nonzero()]
        negative_pairs = all_pairs[(labels[all_pairs[:, 0]] != labels[all_pairs[:, 1]]).nonzero()]

        negative_distances = distance_matrix[negative_pairs[:, 0], negative_pairs[:, 1]]
        negative_distances = negative_distances.cpu().data.numpy()
        top_negatives = np.argpartition(negative_distances, len(positive_pairs))[:len(positive_pairs)]
        top_negative_pairs = negative_pairs[torch.LongTensor(top_negatives)]

        return positive_pairs, top_negative_pairs

class TripletSelector:
    Implementation should return indices of anchors, positive and negative samples
    return np array of shape [N_triplets x 3]

    def __init__(self):

    def get_triplets(self, embeddings, labels):
        raise NotImplementedError

class AllTripletSelector(TripletSelector):
    Returns all possible triplets
    May be impractical in most cases

    def __init__(self):
        super(AllTripletSelector, self).__init__()

    def get_triplets(self, embeddings, labels):
        labels = labels.cpu().data.numpy()
        triplets = []
        for label in set(labels):
            label_mask = (labels == label)
            label_indices = np.where(label_mask)[0]
            if len(label_indices) < 2:
            negative_indices = np.where(np.logical_not(label_mask))[0]
            anchor_positives = list(combinations(label_indices, 2))  # All anchor-positive pairs

            # Add all negatives for all positive pairs
            temp_triplets = [[anchor_positive[0], anchor_positive[1], neg_ind] for anchor_positive in anchor_positives
                             for neg_ind in negative_indices]
            triplets += temp_triplets

        return torch.LongTensor(np.array(triplets))

def hardest_negative(loss_values):
    hard_negative = np.argmax(loss_values)
    return hard_negative if loss_values[hard_negative] > 0 else None

def random_hard_negative(loss_values):
    hard_negatives = np.where(loss_values > 0)[0]
    return np.random.choice(hard_negatives) if len(hard_negatives) > 0 else None

def semihard_negative(loss_values, margin):
    semihard_negatives = np.where(np.logical_and(loss_values < margin, loss_values > 0))[0]
    return np.random.choice(semihard_negatives) if len(semihard_negatives) > 0 else None

class FunctionNegativeTripletSelector(TripletSelector):
    For each positive pair, takes the hardest negative sample (with the greatest triplet loss value) to create a triplet
    Margin should match the margin used in triplet loss.
    negative_selection_fn should take array of loss_values for a given anchor-positive pair and all negative samples
    and return a negative index for that pair

    def __init__(self, margin, negative_selection_fn, cpu=True):
        super(FunctionNegativeTripletSelector, self).__init__()
        self.cpu = cpu
        self.margin = margin
        self.negative_selection_fn = negative_selection_fn

    def get_triplets(self, embeddings, labels):
        if self.cpu:
            embeddings = embeddings.cpu()
        distance_matrix = pdist(embeddings)
        distance_matrix = distance_matrix.cpu()

        labels = labels.cpu().data.numpy()
        triplets = []

        for label in set(labels):
            label_mask = (labels == label)
            label_indices = np.where(label_mask)[0]
            if len(label_indices) < 2:
            negative_indices = np.where(np.logical_not(label_mask))[0]
            anchor_positives = list(combinations(label_indices, 2))  # All anchor-positive pairs
            anchor_positives = np.array(anchor_positives)

            ap_distances = distance_matrix[anchor_positives[:, 0], anchor_positives[:, 1]]
            for anchor_positive, ap_distance in zip(anchor_positives, ap_distances):
                loss_values = ap_distance - distance_matrix[torch.LongTensor(np.array([anchor_positive[0]])), torch.LongTensor(negative_indices)] + self.margin
                loss_values =
                hard_negative = self.negative_selection_fn(loss_values)
                if hard_negative is not None:
                    hard_negative = negative_indices[hard_negative]
                    triplets.append([anchor_positive[0], anchor_positive[1], hard_negative])

        if len(triplets) == 0:
            triplets.append([anchor_positive[0], anchor_positive[1], negative_indices[0]])

        triplets = np.array(triplets)

        return torch.LongTensor(triplets)

def HardestNegativeTripletSelector(margin, cpu=False): return FunctionNegativeTripletSelector(margin=margin,

def RandomNegativeTripletSelector(margin, cpu=False): return FunctionNegativeTripletSelector(margin=margin,

def SemihardNegativeTripletSelector(margin, cpu=False): return FunctionNegativeTripletSelector(margin=margin,
                                                                                  negative_selection_fn=lambda x: semihard_negative(x, margin),

if __name__ == '__main__':

I am still wondering why I am getting this error. your help would be grateful.

@ptrblck Thank you for prompt response. I have posted all the modules I am using above. I am unable to figure out where I am missing the mistake. Because the shape in convert I am getting is 4D only.

Thanks for posting the code. Unfortunately, it’s unclear to me how to reproduce the issue, as you’ve posted >800 lines of code.
The model, which seems to raise the issue works fine for an input in the shape [batch_size, 3, 256, 256]:

model = EmbeddingNet()
x = torch.randn(1, 3, 256, 256)
out = model(x)

so my guess would be that (some) of your inputs do not have this expected shape.
In case you are still stuck, try to reduce the number of lines of code and post an executable code snippet, which would raise the same error.

Hello @ptrblck, I have looked into a code and reduced it, also reproduced the same error. I have analyzed the code and output where I am getting this error.
What I observed is

class SiameseNet(nn.Module):
    def __init__(self, embedding_net):
        super(SiameseNet, self).__init__()
        self.embedding_net = embedding_net

    def forward(self, x1, x2):
        output1 = self.embedding_net(x1)
        output2 = self.embedding_net(x2)
        return output1, output2

    def get_embedding(self, x):
        return self.embedding_net(x)

the data for in def forward x1,x2 is different. In x1 I am getting a pixel values tensor which is 4D but in x2, it is just class labels that are 1D, as shown in the below figure.

The point here, I am not getting is how can I load the pixel values of the 2nd image into x2 which is 4D.
P.S: I am using a siamese network, so I will be returning 2 images as defined in class Siamese(Dataset): in the following code which I reduced and less than 400 lines of code.

Can you please have a look and let me know how can I load the 4D values into x2.

Thanks for your time and consideration

import os.path as osp
from import Dataset
from torchvision import transforms
import numpy as np
from PIL import Image
import pandas as pd
from import Dataset
from import BatchSampler
import torch
import torch.nn as nn
import torch.optim as optim
from torch.optim import lr_scheduler
import numpy as np
import torchvision
from torchvision import datasets, models, transforms
import matplotlib.pyplot as plt
import time
import os
import copy
import random
import torch.nn.functional as F

ROOT_PATH = '/home/kumar/iter3/materials/'
Root_path1 = '/home/kumar/dataset/'

class Classifier(Dataset):

    def __init__(self, setname,train=True):
        csv_path = osp.join(ROOT_PATH, setname + '.csv')
        lines = [x.strip() for x in open(csv_path, 'r').readlines()][1:]
        data = []
        label = []
        lb = -1

        self.wnids = []
        for l in lines:
            name, wnid = l.split(',')
            path = osp.join(Root_path1, 'images', name)
            if wnid not in self.wnids:
                lb += 1
            label.append(lb) = data
        self.label = label

        self.transform = transforms.Compose([
            #transforms.Normalize(mean=[0.485, 0.456, 0.406],
                                 #std=[0.229, 0.224, 0.225])
            transforms.Normalize(mean=[0.9439, 0.9439, 0.9439],
                                 std=[0.208, 0.208, 0.208])

    def __len__(self):
        return len(

    def __getitem__(self, i):
        path, label =[i], self.label[i]
        image = self.transform('RGB'))
        return image, label

class Siamese(Dataset):
    def __init__(self, dataset):
        self.dataset = dataset

    def __getitem__(self, index):
        # We need approx 50 % of  images of the same class
        same_class = random.randint(0, 1)
        img_0, label_0 = self.dataset[index]
        if same_class:
            while True:
                # keep looping till the same class image is found
                index_1 = random.randint(0, self.__len__()-1)
                img_1, label_1 = self.dataset[index_1]

                if label_0 == label_1:
            while True:
                index_1 = random.randint(0, self.__len__()-1)
                img_1, label_1 = self.dataset[index_1]
                if label_0 != label_1:

        return (img_0, label_0), (img_1, label_1)

    def __len__(self):
        return len(self.dataset)

class EmbeddingNet(nn.Module):
    def __init__(self):
        super(EmbeddingNet, self).__init__()
        self.convnet = nn.Sequential(nn.Conv2d(3, 32, 5), nn.PReLU(),
                                     nn.MaxPool2d(2, stride=2),
                                     nn.Conv2d(32, 64, 5), nn.PReLU(),
                                     nn.MaxPool2d(2, stride=2))

        self.fc = nn.Sequential(nn.Linear(64 * 61 * 61, 256),
                                nn.Linear(256, 256),
                                nn.Linear(256, 2)

    def forward(self, x):
        output = self.convnet(x)
        output = output.view(output.size()[0], -1)
        output = self.fc(output)
        return output

    def get_embedding(self, x):
        return self.forward(x)

class ClassificationNet(nn.Module):
    def __init__(self, embedding_net, n_classes):
        super(ClassificationNet, self).__init__()
        self.embedding_net = embedding_net
        self.n_classes = n_classes
        self.nonlinear = nn.PReLU()
        self.fc1 = nn.Linear(2, n_classes)

    def forward(self, x):
        output = self.embedding_net(x)
        output = self.nonlinear(output)
        scores = F.log_softmax(self.fc1(output), dim=-1)
        return scores

    def get_embedding(self, x):
        return self.nonlinear(self.embedding_net(x))

class ContrastiveLoss(nn.Module):
    def __init__(self, margin):
        super(ContrastiveLoss, self).__init__()
        self.margin = margin
        self.eps = 1e-9

    def forward(self, output1, output2, target, size_average=True):
        distances = (output2 - output1).pow(2).sum(1)  # squared distances
        losses = 0.5 * (target.float() * distances +
                        (1 + -1 * target).float() * F.relu(self.margin - (distances + self.eps).sqrt()).pow(2))
        return losses.mean() if size_average else losses.sum()

def fit(train_loader, val_loader, model, loss_fn, optimizer, scheduler, n_epochs, cuda, log_interval, metrics=[],
    for epoch in range(0, start_epoch):

    for epoch in range(start_epoch, n_epochs):

        # Train stage
        train_loss, metrics = train_epoch(train_loader, model, loss_fn, optimizer, cuda, log_interval, metrics)

        message = 'Epoch: {}/{}. Train set: Average loss: {:.4f}'.format(epoch + 1, n_epochs, train_loss)
        for metric in metrics:
            message += '\t{}: {}'.format(, metric.value())

        val_loss, metrics = test_epoch(val_loader, model, loss_fn, cuda, metrics)
        val_loss /= len(val_loader)

        message += '\nEpoch: {}/{}. Validation set: Average loss: {:.4f}'.format(epoch + 1, n_epochs,
        for metric in metrics:
            message += '\t{}: {}'.format(, metric.value())


def train_epoch(train_loader, model, loss_fn, optimizer, cuda, log_interval, metrics):
    for metric in metrics:

    losses = []
    total_loss = 0

    for batch_idx, (data, target) in enumerate(train_loader):
        target = target if len(target) > 0 else None
        if not type(data) in (tuple, list):
            data = (data,)
        if cuda:
            data = tuple(d.cuda() for d in data)
            if target is not None:
                target = (t.cuda() for t in target)

        outputs = model(*data)

        if type(outputs) not in (tuple, list):
            outputs = (outputs,)

        loss_inputs = outputs
        if target is not None:
            target = (target,)
            loss_inputs += target

        loss_outputs = loss_fn(*loss_inputs)
        loss = loss_outputs[0] if type(loss_outputs) in (tuple, list) else loss_outputs
        total_loss += loss.item()

        for metric in metrics:
            metric(outputs, target, loss_outputs)

        if batch_idx % log_interval == 0:
            message = 'Train: [{}/{} ({:.0f}%)]\tLoss: {:.6f}'.format(
                batch_idx * len(data[0]), len(train_loader.dataset),
                100. * batch_idx / len(train_loader), np.mean(losses))
            for metric in metrics:
                message += '\t{}: {}'.format(, metric.value())

            losses = []

    total_loss /= (batch_idx + 1)
    return total_loss, metrics

def test_epoch(val_loader, model, loss_fn, cuda, metrics):
    with torch.no_grad():
        for metric in metrics:
        val_loss = 0
        for batch_idx, (data, target) in enumerate(val_loader):
            target = target if len(target) > 0 else None
            if not type(data) in (tuple, list):
                data = (data,)
            if cuda:
                data = tuple(d.cuda() for d in data)
                if target is not None:
                    target = target.cuda()

            outputs = model(*data)

            if type(outputs) not in (tuple, list):
                outputs = (outputs,)
            loss_inputs = outputs
            if target is not None:
                target = (target,)
                loss_inputs += target

            loss_outputs = loss_fn(*loss_inputs)
            loss = loss_outputs[0] if type(loss_outputs) in (tuple, list) else loss_outputs
            val_loss += loss.item()

            for metric in metrics:
                metric(outputs, target, loss_outputs)

    return val_loss, metrics

train_dataset = Classifier('train')
test_dataset = Classifier('val')

mean, std = 0.28604059698879553, 0.35302424451492237
batch_size = 256

cuda = torch.cuda.is_available()
kwargs = {'num_workers': 1, 'pin_memory': True} if cuda else {}
train_loader =, batch_size=batch_size, shuffle=True, **kwargs)
test_loader =, batch_size=batch_size, shuffle=False, **kwargs)
n_classes = 27

from torch.autograd import Variable
cuda = torch.cuda.is_available()

# Set up data loaders
batch_size = 256
kwargs = {'num_workers': 1, 'pin_memory': True} if cuda else {}
train_loader =, batch_size=batch_size, shuffle=True, **kwargs)
test_loader =, batch_size=batch_size, shuffle=False, **kwargs)

embedding_net = EmbeddingNet()
model = ClassificationNet(embedding_net, n_classes=n_classes)
if cuda:
loss_fn = torch.nn.NLLLoss()
lr = 1e-2
optimizer = optim.Adam(model.parameters(), lr=lr)
scheduler = lr_scheduler.StepLR(optimizer, 8, gamma=0.1, last_epoch=-1)
n_epochs = 100
log_interval = 10

# Step 1
siamese_train_dataset = Siamese(train_dataset) # Returns pairs of images and target same/different
siamese_test_dataset = Siamese(test_dataset)
batch_size = 128
kwargs = {'num_workers': 1, 'pin_memory': True} if cuda else {}
siamese_train_loader =, batch_size=batch_size, shuffle=True, **kwargs)
siamese_test_loader =, batch_size=batch_size, shuffle=False, **kwargs)

# Step 2
embedding_net = EmbeddingNet()
# Step 3
model = SiameseNet(embedding_net)
if cuda:
# Step 4
margin = 1.
loss_fn = ContrastiveLoss(margin)
lr = 1e-3
optimizer = optim.Adam(model.parameters(), lr=lr)
scheduler = lr_scheduler.StepLR(optimizer, 8, gamma=0.1, last_epoch=-1)
n_epochs = 100
log_interval = 10

fit(siamese_train_loader, siamese_test_loader, model, loss_fn, optimizer, scheduler, n_epochs, cuda, log_interval)

Based on the code snippet and your explanation it seems that the data loading is wrong.
The Siamese dataset returns:

return (img_0, label_0), (img_1, label_1)

so I assume that this unpacking is wrong:

    for batch_idx, (data, target) in enumerate(train_loader):
        target = target if len(target) > 0 else None
        if not type(data) in (tuple, list):
            data = (data,)
        if cuda:
            data = tuple(d.cuda() for d in data)
            if target is not None:
                target = (t.cuda() for t in target)

as it seems you are assigning the img_0 and label_0 as the data and img_1 and label_1 as the target.

Thank you very much for the reply by taking time @ptrblck.
Even I have analyzed and got to know that I am making a mistake in unpacking. Thank you for your assertions and help.

It may be a stupid question to ask, but as I am quite new to PyTorch and the data loader thing.
I am trying to unpack it but it is giving me a Runtime error or out-of-shape error.

if possible, can you just give me an Idea/hint on how should I unpack for 2 images in data loader for

for batch_idx, (data, target) in enumerate(train_loader):
        target = target if len(target) > 0 else None
        if not type(data) in (tuple, list):
            data = (data,)
        if cuda:
            data = tuple(d.cuda() for d in data)
            if target is not None:
                target = (t.cuda() for t in target)

This unpacking should work:

class Siamese(Dataset):
    def __init__(self):
        self.data0 = torch.randn(10, 3, 224, 224)
        self.target0 = torch.randint(0, 10, (10,))
        self.data1 = torch.randn(10, 3, 224, 224)
        self.target1 = torch.randint(0, 10, (10,))

    def __getitem__(self, index):
        img_0 = self.data0[index]
        label_0 = self.target0[index]
        img_1 = self.data1[index]
        label_1 = self.target1[index]
        return (img_0, label_0), (img_1, label_1)

    def __len__(self):
        return len(self.data0)

dataset = Siamese()
loader = DataLoader(dataset, batch_size=2)

for idx, (batch0, batch1) in enumerate(loader):
    data0, target0 = batch0
    data1, target1 = batch1
    print(data0.shape, target0.shape)
    print(data1.shape, target1.shape)

Thank you so much @ptrblck . Once again thank you for your time. I really appreciate your help.
But eventually with above solution I am facing an issue in def train_epoch where I call

        outputs = model(*data)

to plot the accuracy vs loss curve or to fit the model.

I did not understand how I am supposed to give dataset object in siamese class. How it will pick 2 random images. I have a single folder which contains 21600 images , and csv files which contains labels. (Training, Validation, Testing CSV files).

In my above code, in class Siamese(Dataset): I am choosing the 2 random images as you did.

is their a problem with dataset I am loading using class Classifier(Dataset):
Where I am returning only img and lable.

I assume model is an object of SiameseNet, which expects two inputs.
In that case you can pass the data inputs directly to the model via:

outputs = model(data0, data1)

I tried it @ptrblck . But that raises an another error of more than one value is ambigious as shown in below image


also in def train_epoch It gives the same error for

if target is not None:
            target = (target,)
            loss_inputs += target

if I replace target with (target0,target1).

Based on the error message it seems that one input is used as the size_average argument, so you would have to make sure to pass the appropriate number of inputs to this loss function.

Yes. I got your point. But I am passing appropriate number of inputs to loss function.

        outputs = model(data0,data1)

        if type(outputs) not in (tuple, list):
            outputs = (outputs,)

        loss_inputs = outputs
        if (target0,target1) is not None:
            target = (target0,target1)
            loss_inputs += target

        loss_outputs = loss_fn(*loss_inputs)
        loss = loss_outputs[0] if type(loss_outputs) in (tuple, list) else loss_outputs
        total_loss += loss.item()

        for metric in metrics:
            metric(outputs, target), loss_outputs)
 loss_outputs = loss_fn(*loss_inputs)

This line is causing this bug. I observed the loss function as well. Which I defined:

class ContrastiveLoss(nn.Module):

    def __init__(self, margin):
        super(ContrastiveLoss, self).__init__()
        self.margin = margin
        self.eps = 1e-9

    def forward(self, output1, output2, target, size_average=False):
        distances = (output2 - output1).pow(2).sum(1)  # squared distances
        losses = 0.5 * (target.float() * distances +
                        (1 + -1 * target).float() * F.relu(self.margin - (distances + self.eps).sqrt()).pow(2))
        return losses.mean() if size_average else losses.sum()

Contrastive loss: it takes embeddings of two samples and a target label == 1 if samples are from the same class and label == 0 otherwise.

in which I tried keeping size_average= False still I am getting this bug.

is their any other problem which may cause this bug?

Based on your previous post, your model would return 2 tensors:

return output1, output2

so outputs will contain both of these tensors.
Later you are appending both targets:

        loss_inputs = outputs
        if (target0,target1) is not None:
            target = (target0,target1)
            loss_inputs += target

so loss_inputs would contain 4 tensors, which will raise the error, since ContrastiveLoss will map them to:

def forward(self, output1, output2, target, size_average=False):

such that target1 will be used as size_average.

Thanks for the explanation @ptrblck. I have made changes and it worked. Just 1 last doubt. Below

class Siamese(Dataset):
    def __init__(self):
        self.data0 = torch.randn(10, 3, 224, 224)
        self.target0 = torch.randint(0, 10, (10,))
        self.data1 = torch.randn(10, 3, 224, 224)
        self.target1 = torch.randint(0, 10, (10,))

you have used torch.randn function.

I have tried giving my own dataset as an object as like below previously:

   def __init__(self, dataset):
        self.dataset = dataset

Now, As I want to pass my own dataset over here, what am I supposed to pass it
for self.data0,self.target0,self.data1 ,self.target1 .
What instances should I make to make it work?

You can keep your initial indexing of the self.dataset and sample all 4 tensors (2 data and 2 target tensors). I just used torch.randn as an example as I don’t have your dataset.

Hello @ptrblck ,
I kept the initial indexing of self. dataset. But I did not understand for a sampling of a dataset into 4 tensors (2 data and 2 target tensors). How should I do it in PyTorch?
As I have 21600 images in a single folder and respective train.csv, val.csv,test.csv.

   def __init__(self, dataset):
        self.dataset = dataset

After this how should I sample?

class Siamese(Dataset):

    def __init__(self,dataset):

        self.data0 = ??
        self.data1 = ??
        self.target0 = ??
        self.target1 = ??

and also If I sample it into 4 tensor while returning

    def __len__(self):
        return len(self.data0)

it gives me an error

TypeError: object of type 'module' has no len()