RuntimeError: Expected object of scalar type Long but got scalar type Float when using CrossEntropyLoss

I have a NN that ends with the following linear layers

dense = nn.Linear(input_size, 1)

if I use CrossEntropyLoss as loss function (as I’m y is supposed to be the class number) I get the following error

RuntimeError                              Traceback (most recent call last)
<ipython-input-39-72a754e03ca3> in <module>()
      1 lr = 2e-2
      2 learner = SimpleLearner([train_dl, test_dl], model, loss_func)
----> 3 history = learner.fit(10)

<ipython-input-37-121ec7440a76> in fit(self, epochs, lr)
     26             losses = []
     27             for x,y in self.data[0]:
---> 28                 losses.append(self.update(x, y , lr))
     29             history['losses'].append(np.mean(losses))
     30         return history

<ipython-input-37-121ec7440a76> in update(self, x, y, lr)
     10         for p in model.parameters(): w2 += (p**2).sum()
     11         # add to regular loss
---> 12         loss = loss_func(y_hat, y) + w2 * self.wd
     13         loss.backward()
     14         with torch.no_grad():

/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)
    477             result = self._slow_forward(*input, **kwargs)
    478         else:
--> 479             result = self.forward(*input, **kwargs)
    480         for hook in self._forward_hooks.values():
    481             hook_result = hook(self, input, result)

/usr/local/lib/python3.6/dist-packages/torch/nn/modules/loss.py in forward(self, input, target)
    865     def forward(self, input, target):
    866         return F.cross_entropy(input, target, weight=self.weight,
--> 867                                ignore_index=self.ignore_index, reduction=self.reduction)
    868 
    869 

/usr/local/lib/python3.6/dist-packages/torch/nn/functional.py in cross_entropy(input, target, weight, size_average, ignore_index, reduce, reduction)
   1778     if size_average is not None or reduce is not None:
   1779         reduction = _Reduction.legacy_get_string(size_average, reduce)
-> 1780     return nll_loss(log_softmax(input, 1), target, weight, None, ignore_index, None, reduction)
   1781 
   1782 

/usr/local/lib/python3.6/dist-packages/torch/nn/functional.py in nll_loss(input, target, weight, size_average, ignore_index, reduce, reduction)
   1623                          .format(input.size(0), target.size(0)))
   1624     if dim == 2:
-> 1625         ret = torch._C._nn.nll_loss(input, target, weight, _Reduction.get_enum(reduction), ignore_index)
   1626     elif dim == 4:
   1627         ret = torch._C._nn.nll_loss2d(input, target, weight, _Reduction.get_enum(reduction), ignore_index)

RuntimeError: Expected object of scalar type Long but got scalar type Float for argument #2 'target'

What should be the loss (similarly for the accuracy) function that I should be using? if CrossEntropyLoss is the good one that should I do with the output of the linear layer?

The output layer should have the number of classes as out_features.
Currently your output layer only returns one neuron, which corresponds to class0.
For a binary use case, this should work:

batch_size = 5
nb_classes = 2
in_features = 10

model = nn.Linear(in_features, nb_classes)
criterion = nn.CrossEntropyLoss()

x = torch.randn(batch_size, in_features)
target = torch.empty(batch_size, dtype=torch.long).random_(nb_classes)

output = model(x)
loss = criterion(output, target)
loss.backward()

However, this doesn’t seem to be the error you are seeing here.
As you can see in my example, target should be of type torch.long. Try to fix the shapes and call target = target.long() to transform the data type.

Alternatively, you could return just one neuron and use nn.BCEWithLogitsLoss as your criterion.
This would also work with a float target.

25 Likes

I figured the problem, I was creating the target tensor and passing float as dtype :disappointed_relieved:
the following fixed the issue.

y_tensor = torch.tensor(y_train, dtype=torch.long, device=device)
9 Likes

target should be of type torch.long .

Why does the type of target have to be torch.long ?
Thank you!

The target should be a LongTensor using nn.CrossEntropyLoss (or nn.NLLLoss), since it is used to index the output logit (or log probability) for the current target class as shown in this formula (note the indexing in x[class]).

8 Likes

I’ve specified the dtype to be torch.long, and still the error pops out, but not in nn.CrossEntropyLoss in the prediction line, here is the error:

RuntimeError Traceback (most recent call last)
in
----> 1 train(model,train_loader,valid_loader)

in train(model, train_dataloader, valid_dataloader, epochs, lr)
18 #print(len(target[0][‘label’]))
19 optimizer.zero_grad()
—> 20 output = model(data,target)
21 print(‘check’)
22 loss = criterion(output,target)

/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py in call(self, *input, **kwargs)
491 result = self._slow_forward(*input, **kwargs)
492 else:
–> 493 result = self.forward(*input, **kwargs)
494 for hook in self._forward_hooks.values():
495 hook_result = hook(self, input, result)

/opt/conda/lib/python3.6/site-packages/torchvision/models/detection/generalized_rcnn.py in forward(self, images, targets)
49 if isinstance(features, torch.Tensor):
50 features = OrderedDict([(0, features)])
—> 51 proposals, proposal_losses = self.rpn(images, features, targets)
52 detections, detector_losses = self.roi_heads(features, proposals, images.image_sizes, targets)
53 detections = self.transform.postprocess(detections, images.image_sizes, original_image_sizes)

/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py in call(self, *input, **kwargs)
491 result = self._slow_forward(*input, **kwargs)
492 else:
–> 493 result = self.forward(*input, **kwargs)
494 for hook in self._forward_hooks.values():
495 hook_result = hook(self, input, result)

/opt/conda/lib/python3.6/site-packages/torchvision/models/detection/rpn.py in forward(self, images, features, targets)
413 losses = {}
414 if self.training:
–> 415 labels, matched_gt_boxes = self.assign_targets_to_anchors(anchors, targets)
416 regression_targets = self.box_coder.encode(matched_gt_boxes, anchors)
417 loss_objectness, loss_rpn_box_reg = self.compute_loss(

/opt/conda/lib/python3.6/site-packages/torchvision/models/detection/rpn.py in assign_targets_to_anchors(self, anchors, targets)
272 for anchors_per_image, targets_per_image in zip(anchors, targets):
273 gt_boxes = targets_per_image[“boxes”]
–> 274 match_quality_matrix = self.box_similarity(gt_boxes, anchors_per_image)
275 matched_idxs = self.proposal_matcher(match_quality_matrix)
276 # get the targets corresponding GT for each proposal

/opt/conda/lib/python3.6/site-packages/torchvision/ops/boxes.py in box_iou(boxes1, boxes2)
130 area2 = box_area(boxes2)
131
–> 132 lt = torch.max(boxes1[:, None, :2], boxes2[:, :2]) # [N,M,2]
133 rb = torch.min(boxes1[:, None, 2:], boxes2[:, 2:]) # [N,M,2]
134

RuntimeError: Expected object of scalar type Long but got scalar type Float for argument #2 ‘other’

I’m not familiar with your code, but it seems boxes1 is passed as a LongTensor, while boxes2 is a FloatTensor. Could this be the case?
If so, you could call float() in the first or long() on the second argument, depending on your expected result.

1 Like

thanks for your response, I was wondering why there is boxes 1 and boxes 2, my input is about an image with a lot of labels in it and each label have 4 values for it box, so actually each label have one box only, so what does boxes 2 is for, if you have any idea in general??

It is helpful for me

I solved my problem with:

loss = criterion(output, target.long())

But i don’t know yet why when i do this conversion from a list of ints to a tensor, its data became floats. I’m doing the conversion with torch.Tensor:

target = torch.Tensor(Target_list)

is there a way to specify the dtype of the conversion?

5 Likes

Shows me this TypeError

new() received an invalid combination of arguments

In my case y_train is a list of ints

Use torch.tensor with the lowercase t: torch.tensor([0, 1, 2]).

1 Like

Why does pytorch require the label to be of type torch.long? Keras did not have this problem. A label should be what the user says it is…

e.g. if I’m creating an LSTM model for the following sequence
[0.08574425157113574, 0.08615396453867125, 0.08615396453867125, 0.08624176097991065, 0.08632949541005121, 0.08641732567552636, 0.08647585287810422, 0.08650518412786463, 0.08752938198611403, 0.08805613808405999]

And the sequence length is 3, then the label for
.08574425157113574, 0.08615396453867125, 0.08615396453867125

is 0.08624176097991065

The above should not be a problem.

nn.CrossEntropyLoss expects class indices to index the logits directly (as given in the formula in the docs).
If you want to provide a contiguous target distribution, have a look at nn.KLDivLoss.

Is it feasible to allow it to accept a char instead of a long and still index into the logits as mentioned? I’m working on a segmentation problem with around 10 classes, so my label fits comfortably into a single byte. But requiring a long means the mask has to be of type int64, using 7 extra bytes per pixel. This seems non-trivial since then each pixel requires 20 bytes (total across the image and mask) when uint8 mask pixels would mean a total of only 13 bytes, meaning for this problem int64 labels makes a batch require over 50% more memory than with uint8 labels.

1 Like

I have the same thought with you, now do you have any ideas?

:smiley:
problem was on dtype of target, for labels greater than 1, i found we should use target of LongTensor not FloatTensor for nn.Crossentropy

If you see this error, you can also try using a numpy ndarray but with int64 types instead of float types.

In my code, the float part was causing problems. I fix it with:
y = data['outcome'].values.astype('int')

Numpy arrays seem to be supported. See PyTorch Lightning Bolts Scikit-Learn example:

All right, this worked for me whenever I got error on similar lines (I am gonna keep it here for reference):

  1. When calling your model on input ensure that your x is float.
  2. However keep your target as long.
  3. Call loss on model output and y without any typecasting.
from torch import nn
net = nn.Linear(input, out_classes)
loss_criterion = nn.CrossEntropyLoss()

net = net.to(device)
X = X.to(device).float()
y = y.to(device).long()

y_hat = net(X)
l = loss_criterion(y_hat, y)

The code runs for a certain (random) number of times every time I run it and then gives the following error.

Traceback (most recent call last):
  File "c:/Users/MY/Documents/slicing_dqn/envs/simple_dqn_torch_main.py", line 53, in <module>
    action = agent.choose_action(observation)
  File "c:\Users\MY\Documents\slicing_dqn\envs\simple_dqn_torch.py", line 73, in choose_action
    actions = self.Q_eval.forward(state)
  File "c:\Users\MY\Documents\slicing_dqn\envs\simple_dqn_torch.py", line 26, in forward
    x = F.relu(self.fc1(state))
  File "C:\Users\MY\anaconda3\lib\site-packages\torch\nn\modules\module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "C:\Users\MY\anaconda3\lib\site-packages\torch\nn\modules\linear.py", line 103, in forward
    return F.linear(input, self.weight, self.bias)
  File "C:\Users\MY\anaconda3\lib\site-packages\torch\nn\functional.py", line 1848, in linear
    return torch._C._nn.linear(input, weight, bias)
RuntimeError: expected scalar type Float but found Long

I am using DQN model and here is the observation space from my environment:

low = np.array([1, 1, 3, 2]).astype(np.float32) 
high = np.array([10, 10, 30, 20]).astype(np.float32)
self.observation_space = spaces.Box(low, high)

Here is the agent and model code:

class Agent:
    def __init__(self, gamma, epsilon, lr, input_dims, batch_size, n_actions,
                 max_mem_size=100000, eps_end=0.05, eps_dec=5e-4):
        self.gamma = gamma
        self.epsilon = epsilon
        self.eps_min = eps_end
        self.eps_dec = eps_dec
        self.lr = lr
        self.action_space = [i for i in range(n_actions)]
        self.mem_size = max_mem_size
        self.batch_size = batch_size
        self.mem_cntr = 0
        self.iter_cntr = 0
        self.replace_target = 100

        self.Q_eval = DeepQNetwork(lr, n_actions=n_actions,
                                   input_dims=input_dims,
                                   fc1_dims=256, fc2_dims=256)
        self.state_memory = np.zeros((self.mem_size, *input_dims), dtype=np.float32)
        self.new_state_memory = np.zeros((self.mem_size, *input_dims), dtype=np.float32)
        self.action_memory = np.zeros(self.mem_size, dtype=np.int32)
        self.reward_memory = np.zeros(self.mem_size, dtype=np.float32)
        self.terminal_memory = np.zeros(self.mem_size, dtype=np.bool)

    def store_transition(self, state, action, reward, state_, terminal):
        index = self.mem_cntr % self.mem_size
        self.state_memory[index] = state
        self.new_state_memory[index] = state_
        self.reward_memory[index] = reward
        self.action_memory[index] = action
        self.terminal_memory[index] = terminal

        self.mem_cntr += 1

    def choose_action(self, observation):
        if np.random.random() > self.epsilon:
            state = T.tensor([observation]).to(self.Q_eval.device)
            actions = self.Q_eval.forward(state)
            action = T.argmax(actions).item()
        else:
            action = np.random.choice(self.action_space)

        return action

    def learn(self):
        if self.mem_cntr < self.batch_size:
            return

        self.Q_eval.optimizer.zero_grad()

        max_mem = min(self.mem_cntr, self.mem_size)

        batch = np.random.choice(max_mem, self.batch_size, replace=False)
        batch_index = np.arange(self.batch_size, dtype=np.int32)

        state_batch = T.tensor(self.state_memory[batch]).to(self.Q_eval.device)
        new_state_batch = T.tensor(
                self.new_state_memory[batch]).to(self.Q_eval.device)
        action_batch = self.action_memory[batch]
        reward_batch = T.tensor(
                self.reward_memory[batch]).to(self.Q_eval.device)
        terminal_batch = T.tensor(
                self.terminal_memory[batch]).to(self.Q_eval.device)

        q_eval = self.Q_eval.forward(state_batch)[batch_index, action_batch]
        q_next = self.Q_eval.forward(new_state_batch)
        q_next[terminal_batch] = 0.0

        q_target = reward_batch + self.gamma*T.max(q_next, dim=1)[0]

        loss = self.Q_eval.loss(q_target, q_eval).to(self.Q_eval.device)
        loss.backward()
        self.Q_eval.optimizer.step()

        self.iter_cntr += 1
        self.epsilon = self.epsilon - self.eps_dec \
            if self.epsilon > self.eps_min else self.eps_min

I don’t understand what seems to be the issue particularly. If the issue is in the model or my observation space? If it is in obs_space then why does the code run for sometime and then crashes? If anyone can assist, it would be highly appreciated!