RuntimeError: Input type (torch.cuda.FloatTensor) and weight type (torch.FloatTensor) should be the same

rjsdebug · July 26, 2018, 7:36pm

I am getting the above error (mentioned in title). I was wondering what this issue even means. My guess is that I am using the cpu and gpu together, which is causing errors. However, how can the problem only lie in the weight data? Does this mean I have to convert all of my tensors to tensor.cuda()?

justusschock · July 26, 2018, 7:47pm

It seems that your input tensors are already on GPU, but your model isn’t. Did push your model to GPU as well?
If you did not, this might be causing the error. If you did so, you should probably show the code since we can only have random guesses without the code.

rjsdebug · July 26, 2018, 7:52pm

editing… i need to format this correctl sorry

rjsdebug · July 26, 2018, 7:55pm

Here is the code. I feel like the error might be caused when I say layers+=[RowLSTM()], since this has not been pushed to GPU yet? Sorry for the length but the error could be anywhere so here it all is.

import torch.nn.init as init
import torch
__all__ = [
  'VGG', 'vgg11', 'vgg11_bn', 'vgg13', 'vgg13_bn', 'vgg16', 'vgg16_bn',
  'vgg19_bn', 'vgg19',
]


class VGG(nn.Module):
  '''
  VGG model 
  '''
  def __init__(self, features): # features represents the layers array
      super(VGG, self).__init__()
      self.features = features
      self.classifier = nn.Sequential(
          nn.Dropout(),
          nn.Linear(512,512),
          nn.ReLU(True),
          nn.Dropout(),
          nn.Linear(512, 512),
          nn.ReLU(True),
          nn.Linear(512, 10),
      )
       # Initialize weights
      for m in self.modules():
          if isinstance(m, nn.Conv2d):
              n = m.kernel_size[0] * m.kernel_size[1] * m.out_channels
              m.weight.data.normal_(0, math.sqrt(2. / n))
              m.bias.data.zero_()


  def forward(self, x): # x is the image, we run x through the layers
      print(x.size())
      x = self.features(x) # runs through all features, where each feature is a function
      x = x.view(x.size(0), -1) 
      # after running through features, does sequential steps to finally classify
      x = self.classifier(x)
      # print(x)
      return x


def make_layers(cfg, batch_norm=False):
 # print("Making layers!")
  layers = []
  in_channels = 3
  for v in cfg:
      if v == 'M':
          layers += [nn.MaxPool2d(kernel_size=2, stride=2)]
      else:
          conv2d = nn.Conv2d(in_channels, v, kernel_size=3, padding=1)
          if batch_norm:
              layers += [conv2d, nn.BatchNorm2d(v), nn.ReLU(inplace=True)]
          else:
              layers += [conv2d, nn.ReLU(inplace=True)]
          in_channels = v
          layers+=[RLSTM()]

  return nn.Sequential(*layers)

class RLSTM(nn.Module):
  def __init__(self):
      super(RLSTM,self).__init__()



  def forward(self, image):
      print("going in rowlstm")
      global current
      global _layer
      global isgates
      size = image.size()
      b = size[0]
      indvs = list(image.split(1,0)) # split up the batch into individual images
      #print(indvs[0].size())
      tensor_array = []
      for i in range(b):
          current = 0
          _layer = []
          isgates = []
          tensor_array.append(self.RowLSTM(indvs[i]))

      seq=tuple(tensor_array)
      trans = torch.cat(seq,0)
      return trans.cuda() # trying to make floattensor error go away 
  def RowLSTM(self, image): 
      global current
      global _layer
      global isgates


      # input-to-state (K_is * x_i) : 3x1 convolution. generate 4h x n x n tensor. 4hxnxn tensor contains all i -> s info

  # the input to state convolution should only be computed one time 
      if current==0:
          n = image.size()[2]
          ch=image.size()[1]
          input_to_state = torch.nn.Conv2d(ch,4*ch,kernel_size=(1,3),padding=(0,1))
          isgates = self.splitIS(input_to_state(image)) # convolve, then split into gates (4 per row)
          cell=RowLSTMCell(0,torch.randn(ch,n,1),torch.randn(ch,n,1),torch.randn(ch,n,1),torch.randn(ch,n,1),torch.randn(ch,n,1),torch.randn(ch,n,1))
          # now have dummy, learnable variables for first row
          _layer.append(cell)

      else:   
          Cell_prev = _layer[current-1] # access previous row
          hidPrev = Cell_prev.getHiddenState() 
          ch = image.size()[1] 
      #   print("about to apply conv1d")
          state_to_state = torch.nn.Conv2d(ch,4*ch,kernel_size=(1,3),padding=(0,1)) # error is here: hidPrev is an array - not a valid number of input channel
      #   print("applied conv1d") 
          prevHid=Cell_prev.getHiddenState()
          ssgates = self.splitSS(state_to_state(prevHid.unsqueeze(0))) #need to unsqueeze (Ex: currently 16x5, need to make 1x16x5)
          gates = self.addGates(isgates,ssgates,current)
          # split gates
          ig, og, fg, gg = gates[0], gates[1], gates[2], gates[3] # into four, ADD SIGMOID!
          cell = RowLSTMCell(Cell_prev,ig,og,fg,gg,0,0)
          cell.compute()
          _layer.append(cell)
      # attempting to eliminate requirement of getting size

      #print(current)
      try:
          
          current+=1
          y=(isgates[0][0][1][current])
          return self.RowLSTM(image) 
      except Exception as error:
          concats=[]
          for cell in _layer:
              tensor=torch.unsqueeze(cell.h,0)
              
              concats.append(tensor)
          seq=tuple(concats)
          tensor=torch.cat(seq,3)
          return tensor

  def splitIS(tensor): #always going to be splitting into 4 pieces, so no need to add extra parameters
      inputStateGates={}
      size=tensor.size() # 1 x 4h x n x n
      out_ft=size[1] # get 4h for the nxnx4h tensor
      num=size[2] # get n for the nxn image
      hh=out_ft/4 # we want to split the tensor into 4, for the gates
      tensor = torch.squeeze(tensor) # 4h x n x n

      # First, split by row: Creates n tensors of 4h x n x 1
      rows = list(tensor.split(1,2))

      for i in range(num):
          # Each row is a tensor of 4h x n x 1, split it into 4 of h x n x 1
          row=rows[i]
          inputStateGates[i]=list(row.split(hh,0))
          
      return inputStateGates 


  def splitSS(tensor): # 1 x 4h x n x 1, create 4 of 1 x h x n x 1 
      size=tensor.size() 
      out_ft=size[1] # get 4h for the 1x4hxn tensor
      num=size[2] # get n for the 1xhxn row
      hh=out_ft/4 # we want to split the tensor into 4, for the gates
      tensor = tensor.squeeze(0) # 4h x n x 1
      splitted=list(tensor.split(hh,0))
      return splitted 


  def addGates(i2s,s2s,key):
      """ these dictionaries are of form {key : [[i], [o], [f], [g]]}
          we want to add pairwise elemeents """

      # i2s is of form key: [[i], [o], [f], [g]] where each gate is hxn
      # s2s is of form [[h,n],[h,n],[h,n], [h,n]]
      gateSum = []
      for i in range(4): # always of length 4, representing the gates
          gateSum.append(torch.sigmoid(i2s[key][i] + s2s[i]))

      return gateSum
cfg = {
  'A': [64, 'M', 128, 'M', 256, 256, 'M', 512, 512, 'M', 512, 512, 'M'],
  'B': [64, 64, 'M', 128, 128, 'M', 256, 256, 'M', 512, 512, 'M', 512, 512, 'M'],
  'D': [64, 64, 'M', 128, 128, 'M', 256, 256, 256, 'M', 512, 512, 512, 'M', 512, 512, 512, 'M'],
  'E': [64, 64, 'M', 128, 128, 'M', 256, 256, 256, 256, 'M', 512, 512, 512, 512, 'M', 
        512, 512, 512, 512, 'M'],
}

Noosh_Nabi · December 20, 2018, 11:15pm

I have faced the similar error as well, and will be thankful for any help and advice.

RuntimeError: Input type (torch.FloatTensor) and weight type (torch.cuda.FloatTensor) should be the same

justusschock · December 21, 2018, 8:08am

can you post a minimal example to reproduce it? Typically something hasn’t been pushed to the correct device (usually the GPU).

Hung_Nguyen · December 21, 2018, 9:25am

You need to convert your input to cuda before feeding it to the model.
Let’s say:

model = VGG16()
model.cuda()

for inp in dataset:
    x = inp.cuda()
    y = model(x)
   ...

sun · December 23, 2018, 8:58am

need to move your model to GPU:

model.cuda()

zhjikoshlizhzc · March 12, 2019, 7:10am

Thanks very much! my problem was fixed because of your idea!

shwinshaker · January 9, 2020, 3:25am

I got the same problem. while later found that nn.ModuleList should be used to properly implement list of modules in pytorch, otherwise, it can’t be recognized. Hope this helps

AlittleBrave · April 2, 2020, 4:03am

Thanks. Fixed my problem.

zongoalbert · June 9, 2020, 3:16pm

i have the some issue please need your help

ptrblck · June 10, 2020, 7:59am

The error is raised if your data or model are not pushed to the device, while one of them is.
You should make sure that both are on the GPU using:

model.to('cuda')
data = data.to('cuda')

as explained in this topic.

devin · December 8, 2020, 12:22pm

Thanks a lot! Worked.

jacob_williams · December 17, 2020, 8:33pm

Thanks for this suggestion, this was my issue.

Shyam_Gupta196 · May 13, 2021, 4:56pm

Model.cuda() #not a good approach

is not what i will tell you to do …
you may use

device =  torch.device('cuda')if torch.cuda.is_available() else torch.device('cpu')

then push the model to device & not cuda 

model = model.to(device) ## this will be a correct approach
### otherwise it will not be reproducable ###

Sajedeh_Molavi · September 22, 2021, 6:39pm

Thank you so much. Great.

senitent_signal · August 4, 2022, 11:53am

One of element of n-network is not registered on GPU

CNN_net(
  (conv1): Conv2d(1, 16, kernel_size=(5, 5), stride=(1, 1))
  (maxpool): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  (conv2): Conv2d(16, 24, kernel_size=(5, 5), stride=(1, 1))
  (maxpool2): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  (fc1): Linear(in_features=384, out_features=100, bias=True)
  (fc2): Linear(in_features=100, out_features=10, bias=True)
  (prelu): PReLU(num_parameters=1)
)

Example 2

CNN_net(
  (conv1): Conv2d(1, 16, kernel_size=(5, 5), stride=(1, 1))
  (maxpool): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  (conv2): Conv2d(16, 24, kernel_size=(5, 5), stride=(1, 1))
  (maxpool2): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  (fc1): Linear(in_features=384, out_features=100, bias=True)
  (fc2): Linear(in_features=100, out_features=10, bias=True)
  
)

Only solution is to include elements of n-network in init function else torch wont register it while transfering to gpu.

mna · May 9, 2024, 1:44pm

i have an error :
/usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py in _conv_forward(self, input, weight, bias)
454 weight, bias, self.stride,
455 _pair(0), self.dilation, self.groups)
→ 456 return F.conv2d(input, weight, bias, self.stride,
457 self.padding, self.dilation, self.groups)
458

RuntimeError: Input type (torch.cuda.FloatTensor) and weight type (torch.FloatTensor) should be the same

and it runs the first epoch well, but after that i have error above.

ptrblck · May 9, 2024, 1:51pm

Try to narrow down which layer fails exactly and make sure its parameters are in the GPU since the error message claims they are still on the CPU.