How can I make torchvision classification model accept numpy inputs?

I am experimenting with a CNN+LSTM model that uses torchvision classification model. The original model accepts images as inputs, and I have been trying for days to make it accept numpy inputs. I understand the two have different dimensions as the numpy data is given as

[batch_size, depth, height, width, channels] 

instead of

[batch_size, channels, depth, height, width]. 

Based on this answer, I can use the permute function to change the order of the dimensions. However, I can’t find any solution or leads on how to do this in a torchvision classification model.

Here is the part of the model.py that I think is relevant to the question

        elif arch.startswith('resnet50'):
            self.features = nn.Sequential(*list(original_model.children())[:-1]) 
            for i, param in enumerate(self.features.parameters()):
               
                param.requires_grad = False
            self.fc_pre = nn.Sequential(nn.Linear(2048, fc_size), nn.Dropout())
            self.rnn = nn.LSTM(input_size = fc_size,
                        hidden_size = hidden_size,
                        num_layers = lstm_layers,
                        batch_first = True)
            self.fc = nn.Linear(hidden_size, num_classes)
            self.modelName = 'resnet50_lstm'

Thank you.

have you tried copying the numpy array into a torch tensor?

I just tried this a while ago; however I got this error which I am getting sometimes when I tried np.load()

RuntimeError: Input type (torch.cuda.DoubleTensor) and weight type (torch.cuda.FloatTensor) should be the same

Is it possible to convert the numpy array to torch tensor with float type?

I converted the numpy arrays to torch tensor and saved them as “.pt” files

What did you meant by the depth dimension ?
Usually you can try ToTensor(). Most people put it in the transform step of the DataLoader. Such as the example below:

class ImageDataset(torch.utils.data.Dataset):
  '''This class is the image dataset used in inferences
    Args:
    ----------------------------------------------------------------------------
    default_conf, default_preprocessing: 
      - conf: standard stuff
      - preprocessing: resize, center crop, to Tensor (from numpy), and normalize
    self.root: Database folder, in which there are images folder 
      -> each folder would be a class
    self.names: array of Paths of image relative to root -> used to read images
  '''
  default_conf = {
    'globs': ['*.jpg', '*.png', '*.jpeg', '*.JPG', '*.PNG'],
    'grayscale': False,
    'interpolation': 'cv2_area'
  }
  default_preprocessing = transforms.Compose([
      transforms.Resize(256),
      transforms.CenterCrop(256),
      transforms.ToTensor(),
      transforms.Normalize(mean = [0.485, 0.456, 0.406],
                            std = [0.229, 0.224, 0.225]) 
  ])
  def __init__(self, root, input_transforms = default_preprocessing):
      super().__init__()
      self.root = root
      self.input_transforms = input_transforms
      
      # Iterate over directories to get Images' path
      paths = []
      for g in self.default_conf['globs']:
          paths += list(Path(root).glob('**/'+g)) #read paths with file following 'globs'
      paths = sorted(list(set(paths))) 
      self.names = [i.relative_to(root).as_posix() for i in paths]
  
  def __getitem__(self, idx):
    ''' This function is due to Pytorch's Map-style dataset.
    Currently return two element:
      - query: TENSOR image of the index
      - label: label of that image  
    '''        
    query =  read_image(self.root/self.names[idx])
    if self.input_transforms:
      query = self.input_transforms(query)         
    return query
    
  def __len__(self):
      return len(self.names)

read_image() is simply read using PIL.Image (you can also use OpenCV).

def read_image(path: Path):
    """This function read an image from a path.
    The read is perform using PIL.Image (cause PyTorch).
    """

    image = Image.open(path)
    if image is None:
        raise ValueError(f'Cannot read image {path}.')
    return image

Can you please share your code where you try to load numpy arrays into torch? Note the “.pt” files are generally used to save torch models, and you may not even need to save the tensors in the first place.

my inputs are numpy array - what I was trying to do is action recognition where instead of video frames my inputs are numpy files. I have the transform step and used the ToTensor() to convert these numpy files. Below is the part of the code:

    normalize = transforms.Normalize(mean=[0.485, 0.456, 0.406],
                                    std=[0.339, 0.224, 0.225])

    transform = (transforms.Compose([
                                    # transforms.Resize(224),
                                    # transforms.CenterCrop(224),
                                    transforms.ToTensor(),
                                    normalize]
                                    ),
                transforms.Compose([
                                    # transforms.Resize(224),
                                    # transforms.CenterCrop(224),
                                    transforms.ToTensor()]
                                    )
                )

However, I get this error

RuntimeError: Given groups=1, weight of size 64 3 7 7, expected input[1, 6, 75, 3] to have 3 channels, but got 6 channels instead

Hence, I wish to do the permute function but can’t figure out how to do it with a torchvision classification model.

I tried two ways. The first is through the transform step:

normalize = transforms.Normalize(mean=[0.485, 0.456, 0.406],
                                    std=[0.339, 0.224, 0.225])

    transform = (transforms.Compose([
                                    # transforms.Resize(224),
                                    # transforms.CenterCrop(224),
                                    transforms.ToTensor(),
                                    normalize]
                                    ),
                transforms.Compose([
                                    # transforms.Resize(224),
                                    # transforms.CenterCrop(224),
                                    transforms.ToTensor()]
                                    )
                )

and for the second, I tried converting them first to “.pt” and then I torch.save() them

        tensor1 = torch.from_numpy(np.load(NPY))
        tensor1 = tensor1.type(torch.FloatTensor)

But I get this error:

RuntimeError: Given groups=1, weight of size 64 3 7 7, expected input[1, 11, 75, 3] to have 3 channels, but got 11 channels instead

I think the permute function will help me solve this error as tensor and numpy have different orders of dimensions, but I don’t know how to do it with a torchvision classification model

ToTensor() convert each image (represented by a numpy array) from [Height, Width, Channels] to [Channels, Height, Width]

Seem to suggest you messed up somewhere, do you have a colab or jupyter where i can see your whole code ?

I have fixed the error. The first step I did is to convert the numpy arrays into torch as suggested by ArchiGertsman, but I did it by converting them to “.pt”. Since I can’t reorder the numpy array’s [Height, Width, Channels] to [Channels, Height, Width] using the ToTensor() , I used torch.einsum() before saving the tensors.

Congrats!!

Just a suggestion, for optimization I suggest you tried to figure out why ToTensor() doesn’t work.
This is just to follow normal convention.

1 Like