I am experimenting with a CNN+LSTM model that uses torchvision classification model. The original model accepts images as inputs, and I have been trying for days to make it accept numpy inputs. I understand the two have different dimensions as the numpy data is given as
[batch_size, depth, height, width, channels]
instead of
[batch_size, channels, depth, height, width].
Based on this answer, I can use the permute function to change the order of the dimensions. However, I can’t find any solution or leads on how to do this in a torchvision classification model.
Here is the part of the model.py that I think is relevant to the question
What did you meant by the depth dimension ?
Usually you can try ToTensor(). Most people put it in the transform step of the DataLoader. Such as the example below:
class ImageDataset(torch.utils.data.Dataset):
'''This class is the image dataset used in inferences
Args:
----------------------------------------------------------------------------
default_conf, default_preprocessing:
- conf: standard stuff
- preprocessing: resize, center crop, to Tensor (from numpy), and normalize
self.root: Database folder, in which there are images folder
-> each folder would be a class
self.names: array of Paths of image relative to root -> used to read images
'''
default_conf = {
'globs': ['*.jpg', '*.png', '*.jpeg', '*.JPG', '*.PNG'],
'grayscale': False,
'interpolation': 'cv2_area'
}
default_preprocessing = transforms.Compose([
transforms.Resize(256),
transforms.CenterCrop(256),
transforms.ToTensor(),
transforms.Normalize(mean = [0.485, 0.456, 0.406],
std = [0.229, 0.224, 0.225])
])
def __init__(self, root, input_transforms = default_preprocessing):
super().__init__()
self.root = root
self.input_transforms = input_transforms
# Iterate over directories to get Images' path
paths = []
for g in self.default_conf['globs']:
paths += list(Path(root).glob('**/'+g)) #read paths with file following 'globs'
paths = sorted(list(set(paths)))
self.names = [i.relative_to(root).as_posix() for i in paths]
def __getitem__(self, idx):
''' This function is due to Pytorch's Map-style dataset.
Currently return two element:
- query: TENSOR image of the index
- label: label of that image
'''
query = read_image(self.root/self.names[idx])
if self.input_transforms:
query = self.input_transforms(query)
return query
def __len__(self):
return len(self.names)
read_image() is simply read using PIL.Image (you can also use OpenCV).
def read_image(path: Path):
"""This function read an image from a path.
The read is perform using PIL.Image (cause PyTorch).
"""
image = Image.open(path)
if image is None:
raise ValueError(f'Cannot read image {path}.')
return image
Can you please share your code where you try to load numpy arrays into torch? Note the “.pt” files are generally used to save torch models, and you may not even need to save the tensors in the first place.
my inputs are numpy array - what I was trying to do is action recognition where instead of video frames my inputs are numpy files. I have the transform step and used the ToTensor() to convert these numpy files. Below is the part of the code:
RuntimeError: Given groups=1, weight of size 64 3 7 7, expected input[1, 11, 75, 3] to have 3 channels, but got 11 channels instead
I think the permute function will help me solve this error as tensor and numpy have different orders of dimensions, but I don’t know how to do it with a torchvision classification model
I have fixed the error. The first step I did is to convert the numpy arrays into torch as suggested by ArchiGertsman, but I did it by converting them to “.pt”. Since I can’t reorder the numpy array’s [Height, Width, Channels] to [Channels, Height, Width] using the ToTensor() , I used torch.einsum() before saving the tensors.