Loading model not working with device


I’m trying to load a model and run my test and validation data through it. I’m getting the following error:

  File C:\Users\JORDAN.HOWELL.GITDIR\Documents\GitHub\Inspection_Photo_Pytorch_Model\model_load.py", line 494, in <module>
    test_outputs = model(image, numerical_data, categorical_data)

  File "C:\Users\JORDAN.HOWELL.GITDIR\AppData\Local\Continuum\anaconda3\envs\torch_env\lib\site-packages\torch\nn\modules\module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)

  File "C:\Users\JORDAN.HOWELL.GITDIR\Documents\GitHub\Inspection_Photo_Pytorch_Model\model_load.py", line 463, in forward
    e = torch.cat(e, 1)

RuntimeError: invalid argument 0: Sizes of tensors must match except in dimension 1. Got 1 and 2 in dimension 2 at C:/w/1/s/tmp_conda_3.7_100118/conda/conda-bld/pytorch_1579082551706/work/aten/src\THC/generic/THCTensorMath.cu:71

Below is the model and how I loaded it:

class Image_Model(nn.Module):
    def __init__(self, embedding_size):
        self.all_embeddings = nn.ModuleList([nn.Embedding(categories, size) for categories, size in embedding_size])
        n_emb = sum(e.embedding_dim for e in self.all_embeddings)
        self.n_emb = n_emb
        self.embedding_dropout = nn.Dropout(p = 0.04)
        self.cnn = models.vgg19(pretrained=True)
        for param in self.cnn.parameters():
            param_requires_grad = False

        self.fc2 = nn.Sequential(nn.Linear(1043, 512))
        self.fc3 = nn.Dropout(p=.04)
        self.fc4 = nn.Sequential(nn.Linear(512, 256))
        self.fc5 = nn.Dropout(p = 0.04)
        self.fc9 = nn.Sequential(nn.Linear(256, 2))
    def forward(self, image, numerical_columns, cat_columns):
        x = self.cnn(image)  
        e = [e(cat_columns[:,i]) for i,e in enumerate(self.all_embeddings)]
        e = torch.cat(e, 1)
        x = torch.cat((x, numerical_columns), dim = 1)
        x = torch.cat((x, e), dim = 1)
        x = F.leaky_relu(self.fc2(x))
        x = F.leaky_relu(self.fc3(x))
        x = F.leaky_relu(self.fc4(x))
        x = F.leaky_relu(self.fc5(x))
        x = F.leaky_relu(self.fc9(x))
        x = F.log_softmax(x)
        return x

device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
model = Image_Model(embedding_size=[(3, 2), (2, 1), (13, 7), (5, 3), (3, 2), (7, 4), (33, 17), (2, 1)])
model = model.to(device)

This data has worked before when going straight from training to testing. But when I load the model, it fails. Not sure what is happening.

Did you change the data somehow, e.g. by removing the batch dimension?
Based on the error message e seems to contain tensors of different shapes.
Could you add a print statement before torch.cat(e, 1) and print the shapes of all tensors in e for both runs?

I didn’t change the data in any way.

The print statement added before torch.cat(e,1) now throws this error:

File "C:\Users\JORDAN.HOWELL.GITDIR\AppData\Local\Continuum\anaconda3\envs\torch_env\lib\site-packages\torch\tensor.py", line 159, in __repr__
    return torch._tensor_str._str(self)

  File "C:\Users\JORDAN.HOWELL.GITDIR\AppData\Local\Continuum\anaconda3\envs\torch_env\lib\site-packages\torch\_tensor_str.py", line 311, in _str
    tensor_str = _tensor_str(self, indent)

  File "C:\Users\JORDAN.HOWELL.GITDIR\AppData\Local\Continuum\anaconda3\envs\torch_env\lib\site-packages\torch\_tensor_str.py", line 209, in _tensor_str
    formatter = _Formatter(get_summarized_data(self) if summarize else self)

  File "C:\Users\JORDAN.HOWELL.GITDIR\AppData\Local\Continuum\anaconda3\envs\torch_env\lib\site-packages\torch\_tensor_str.py", line 87, in __init__
    nonzero_finite_vals = torch.masked_select(tensor_view, torch.isfinite(tensor_view) & tensor_view.ne(0))

RuntimeError: CUDA error: device-side assert triggered

Maybe I should rethink how to load this data in the first place.

Could you rerun the script with CUDA_LAUNCH_BLOCKING=1 python script.py args?
It seems that a CUDA error was raised before the line of code in question.
Also, could you update to PyTorch 1.5.1, if you are using an older version (or the nightly binaries)?