Why is a tensor non-contiguous on one machine, but contiguous on another?

ricsi046 · August 1, 2024, 12:20pm

I’m running the exact same code on 2 virtual machines, each of them has the same os, same python package versions.

from multimodal_transformers.model import BertWithTabular, TabularConfig
from transformers import BertTokenizer, BertConfig

if __name__ == "__main__":
    pretrained_model_name = "SZTAKI-HLT/hubert-base-cc"
    tokenizer = BertTokenizer.from_pretrained(pretrained_model_name)
    bert_config = BertConfig.from_pretrained(pretrained_model_name)
    tabular_config = TabularConfig(
        combine_feat_method="text_only",
        cat_feat_dim=0,
        numerical_feat_dim=0,
        num_labels=7,
    )
    bert_config.tabular_config = tabular_config

    model = BertWithTabular.from_pretrained(pretrained_model_name, config=bert_config)

When I check model.state_dict(), I can see some of the tensors are always non-contiguous on one machine, while all of them are contiguous on the other machine, which results in error when trying to save the model. I can’t figure out what determines if the tensor is contiguous or not. Do I need to convert tensors to contiguous before saving? I don’t feel like that is normal.

I would also like to mention that the first machine didn’t have this problem a few weeks ago, no idea what changed.

ptrblck · August 1, 2024, 1:48pm

This is indeed strange. Does the error show which tensors are not contiguous? If so could you check how these tensors were initialized and possibly manipulated in your code?

ricsi046 · August 1, 2024, 2:55pm

They come from a pretrained model (SZTAKI-HLT/hubert-base-cc · Hugging Face), so I don’t know how the tensors were initialized and manipulated, but the problem is definitely related to the model, since I tried loading another model (google-bert/bert-base-uncased · Hugging Face), and it doesn’t have this problem, all the tensors are contiguous.

So I guess there is nothing we can do about it.