How can I modify some parts of the saved model?

My original model as below:

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.encoder = Encoder()
        self.decoder = Decoder()

    def forward(self, x):
        x = self.encoder(x)
        x = F.dropout(x, p = 0.2)
        x = self.decoder(x)
        return x

Note that the encoder and decoder consist of multiple modules, respectively.
After saving it and loading it, I want to make the new model by removing the decoder part and adding the classifier.

net = torch.load('net.pth')
class classifier(nn.Module):
    super(classifier, self),__init__()
    self.encoder = nn.Sequential(*list(net.encoder.children()))
    self.fc = nn.Linear(enc_dim, num_class)

    def forward(self, x):
        x = self.encoder(x)
        x = F.dropout(x,p = 0.5)
        x = self.fc(x)
        x = F.log_softmax(x)
        return x

When I execute the script, I got the following error
THCudaCheck FAIL file=torch/csrc/cuda/Module.cpp line=80 error=10 : invalid device ordinal

Can you please tell me how I can solve this error ?

Also, when I use DataParallel, I think


needs to be modified as like


Is it right ?

invalid device ordinal points to you trying to use a CUDA device id that doesn’t exist. For example, trying to use id=2 on a device with only 2 GPUs (0, 1)

I do not understand your explanation. Where can i add gpu id ? I have 4gpus, but at this time i just use only one by using CUDA_VISIBLE_DEVICES.

you must’ve tried to do:

CUDA_VISIBLE_DEVICES is 0-indexed, so if you have 4 GPUs, then valid values are 0, 1, 2 or 3.

pytorch (actually CUDA) is complaining that you gave an invalid device id to use

@Seungyoung_Park yes, you need to add an additional .module. if you save a model wrapped in DataParallel. BTW, you are you doing that: nn.Sequential(*list(net.encoder.children()))? This will create a sequential with all modules that are already in net.encoder, so why not change it to self.encoder = net.encoder directly?

No, I tried it using CUDA_DEVICES_VISIBLE=1. But, may be i used different gpu when the model was loaded.

When I used DataParallel, do i use self.encoder = net.module.encoder ?

@Seungyoung_Park yes, you need to unpack the original module from the data parallel

As you suggested, it works!