@ptrblck respected si,r i am looking for a brief and detailed tutorial for nn.transformerdecder? can you help me find it?
Please donât tag users as it could discourage others to post a valid answer and donât double post the same question. This tutorial might be a good starter.
thank you sir, i only saw nn.transformerencoder in that tutorial.also can you help me find a way to load cnn model without defining an nn.Module class,is there anyway to do this? when i asked chat gpt it direct me to load into resnet18.
Hey, If youâre looking for a vanilla CNN, you would have to build that using PyTorchâs torch.nn module. If youâd like a pre-trained (that is a model already trained on a collection of images) then you would have to load a model like ResNet. This link offers a useful guide to loading pre-trained models. It gives you of suite of models and their weights for their respective tasks. If you need a CNN to be fine-tuned for a specific task, for example classifying planes, you may be able to fine tune a model using a dataset already in torchvisionâs Datasets or thatâs open source online.
Usually itâs easier to use a pre-trained model to then fine-tune because you donât have to worry too much about training data and overfitting, but you could always train from scratch.
@Andrew_Holmes i used this code to load a pretrained model model = torchvision.models.resnet18(pretrained=True)
model.load_state_dict(torch.load(â/content/aI_genrtd_img_dectrx1.pthâ)) but it showing an error:----------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
in <cell line: 2>()
1 model = torchvision.models.resnet18(pretrained=True)
----> 2 model.load_state_dict(torch.load(â/content/aI_genrtd_img_dectrx1.pthâ))
/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py in load_state_dict(self, state_dict, strict)
2039
2040 if len(error_msgs) > 0:
â 2041 raise RuntimeError(âError(s) in loading state_dict for {}:\n\t{}â.format(
2042 self.class.name, â\n\tâ.join(error_msgs)))
2043 return _IncompatibleKeys(missing_keys, unexpected_keys)
RuntimeError: Error(s) in loading state_dict for ResNet:
Missing key(s) in state_dict: âconv1.weightâ, âbn1.weightâ, âbn1.biasâ, âbn1.running_meanâ, âbn1.running_varâ, âlayer1.0.conv1.weightâ, âlayer1.0.bn1.weightâ, âlayer1.0.bn1.biasâ, âlayer1.0.bn1.running_meanâ, âlayer1.0.bn1.running_varâ, âlayer1.0.conv2.weightâ, âlayer1.0.bn2.weightâ, âlayer1.0.bn2.biasâ, âlayer1.0.bn2.running_meanâ, âlayer1.0.bn2.running_varâ, âlayer1.1.conv1.weightâ, âlayer1.1.bn1.weightâ, âlayer1.1.bn1.biasâ, âlayer1.1.bn1.running_meanâ, âlayer1.1.bn1.running_varâ, âlayer1.1.conv2.weightâ, âlayer1.1.bn2.weightâ, âlayer1.1.bn2.biasâ, âlayer1.1.bn2.running_meanâ, âlayer1.1.bn2.running_varâ, âlayer2.0.conv1.weightâ, âlayer2.0.bn1.weightâ, âlayer2.0.bn1.biasâ, âlayer2.0.bn1.running_meanâ, âlayer2.0.bn1.running_varâ, âlayer2.0.conv2.weightâ, âlayer2.0.bn2.weightâ, âlayer2.0.bn2.biasâ, âlayer2.0.bn2.running_meanâ, âlayer2.0.bn2.running_varâ, âlayer2.0.downsample.0.weightâ, âlayer2.0.downsample.1.weightâ, âlayer2.0.downsample.1.biasâ, âlayer2.0.downsample.1.running_meanâ, âlayer2.0.downsample.1.running_varâ, âlayer2.1.conv1.weightâ, âlayer2.1.bn1.weightâ, âlayer2.1.bn1.biasâ, âlayer2.1.bn1.running_meanâ, âlayer2.1.bn1.running_varâ, âlayer2.1.conv2.weightâ, âlayer2.1.bn2.weightâ, âlayer2.1.bn2.biasâ, âlayer2.1.bn2.running_meanâ, âlayer2.1.bn2.running_varâ, âlayer3.0.conv1.weightâ, âlayer3.0.bn1.weightâ, âlayer3.0.bn1.biasâ, âlayer3.0.bn1.running_meanâ, âlayer3.0.bn1.running_varâ, âlayer3.0.conv2.weightâ, âlayer3.0.bn2.weightâ, âlayer3.0.bn2.biasâ, "layer3.0.bn2âŚ
Unexpected key(s) in state_dict: âConv_block1.0.weightâ, âConv_block1.0.biasâ, âConv_block1.1.weightâ, âConv_block1.1.biasâ, âConv_block1.1.running_meanâ, âConv_block1.1.running_varâ, âConv_block1.1.num_batches_trackedâ, âConv_block1.4.weightâ, âConv_block1.4.biasâ, âConv_block1.5.weightâ, âConv_block1.5.biasâ, âConv_block1.5.running_meanâ, âConv_block1.5.running_varâ, âConv_block1.5.num_batches_trackedâ, âConv_block2.0.weightâ, âConv_block2.0.biasâ, âConv_block2.1.weightâ, âConv_block2.1.biasâ, âConv_block2.1.running_meanâ, âConv_block2.1.running_varâ, âConv_block2.1.num_batches_trackedâ, âConv_block2.4.weightâ, âConv_block2.4.biasâ, âConv_block2.5.weightâ, âConv_block2.5.biasâ, âConv_block2.5.running_meanâ, âConv_block2.5.running_varâ, âConv_block2.5.num_batches_trackedâ, âConv_block3.0.weightâ, âConv_block3.0.biasâ, âConv_block3.1.weightâ, âConv_block3.1.biasâ, âConv_block3.1.running_meanâ, âConv_block3.1.running_varâ, âConv_block3.1.num_batches_trackedâ, âConv_block3.4.weightâ, âConv_block3.4.biasâ, âConv_block3.5.weightâ, âConv_block3.5.biasâ, âConv_block3.5.running_meanâ, âConv_block3.5.running_varâ, âConv_block3.5.num_batches_trackedâ, âConv_block4.0.weightâ, âConv_block4.0.biasâ, âConv_block4.1.weightâ, âConv_block4.1.biasâ, âConv_block4.1.running_meanâ, âConv_block4.1.running_varâ, âConv_block4.1.num_batches_trackedâ, âConv_block4.4.weightâ, âConv_block4.4.biasâ, âConv_block4.5.weightâ, âConv_block4.5.biasâ, âConv_block4.5.running_meanâ, "Conv_block4.5.running_vaâŚ
You are still running into the same error as described here and as already explained, you wonât be able to load the state_dict
of your custom module into a resnet. ChatGPT is still wrong and you should load the state_dict
into an object of your custom model instead.
@ptrblck So, is there no way to load and train the pretrained model on another page without defining the nn.Module class?, now i am doing upload data run the original model the i will load the p retrained model then training it, now i am asking how can i load and train the model without run into original code?, for example if i want to load and train on another computer with donât have any source code of the model.
You need to provide the source code and create a model instance before loading the state_dict
. There are approaches to script the model and to torch.jit.load
it, but I would not recommend this approach and would rather stick to the recommended approach.
Iâm not fully comprehending what youâre trying to do, but please correct me if Iâm wrong. I believe youâre trying to load weights into a pre-trained model from a .pth file. However, this simply wonât work if the state_dict youâre trying to load doesnât match the state_dict of the pre-trained model. If you specifically need to use those weights, you would have to define a vanilla model using the torch.nn library that has the same Modules the state_dict of the weights youâre trying to load uses, otherwise youâll get the error above.
But, if want to use the weights of the pre-trained model, I would retrain it on the data responsible for the weights youâre trying to load from the .pth file. For example if the weights in the .pth was from a model for classifying dogs, I would load ResNet then retrain it on a dataset for dog image classification.
Itâs hard to fully understand your intentions, so if you could clarify that, maybe I could provide and example that could help.
@Andrew_Holmes I wanted to load pretrained model i trained earlier, and resume training. currently i am doing, i run original model from source code then i define model = tinyvgg1(input=3,hidden=76,output=2).to(device),
saved_model_path = â/content/aI_genrtd_img_dectrx1.pthâ
model.load_state_dict(torch.load(saved_model_path))
model.to(device) this how i load a model, before execute this code i need to run orginal model class tinyvgg1(nn.Module):
def init(self,hidden,input,output):
super().init()
self.Conv_block1 = nn.Sequential(
nn.Conv2d(
in_channels= input,
out_channels=hidden,
kernel_size=3,
stride=1,
padding=1
),
nn.BatchNorm2d(hidden),
nn.ReLU(),
nn.Dropout(0.2),
nn.Conv2d(
in_channels = hidden,
out_channels =hidden,
kernel_size=3,
stride=1,
padding=1
),
nn.BatchNorm2d(hidden),
nn.ReLU(),
nn.MaxPool2d(kernel_size=2,stride=2)
)
self.Conv_block2 = nn.Sequential(
nn.Conv2d(
in_channels=hidden,
out_channels=hidden,
kernel_size=3,
stride=1,
padding=1
),
nn.BatchNorm2d(hidden),
nn.ReLU(),
nn.Dropout(0.2),
nn.Conv2d(
in_channels=hidden,
out_channels=hidden,
kernel_size=3,
stride=1,
padding=1
),
nn.BatchNorm2d(hidden),
nn.ReLU(),
nn.MaxPool2d(kernel_size=2,stride=2)
)
self.Conv_block3 = nn.Sequential(
nn.Conv2d(
in_channels=hidden,
out_channels=hidden,
kernel_size=3,
stride=1,
padding=1
),
nn.BatchNorm2d(hidden),
nn.ReLU(),
nn.Dropout(0.2),
nn.Conv2d(
in_channels=hidden,
out_channels=hidden,
kernel_size=3,
stride=1,
padding=1
),
nn.BatchNorm2d(hidden),
nn.ReLU(),
nn.MaxPool2d(kernel_size=2,stride=2)
)
self.Conv_block4 = nn.Sequential(
nn.Conv2d(
in_channels=hidden,
out_channels=hidden,
kernel_size=3,
stride=1,
padding=1
),
nn.BatchNorm2d(hidden),
nn.ReLU(),
nn.Dropout(0.2),
nn.Conv2d(
in_channels=hidden,
out_channels=hidden,
kernel_size=3,
stride=1,
padding=1
),
nn.BatchNorm2d(hidden),
nn.ReLU(),
nn.MaxPool2d(kernel_size=2,stride=2)
)
self.classifier = nn.Sequential(
nn.Flatten(),
nn.Linear(
in_features = hidden*14*14,
out_features = output
)
)
def forward(self,x):
x = self.Conv_block1(x)
#print(x.shape)
x = self.Conv_block2(x)
#print(x.shape)
x = self.Conv_block3(x)
#print(x.shape)
x = self.Conv_block4(x)
#print(x.shape)
x = self.classifier(x)
return x
return self.Conv_block4(self.Conv_block3(self.Conv_block2(self.Conv_block1(x))))
i meant this one from source code. my need is i need to load the model without run the model class tinyvgg1(nn.Module)(above mentioned model).is that possible for example , i am opening a another computer donât have any source code, i have train_model.pth with me . can i upload the .pth model and training it without running any source code, is this possible?
exmaple like this:- saved_model_path = â/content/aI_genrtd_img_dectrx1.pthâ
model.load_state_dict(torch.load(saved_model_path))
model.to(device) load the model and training it.
Okay now I understand!
Yes, you would need the tinyvgg1 source code in order to use those weights on a different computer. Thereâs two ways you could go around this:
-
save it to a GitHub repository.
Iâd make a GitHub repo for the tinyvgg1 model then push the source code and any dependencies to that repo. Then, you could clone the repo from a different machine and load the weights into a vanilla tinyvgg1 as long as you have access to the .pth file for the weights. Itâs common practice to use GitHub for projects and GitHub actually hosts the PyTorch library in you ever wondered. -
save the entire model instead.
You could just save the entire model then load it using torch.save() and torch.load(). The only difference is you donât pass the state_dict of the model and instead pass the entire model itself. You can see a better explanation here under the Save/Load Entire Model section. Thereâs some reasons this isnât usually done, but for your need it could help.