Can I used pretrained weight for different embedding size?

pojan · March 4, 2026, 10:45am

Hello,
I’m new to this field and still learning. I would like to ask a question due to the pretrained weights. I’m trying to replace the conventional 3Dconv with with my custom Depthwise Separable Convolution method for PathEmbed3D for Video Swin Transformer.

I already trained using pretrained weight for both original Video Swin Transformer and my Custom method, but I found that the proposed method achieved lower accuracy, compared to the original Video Swin Transformer.. It is because the size of the my Patch 3D Embed already changed, make it not suitable for Video Swin Transformer Weights?

class PatchEmbed3d(nn.Module):
        """Video to Patch Embedding.
    
        Args:
            patch_size (List[int]): Patch token size.
            in_channels (int): Number of input channels. Default: 3
            embed_dim (int): Number of linear projection output channels. Default: 96.
            norm_layer (nn.Module, optional): Normalization layer. Default: None
        """
    
        def __init__(
            self,
            patch_size: list[int],
            in_channels: int = 3,
            embed_dim: int = 96,
            norm_layer: Optional[Callable[..., nn.Module]] = None,
        ) -> None:
            super().__init__()
            _log_api_usage_once(self)
            self.tuple_patch_size = (patch_size[0], patch_size[1], patch_size[2])
    
            self.proj = DepthwiseSeparableConv(
                in_channels,
                embed_dim,
                kernel_size=self.tuple_patch_size,
                stride=self.tuple_patch_size,
            )
            if norm_layer is not None:
                self.norm = norm_layer(embed_dim)
            else:
                self.norm = nn.Identity()
    
        def forward(self, x: Tensor) -> Tensor:
            """Forward function."""
            # padding
            _, _, t, h, w = x.size()
            pad_size = _compute_pad_size_3d((t, h, w), self.tuple_patch_size)
            x = F.pad(x, (0, pad_size[2], 0, pad_size[1], 0, pad_size[0]))
            x = self.proj(x)  # B C T Wh Ww
            x = x.permute(0, 2, 3, 4, 1)  # B T Wh Ww C
            if self.norm is not None:
                x = self.norm(x)
            return x

Besides, I also tried to delete the 3DPatchEmbed Keys inside the pretrained using code below:

state_dict = torch.load(weights_path, map_location="cpu")
    for k in list(state_dict.keys()):
        if "patch_embed" in k:
            del state_dict[k]

Looking forward for the answer.