When can you NOT deepcopy a model?

The answer to the question Can I deepcopy a model? doesn’t describe when or why this process sometimes fails.

I have two different nn.Module objects, Encoder and Decoder (the code for these is very long so I’ll save that for the moment). To me they look very similar, but only the first can be deepcopy-ied.

encoder = Encoder(args1)
decoder = Decoder(args2)

The former I can do new_enc = copy.deepcopy(encoder) no problem. If I try to deepcopy the decoder, I get the error

    raise RuntimeError("Only Tensors created explicitly by the user "
RuntimeError: Only Tensors created explicitly by the user (graph leaves) support the deepcopy protocol at the moment

Both modules have inits, they init the super, and they have a forward method. So… how might figure out what’s wrong with the second one? (I don’t think I did anything ‘fancy’ with the Decoder!)

Thanks.

The error is raised if you try to deepcopy non-leaf tensors as seen here:

x = torch.randn(1)
print(x.is_leaf)
# True
copy.deepcopy(x) # works

y = x + 1
print(y.is_leaf)
# True
copy.deepcopy(y) # works

x = torch.nn.Parameter(torch.randn(1))
print(x.is_leaf)
# True
copy.deepcopy(x) # works

y = x + 1
print(y.is_leaf)
# False
copy.deepcopy(y) # fails
# RuntimeError: Only Tensors created explicitly by the user (graph leaves) support the deepcopy protocol at the moment

Could this be the case in the Decoder, i.e. are you creating new tensors from leaf tensors in its __init__?

Thanks Piotr! That explanation makes sense. And yet the answer is… maybe…but I don’t see how? There are some non-tensor scalar variables being assigned based on other non-tensors, and there are lists of network modules that are being appended to, but I’m not aware of any tensor being changed in the __init__ to make it a non-leaf tensor. But it’s modules-within-modules. I guess i could try deepcopy on each of those and see if the error appears.

I guess I can also post the code: The decoder is the Generator from RAVE, which looks like this…

class Generator(nn.Module):
    def __init__(self,
                 latent_size,
                 capacity,
                 data_size,
                 ratios,
                 loud_stride,
                 use_noise,
                 noise_ratios,
                 noise_bands,
                 padding_mode,
                 bias=False):
        super().__init__()
        net = [
            wn(
                cc.Conv1d(
                    latent_size,
                    2**len(ratios) * capacity,
                    7,
                    padding=cc.get_padding(7, mode=padding_mode),
                    bias=bias,
                ))
        ]

        for i, r in enumerate(ratios):
            in_dim = 2**(len(ratios) - i) * capacity
            out_dim = 2**(len(ratios) - i - 1) * capacity

            net.append(
                UpsampleLayer(
                    in_dim,
                    out_dim,
                    r,
                    padding_mode,
                    cumulative_delay=net[-1].cumulative_delay,
                ))
            net.append(
                ResidualStack(
                    out_dim,
                    3,
                    padding_mode,
                    cumulative_delay=net[-1].cumulative_delay,
                ))

        self.net = cc.CachedSequential(*net)

        wave_gen = wn(
            cc.Conv1d(
                out_dim,
                data_size,
                7,
                padding=cc.get_padding(7, mode=padding_mode),
                bias=bias,
            ))

        loud_gen = wn(
            cc.Conv1d(
                out_dim,
                1,
                2 * loud_stride + 1,
                stride=loud_stride,
                padding=cc.get_padding(2 * loud_stride + 1,
                                       loud_stride,
                                       mode=padding_mode),
                bias=bias,
            ))

        branches = [wave_gen, loud_gen]

        if use_noise:
            noise_gen = NoiseGenerator(
                out_dim,
                data_size,
                noise_ratios,
                noise_bands,
                padding_mode=padding_mode,
            )
            branches.append(noise_gen)

        self.synth = cc.AlignBranches(
            *branches,
            cumulative_delay=self.net.cumulative_delay,
        )

        self.use_noise = use_noise
        self.loud_stride = loud_stride
        self.cumulative_delay = self.synth.cumulative_delay

    def forward(self, x, add_noise: bool = True):
        x = self.net(x)

        if self.use_noise:
            waveform, loudness, noise = self.synth(x)
        else:
            waveform, loudness = self.synth(x)
            noise = torch.zeros_like(waveform)

        loudness = loudness.repeat_interleave(self.loud_stride)
        loudness = loudness.reshape(x.shape[0], 1, -1)

        waveform = torch.tanh(waveform) * mod_sigmoid(loudness)

        if add_noise:
            waveform = waveform + noise

        return waveform

Thanks for the code!
I’ve installed the dependencies and narrowed it down to the usage of weight_norm here.
Minimal code snippet:

m = nn.Linear(1, 1)
print(m)
copy.deepcopy(m) #works
m = nn.utils.weight_norm(m, name='weight')
print(m)
copy.deepcopy(m)
# RuntimeError: Only Tensors created explicitly by the user (graph leaves) support the deepcopy protocol at the moment

which makes the m.weight a non-leaf tensor and will thus break the code.

This sounds like a strict limitation, but I don’t know if there is an easy workaround.
Would it work for you to create a new model instance and copy the state_dict to it?

EDIT: corresponding GitHub issue: 28594