RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [1, 6890]], which is output 0 of SelectBackward, is at version 1; expected version 0 instead. Hint: enable anomaly detect

julyaugust12345 · January 22, 2024, 11:05am

I am trying to reproduct AvatarCLIP in pytorch.
And I got the following error:
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [1, 6890]], which is output 0 of SelectBackward, is at version 1; expected version 0 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).
It would be very appreciate if you help me.

    def get_pose(self, text_feature: Tensor) -> Tensor:
        latent_code = nn.Parameter(torch.randn(32))
        cls = getattr(torch.optim, self.optim_name)
        optimizer = cls([latent_code], **self.optim_cfg)
        for i in tqdm(range(self.num_iteration)):
            new_latent_code = latent_code.to(self.device).unsqueeze(0)
            new_pose = self.vp.decode(new_latent_code)['pose_body']
            new_pose = new_pose.contiguous().view(-1)
            clip_feature = self.get_pose_feature(new_pose).squeeze(0)
            loss = 1 - F.cosine_similarity(clip_feature, text_feature, dim=-1)
            loss = loss.mean()
            optimizer.zero_grad()
            loss.backward()
            optimizer.step()
        return pose_padding(new_pose.detach())

    def get_topk_poses(self, text: str) -> Tensor:
        text_feature = self.get_text_feature(text)
        poses = [self.get_pose(text_feature) for _ in range(self.topk)]
        poses = self.sort_poses_by_score(text, poses)
        poses = torch.stack(poses, dim=0)
        return poses

    def get_pose(self, text_feature: Tensor) -> Tensor:
        pose = nn.Parameter(torch.randn(63))
        cls = getattr(torch.optim, self.optim_name)
        optimizer = cls([pose], **self.optim_cfg)
        for i in tqdm(range(self.num_iteration)):
            new_pose = pose.to(self.device)
            clip_feature = self.get_pose_feature(new_pose).squeeze(0)
            loss = 1 - F.cosine_similarity(clip_feature, text_feature, dim=-1)
            loss = loss.mean()
            optimizer.zero_grad()
            loss.backward()
            optimizer.step()
        return pose_padding(pose.data).to(self.device)

    def get_topk_poses(self, text: str) -> Tensor:
        text_feature = self.get_text_feature(text)
        poses = [self.get_pose(text_feature) for _ in range(self.topk)]
        poses = self.sort_poses_by_score(text, poses)
        poses = torch.stack(poses, dim=0)
        return poses

KFrank · January 24, 2024, 6:10am

Hi July!

Can you identify a tensor in your forward pass that has shape [1, 6890]?
See if you can find where it is being modified.

Try using set_detect_anomaly (True). Doing so provides additional
information that can be very helpful in tracking down the cause of your
error.

There’s a lot of code that you haven’t posted that could be causing your
error.

You’ve posted two nearly identical versions of get_pose() (and we can’t
tell which one is being run when you get your error).

Note, this second version uses pose.data. .data is deprecated (in the
public api) and can lead to errors, so you shouldn’t be using it.

For suggestions about how to find and fix inplace-modification errors,
please see the following post:

"RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [64, 1]], which is output 0 of AsStridedBackward0, is at version 3; expected version 2 instead. Hint: the backtrace further a autograd

Hi Fahmyadan and Sangyoon! Here are some suggestions about how to track down (and maybe fix) inplace-modification errors. Note that an inplace modification in the forward pass is not necessarily* an error – it depends on whether and how the tensor that was modified is used in the backward pass. Note that inplace operations can be useful for saving memory – if you replace an innocent inplace operation with an out-of-place equivalent, your training will use more memory (and, to a minor e…

Good luck!

K. Frank