Trouble Using Custom c++ Operator with torch.compile

I have a working setup using my custom c++ operator with torch.jit.script. I want to experiment with the newer torch.compile. I’m using pytorch==2.4.0

After seeing some errors with torch.compile, I’ve been following this guide to add FakeTensor support for my c++ operator. I’m getting this error that I don’t see reported clearly anywhere:

RuntimeError: register_fake(...): the operator my_namespace::my_operator already 
has an implementation for this device type via a pre-existing registration to 
DispatchKey::CompositeImplicitAutograd.CompositeImplicitAutograd operators do not 
need an fake impl; instead, the operator will decompose into its constituents and those
 can have fake impls defined on them.

How should I change my fake operator?
Here’s what my file structure looks like:

main.py
folder1
    folder2
        folder3
            file1.py
            file2.py

I will provide a simplified version of the files. Based on the error,

Here’s what file1.py looks like:

import torch
from . import file2
# torch.ops.import_module("file2")  # <-- This import doesn't work for some reason
@torch.compile(backend="eager")
def helper(...) -> torch.
    return torch.ops.my_namespace.my_operator(...)

Here’s what file2 looks like:

import torch
torch.ops.load_library("path_to_mycustom_op.so")

@torch.library.register_fake("my_namespace::my_operator")
def fake_operator(...):
    return torch.zeros(...)

Here’s what my custom c++ operator cpp file looks like:

// CUDA implementation for forward and backward
TORCH_LIBRARY_IMPL(my_namespace, CUDA, m) {
    m.impl("my_operator_forward", &my_operator_forward);   // CUDA forward
    m.impl("my_operator_backward", &my_operator_backward); // CUDA backward
}

// Don't define backward pass here since this is for non-autograd case, like inference
TORCH_LIBRARY(my_namespace, m) {
    m.def("my_operator_forward", &my_operator_forward_no_autograd);
}

// Autograd implementation using the custom function
TORCH_LIBRARY_IMPL(my_namespace, Autograd, m) {
    m.impl("my_operator_forward", my_operator_autograd);  // Wraps Autograd
}

After trying the other method of registering a fake kernel, torch compile seems to run successfully. I deleted file2.py entirely, and updated file1.py to be the following:

import torch
torch.ops.load_library("path_to_mycustom_op.so")

@torch.library.custom_op("fake::helper", mutates_args=())
def helper(...) -> torch.Tensor:
    return torch.ops.my_namespace.my_operator(...)

@helper.register_fake
def _(...) -> torch.Tensor:
    return torch.zeros(...)