Optimize_for_mobile methods from preserved_methods not available in Android/Java via runMethod


I successfully exported a module with a method:

     def inference(self, text_inputs, speaker_embed, style_embed):


ts_prosody = torch.jit.script(model.prosody_generator)
ts_prosody_mobile = optimize_for_mobile(ts_prosody, preserved_methods=["inference", "inference_half"])

I verified that in Python both ts_prosody_mobile.inference() and ts_prosody_mobile.inference_half() exist.

Now in Java/Android I load the models with Module.load

IValue[] prosodyOutput = modelProsody.runMethod("inference", ... );

IValue[] prosodyOutput = modelProsodyMobile.runMethod("inference", ... );

gives me
java.lang.IllegalArgumentException: Undefined method inference

Is there anything I missed?

I am using in Python:

And Java from the Gradle file:
implementation ‘org.pytorch:pytorch_android:1.9.0’


@JacobSzwejbka could you help with this?


I’ve tried to work around the issue as:

class ProsodyGeneratorExport(ProsodyGenerator):
    def __init__(self, hparams):

    def forward(self,
                text_inputs: Tensor,
                speaker_embed: Optional[Tensor]=None,
                style_embed: Optional[Tensor]=None):
        return self.inference(text_inputs, speaker_embed, style_embed)

And calling optimize_for_mobile without the preserved_methods parameter:

ts_prosody_mobile = optimize_for_mobile(ts_prosody)

ts_prosody_mobile.forward() can be called.
Exporting with torch.jit.save and then loading into Java again:

        IValue[] prosodyOutput = modelProsody.forward(

gives me

com.facebook.jni.CppException: Method 'forward' is not defined.
    Exception raised from get_method at ../../../../src/main/cpp/libtorch_include/arm64-v8a/torch/csrc/jit/api/object.h:103 (most recent call first):
    (no backtrace available)

Again the same code loading the model without optimize_for_mobile works.

So I ended up instead refactoring my classes from forward/inference to forward_train/forward and now it works.

Only that (without any further quantization or fuse calls) the optimize_for_mobile version runs 5 seconds for a given task while the unoptimized one runs for 4 seconds.

Without knowing the details of the model it might be a bit harder. Are you profiling only one call to forward or multiple?

Multiple calls but it’s 3 different models in sequence that are called.

Model A and B are rather heavy on GRUs and LSTMs together with a few (1D) convolutions.
B also uses some Gaussian upsampling mechanism with a bit of manual implementation.
Both are significantly slower in the “optimize for mobile” version.
The biggest part of it is this layer

The rest is also quite similar to the model in this repo.

Model C is mostly transposed 1D convolutions and here the performance is pretty much the same (looks like this melgan-neurips/modules.py at 6488045bfba1975602288de07a58570c7b4d66ea · descriptinc/melgan-neurips · GitHub)