Torch._dynamo.run vs torch.compile

Abhishek_Ghosh · April 27, 2024, 9:38am

I understand that if you want to use PyTorch 2.0’s torch.compile feature, you wrap your module with torch.compile and you shall get the benefits.

I was going through PyTorch Benchmark Suite, and in the speedup experiments there I found a call to:
torch._dynamo.run()

github.com

pytorch/pytorch/blob/d4a1b3e09349de3a6a2dd43020e265a88965e4d7/benchmarks/dynamo/common.py#L664C13-L665C1


      
          frozen_model_iter_fn = torch._dynamo.run(model_iter_fn)

The definition of the torch._dynamo.run() function is as follows:

github.com

pytorch/pytorch/blob/d4a1b3e09349de3a6a2dd43020e265a88965e4d7/torch/_dynamo/decorators.py#L27C1-L34C1


      
          def run(fn=None):
              """Don't do any dynamic compiles, just use prior optimizations"""
              if fn is not None:
                  fn = innermost_fn(fn)
                  assert callable(fn)
                  return RunOnlyContext()(fn)
              return RunOnlyContext()

I find the doc string:

Don’t do any dynamic compiles, just use prior optimizations

Just want to clarify, is this API latest. And is it at par (performance wise) with torch.compile (after a torch compiled model say has run certain number of warmups?)

Please correct me if my understanding is mistaken or not, but what I feel is that, we can torch.compile a module, which shall do certain optimizations and keep it aside, and then when we use this torch._dynamo.run API, it somehow makes use of those optimizations which we did previously?

If so, I am unable to get, how is it done? Because the

github.com

pytorch/pytorch/blob/d4a1b3e09349de3a6a2dd43020e265a88965e4d7/benchmarks/dynamo/common.py#L624


      
                  raise RuntimeError(
                      f"randomize_input can not handle input of type {type(inputs)}"
                  )
          
          
          def maybe_mark_step(args):
              if args.trace_on_xla:
                  xm.mark_step()
          
          
          def speedup_experiment(args, model_iter_fn, model, example_inputs, **kwargs):
              """
              Measure speedups over eager.
          
              Writes to ./speedups.csv
              """
              # if args.dynamic_shapes:
              #     return speedup_experiment_ds(args, model_iter_fn, model, example_inputs)
          
              timings = np.zeros((args.repeat, 2), np.float64)
              # if we randomize the input, we should also check the result is correct

function, takes in a regular model handle, and the function which is wrappered with torch._dynamo.run is not model but rather model_iter_fn which happens to be like either of the following function:

github.com

pytorch/pytorch/blob/d4a1b3e09349de3a6a2dd43020e265a88965e4d7/benchmarks/dynamo/torchbench.py#L418C5-L432C20


      
          def forward_pass(self, mod, inputs, collect_outputs=True):
              with self.autocast(**self.autocast_arg):
                  return mod(*inputs)
          
          def forward_and_backward_pass(self, mod, inputs, collect_outputs=True):
              cloned_inputs = clone_inputs(inputs)
              self.optimizer_zero_grad(mod)
              with self.autocast(**self.autocast_arg):
                  pred = mod(*cloned_inputs)
                  loss = self.compute_loss(pred)
              self.grad_scaler.scale(loss).backward()
              self.optimizer_step()
              if collect_outputs:
                  return collect_results(mod, pred, loss, cloned_inputs)
              return None

which defines the logic of how to run the model.

Can there be performance gap if we use torch._dynamo.run as used in the above example (where the model was previously torch.compiled). Or does torch.compiled module after initial warmups, use the same torch._dynamo.run? And both of them are equivalent after a certain number of warmups?

ptrblck · April 27, 2024, 4:33pm

I don’t know the exact difference and why this call is used in the benchmarks (@marksaroufim might know), but wanted to point out that torch._dynamo is an internal module and should thus not be directly used (note the underscore at the beginning of the module).

marksaroufim · April 27, 2024, 7:03pm

Yeah so as Piotr mentioned this is an internal function you want to instead be using functions that are in the torch.compiler namespace

First off torch.compile() just calls another function called torch._dynamo.optimize() I would not use run in your code because it disables dynamic recompiles which I believe would have some silent correctness issues

Although I’m still not sure where your interest in run() stems from, are you trying to do serialization?

Abhishek_Ghosh · April 27, 2024, 8:40pm

I was just trying to reuse some of the performance benchmark scripts from pytorch benchmark suite (the one which is used in the TorchInductor Performance Dashboard).

I started from

github.com

pytorch/pytorch/blob/641ec2115f300a3e3b39c75f6a32ee3f64afcf30/.ci/pytorch/test.sh#L343-L418


      
          test_perf_for_dashboard() {
            TEST_REPORTS_DIR=$(pwd)/test/test-reports
            mkdir -p "$TEST_REPORTS_DIR"
          
            local suite="$1"
            shift
          
            local backend=inductor
            local modes=()
            if [[ "$DASHBOARD_TAG" == *training-true* ]]; then
              modes+=(training)
            fi
            if [[ "$DASHBOARD_TAG" == *inference-true* ]]; then
              modes+=(inference)
            fi
            # TODO: All the accuracy tests can be skipped once the CI accuracy checking is stable enough
            local targets=(accuracy performance)
          
            for mode in "${modes[@]}"; do
              if [[ "$mode" == "inference" ]]; then

This file has been truncated. show original

I thought that reusing most of the code from PyTorch itself shall be a good start.
The above script does basically something like:

./benchmarks/dynamo/torchbench.py --performance --training --amp --backend=inductor --output=torchbench_training.csv

This torchbench.py during the speedup calculation uses the torch._dynamo.run() function in the common.py file.

That is why I am curious about it. (I am not doing serialization)

Since you said that, is the same torch._dynamo.run function still being used in the latest PyTorch CI Dashboard workflow? Like is it safe to use the run() as in the common.py file — given the situation in that file? Why or why not?

I guess with changing inputs, we might hit a different branch (say), and we might have to compile some part dynamically, but if inputs remain the same (as in the common.py file) then this function is alright in that situation…