I`m new to torch compiler.
Recently, i`m figuring out the progress of operation lowering in torch compiler.
For a operation F.interpolate(x, scale_factor=0.5, mode="bilinear")
, will be expanded to a vectorize implementation
arange: "i64[32]" = torch.ops.aten.arange.default(32, device = device(type='cpu'), pin_memory = False)
_to_copy: "f32[32]" = torch.ops.aten._to_copy.default(arange, dtype = torch.float32); arange = None
add: "f32[32]" = torch.ops.aten.add.Tensor(_to_copy, 0.5); _to_copy = None
mul: "f32[32]" = torch.ops.aten.mul.Tensor(add, 2.0); add = None
sub: "f32[32]" = torch.ops.aten.sub.Tensor(mul, 0.5); mul = None
clamp: "f32[32]" = torch.ops.aten.clamp.default(sub, 0.0); sub = None
view: "f32[32, 1]" = torch.ops.aten.view.default(clamp, [32, 1]); clamp = None
_to_copy_1: "i64[32, 1]" = torch.ops.aten._to_copy.default(view, dtype = torch.int64)
add_1: "i64[32, 1]" = torch.ops.aten.add.Tensor(_to_copy_1, 1)
clamp_1: "i64[32, 1]" = torch.ops.aten.clamp.default(add_1, None, 63); add_1 = None
arange_1: "i64[32]" = torch.ops.aten.arange.default(32, device = device(type='cpu'), pin_memory = False)
_to_copy_2: "f32[32]" = torch.ops.aten._to_copy.default(arange_1, dtype = torch.float32); arange_1 = None
add_2: "f32[32]" = torch.ops.aten.add.Tensor(_to_copy_2, 0.5); _to_copy_2 = None
mul_1: "f32[32]" = torch.ops.aten.mul.Tensor(add_2, 2.0); add_2 = None
sub_1: "f32[32]" = torch.ops.aten.sub.Tensor(mul_1, 0.5); mul_1 = None
clamp_2: "f32[32]" = torch.ops.aten.clamp.default(sub_1, 0.0); sub_1 = None
view_1: "f32[32]" = torch.ops.aten.view.default(clamp_2, [32]); clamp_2 = None
_to_copy_3: "i64[32]" = torch.ops.aten._to_copy.default(view_1, dtype = torch.int64)
add_3: "i64[32]" = torch.ops.aten.add.Tensor(_to_copy_3, 1)
clamp_3: "i64[32]" = torch.ops.aten.clamp.default(add_3, None, 63); add_3 = None
_unsafe_index: "f32[1, 256, 32, 32]" = torch.ops.aten._unsafe_index.Tensor(convolution, [None, None, _to_copy_1, _to_copy_3])
_unsafe_index_1: "f32[1, 256, 32, 32]" = torch.ops.aten._unsafe_index.Tensor(convolution, [None, None, _to_copy_1, clamp_3])
_unsafe_index_2: "f32[1, 256, 32, 32]" = torch.ops.aten._unsafe_index.Tensor(convolution, [None, None, clamp_1, _to_copy_3])
_unsafe_index_3: "f32[1, 256, 32, 32]" = torch.ops.aten._unsafe_index.Tensor(convolution, [None, None, clamp_1, clamp_3]); convolution = None
sub_2: "f32[32]" = torch.ops.aten.sub.Tensor(view_1, _to_copy_3); view_1 = None
clamp_4: "f32[32]" = torch.ops.aten.clamp.default(sub_2, 0.0, 1.0); sub_2 = None
sub_3: "f32[1, 256, 32, 32]" = torch.ops.aten.sub.Tensor(_unsafe_index_1, _unsafe_index); _unsafe_index_1 = None
mul_2: "f32[1, 256, 32, 32]" = torch.ops.aten.mul.Tensor(sub_3, clamp_4); sub_3 = None
add_4: "f32[1, 256, 32, 32]" = torch.ops.aten.add.Tensor(_unsafe_index, mul_2); _unsafe_index = mul_2 = None
sub_4: "f32[1, 256, 32, 32]" = torch.ops.aten.sub.Tensor(_unsafe_index_3, _unsafe_index_2); _unsafe_index_3 = None
mul_3: "f32[1, 256, 32, 32]" = torch.ops.aten.mul.Tensor(sub_4, clamp_4); sub_4 = None
add_5: "f32[1, 256, 32, 32]" = torch.ops.aten.add.Tensor(_unsafe_index_2, mul_3); _unsafe_index_2 = mul_3 = None
sub_5: "f32[32, 1]" = torch.ops.aten.sub.Tensor(view, _to_copy_1); view = None
clamp_5: "f32[32, 1]" = torch.ops.aten.clamp.default(sub_5, 0.0, 1.0); sub_5 = None
sub_6: "f32[1, 256, 32, 32]" = torch.ops.aten.sub.Tensor(add_5, add_4); add_5 = None
mul_4: "f32[1, 256, 32, 32]" = torch.ops.aten.mul.Tensor(sub_6, clamp_5); sub_6 = None
add_6: "f32[1, 256, 32, 32]" = torch.ops.aten.add.Tensor(add_4, mul_4); add_4 = mul_4 = None
I dont know how this happens, why does not dispatch to a upsample kernel, and how to disable this behavior?