Hi
Were you able to resolve this?
I think with CUDA, the timings being measured by your code might not be accurate:
(https://github.com/braincreators/octconv/blob/oct-resnet152/benchmarks/benchmark.py#L49)
You would either need cuda synchronize events or the autograd profiler (How to measure time in PyTorch)