Increasing the batch size from 64 to 512 is not decreasing the runtime in 2D to 3D lifting from RGB images

Mona_Jalal · May 11, 2021, 9:21pm

[ a double-post of How to speed up the rendering of 3D lifting from 2D images? · Issue #38 · facebookresearch/phosa · GitHub ]
I am noticing that despite increasing the batch size to 512 and moving to a 12GB GPU, I am not gaining speedup really. I have changed both occurrences of batch_size=XYZ in file phosa/pose_optimization.py from XYZ to 512.

github.com

facebookresearch/phosa/blob/207a6b1754b80115dbbcb9d8abb9005b11e01f1f/phosa/pose_optimization.py#L103


"""


def __init__(
    self,
    ref_image,
    vertices,
    faces,
    textures,
    rotation_init,
    translation_init,
    batch_size=1,
    kernel_size=7,
    K=None,
    power=0.25,
    lw_chamfer=0.5,
):
    assert ref_image.shape[0] == ref_image.shape[1], "Must be square."
    super(PoseOptimizer, self).__init__()


    self.register_buffer("vertices", vertices.repeat(batch_size, 1, 1))
    self.register_buffer("faces", faces.repeat(batch_size, 1, 1))

and

github.com

facebookresearch/phosa/blob/207a6b1754b80115dbbcb9d8abb9005b11e01f1f/phosa/pose_optimization.py#L265


    plt.show()




def find_optimal_pose(
    vertices,
    faces,
    mask,
    bbox,
    square_bbox,
    image_size,
    batch_size=500,
    num_iterations=50,
    num_initializations=2000,
    lr=1e-3,
):
    ts = 1
    textures = torch.ones(faces.shape[0], ts, ts, ts, 3, dtype=torch.float32).cuda()
    x, y, b, _ = square_bbox
    L = max(image_size)
    K_roi = compute_K_roi((x, y), b, L, focal_length=FOCAL_LENGTH)
    # Stuff to keep around

Do you have any other recommendations that could perhaps make it faster? How can I actually check to make sure batch size of 512 is set via pytorch? What other techniques should be used to get speedup?

Thank you,
Mona

Here’s what I see when I run the demo code for one single image, it takes like 3-5 min:

(phosa) [jalal@goku phosa]$ python demo.py --filename input/bike_ride2.jpg --class_name bicycle
2021-05-11 17:06:08,654 INFO     Calling with args: Namespace(class_name='bicycle', filename='input/bike_ride2.jpg', lw_collision=None, lw_depth=None, lw_inter=None, lw_inter_part=None, lw_scale=None, lw_scale_person=None, lw_sil=None, mesh_index=0, output_dir='output')
2021-05-11 17:06:11,258 INFO     Loading checkpoint from detectron2://PointRend/InstanceSegmentation/pointrend_rcnn_R_50_FPN_3x_coco/164955410/model_final_3c3198.pkl
2021-05-11 17:06:11,291 INFO     URL https://dl.fbaipublicfiles.com/detectron2/PointRend/InstanceSegmentation/pointrend_rcnn_R_50_FPN_3x_coco/164955410/model_final_3c3198.pkl cached in /home/grad3/jalal/.torch/fvcore_cache/detectron2/PointRend/InstanceSegmentation/pointrend_rcnn_R_50_FPN_3x_coco/164955410/model_final_3c3198.pkl
2021-05-11 17:06:14,962 INFO     Reading a file from 'Detectron2 Model Zoo'
WARNING: You are using a SMPL model, with only 10 shape coefficients.
class_name:  bicycle
  0%|                                                                                         | 0/200.0 [00:00<?, ?it/s]PoseOptimizer(
  (pool): MaxPool2d(kernel_size=7, stride=1, padding=3, dilation=1, ceil_mode=False)
  (renderer): Renderer()
)
loss: 5.88e+03:  25%|████████████████                                                | 50/200.0 [00:46<02:15,  1.10it/s]PoseOptimizer(
  (pool): MaxPool2d(kernel_size=7, stride=1, padding=3, dilation=1, ceil_mode=False)
  (renderer): Renderer()
)
loss: 5.88e+03:  50%|███████████████████████████████▌                               | 100/200.0 [01:32<01:31,  1.10it/s]PoseOptimizer(
  (pool): MaxPool2d(kernel_size=7, stride=1, padding=3, dilation=1, ceil_mode=False)
  (renderer): Renderer()
)
loss: 5.88e+03:  75%|███████████████████████████████████████████████▎               | 150/200.0 [02:18<00:45,  1.10it/s]PoseOptimizer(
  (pool): MaxPool2d(kernel_size=7, stride=1, padding=3, dilation=1, ceil_mode=False)
  (renderer): Renderer()
)
loss: 5.88e+03: 100%|███████████████████████████████████████████████████████████████| 200/200.0 [03:04<00:00,  1.08it/s]
  0%|                                                                                                    | 0/200.0 [00:00<?, ?it/s]PoseOptimizer(
  (pool): MaxPool2d(kernel_size=7, stride=1, padding=3, dilation=1, ceil_mode=False)
  (renderer): Renderer()
)
loss: 5.97e+03:  25%|██████████████████▊                                                        | 50/200.0 [00:49<02:26,  1.03it/s]PoseOptimizer(
  (pool): MaxPool2d(kernel_size=7, stride=1, padding=3, dilation=1, ceil_mode=False)
  (renderer): Renderer()
)
loss: 5.97e+03:  50%|█████████████████████████████████████                                     | 100/200.0 [01:39<01:37,  1.03it/s]PoseOptimizer(
  (pool): MaxPool2d(kernel_size=7, stride=1, padding=3, dilation=1, ceil_mode=False)
  (renderer): Renderer()
)
loss: 5.97e+03:  75%|███████████████████████████████████████████████████████▌                  | 150/200.0 [02:28<00:48,  1.04it/s]PoseOptimizer(
  (pool): MaxPool2d(kernel_size=7, stride=1, padding=3, dilation=1, ceil_mode=False)
  (renderer): Renderer()
)
loss: 4.59e+03: 100%|██████████████████████████████████████████████████████████████████████████| 200/200.0 [03:17<00:00,  1.01it/s]
Loss 77.4604: 100%|██████████████████████████████████████████████████████████████████████████████| 400/400 [03:21<00:00,  1.99it/s]
2021-05-11 17:16:02,046 INFO     Saved rendered image to output/bike_ride2.jpg.
2021-05-11 17:16:02,067 INFO     Saved top-down image to output/bike_ride2_top.jpg.

eqy · May 11, 2021, 9:31pm

A crude test you can try is to simply inspect the amount of GPU memory used in something like nvidia-smi or gpustat. If the memory usage is not increasing, then something is suspicious. Alternatively, you can try increasing the batch sizes until you run out of GPU memory (which should happen).