Hello,
I would like to ask about parallelization the simple for loop or any loop to be executed on GPU. Currently i have the following code but works very slow on CPU. I cannot find the answer for get this work on GPU. Is it even possible? I search the web without any luck. Any help will be appreciated.
Thank you in advance.
img_height_vec = [0] * detected_bright_regions_image.shape[0]
img_width_vec = [0] * detected_bright_regions_image.shape[1]
for i in range(0, detected_bright_regions_image.shape[0]):
for j in range(0, detected_bright_regions_image.shape[1]):
if(detected_bright_regions_image[i][j] > 180):
img_height_vec[i] = 1
img_width_vec[j] = 1
Python loops are slow. If u can’t find a way to use some Pytorch methods to replace these loops, u can write them in C++ or use TouchScript. I would suggest first try to replace these loops with other methods. I think the following code should achieve ur goal
h_max, w_max = image.max(0)[0], image.max(1)[0]
h_vec, w_vec=h_max>180, w_max>180
# these 2 tensors are bool tensors, u can cast them to other types