Code getting slow when using python multi-threading

We are training a RL agent in grid world domain. When we run multiple independent threads of different agents, they run perfectly fine. But when we introduce message passing(via askAdvice and giveAdvice functions) through common global variables across these threads the code slows down extremely.
And when the same code is run only on CPU there is no slow down.

We are not able to figure out the reason for the same, any help in the regards is really appreciated.

Please find the issue highlighted on github - https://github.com/dakshanand/state-granular-trust-metric/issues/1

its probably due to the fact that data transfer in cuda is often a bottleneck

But when I disable those 2 functions the code speed goes up to normal.
Also if I disable those 2 and just start a dummy thready having a target on this function -
def dummy():
while True:
for i in range(2):
if i == 2:
continue
the speed again slows down drastically. Including this new dummy, I am having just 3 threads in total on an i7 processor which can easily support 6-8 threads too(tried in some other project). Adding this dummy thread to an earlier project(normal python) having 6 threads does not cause any slowdown