Is evaluating the network thread-safe?

Hi,
Haven’t found this topic before because of slightly misleading title, but it seems that I ran into same issue (only in my case network is even smaller than the one used by @Willem). Batching does not help me much because each worker calls forward and grad independently and in unsynchronized manner. Does this “overhead” problem have any resolution? Maybe thread-local allocation?