Hi,

I’m calculating the accumulated distance between each pair of kernel inside a nn.Conv2d layer. However for large layers it runs out of memory using a Titan X with 12gb of memory. I’d like to know if it is possible to divide such calculations across two gpus.

The code follows:

```
def ac_distance(layer):
total = 0
for p in layer.weight:
for q in layer.weight:
total += distance(p,q)
return total
```

Where `layer`

is instance of `nn.Conv2d`

and distance returns the sum of the differences between p and q. I can’t detach the graph, however, for I need it later on. I tried wrapping my model around a nn.DataParallel, but all calculations in `ac_distance`

are done using only 1 gpu, however it trains using both.