So I have a toy attractor model:
points = torch.rand((2**16,3), dtype=torch.float32, device=cuda, requires_grad=False)
def attractor_func(v, a = 10.0, b = 28.0, c = 8.0 / 3.0):
dv = torch.zeros(points.shape, dtype=torch.float32, device=cuda, requires_grad=False)
#lorenz
dv[:, 0] = a * (v[:, 1] - v[:, 0])
dv[:, 1] = v[:, 0] * (b - v[:, 2]) - v[:, 1]
dv[:, 2] = v[:, 0] * v[:, 1] - c * v[:, 2]
return dv
This function is called 4 times from an Runge-Kutta function that is called as quickly as the cpu and gpu can manage.
def rk4(func, h, v):
k1 = func(v)
k2 = func(v + (h / 2.0) * k1)
k3 = func(v + (h / 2.0) * k2)
k4 = func(v + h * k3)
return h * (k1 / 6.0 + k2 / 3.0 + k3 / 3.0 + k4 / 6.0)
with this called in between “rendering”:
for _ in range(32):
points += rk4(attractor_func, 0.01, points)
But something tells me that this is not the most efficient code when it comes to managing GPU memory. I can’t even convert attractor_func to a JIT script, if that is of any use.
Perhaps there is a way to vectorize the dv slices across the second dimension as well?