GPU implementation for sample entropy, and Fourier transformation

Can anyone help me write code GPU implementation for sample entropy, and Fourier transformation as I tried on the CPU it is taking 5 sec for each iteration and I have a nested loop for 1280(num_trials)*14(no_channel) which will take an entire day and little more for just to find out sample entropy. So I thought if can use GPU it will be faster. I tried using “cupy” where I stored the data in GPU but failed to compute on it as I was utilizing a pre-existing module, specifically “nolds.samen”. Can anyone help or provide me with an alternative?