I am trying to implement Fisher-Markov Algorithm using PyTorch (Eq. 19).

This is my CPU implementation:

```
import numpy as np
def fisher(X, y, n_class=10, n_features=30810, gamma=-0.5, beta=0):
theta = []
n = X.shape[0]
C = [[] for _ in range(n_features)]
for j,c in enumerate(y):
C[int(c)].append(int(j))
for j in range(n_features):
# part 1
k = 0
for g in range(n_class):
# Get index class-g
i = np.asarray(C[g])
k += (1 / len(i)) * np.sum(np.dot(np.asarray([X[i,j]]).transpose(),[X[i,j]]))
k *= (1 / n)
# part 2
l = np.dot(X[:,j], X[:,j])
l *= (gamma / n)
# part 3
m = np.sum(np.dot(np.asarray([X[:,j]]).transpose(), [X[:,j]]))
m *= (gamma - 1) / (n * n)
theta.append((k - l + m, j))
theta = np.asarray(sorted(theta, key=lambda x:x[0], reverse=True),dtype=int)
return theta[:beta]
```

and this is my GPU version:

```
import numpy as np
import torch
from torch.autograd import Variable
def fisher(X, y, n_class=10, n_features=30810, gamma=-0.5, beta=0):
theta = Variable(torch.Tensor([0]*n_features)).cuda()
C = [[] for _ in range(n_class)]
for j,c in enumerate(y.data):
C[int(c)].append(int(j))
n = X.size()[0]
C = [torch.from_numpy(np.asarray(ii, dtype=int)).cuda() for ii in C]
L = torch.sum(X*X,0)
for j in range(n_features):
# part 1
k = 0
for g in range(n_class):
i = C[g]
u = X[i][:,j].contiguous().view(-1,X[i][:,j].size()[0])
k += (1 / i.size()[0]) * torch.sum(torch.mm(torch.t(u),u))
k *= (1 / n)
# part 2
l = L[j]
l = l * (gamma / n)
# part 3
v = X[:][:,j].contiguous().view(-1, X[:][:,j].size()[0])
m = torch.sum(torch.mm(torch.t(v), v))
m = m * ((gamma - 1) / (n * n))
res = k - l + m
theta[j] = res.data
theta = torch.sort(theta, 0, descending=True)[1]
return theta[:beta]
```

And the result shows GPU vesion much slower than CPU version. I hope somebody here can give some advices or show me what kind of improvement I can do in this source code, so my GPU version can run faster.

I use pytorch version 0.2.0. `n_features`

is around 20,000, `n`

is around 1000, and `n_class`

is 10. My implementation may slightly different with original equation.

Thank You