Hi all! Started today using PyTorch and it seems to me more natural than Tensorflow. However, I would need to write a customized loss function. While it would be nice to be able to write any loss function, my loss function is a bit specific.So, I am giving it (written on torch)

```
X = np.asarray([[0.6946, 0.1328], [0.6563, 0.6873], [0.8184, 0.8047], [0.8177, 0.4517],
[0.1673, 0.2775], [0.6919, 0.0439], [0.4659, 0.3032], [0.3481, 0.1996]], dtype=np.float32)
X = torch.from_numpy(X)
y = np.asarray((1,3,2,2,3,1,2,3), dtype=np.float32)
y = torch.from_numpy(y)
def similarity(i, j):
''' This function defines the similarity between vectors i and j
inputs: i, j - vectors of the same length
sigma - the denumerator parameter
output: sim - similarity value (real number from 0 to 1) '''
dist = torch.norm(i - j)
return dist
def similarity_matrix(mat):
''' This function creates the similarity matrix of a dataset
input: mat - dataset in matrix format
sigma - a paramter which defines similarity
output: simMatrix - the similarity matrix '''
a = mat.size()
a = a[0]
simMatrix = torch.zeros((a,a))
for i in xrange(a):
for j in xrange(a):
simMatrix[i][j] = similarity(mat[i], mat[j])
return simMatrix
def convert_y(y):
n = y.size()
n = n[0]
converted_y = torch.zeros((n, n))
for i in xrange(n):
for j in xrange(n):
if y[i] == y[j]:
converted_y[i, j] = 1
return converted_y
def customized_loss(X, y):
X_similarity = similarity_matrix(X)
association = convert_y(y)
loss_num = torch.sum(torch.mul(X_similarity, association))
loss_all = torch.sum(X_similarity)
loss_denum = loss_all - loss_num
loss = loss_num/loss_denum
return loss
loss = customized_loss(X, y)
print(loss)
```

Now, of course, considering that I am going to use it as the final layer, of the neural net, I would need to compute the gradients of it and then use them in the backpropagation.

Explaining the function a bit:

I first transform the input data space into a kind of similarity matrix (0 it means the data being the same, the higher the number in ij-th entry, the higher is the dissimilarity). Then in order to find the intra-cluster loss, I multiply this matrix with a 0/1 matrix, where the ij-th entry is 1 if the element i and j are in the same cluster, 0 otherwise. The intra-cluster loss is find similarity, and finally, we just divide the two losses.

My questions are:

- Can this be done in PyTorch, without writing Lua code?
- Can the gradients of this be computed in an automatic way (torch autograd)?
- Can such a loss function be given as input in optim.SGD? (optim.X in general case where X is the optimization algorithm)

Thanks for any answer, or possible hint.