Error when implementing RBF kernel bandwidth differentiation in Pytorch

I’m implementing an RBF network by using some beginner examples from Pytorch Website. I have a problem when implementing the kernel bandwidth differentiation for the network. Also, I would like to know whether my attempt to implement the idea is fine. This is a code sample to reproduce the issue. Thanks

# -*- coding: utf-8 -*-
import torch
from torch.autograd import Variable

def kernel_product(x,y, mode = "gaussian", s = 1.):
    x_i = x.unsqueeze(1)
    y_j = y.unsqueeze(0)
    xmy = ((x_i-y_j)**2).sum(2)

    if   mode == "gaussian" : K = torch.exp( - xmy/s**2) )
    elif mode == "laplace"  : K = torch.exp( - torch.sqrt(xmy + (s**2)))
    elif mode == "energy"   : K = torch.pow(   xmy + (s**2), -.25 )

    return torch.t(K)

class MyReLU(torch.autograd.Function):
    We can implement our own custom autograd Functions by subclassing
    torch.autograd.Function and implementing the forward and backward passes
    which operate on Tensors.

    def forward(ctx, input):
        In the forward pass we receive a Tensor containing the input and return
        a Tensor containing the output. ctx is a context object that can be used
        to stash information for backward computation. You can cache arbitrary
        objects for use in the backward pass using the ctx.save_for_backward method.
        return input.clamp(min=0)

    def backward(ctx, grad_output):
        In the backward pass we receive a Tensor containing the gradient of the loss
        with respect to the output, and we need to compute the gradient of the loss
        with respect to the input.
        input, = ctx.saved_tensors
        grad_input = grad_output.clone()
        grad_input[input < 0] = 0
        return grad_input

dtype = torch.cuda.FloatTensor
N, D_in, H, D_out = 64, 1000, 100, 10

# Create random Tensors to hold input and outputs, and wrap them in Variables.
x = Variable(torch.randn(N, D_in).type(dtype), requires_grad=False)
y = Variable(torch.randn(N, D_out).type(dtype), requires_grad=False)

# Create random Tensors for weights, and wrap them in Variables.
w1 = Variable(torch.randn(H, D_in).type(dtype), requires_grad=True)
w2 = Variable(torch.randn(H, D_out).type(dtype), requires_grad=True)

# I've created this scalar variable (the kernel bandwidth)
s = Variable(torch.randn(1).type(dtype), requires_grad=True)

learning_rate = 1e-6
for t in range(500):
    # To apply our Function, we use Function.apply method. We alias this as 'relu'.
    relu = MyReLU.apply

    # Forward pass: compute predicted y using operations on Variables; we compute
    # ReLU using our custom autograd operation.
#    y_pred = relu(
    y_pred = relu(kernel_product(w1, x, s)).mm(w2)

    # Compute and print loss
    loss = (y_pred - y).pow(2).sum()

    # Use autograd to compute the backward pass.

    # Update weights using gradient descent -= learning_rate * -= learning_rate *

    # Manually zero the gradients after updating weights

However I get this error, which disappears when I simply use a fixed scalar in the default input parameter of kernel_product():

RuntimeError: eq() received an invalid combination of arguments - got (str), but expected one of:
 * (float other)
      didn't match because some of the arguments have invalid types: (str)
 * (Variable other)
      didn't match because some of the arguments have invalid types: (str)

Thank you for your help

In the following line you are missing the mode parameter to kernel_product.

y_pred = relu(kernel_product(w1, x, s)).mm(w2)

This would be better

y_pred = relu(kernel_product(w1, x, "gaussian", s)).mm(w2)

I don’t see how do you update the bandwidth in this case?