Rbfn: Getting same result in each epoch

szZzr · January 9, 2020, 3:16pm

Hi, i am a newbie guy… and try to train an rbf network… I used MNIST database. And pytorch framework… The results are the same in each epoch…

Like that…

Epoch: 1  
Accuracy: 0.815 	Loss: 5.701 	Recall: 0.507 	Precision: 0.340


Epoch: 2  
Accuracy: 0.815 	Loss: 5.628 	Recall: 0.507 	Precision: 0.340


Epoch: 3  
Accuracy: 0.815 	Loss: 5.570 	Recall: 0.507 	Precision: 0.340


Epoch: 4  
Accuracy: 0.815 	Loss: 5.523 	Recall: 0.507 	Precision: 0.340


Epoch: 5  
Accuracy: 0.815 	Loss: 5.486 	Recall: 0.507 	Precision: 0.340


Epoch: 6  
Accuracy: 0.815 	Loss: 5.456 	Recall: 0.507 	Precision: 0.340

and that’s happens with several rbf settings… i m change the way of centers inits, sigma inits, number of clusters and batch size, learning rate… but still the same… it used to repeat the result of the first epoch following epochs, sometimes the loss changes just… like above

look my code:

class RBF(nn.Module):
    def __init__(self, in_layers, centers, sigmas):
        super(RBF, self).__init__()
        self.in_layers = in_layers[0]
        self.centers = nn.Parameter(centers)
        self.dists = nn.Parameter(torch.ones(1,centers.size(0)))
        # self.linear0 = nn.Linear(in_layers[0], in_layers[0], bias = True)
        self.linear1 = nn.Linear(centers.size(0), in_layers[1], bias = True)

    def forward(self, x):
        phi = self.radial_basis(x)
        out = torch.sigmoid(self.linear1(phi.float()))
        return out

    def radial_basis(self,x):
        c = self.centers.view(self.centers.size(0),-1).repeat(x.size(0), 1, 1)
        x = x.view(x.size(0),-1).unsqueeze(1).repeat(1, self.centers.size(0),1)
        phi = torch.exp(-self.dists.mul((c-x).pow(2).sum(2, keepdim=False).pow(0.5) ))
        return phi

i have try and this radial_basis with same results:

def radial_basis(self,x):
        x = x.view(x.size(0),-1)
        size = [self.centers.size(0), x.size(0)]
        sigma = self.sigmas
        dists = torch.empty(size).to(device)
        for i,c in enumerate(self.centers):
            c = c.reshape(-1,c.size(0))
            temp = (x-c).pow(2).sum(-1).pow(0.5)
            dists[i] = temp
        dists = dists.permute(1,0)
        phi = torch.exp(-1*(dists/(2*sigma))) #gaussian
        return phi

and the training method below:

def training(engine, batch, device, model, criterion, optimizer):
    inputs, labels = batch[0].to(device), batch[1].to(device)
    optimizer.zero_grad()
    outputs = model(inputs)
    loss = criterion(outputs, labels)
    loss.backward()
    optimizer.step()
    return outputs, labels

i m not sure if it’s architecture’s problem… or something with the weights… it’s like backward() function not working…

albanD · January 9, 2020, 3:24pm

Hi,

Your loss seems to be coming down, but very very slowly. Have you tried increasing the learning rate to speed it up?

szZzr · January 9, 2020, 5:19pm

yes i have try from 1–>0.001
but nothing happens… i have some examples in which the loss still the same…

maybe some parameters aren’t define with the right way???

it estimates the centres with kmeans but i have try with randoms centers and gaussian distribution initialise centers…!!! but the same results

albanD · January 9, 2020, 5:40pm

But the loss is going down? It is just that the accuracy did not change yet.

szZzr · January 9, 2020, 5:45pm

yeah! the accurasy and the other metrics like precision and recall still the same!
the loss has very little difference! You can check the result below:
these are the settings

    batch = 10
    classes = ('0', '1', '2', '3', '4', '5', '6', '7', '8', '9')
    dim = 28*28
    learning_rate = 1
    epochs = 14
    clusters = 10
    executions = 0
    momentum = 0.9

ad the results

Epoch: 1  
Accuracy: 0.815 	Loss: 5.280 	Recall: 0.507 	Precision: 0.340


Epoch: 2  
Accuracy: 0.815 	Loss: 5.276 	Recall: 0.507 	Precision: 0.340


Epoch: 3  
Accuracy: 0.815 	Loss: 5.274 	Recall: 0.507 	Precision: 0.340


Epoch: 4  
Accuracy: 0.815 	Loss: 5.274 	Recall: 0.507 	Precision: 0.340


Epoch: 5  
Accuracy: 0.815 	Loss: 5.273 	Recall: 0.507 	Precision: 0.340


Epoch: 6  
Accuracy: 0.815 	Loss: 5.273 	Recall: 0.507 	Precision: 0.340


Epoch: 7  
Accuracy: 0.815 	Loss: 5.273 	Recall: 0.507 	Precision: 0.340

Do you think that this is normaL?

albanD · January 9, 2020, 5:55pm

Given how small the change in loss is, I am not that suprise that the other metrics do not move.
You want to find out why is your loss changing so slowly.
Have you tried higher lr like 100 or 1000 ? You can increase it until your model actually diverges.

szZzr · January 9, 2020, 6:00pm

i have the convection that the model is re-initialising, but i don’t know how…
look this with lr = 100

Device: cuda:0
/usr/local/lib/python3.6/dist-packages/torchvision/datasets/mnist.py:53: UserWarning: train_data has been renamed data
  warnings.warn("train_data has been renamed data")
Epoch: 1  
Accuracy: 0.813 	Loss: 5.271 	Recall: 0.000 	Precision: 0.000


Epoch: 2  
Accuracy: 0.813 	Loss: 5.271 	Recall: 0.000 	Precision: 0.000


Epoch: 3  
Accuracy: 0.813 	Loss: 5.271 	Recall: 0.000 	Precision: 0.000


Epoch: 4  
Accuracy: 0.813 	Loss: 5.271 	Recall: 0.000 	Precision: 0.000


Epoch: 5  
Accuracy: 0.813 	Loss: 5.271 	Recall: 0.000 	Precision: 0.000

exactly the same loss, seems likes that something is going baaad

and just now, i try this with the lr=1000 i have the same resutls with lr=100 and lr=2… without a bit of differnce

albanD · January 9, 2020, 6:40pm

Could you give a small (40/50 lines) code sample that reproduces this that I can run locally?

szZzr · January 9, 2020, 6:53pm

i just commit all the code in my github to check plz!

it’s ok for you?