Slow training vs Tensorflow and sudden performance decrease on train data

Hi there,
I am writing a PyTorch implementation of Logic Tensor Networks for Semantic Image Interpretation which has opensource Tensorflow code.

I managed to get the network together and it can train. I believe that I am correctly copying the hyperparameters for the optimiser and I also checked that the underlying math is correct. Therefore, I am fairly certain that I have correctly set everything up. However, I have noticed two things that I am struggling to explain:

  1. The network trains much slower than the Tensorflow implementation. The Tensorflow implementation reaches an average train performance of about 95% within 1000 steps, whereas for my code it requires ~3000 steps.
  2. The train performance suddenly flips once the performance increases.

I’ll give some more information about each, below.

Slow Network Training
I thought this could be down to hyperparameters for the optimiser. The paper uses RMSProp. I notice that the Tensorflow version has the following hyperparameters:

  • learning_rate: A Tensor or a floating point value. The learning rate.
  • decay: Discounting factor for the history/coming gradient
  • momentum: A scalar tensor.
  • epsilon: Small value to avoid zero denominator.
  • use_locking: If True use locks for update operation.
  • centered: If True, gradients are normalized by the estimated variance of the gradient; if False, by the uncentered second moment. Setting this to True may help with training, but is slightly more expensive in terms of computation and memory. Defaults to False.
  • name: Optional name prefix for the operations created when applying gradients. Defaults to “RMSProp”

The PyTorch version has:

  • params (iterable) – iterable of parameters to optimize or dicts defining parameter groups
  • lr (float, optional) – learning rate (default: 1e-2)
  • momentum (float, optional) – momentum factor (default: 0)
  • alpha (float, optional) – smoothing constant (default: 0.99)
  • eps (float, optional) – term added to the denominator to improve numerical stability (default: 1e-8)
  • centered (bool, optional) – if True, compute the centered RMSProp, the gradient is normalized by an estimation of its variance
  • weight_decay (float, optional) – weight decay (L2 penalty) (default: 0)

Following the paper, for the PyTorch RMSProp hyperparameters I use:

  • LR = 0.01
  • REGULARISATION = 1e-15
  • ALPHA = 0.9
  • EPSILON = 1e-10
    I am assuming that
  • alpha is the equivalent of the tensorflow decay parameter
  • Weight decay is the regularisation, which tensorflow requires to be added externally to the loss

In the paper, the optimiser is initialised here (I can’t hyperlink as I am limited to two links per post): logictensornetworks.py#L21. The regularisation is implemented here: logictensornetworks.py#L53. The relevant hyperparameters are partly defined when the optimiser is initialised and then also here: pascalpart.py#L8
(Note: the hyperparameters get defined a few times in different places, which is confusing, but I checked and the reference I am giving is where the final values are set).

Performance flipping
I use the harmonic mean to calculate average performance, which is very sensitive to lower values. But this still doesn’t explain why performance would suddenly flip. I know that setting a bad learning rate can lead to divergence, but I am seeing steady (but slow) performance increase and then sudden change when performance is high. Moreover, I am copying the paper’s learning rate and have tested their code, which is both fast and doesn’t show the erratic performance behaviour. Here is a trace of the output showing how the performance changes:

15:00:38 [root ] [DEBUG ] : ======Iteration: 1799======
15:00:38 [root ] [DEBUG ] : ==Clause: Clause_isOfType_bottle Score: 0.992330014705658
15:00:38 [root ] [DEBUG ] : ==Clause: Clause_neg_isOfType_bottle Score: 0.992296040058136
15:00:38 [root ] [DEBUG ] : ==Clause: Clause_isOfType_body Score: 0.9187692403793335
15:00:38 [root ] [DEBUG ] : ==Clause: Clause_neg_isOfType_body Score: 0.9190800189971924
15:00:38 [root ] [DEBUG ] : ==Clause: Clause_isOfType_cap Score: 0.9014581441879272
15:00:38 [root ] [DEBUG ] : ==Clause: Clause_neg_isOfType_cap Score: 0.9029223918914795
15:00:38 [root ] [DEBUG ] : ==Clause: Clause_isOfType_pottedplant Score: 0.9162445664405823
15:00:38 [root ] [DEBUG ] : ==Clause: Clause_neg_isOfType_pottedplant Score: 0.9166520237922668
15:00:38 [root ] [DEBUG ] : ==Clause: Clause_isOfType_plant Score: 0.8137131333351135
15:00:38 [root ] [DEBUG ] : ==Clause: Clause_neg_isOfType_plant Score: 0.8152325749397278
15:00:38 [root ] [DEBUG ] : ==Clause: Clause_isOfType_pot Score: 0.9676965475082397
15:00:38 [root ] [DEBUG ] : ==Clause: Clause_neg_isOfType_pot Score: 0.9682645201683044
15:00:38 [root ] [DEBUG ] : ==Clause: Clause_isOfType_tvmonitor Score: 0.8954532146453857
15:00:38 [root ] [DEBUG ] : ==Clause: Clause_neg_isOfType_tvmonitor Score: 0.895622968673706
15:00:38 [root ] [DEBUG ] : ==Clause: Clause_isOfType_screen Score: 0.927297055721283
15:00:38 [root ] [DEBUG ] : ==Clause: Clause_neg_isOfType_screen Score: 0.9281954765319824
15:00:38 [root ] [DEBUG ] : ==Clause: Clause_isOfType_chair Score: 0.9460874795913696
15:00:38 [root ] [DEBUG ] : ==Clause: Clause_neg_isOfType_chair Score: 0.9469890594482422
15:00:38 [root ] [DEBUG ] : ==Clause: Clause_isOfType_sofa Score: 0.8196842074394226
15:00:38 [root ] [DEBUG ] : ==Clause: Clause_neg_isOfType_sofa Score: 0.820656418800354
15:00:38 [root ] [DEBUG ] : ==Clause: Clause_isOfType_diningtable Score: 0.889803409576416
15:00:38 [root ] [DEBUG ] : ==Clause: Clause_neg_isOfType_diningtable Score: 0.8908553123474121
15:00:38 [root ] [DEBUG ] : ==Clause: Clause_partOf Score: 0.9138585925102234
15:00:38 [root ] [DEBUG ] : ==Clause: Clause_neg_partOf Score: 0.9138901829719543
15:00:38 [root ] [DEBUG ] : ==Score: 0.9060626029968262==
15:00:38 [root ] [DEBUG ] : ===Setting up data subsets===
15:00:39 [root ] [DEBUG ] : ======Iteration: 1800======
15:00:39 [root ] [DEBUG ] : ==Clause: Clause_isOfType_bottle Score: 0.990585446357727
15:00:39 [root ] [DEBUG ] : ==Clause: Clause_neg_isOfType_bottle Score: 0.9916159510612488
15:00:39 [root ] [DEBUG ] : ==Clause: Clause_isOfType_body Score: 0.8670791387557983
15:00:39 [root ] [DEBUG ] : ==Clause: Clause_neg_isOfType_body Score: 0.7959716320037842
15:00:39 [root ] [DEBUG ] : ==Clause: Clause_isOfType_cap Score: 0.8995224833488464
15:00:39 [root ] [DEBUG ] : ==Clause: Clause_neg_isOfType_cap Score: 0.8619083166122437
15:00:39 [root ] [DEBUG ] : ==Clause: Clause_isOfType_pottedplant Score: 0.9155776500701904
15:00:39 [root ] [DEBUG ] : ==Clause: Clause_neg_isOfType_pottedplant Score: 0.7972513437271118
15:00:39 [root ] [DEBUG ] : ==Clause: Clause_isOfType_plant Score: 0.8212026357650757
15:00:39 [root ] [DEBUG ] : ==Clause: Clause_neg_isOfType_plant Score: 0.7968321442604065
15:00:39 [root ] [DEBUG ] : ==Clause: Clause_isOfType_pot Score: 0.9789061546325684
15:00:39 [root ] [DEBUG ] : ==Clause: Clause_neg_isOfType_pot Score: 0.981188952922821
15:00:39 [root ] [DEBUG ] : ==Clause: Clause_isOfType_tvmonitor Score: 0.8427695631980896
15:00:39 [root ] [DEBUG ] : ==Clause: Clause_neg_isOfType_tvmonitor Score: 0.7958213090896606
15:00:39 [root ] [DEBUG ] : ==Clause: Clause_isOfType_screen Score: 0.7628504037857056
15:00:39 [root ] [DEBUG ] : ==Clause: Clause_neg_isOfType_screen Score: 0.013370582833886147
15:00:39 [root ] [DEBUG ] : ==Clause: Clause_isOfType_chair Score: 0.941319465637207
15:00:39 [root ] [DEBUG ] : ==Clause: Clause_neg_isOfType_chair Score: 0.11598806828260422
15:00:39 [root ] [DEBUG ] : ==Clause: Clause_isOfType_sofa Score: 0.7932454943656921
15:00:39 [root ] [DEBUG ] : ==Clause: Clause_neg_isOfType_sofa Score: 0.8461207747459412
15:00:39 [root ] [DEBUG ] : ==Clause: Clause_isOfType_diningtable Score: 0.8798862099647522
15:00:39 [root ] [DEBUG ] : ==Clause: Clause_neg_isOfType_diningtable Score: 0.8369185924530029
15:00:39 [root ] [DEBUG ] : ==Clause: Clause_partOf Score: 0.8789774179458618
15:00:39 [root ] [DEBUG ] : ==Clause: Clause_neg_partOf Score: 0.7778722643852234
15:00:39 [root ] [DEBUG ] : ==Score: 0.22021272778511047==
15:00:39 [root ] [DEBUG ] : ======Iteration: 1801======
15:00:39 [root ] [DEBUG ] : ==Clause: Clause_isOfType_bottle Score: 0.9906085729598999
15:00:39 [root ] [DEBUG ] : ==Clause: Clause_neg_isOfType_bottle Score: 0.9916187524795532
15:00:39 [root ] [DEBUG ] : ==Clause: Clause_isOfType_body Score: 0.8675155639648438
15:00:39 [root ] [DEBUG ] : ==Clause: Clause_neg_isOfType_body Score: 0.7966839671134949
15:00:39 [root ] [DEBUG ] : ==Clause: Clause_isOfType_cap Score: 0.8989664316177368
15:00:39 [root ] [DEBUG ] : ==Clause: Clause_neg_isOfType_cap Score: 0.8625910878181458
15:00:39 [root ] [DEBUG ] : ==Clause: Clause_isOfType_pottedplant Score: 0.9151699542999268
15:00:39 [root ] [DEBUG ] : ==Clause: Clause_neg_isOfType_pottedplant Score: 0.797529935836792
15:00:39 [root ] [DEBUG ] : ==Clause: Clause_isOfType_plant Score: 0.8211848735809326
15:00:39 [root ] [DEBUG ] : ==Clause: Clause_neg_isOfType_plant Score: 0.7974907159805298
15:00:39 [root ] [DEBUG ] : ==Clause: Clause_isOfType_pot Score: 0.9789376258850098
15:00:39 [root ] [DEBUG ] : ==Clause: Clause_neg_isOfType_pot Score: 0.9812048077583313
15:00:39 [root ] [DEBUG ] : ==Clause: Clause_isOfType_tvmonitor Score: 0.8427335023880005
15:00:39 [root ] [DEBUG ] : ==Clause: Clause_neg_isOfType_tvmonitor Score: 0.7968791127204895
15:00:39 [root ] [DEBUG ] : ==Clause: Clause_isOfType_screen Score: 0.0010666713351383805
15:00:39 [root ] [DEBUG ] : ==Clause: Clause_neg_isOfType_screen Score: 0.9761099815368652
15:00:39 [root ] [DEBUG ] : ==Clause: Clause_isOfType_chair Score: 0.9172378778457642
15:00:39 [root ] [DEBUG ] : ==Clause: Clause_neg_isOfType_chair Score: 0.22136850655078888
15:00:39 [root ] [DEBUG ] : ==Clause: Clause_isOfType_sofa Score: 0.7935158014297485
15:00:39 [root ] [DEBUG ] : ==Clause: Clause_neg_isOfType_sofa Score: 0.8462190628051758
15:00:39 [root ] [DEBUG ] : ==Clause: Clause_isOfType_diningtable Score: 0.8791512846946716
15:00:39 [root ] [DEBUG ] : ==Clause: Clause_neg_isOfType_diningtable Score: 0.8383476138114929
15:00:39 [root ] [DEBUG ] : ==Clause: Clause_partOf Score: 0.8790765404701233
15:00:39 [root ] [DEBUG ] : ==Clause: Clause_neg_partOf Score: 0.7791911959648132
15:00:39 [root ] [DEBUG ] : ==Score: 0.02481084130704403==
15:00:39 [root ] [DEBUG ] : ======Iteration: 1802======
15:00:39 [root ] [DEBUG ] : ==Clause: Clause_isOfType_bottle Score: 0.9906294345855713
15:00:39 [root ] [DEBUG ] : ==Clause: Clause_neg_isOfType_bottle Score: 0.991621196269989
15:00:39 [root ] [DEBUG ] : ==Clause: Clause_isOfType_body Score: 0.867911159992218
15:00:39 [root ] [DEBUG ] : ==Clause: Clause_neg_isOfType_body Score: 0.797331690788269
15:00:39 [root ] [DEBUG ] : ==Clause: Clause_isOfType_cap Score: 0.8984541893005371
15:00:39 [root ] [DEBUG ] : ==Clause: Clause_neg_isOfType_cap Score: 0.8632128238677979
15:00:39 [root ] [DEBUG ] : ==Clause: Clause_isOfType_pottedplant Score: 0.9147946834564209
15:00:39 [root ] [DEBUG ] : ==Clause: Clause_neg_isOfType_pottedplant Score: 0.797791600227356
15:00:39 [root ] [DEBUG ] : ==Clause: Clause_isOfType_plant Score: 0.8211624026298523
15:00:39 [root ] [DEBUG ] : ==Clause: Clause_neg_isOfType_plant Score: 0.798086941242218
15:00:39 [root ] [DEBUG ] : ==Clause: Clause_isOfType_pot Score: 0.9789655208587646
15:00:39 [root ] [DEBUG ] : ==Clause: Clause_neg_isOfType_pot Score: 0.9812188744544983
15:00:39 [root ] [DEBUG ] : ==Clause: Clause_isOfType_tvmonitor Score: 0.8426967859268188
15:00:39 [root ] [DEBUG ] : ==Clause: Clause_neg_isOfType_tvmonitor Score: 0.7978458404541016
15:00:39 [root ] [DEBUG ] : ==Clause: Clause_isOfType_screen Score: 9.303313163400162e-06
15:00:39 [root ] [DEBUG ] : ==Clause: Clause_neg_isOfType_screen Score: 0.9993413686752319
15:00:39 [root ] [DEBUG ] : ==Clause: Clause_isOfType_chair Score: 0.8807621598243713
15:00:39 [root ] [DEBUG ] : ==Clause: Clause_neg_isOfType_chair Score: 0.37268978357315063
15:00:39 [root ] [DEBUG ] : ==Clause: Clause_isOfType_sofa Score: 0.7937597632408142
15:00:39 [root ] [DEBUG ] : ==Clause: Clause_neg_isOfType_sofa Score: 0.846305787563324
15:00:39 [root ] [DEBUG ] : ==Clause: Clause_isOfType_diningtable Score: 0.8784731030464172
15:00:39 [root ] [DEBUG ] : ==Clause: Clause_neg_isOfType_diningtable Score: 0.8396425843238831
15:00:39 [root ] [DEBUG ] : ==Clause: Clause_partOf Score: 0.8791543841362
15:00:39 [root ] [DEBUG ] : ==Clause: Clause_neg_partOf Score: 0.7804265022277832
15:00:39 [root ] [DEBUG ] : ==Score: 0.00022322138829622418==
The score then continues to fall to ~0.

Each of the scores in the trace is a predicate which is calculated via the below code. Then, following how the paper does it, the harmonic mean of the batch of outputs is taken. The final score is another harmonic mean taken on all predicate scores.

def compute(self, inference, input_):
            """
            Compute predicate grounding for input_
            """
            stacked_inputs = input_
            batch_h = torch.bmm(
                    torch.einsum('bi,ijk->bkj', (stacked_inputs, self.W)),
                    stacked_inputs.unsqueeze(-1)
                    ).squeeze(-1)
            mx_plus_b = torch.matmul(
                    stacked_inputs, self.V) + self.B  # Broadcast on self.B
            non_linear = self.tanh(batch_h + mx_plus_b)
            output = self.sigmoid(
                    torch.matmul(non_linear, self.U))
            return output

I’ve been working on this for a while and am quite confused. Any help would be great and if any more info is needed let me know.

Thanks!

Regarding the performance flip, it looks like the prediction for Clause_isOfType_screen is going down constantly.
Could this be the reason the overall performance if decreasing?

I ran it again to see if the same prediction leads the way and this is what happened:

08:38:31 [root ] [DEBUG ] : ======Iteration: 1199======
08:38:31 [root ] [DEBUG ] : ==Clause: Clause_isOfType_bottle Score: 0.9259634017944336
08:38:31 [root ] [DEBUG ] : ==Clause: Clause_neg_isOfType_bottle Score: 0.9279131293296814
08:38:31 [root ] [DEBUG ] : ==Clause: Clause_isOfType_body Score: 0.8313666582107544
08:38:31 [root ] [DEBUG ] : ==Clause: Clause_neg_isOfType_body Score: 0.8347911834716797
08:38:31 [root ] [DEBUG ] : ==Clause: Clause_isOfType_cap Score: 0.7694564461708069
08:38:31 [root ] [DEBUG ] : ==Clause: Clause_neg_isOfType_cap Score: 0.7764528393745422
08:38:31 [root ] [DEBUG ] : ==Clause: Clause_isOfType_pottedplant Score: 0.7411538362503052
08:38:31 [root ] [DEBUG ] : ==Clause: Clause_neg_isOfType_pottedplant Score: 0.8146580457687378
08:38:31 [root ] [DEBUG ] : ==Clause: Clause_isOfType_plant Score: 0.615966260433197
08:38:31 [root ] [DEBUG ] : ==Clause: Clause_neg_isOfType_plant Score: 0.740339994430542
08:38:31 [root ] [DEBUG ] : ==Clause: Clause_isOfType_pot Score: 0.9376562833786011
08:38:31 [root ] [DEBUG ] : ==Clause: Clause_neg_isOfType_pot Score: 0.9397470951080322
08:38:31 [root ] [DEBUG ] : ==Clause: Clause_isOfType_tvmonitor Score: 0.8468390107154846
08:38:31 [root ] [DEBUG ] : ==Clause: Clause_neg_isOfType_tvmonitor Score: 0.8444467782974243
08:38:31 [root ] [DEBUG ] : ==Clause: Clause_isOfType_screen Score: 0.8084747791290283
08:38:31 [root ] [DEBUG ] : ==Clause: Clause_neg_isOfType_screen Score: 0.8111011981964111
08:38:31 [root ] [DEBUG ] : ==Clause: Clause_isOfType_chair Score: 0.9456685185432434
08:38:31 [root ] [DEBUG ] : ==Clause: Clause_neg_isOfType_chair Score: 0.9461091756820679
08:38:31 [root ] [DEBUG ] : ==Clause: Clause_isOfType_sofa Score: 0.7411292195320129
08:38:31 [root ] [DEBUG ] : ==Clause: Clause_neg_isOfType_sofa Score: 0.7432351112365723
08:38:31 [root ] [DEBUG ] : ==Clause: Clause_isOfType_diningtable Score: 0.7659196853637695
08:38:31 [root ] [DEBUG ] : ==Clause: Clause_neg_isOfType_diningtable Score: 0.7669181227684021
08:38:31 [root ] [DEBUG ] : ==Clause: Clause_partOf Score: 0.9910631775856018
08:38:31 [root ] [DEBUG ] : ==Clause: Clause_neg_partOf Score: 0.9906010031700134
08:38:31 [root ] [DEBUG ] : ==Score: 0.8247549533843994==
08:38:31 [root ] [DEBUG ] : ===Setting up data subsets===
08:38:32 [root ] [DEBUG ] : ======Iteration: 1200======
08:38:32 [root ] [DEBUG ] : ==Clause: Clause_isOfType_bottle Score: 0.9267792105674744
08:38:32 [root ] [DEBUG ] : ==Clause: Clause_neg_isOfType_bottle Score: 0.9048082232475281
08:38:32 [root ] [DEBUG ] : ==Clause: Clause_isOfType_body Score: 0.7979954481124878
08:38:32 [root ] [DEBUG ] : ==Clause: Clause_neg_isOfType_body Score: 0.8168696165084839
08:38:32 [root ] [DEBUG ] : ==Clause: Clause_isOfType_cap Score: 0.7883314490318298
08:38:32 [root ] [DEBUG ] : ==Clause: Clause_neg_isOfType_cap Score: 0.7752796411514282
08:38:32 [root ] [DEBUG ] : ==Clause: Clause_isOfType_pottedplant Score: 0.8315651416778564
08:38:32 [root ] [DEBUG ] : ==Clause: Clause_neg_isOfType_pottedplant Score: 0.6867634654045105
08:38:32 [root ] [DEBUG ] : ==Clause: Clause_isOfType_plant Score: 0.7269256114959717
08:38:32 [root ] [DEBUG ] : ==Clause: Clause_neg_isOfType_plant Score: 0.5743570923805237
08:38:32 [root ] [DEBUG ] : ==Clause: Clause_isOfType_pot Score: 0.9248771667480469
08:38:32 [root ] [DEBUG ] : ==Clause: Clause_neg_isOfType_pot Score: 0.9490529298782349
08:38:32 [root ] [DEBUG ] : ==Clause: Clause_isOfType_tvmonitor Score: 0.7744091749191284
08:38:32 [root ] [DEBUG ] : ==Clause: Clause_neg_isOfType_tvmonitor Score: 0.8416061997413635
08:38:32 [root ] [DEBUG ] : ==Clause: Clause_isOfType_screen Score: 0.7160218358039856
08:38:32 [root ] [DEBUG ] : ==Clause: Clause_neg_isOfType_screen Score: 0.8044382929801941
08:38:32 [root ] [DEBUG ] : ==Clause: Clause_isOfType_chair Score: 0.9570673704147339
08:38:32 [root ] [DEBUG ] : ==Clause: Clause_neg_isOfType_chair Score: 0.0
08:38:32 [root ] [DEBUG ] : ==Clause: Clause_isOfType_sofa Score: 0.7532014846801758
08:38:32 [root ] [DEBUG ] : ==Clause: Clause_neg_isOfType_sofa Score: 0.0010712684597820044
08:38:32 [root ] [DEBUG ] : ==Clause: Clause_isOfType_diningtable Score: 0.7633653283119202
08:38:32 [root ] [DEBUG ] : ==Clause: Clause_neg_isOfType_diningtable Score: 0.7567595839500427
08:38:32 [root ] [DEBUG ] : ==Clause: Clause_partOf Score: 0.1235593631863594
08:38:32 [root ] [DEBUG ] : ==Clause: Clause_neg_partOf Score: 0.00486149825155735
08:38:32 [root ] [DEBUG ] : ==Score: 0.0==
08:38:32 [root ] [DEBUG ] : ======Iteration: 1201======
08:38:32 [root ] [DEBUG ] : ==Clause: Clause_isOfType_bottle Score: 0.9267792105674744
08:38:32 [root ] [DEBUG ] : ==Clause: Clause_neg_isOfType_bottle Score: 0.9048082232475281
08:38:32 [root ] [DEBUG ] : ==Clause: Clause_isOfType_body Score: 0.7979954481124878
08:38:32 [root ] [DEBUG ] : ==Clause: Clause_neg_isOfType_body Score: 0.8168696165084839
08:38:32 [root ] [DEBUG ] : ==Clause: Clause_isOfType_cap Score: 0.7883314490318298
08:38:32 [root ] [DEBUG ] : ==Clause: Clause_neg_isOfType_cap Score: 0.7752796411514282
08:38:32 [root ] [DEBUG ] : ==Clause: Clause_isOfType_pottedplant Score: 0.8315651416778564
08:38:32 [root ] [DEBUG ] : ==Clause: Clause_neg_isOfType_pottedplant Score: 0.6867634654045105
08:38:32 [root ] [DEBUG ] : ==Clause: Clause_isOfType_plant Score: 0.7269256114959717
08:38:32 [root ] [DEBUG ] : ==Clause: Clause_neg_isOfType_plant Score: 0.5743570923805237
08:38:32 [root ] [DEBUG ] : ==Clause: Clause_isOfType_pot Score: 0.9248771667480469
08:38:32 [root ] [DEBUG ] : ==Clause: Clause_neg_isOfType_pot Score: 0.9490529298782349
08:38:32 [root ] [DEBUG ] : ==Clause: Clause_isOfType_tvmonitor Score: 0.7744091749191284
08:38:32 [root ] [DEBUG ] : ==Clause: Clause_neg_isOfType_tvmonitor Score: 0.8416061997413635
08:38:32 [root ] [DEBUG ] : ==Clause: Clause_isOfType_screen Score: 0.7160218358039856
08:38:32 [root ] [DEBUG ] : ==Clause: Clause_neg_isOfType_screen Score: 0.8044382929801941
08:38:32 [root ] [DEBUG ] : ==Clause: Clause_isOfType_chair Score: nan
08:38:32 [root ] [DEBUG ] : ==Clause: Clause_neg_isOfType_chair Score: nan
08:38:32 [root ] [DEBUG ] : ==Clause: Clause_isOfType_sofa Score: 0.7532014846801758
08:38:32 [root ] [DEBUG ] : ==Clause: Clause_neg_isOfType_sofa Score: 0.0010712684597820044
08:38:32 [root ] [DEBUG ] : ==Clause: Clause_isOfType_diningtable Score: 0.7633653283119202
08:38:32 [root ] [DEBUG ] : ==Clause: Clause_neg_isOfType_diningtable Score: 0.7567595839500427
08:38:32 [root ] [DEBUG ] : ==Clause: Clause_partOf Score: 0.1235593631863594
08:38:32 [root ] [DEBUG ] : ==Clause: Clause_neg_partOf Score: 0.00486149825155735
08:38:32 [root ] [DEBUG ] : ==Score: nan==
08:38:32 [root ] [DEBUG ] : ======Iteration: 1202======
08:38:32 [root ] [DEBUG ] : ==Clause: Clause_isOfType_bottle Score: nan
08:38:32 [root ] [DEBUG ] : ==Clause: Clause_neg_isOfType_bottle Score: nan
08:38:32 [root ] [DEBUG ] : ==Clause: Clause_isOfType_body Score: nan
08:38:32 [root ] [DEBUG ] : ==Clause: Clause_neg_isOfType_body Score: nan
08:38:32 [root ] [DEBUG ] : ==Clause: Clause_isOfType_cap Score: nan
08:38:32 [root ] [DEBUG ] : ==Clause: Clause_neg_isOfType_cap Score: nan
08:38:32 [root ] [DEBUG ] : ==Clause: Clause_isOfType_pottedplant Score: nan
08:38:32 [root ] [DEBUG ] : ==Clause: Clause_neg_isOfType_pottedplant Score: nan
08:38:32 [root ] [DEBUG ] : ==Clause: Clause_isOfType_plant Score: nan
08:38:32 [root ] [DEBUG ] : ==Clause: Clause_neg_isOfType_plant Score: nan
08:38:32 [root ] [DEBUG ] : ==Clause: Clause_isOfType_pot Score: nan
08:38:32 [root ] [DEBUG ] : ==Clause: Clause_neg_isOfType_pot Score: nan
08:38:32 [root ] [DEBUG ] : ==Clause: Clause_isOfType_tvmonitor Score: nan
08:38:32 [root ] [DEBUG ] : ==Clause: Clause_neg_isOfType_tvmonitor Score: nan
08:38:32 [root ] [DEBUG ] : ==Clause: Clause_isOfType_screen Score: nan
08:38:32 [root ] [DEBUG ] : ==Clause: Clause_neg_isOfType_screen Score: nan
08:38:32 [root ] [DEBUG ] : ==Clause: Clause_isOfType_chair Score: nan
08:38:32 [root ] [DEBUG ] : ==Clause: Clause_neg_isOfType_chair Score: nan
08:38:32 [root ] [DEBUG ] : ==Clause: Clause_isOfType_sofa Score: nan
08:38:32 [root ] [DEBUG ] : ==Clause: Clause_neg_isOfType_sofa Score: nan
08:38:32 [root ] [DEBUG ] : ==Clause: Clause_isOfType_diningtable Score: nan
08:38:32 [root ] [DEBUG ] : ==Clause: Clause_neg_isOfType_diningtable Score: nan
08:38:32 [root ] [DEBUG ] : ==Clause: Clause_partOf Score: nan
08:38:32 [root ] [DEBUG ] : ==Clause: Clause_neg_partOf Score: nan
08:38:32 [root ] [DEBUG ] : ==Score: nan==

In this run Clause_isOfType_screen was ok and others reduced in performance, to eventually hit zero performance meaning that the hmean divides by zero, producing the nan I think.

I then ran it again and this time it reached the end without turning to nan/coverging on zero performance, but it did have erratic jumps where when training on the same data it would fall quite dramatically for different predictions. I’m including two examples below:

08:52:57 [root ] [DEBUG ] : ======Iteration: 2961======
08:52:57 [root ] [DEBUG ] : ==Clause: Clause_isOfType_bottle Score: 0.9987756013870239
08:52:57 [root ] [DEBUG ] : ==Clause: Clause_neg_isOfType_bottle Score: 0.9986613988876343
08:52:57 [root ] [DEBUG ] : ==Clause: Clause_isOfType_body Score: 0.9381701350212097
08:52:57 [root ] [DEBUG ] : ==Clause: Clause_neg_isOfType_body Score: 0.9401543140411377
08:52:57 [root ] [DEBUG ] : ==Clause: Clause_isOfType_cap Score: 0.9116418361663818
08:52:57 [root ] [DEBUG ] : ==Clause: Clause_neg_isOfType_cap Score: 0.9101962447166443
08:52:57 [root ] [DEBUG ] : ==Clause: Clause_isOfType_pottedplant Score: 0.934924840927124
08:52:57 [root ] [DEBUG ] : ==Clause: Clause_neg_isOfType_pottedplant Score: 0.9366714358329773
08:52:57 [root ] [DEBUG ] : ==Clause: Clause_isOfType_plant Score: 0.8942890167236328
08:52:57 [root ] [DEBUG ] : ==Clause: Clause_neg_isOfType_plant Score: 0.8937970399856567
08:52:57 [root ] [DEBUG ] : ==Clause: Clause_isOfType_pot Score: 0.9987015724182129
08:52:57 [root ] [DEBUG ] : ==Clause: Clause_neg_isOfType_pot Score: 0.9985659122467041
08:52:57 [root ] [DEBUG ] : ==Clause: Clause_isOfType_tvmonitor Score: 0.9003317952156067
08:52:57 [root ] [DEBUG ] : ==Clause: Clause_neg_isOfType_tvmonitor Score: 0.9020853042602539
08:52:57 [root ] [DEBUG ] : ==Clause: Clause_isOfType_screen Score: 0.9498090744018555
08:52:57 [root ] [DEBUG ] : ==Clause: Clause_neg_isOfType_screen Score: 0.8289201855659485
08:52:57 [root ] [DEBUG ] : ==Clause: Clause_isOfType_chair Score: 0.968237042427063
08:52:57 [root ] [DEBUG ] : ==Clause: Clause_neg_isOfType_chair Score: 0.972324013710022
08:52:57 [root ] [DEBUG ] : ==Clause: Clause_isOfType_sofa Score: 0.8907787799835205
08:52:57 [root ] [DEBUG ] : ==Clause: Clause_neg_isOfType_sofa Score: 0.8876224756240845
08:52:57 [root ] [DEBUG ] : ==Clause: Clause_isOfType_diningtable Score: 0.9246724843978882
08:52:57 [root ] [DEBUG ] : ==Clause: Clause_neg_isOfType_diningtable Score: 0.9211363792419434
08:52:57 [root ] [DEBUG ] : ==Clause: Clause_partOf Score: 0.9914373755455017
08:52:57 [root ] [DEBUG ] : ==Clause: Clause_neg_partOf Score: 0.98811274766922
08:52:57 [root ] [DEBUG ] : ==Score: 0.9345283508300781==
08:52:57 [root ] [DEBUG ] : ======Iteration: 2962======
08:52:57 [root ] [DEBUG ] : ==Clause: Clause_isOfType_bottle Score: 0.9987846612930298
08:52:57 [root ] [DEBUG ] : ==Clause: Clause_neg_isOfType_bottle Score: 0.9986718893051147
08:52:57 [root ] [DEBUG ] : ==Clause: Clause_isOfType_body Score: 0.9385195374488831
08:52:57 [root ] [DEBUG ] : ==Clause: Clause_neg_isOfType_body Score: 0.93996262550354
08:52:57 [root ] [DEBUG ] : ==Clause: Clause_isOfType_cap Score: 0.9118189215660095
08:52:57 [root ] [DEBUG ] : ==Clause: Clause_neg_isOfType_cap Score: 0.9104220271110535
08:52:57 [root ] [DEBUG ] : ==Clause: Clause_isOfType_pottedplant Score: 0.9350184202194214
08:52:57 [root ] [DEBUG ] : ==Clause: Clause_neg_isOfType_pottedplant Score: 0.936826765537262
08:52:57 [root ] [DEBUG ] : ==Clause: Clause_isOfType_plant Score: 0.8944593667984009
08:52:57 [root ] [DEBUG ] : ==Clause: Clause_neg_isOfType_plant Score: 0.8939589262008667
08:52:57 [root ] [DEBUG ] : ==Clause: Clause_isOfType_pot Score: 0.9987120032310486
08:52:57 [root ] [DEBUG ] : ==Clause: Clause_neg_isOfType_pot Score: 0.9985832571983337
08:52:57 [root ] [DEBUG ] : ==Clause: Clause_isOfType_tvmonitor Score: 0.9006354212760925
08:52:57 [root ] [DEBUG ] : ==Clause: Clause_neg_isOfType_tvmonitor Score: 0.9023823142051697
08:52:57 [root ] [DEBUG ] : ==Clause: Clause_isOfType_screen Score: 0.26709312200546265
08:52:57 [root ] [DEBUG ] : ==Clause: Clause_neg_isOfType_screen Score: 0.990290105342865
08:52:57 [root ] [DEBUG ] : ==Clause: Clause_isOfType_chair Score: 0.9687371850013733
08:52:57 [root ] [DEBUG ] : ==Clause: Clause_neg_isOfType_chair Score: 0.9726874232292175
08:52:57 [root ] [DEBUG ] : ==Clause: Clause_isOfType_sofa Score: 0.8908247351646423
08:52:57 [root ] [DEBUG ] : ==Clause: Clause_neg_isOfType_sofa Score: 0.8878068923950195
08:52:57 [root ] [DEBUG ] : ==Clause: Clause_isOfType_diningtable Score: 0.9247756004333496
08:52:57 [root ] [DEBUG ] : ==Clause: Clause_neg_isOfType_diningtable Score: 0.9212349057197571
08:52:57 [root ] [DEBUG ] : ==Clause: Clause_partOf Score: 0.9916152358055115
08:52:57 [root ] [DEBUG ] : ==Clause: Clause_neg_partOf Score: 0.9883267283439636
08:52:57 [root ] [DEBUG ] : ==Score: 0.8519101142883301=
Here screen is the cause. In the next one pottedplant is the cause:
08:52:57 [root ] [DEBUG ] : ======Iteration: 2969======
08:52:57 [root ] [DEBUG ] : ==Clause: Clause_isOfType_bottle Score: 0.9988484382629395
08:52:57 [root ] [DEBUG ] : ==Clause: Clause_neg_isOfType_bottle Score: 0.9987477660179138
08:52:57 [root ] [DEBUG ] : ==Clause: Clause_isOfType_body Score: 0.9394762516021729
08:52:57 [root ] [DEBUG ] : ==Clause: Clause_neg_isOfType_body Score: 0.9401536583900452
08:52:57 [root ] [DEBUG ] : ==Clause: Clause_isOfType_cap Score: 0.9132775664329529
08:52:57 [root ] [DEBUG ] : ==Clause: Clause_neg_isOfType_cap Score: 0.9119341373443604
08:52:57 [root ] [DEBUG ] : ==Clause: Clause_isOfType_pottedplant Score: 0.9840930700302124
08:52:57 [root ] [DEBUG ] : ==Clause: Clause_neg_isOfType_pottedplant Score: 0.7262876033782959
08:52:57 [root ] [DEBUG ] : ==Clause: Clause_isOfType_plant Score: 0.8957027792930603
08:52:57 [root ] [DEBUG ] : ==Clause: Clause_neg_isOfType_plant Score: 0.8951016664505005
08:52:57 [root ] [DEBUG ] : ==Clause: Clause_isOfType_pot Score: 0.9987808465957642
08:52:57 [root ] [DEBUG ] : ==Clause: Clause_neg_isOfType_pot Score: 0.9986937642097473
08:52:57 [root ] [DEBUG ] : ==Clause: Clause_isOfType_tvmonitor Score: 0.9026285409927368
08:52:57 [root ] [DEBUG ] : ==Clause: Clause_neg_isOfType_tvmonitor Score: 0.9042335748672485
08:52:57 [root ] [DEBUG ] : ==Clause: Clause_isOfType_screen Score: 0.9000895023345947
08:52:57 [root ] [DEBUG ] : ==Clause: Clause_neg_isOfType_screen Score: 0.9194599986076355
08:52:57 [root ] [DEBUG ] : ==Clause: Clause_isOfType_chair Score: 0.9719547033309937
08:52:57 [root ] [DEBUG ] : ==Clause: Clause_neg_isOfType_chair Score: 0.9750593304634094
08:52:57 [root ] [DEBUG ] : ==Clause: Clause_isOfType_sofa Score: 0.8911618590354919
08:52:57 [root ] [DEBUG ] : ==Clause: Clause_neg_isOfType_sofa Score: 0.8890736699104309
08:52:57 [root ] [DEBUG ] : ==Clause: Clause_isOfType_diningtable Score: 0.9251247644424438
08:52:57 [root ] [DEBUG ] : ==Clause: Clause_neg_isOfType_diningtable Score: 0.9222541451454163
08:52:57 [root ] [DEBUG ] : ==Clause: Clause_partOf Score: 0.9927259683609009
08:52:57 [root ] [DEBUG ] : ==Clause: Clause_neg_partOf Score: 0.9897489547729492
08:52:57 [root ] [DEBUG ] : ==Score: 0.9285111427307129==
08:52:57 [root ] [DEBUG ] : ======Iteration: 2970======
08:52:57 [root ] [DEBUG ] : ==Clause: Clause_isOfType_bottle Score: 0.9988582730293274
08:52:57 [root ] [DEBUG ] : ==Clause: Clause_neg_isOfType_bottle Score: 0.9987596273422241
08:52:57 [root ] [DEBUG ] : ==Clause: Clause_isOfType_body Score: 0.9395594000816345
08:52:57 [root ] [DEBUG ] : ==Clause: Clause_neg_isOfType_body Score: 0.9402576684951782
08:52:57 [root ] [DEBUG ] : ==Clause: Clause_isOfType_cap Score: 0.9130504131317139
08:52:57 [root ] [DEBUG ] : ==Clause: Clause_neg_isOfType_cap Score: 0.9126277565956116
08:52:57 [root ] [DEBUG ] : ==Clause: Clause_isOfType_pottedplant Score: 0.3291058838367462
08:52:57 [root ] [DEBUG ] : ==Clause: Clause_neg_isOfType_pottedplant Score: 0.9938804507255554
08:52:57 [root ] [DEBUG ] : ==Clause: Clause_isOfType_plant Score: 0.8959022164344788
08:52:57 [root ] [DEBUG ] : ==Clause: Clause_neg_isOfType_plant Score: 0.895273745059967
08:52:57 [root ] [DEBUG ] : ==Clause: Clause_isOfType_pot Score: 0.998790979385376
08:52:57 [root ] [DEBUG ] : ==Clause: Clause_neg_isOfType_pot Score: 0.9987090229988098
08:52:57 [root ] [DEBUG ] : ==Clause: Clause_isOfType_tvmonitor Score: 0.9029027819633484
08:52:57 [root ] [DEBUG ] : ==Clause: Clause_neg_isOfType_tvmonitor Score: 0.9044926762580872
08:52:57 [root ] [DEBUG ] : ==Clause: Clause_isOfType_screen Score: 0.9034289121627808
08:52:57 [root ] [DEBUG ] : ==Clause: Clause_neg_isOfType_screen Score: 0.9166265726089478
08:52:57 [root ] [DEBUG ] : ==Clause: Clause_isOfType_chair Score: 0.9724078178405762
08:52:57 [root ] [DEBUG ] : ==Clause: Clause_neg_isOfType_chair Score: 0.9753999710083008
08:52:57 [root ] [DEBUG ] : ==Clause: Clause_isOfType_sofa Score: 0.8912268280982971
08:52:57 [root ] [DEBUG ] : ==Clause: Clause_neg_isOfType_sofa Score: 0.8892565369606018
08:52:57 [root ] [DEBUG ] : ==Clause: Clause_isOfType_diningtable Score: 0.9251675605773926
08:52:57 [root ] [DEBUG ] : ==Clause: Clause_neg_isOfType_diningtable Score: 0.9224172830581665
08:52:57 [root ] [DEBUG ] : ==Clause: Clause_partOf Score: 0.9929397106170654
08:52:57 [root ] [DEBUG ] : ==Clause: Clause_neg_partOf Score: 0.9898926615715027
08:52:57 [root ] [DEBUG ] : ==Score: 0.8728839755058289==

It’s definitely true that as I am using the harmonic mean, only one value needs to fall for the overall performance to change a lot. But I can’t explain why it would jump up and down like this, and why it would sometimes continue to fall to the point where every prediction has zero performance.

Still stuck on this, would be great to get some new ideas to try so that I can find the problem. Cheers!

Hi @ptrblck, do you know how I can get some help on investigating this? Because the performance seems to converge I feel like I have done the general setup correctly, but I can’t explain why the performance should then break. Thanks!