Are there any changes that I should do in self.model to get the same results?

I’m trying to reproduce the results in Python 3 of this study: Evaluating Semantic Parsing against a Simple Web-based Question Answering Model by Alon Talmor, Mor Geva, Jonathan Berant

The original code is in Python 2 and the torch version used is 0.1.10. I’m having trouble in updating the training part of the code, specifically in the autograd.Variable part and I’m getting a different F1 score from theirs as well as the fx values even if I have the same values in question_train or x_val. Are there any changes that I should make in self.model?

Here is a snippet of the original code from

  def train(self, x_val, y_val):

        fx = self.model.forward(x_val).resize(MAX_NUM_OF_CANDIDATES)

        output = self.loss.forward(fx, y_val)

        # Backward

        # Update parameters

        return {'loss':output,'fx':fx}

  def train_maxent_model(self,ablation_filter=None):
        all_weights = pd.DataFrame()

        if DO_PARAM_TEST:
            param_list = []
            for LR in [2e-2, 5e-2, 12e-2]:
                for decay in [1e-7, 1e-6]:
                    for L2 in [3e-4, 8e-4 , 20e-4]:
            param_list = range(0, NUM_OF_RANDOM_SPLITS)

        for param in param_list:
            print 'new param  = ' + str(param)
            self.run_start_time = time.time()
            self.max_score = 0

            # loading testing and training datasets
            # loading data:
            if DO_PARAM_TEST:

            print 'running on ' + self.feat_filename  + '!' + self.filter_filename

            self.model = nn.Conv1d(NUM_OF_FEATURES, 1, 1, stride=1)
            self.loss = torch.nn.CrossEntropyLoss(size_average=True)
            if DO_PARAM_TEST:
                self.optimizer = optim.Adagrad(self.model.parameters(), lr=param[0], lr_decay=param[1],
                self.optimizer = optim.Adagrad(self.model.parameters(), lr=ADA_GRAD_LR, lr_decay=ADA_GRAD_LR_DECAY,

            training_selection_vec = []

            # learning rounds (each round is a full training learning pass, and could contain one dev pass)
            for round in range(0,NUM_OF_TRAINING_ROUNDS):
                average_train_loss = 0
                number_trained = 0

                # learning from training
                for train_sample,targets,meta,i in zip(['train']['features'],['train']['target'],['train']['meta'],range(0,TRAINING_SIZE)):
                    # pruning learning examples:
                    #if round>40 and i not in training_selection_vec:
                    #    continue

                    number_trained += 1

                    question_train = train_sample.view(MAX_NUM_OF_CANDIDATES, NUM_OF_FEATURES, 1)

                    # Note: we can have more then one true target in the spans
                    span_indexes = [int(target['span_index']) for target in targets]
                    target_tensor = torch.LongTensor(span_indexes)
                    target = autograd.Variable(target_tensor)

                    out_dict = self.train(question_train, target)

What I’ve tried so far is to run it under the 2to3 program and for the autograd.Variable I removed it and just changed it to torch.tensor(). Additionally, I checked and compared the values for question_train and the values in the updated code for question_train are the same as the values in the original. The F1 score in their study is 51.4 but I’m getting 51.132. Thank you in advance.