Input for decoder

If we consider the architecture of transformer, then we see two inputs. One input for (src) for the encoder, another (tgt) for the decoder. On my network, a sequence of numbers is sent to the encoder input (this is like a sentence with words). I have nothing to apply to the decoder. At the output of the decoder, I should have three neurons. As a target, I tell the network which neuron gives the correct answer (as in the classification of softmax tasks). I can’t figure out what to input to the decoder. Can I submit torch.zeros (1, batch, d_model) there? Help me solve my problem.

Maybe someone knows the answer …

Who worked with the nn.transformer module help me figure it out. I am ready for a paid consultation. My model is not learning, something in my code is not correct.

I’m not sure to understand the question completely.
Is it not possible to feed the output of the encoder into the decoder?

Not. If you look at the documentation on nn.transformer, you will see that it accepts two required parameters src and tgt. stc on encoder, tgt on decoder. The dimension they have (S, N, E) and (T, N, E). In my example, src is available. I don’t have tgt, but since this parameter is required, I submit tg = torch.zeros (1, batch, 128) .to (device) as a tgt to the decoder. At the output, I have softmax for 3 classifications. My network cannot be trained even with the simplest example, I have an error somewhere.

import numpy as np
import torch
import torch.nn as nn

wn1 = 160
batch = 30
epochs = 300
learning_rate = 0.00005

class Trans(nn.Module):
    def __init__(self, d_model=128, nhead=8, num_encoder_layers=4, num_decoder_layers=4, dim_feedforward=512):
        super().__init__()
        self.fc1 = nn.Linear(9,128)
        self.tr = nn.Transformer(d_model, nhead, num_encoder_layers, num_decoder_layers, dim_feedforward)
        self.fc = nn.Linear(d_model,3)
        
    def forward(self, src, tgt):
        out = self.fc1(src)
        out = self.tr(out, tgt)
        out = out.reshape(out.size(1), -1)
        out_nograd = self.fc(out)
        
        return out_nograd
    
device = torch.device("cuda:0")
net = Trans()
net.to(device)

for p in net.parameters():
    if p.dim() > 1:
        nn.init.xavier_uniform_(p)
tg = torch.zeros(1, batch, 128).to(device) #tgt: (T, N, E)

#net.load_state_dict(torch.load('save_cuda_2year_trans_v2_2.pth'))
criterion = nn.CrossEntropyLoss(reduction='sum')
optimizer = torch.optim.AdamW(net.parameters(), lr=learning_rate)
st_s = np.interp(np.loadtxt('src.txt',delimiter=';'), [0,10], [-1,1])
st_t = np.loadtxt('target.txt')

len_st = len(st_t)
b = (len_st-wn1)//batch
len_batch = b*batch+wn1
for epoch in range(epochs):
    for wn_start in range(0,len_batch-wn1,batch):
        wn_tick = wn_start + wn1
        wn_all = []
        los_l = []
        for b_iter in range(batch):
            wn_all.append(st_s[wn_start+b_iter:wn_tick+b_iter,:])
            los_l.append(st_t[wn_tick+b_iter])
        los_l = np.array(los_l,dtype=np.int)
        los_l = torch.from_numpy(los_l)
        los_l = los_l.type(torch.long).to(device)
        wn_all = np.array(wn_all,dtype=np.float32)
        wn_all = torch.from_numpy(wn_all)
        wn_all = torch.transpose(wn_all,dim0=1, dim1=0).to(device)
        outputs = net(wn_all,tg) 
        loss1 = criterion(outputs, los_l)
        optimizer.zero_grad() 
        loss1.backward()
        optimizer.step()

I am missing '<sos>' and '<eos>'

Any ideas? :slight_smile: