Using Pytorch to rock paper scissors

I resume the problem written on stackexchange

I am new in Pytorch and I’m trying to find a solution to the following problem via Pytorch: I’m playing the game Rock paper scissors

We are two players

J1 Rock - J2 Rock : 0$

J1 Rock - J2 Paper : -1$

J1 Rock - J2 Scissors : 2$

J1 Paper - J2 Scissors : -1$

And the opposite. In brief, if I win with Rock, I win 2$ instead of 1$

I am trying to make play 2 machines in this game to find out what will be the optimal distribution of choices between Rock/Paper/Scissors. Here is my code in Pytorch

import torch
import torch.nn as nn
import torch.optim as optim

class Modele(nn.Module):
    def __init__(self):
        super(Modele, self).__init__()
        self.fc1 = nn.Linear(1, 5)  
        self.fc2 = nn.Linear(5, 3)  
    def forward(self,x):
        x = torch.sigmoid(self.fc1(x))  
        x = self.fc2(x)  
        x = nn.Softmax(dim=1)(x)
        return x

model1 = Modele()
model2 = Modele()

def discrete_loss(x,y):
    M = torch.tensor([
    loss =torch.einsum('ix,iy,xy->i',x,y,M)    
    return loss.mean().float()

optimizer1 = optim.SGD(model1.parameters(), lr=0.001)
optimizer2 = optim.SGD(model2.parameters(), lr=0.001)

n =  5
for i in range(10) : 
    x = torch.rand(n,1,requires_grad=True)
    y = torch.rand(n,1, requires_grad=True)
    x = model1(x)
    y = model2(y)
    loss1 = -discrete_loss(x,y)
    loss2 = discrete_loss(x,y)

Compared to the stackexchange link, I changed the loss to make it a little smoother. Nevertheless my code bug completely, I have the impression of not knowing the good practices in a basic pytorch model. If you could test the code and tell me my errors

Rock, Paper, Scissors has no optimal distribution between the choices given all choices have an equal probability of winning and losing.

You might get a situation where one model edges out another, but likely that won’t last long as training continues.

However, you can train a model that learns a human’s behavior and biases, and then make predictions, adjusting on the fly to win more often than lose. You can do that by establishing a history of past game choices and feeding those in as model inputs.

I have an example here:

It starts out fairly poorly and needs around 100 games to become more effective. Or you could try tweaking the learning rate/model size to learn more quickly.

What you’re doing in your present code is giving random values as input to each model. So I fail to see what the models are actually learning. The inputs should at least be the last model’s output at a bare minimum.

Thanks for the answer, I’m not taking something completely random, I was thinking of doing something like the GAN, I generate an initial noise I transform this noise (uniform law) into a new law thanks to the network which will mimic the best distribution.

I actually want to answer the following math problem: “what is the best strategy for the 2 players to adopt given the gains and losses with this modified rock paper scissors game”