Insights from computer game to create construction, destruction based neural network to make Siri learn like human

vainaijr · August 11, 2020, 3:09am

In order to make Siri learn like human learn isn’t the best way be to see from how human learn to play computer game?

When human play computer game, then in the beginning nothing learnt, random gameplay, overtime what happen is, that if one gets a positive feedback / construction / success, then the technique be learnt, while if one gets a negative feedback / destruction / failure, then the technique be changed, the word technique here would mean a combination of keys, like move forward -> jump -> dive …

Another thing be that it take time to change technique, for example if one learns a particular combination of keybinds for a move, and then one day decides to change the keybinds, then one is not able to play with the new keybinds immediately, it takes time to update a technique, and for some duration one would keep using the same old keybinds.

Another thing be that one would not consider the real world facts in a game, for example if asked do you want to eat this candy, then in the game world, the game developer could program the rules, that your health would be negative if you eat candy, so one would avoid eating candy in game, as it has destructive output, while in the real world, eating a candy would give good taste, that is a constructive outcome, so, one would do it in the real world, that is our actions are based on this notion of success / failure.

One more insight be that, on a positive feedback, the combination of keys only get strongly learnt, like, one get used to a particular combination of keys, if they have had a positive feedback, using the same combination multiple times, that is if a particular set of moves was X->Y->Z, then this combination would be strongly learnt, while on a negative feedback, it lead to combination of keys, getting changed, like, if one used to have a particular set of moves X->Y->Z, and it lead to failure, then they would update it to something like X->Z->Y, or break certain combination like X->Z.

Is there a term for this type of neural network?

ptrblck · August 12, 2020, 4:08am

I’m not an expert in Reinforcement Learning, but your description seems to point towards this field of machine learning, which uses rewards as a training signal and thus can be used to learn how to play video games.

One interesting thing in my opinion is the creation of rewards.
Let’s stick to video games, since you’ve mentioned them. While some “moves” might give a negative reward after their execution (you might lose life or points in any sense), they might be beneficial in the long run and you might even win the game with them.
Also, it’s always funny to watch when some agents are focusing on the “wrong” rewards while playing games, such as here:

The learned movement of grabbing these “turbo” boxes gave apparently more points than winning the game and was thus considered to be better.

This is also an interesting article from OpenAI:

vainaijr · August 13, 2020, 10:01pm

Furthermore, another insight be that, our actions are always based on this notion of success / failure, the example that you share, where the player, goes for the ‘wrong’ rewards, is still going for success.

Consider a person who has played, a game 100 times, and in each of the 100 times they played that game, they won, they are so much used to winning that they are bored.
For the first 100 games, the notion of success, always meant winning.
that is,

first 100 games -> success -> win the game

In the 101st game, the person is bored, now, they decide to create a funny clip / a meme, and share it with others, now the notion of success, changes from winning the game to creating a funny clip, that is,

101st game -> success -> create a funny clip

or another scenario could be that, in the 102nd game, the person says that now I would want to give my friend a win (assuming it be a multi player game, and their friend is one of the opponents), so the notion of success, would change from winning a game, to giving friend a win, that is,

102nd game -> success -> give friend a win

but in every game, the person is still going for success, for example, in the 101st game, they would consider the scenario of winning, and not being able to create a funny clip as a failure, that is,

101st game -> failure -> no funny clip, but won

or in the 102nd game, they would consider it a failure, if their friend lost, that is,

102nd game -> failure -> friend lost

In the 103rd game, the person might set the notion of success, to be the first person to get eliminated, and if they survive longer than anyone else, then that would be a failure, that is,

103rd game -> success -> first person to get eliminated

103rd game -> failure -> survived longer than others

but, in the all of the games, the person attempts at success, what they consider as success keeps changing.

vainaijr · August 15, 2020, 10:10am

Furthermore, another insight be that, one word on its own does not have a meaning, in order for a word to have a meaning, there must be some more words (or image).
But in the current state, the embedding based approach in text experiments, keep updating the so called representation of each word, as if it has nothing to do with any other word, and as if each word be isolated on its own, I think this be a big problem, which do not consider that words be connected to each other.

ptrblck · September 14, 2020, 8:49am

I’m always reading your interesting thoughts, but have to be a bit picky regarding the medical imaging claims.
Even today CT and MRT scans are displayed as black and white images on special monitors, which can “properly” display all necessary levels of gray. That being said, you can find colored images, e.g. via functional MRT images, where the color might denote the flow direction or speed of the liquid (blood) in the vessel. While color codes can be helpful (as seen in the fMRT example) often a grayscale image (on a medical monitor) is more beneficial and not necessarily limited by the compute power your machine has.
The most important aspect is what exactly do you want and need to see in the image.
Medical software usually comes with some presets using special “windows” to display certain tissue classes. E.g. if you are only interested in the lungs, you would select the “lungs window” so that the proper voxel intensities will be scaled to the full monitor range.