The outputs of my neural network is the production of the last layer’s neurons. This makes the output much larger when the number of neurons increases. To avoid overflowing I have to standardize the neuron outputs, how to do this?

I’m having trouble understanding your question. Why does the number of neuron change? Is it like a CNN where the output size may scale with input size? What do you mean by overflowing and standardizing?

My neural network has only one output, instead of using Non-linear activations in hidden layers, the output of the neural network is the production of the neurons from the last layer. When the weights are initialized by normal distribution (mean is 0 and std is 1), if I increase the neuron numbers of the last layer, the production (output) can be very large. Such large numbers can cause double precision out of range, this is what I named overflowing.

So, I want to find a way to standardizing the neuron outputs of the last layer, which may not cause overflowing.

I’m still quite confused… So you are not using non-linearity at all? Wouldn’t that be just a linear model?

How large did you set the hidden layer size? Even with standard normal initialization, i.e. initial output is a normal centered at 0 with linear model, it shouldn’t easily go beyond double range. Double range is very very large.

Also, you may want to initialize using some heuristic depending on the input (and output) size, such as `randn(n) / sqrt(n)`

(n is input size), etc.

If Chinese is easier for your to explain this, feel free to message me in Chinese.

This is my experimental network. The first layer and the second layer is fully connected.

{first layer (linear activated)}

{second layer (linear activated)}

{the production of the neurons of the second layer}

I think the network is non-linear because of the production. The number of the last layer’s neurons increases, the non-linear ability of the network increases.

The first layer has 100 neurons and the second layer has 64 neurons. The input is an image with 8x8 pixels, each pixel is either 0 or 1. After standard normal initialization, the output is huge.

I tried uniform initialization, the output is also huge, but the outputs’ variation is smaller than that by normal initialization.

Why am I using such structure of network is because I want to express a physics wave-function of many-body system by neural network. I tried MLP and CNN however they don’t work well, hence I designed such kind of network.

I still don’t fully understand, especially I’m not sure what you mean by production.

Concatenating linear layers together will only give you a linear model. So doing that is basically waste of computation,.

With such a small network and binary input, even with N(0, 1) initialization, it shouldn’t output very large values. Although considering you are using a linear model, weights might work together to amplify the input.

I have explained my network in Chinese by sending a message to you. Please check it out. Thank you.

Hi,

I’m late to the party. But I think batch normalization is exactly what you need. You can add a BN-layer before any activation: BatchNorm2d — PyTorch 1.8.1 documentation

The common practice is to normalize values (zero-center and divide by emprical variance) before feeding the values to a NN. Batch normalization simply extends this idea to intermediate neurons as well.