The basic regression model is represented as shown above.
By increasing the number of layers or increasing the units in between, learning is achieved.
Each arrow has a weight w, which has different values for each arrow.
These weights are adjusted through learning.
However, if that’s all, it would be the same as regular regression, so filters like ReLU or Tanh are applied.
It is also possible to regularize the weights to make them closer to zero, like in Ridge Regression.
By default, regularization is rarely applied.
Initially, the weights are determined randomly.
Analyzing the learned content is difficult, and one way to do it is to look at the heatmap of the weights.
There are Adam and LBFGS algorithms available for parameter learning, which are suitable for beginners.
Once the model is completed, when actually performing predictions, you just need to perform this calculation, simple.
(x is the input, W is the weights for each layer, y is the output, and σ is the sigmoid function).
- Among the many perceptrons in a single layer, there may be one that has a very strong influence.
- To avoid this, dropout is randomly applied.
- This is done to avoid overfitting.
