Neural network

A neural network is a statistical model in which multiple simple models, called neurons are connected according to a specific architecture. The earliest version of this technique was the Single-layer perceptron, for which the operator pre-specifies the parameters of the model. Rumelhart, Hinton, and Williams (1986) introduced (or at least popularized) the idea of training these parameters by propagating errors at the output layer to earlier layers. Neural networks can be thought of as stacked ensemble learners with a step-like estimator.

As flexible, non-linear models, neural networks are highly versatile: with the right architecture and enough data, it seems that they can (though maybe should not) (though often should not) be used for essentially any classification, regression, or transduction task (especially translation) that we currently know how to do.

On the other hand, neural networks are computationally intensive; require a large volume of data to train; can (very) easily overfit the training data; and are notoriously opaque, even to the people who develop them.

Large language models (LLMs), based on the Vaswani transformer model, are by far the most visible class of neural network model. However, they are by no means the best suited for many tasks—at least not yet—and a large number of other model types remain in widespread use, particularly CNNs and diffusion models. RNNs and GANs remain in use, but are gradually being supplanted by transformers and diffusion models respectively.

David's raw ML reference notes

Explorer

Neural network

Graph View

Backlinks