Cybenko's universal function approximation theorem

The Cybenko’s universal approximation theorem states, per Wikipedia, that

feedforward networks with non-polynomial activation functions are dense in the space of continuous functions between two Euclidean spaces, with respect to the compact convergence topology.

My understanding of this is that for any continuous function , there exists a multi-layer perceptron with at one hidden layer that can approximate to arbitrary precision.

At a high level, the proof is along these lines:

Any step in can be exactly described by scaling and offsetting the vector Heaviside step function ; and therefore
Any rectangular region in can be exactly described by such functions.
The mean value theorem for integrals states that any continuous function can be approximated to arbitrary precision by summing rectangular regions.
Therefore, any continuous function can be approximated to arbitrary precision by some number of Heaviside step functions.
We recall that is the definition of a perceptron unit.
Therefore, any continuous function can be approximated to arbitrary precision by some number of perceptron units.
A multi-layer perceptron with a single hidden layer consists of one layer of such perceptron units, with an input layer corresponding to and an output layer corresponding to a linear combination of the perceptron units’ output.
Therefore, any continuous function can be approximated to arbitrary precision by a multi-layer perceptron with a single hidden layer.

(This proof can be extended to any nonlinear activation function.)

In practice, the number of neurons may be impracticably large. Additional hidden layers can exponentially reduce the number of neurons required through hierarchical feature extraction.

David's raw ML reference notes

Explorer

Cybenko's universal function approximation theorem

Graph View

Backlinks