While reading (Kunapuli) Ensemble Methods for Machine Learning, I thought the book was poorly edited when it said things like: “A saddle point of the Branin function lies between two minimizers and, like the minimizers, it has a zero gradient at its location. This causes all descent methods to converge to saddle points.”
Turns out, this is exactly what was meant, and a more precise use of language than “minima,” which is what I thought was needed here.
A minimizer is a point in the input space where a function reaches its lowest value in that vicinity. Its corresponding minimum is the value of the function at that point.
More formally, a minimizer is a set of parameters
, and - The Hessian matrix
of at is positive definite.
The value
Hence, it is absolutely correct to say that the saddle point lies between two minimizers. The fact that it lies between two minima is not quite right, since the minima are not locations in the function’s parameter space. The statement “the saddle point is between two minima” is not even meaningful, because for a function