Early stopping is the practice of terminating model training before completing the maximum planned number of epochs. It is primarily a method for regularization, but it can also be very beneficial for reducing training cost.
The basic strategy for early stopping is to check some metric at the end of each epoch and terminate training if it has not improved over several epochs. This benefits generalizability because it stops training just as (or just before) the model begins to overfit the training data. For this reason, it is very common to use the loss over a validation set as the criterion. One is looking for an absence of improvement, which may involve outright deterioration.
The criterion is usually based on a fixed amount of change in the metric (e.g., “loss decreases by less than 0.5 over 10 epochs”) rather than a percentage change (e.g., “loss decreases by less than 1% over 10 epochs”). After all, different models improve in very different ways and at very different rates, especially when an adaptive optimizer and/or learning rate scheduling are employed. By the time one must study the model’s training behavior well enough to choose a percentage difference threshold, one might as well supply a constant difference one.
The required number of epochs with no improvement is called the patience of the early stopping procedure. In the above examples, patience=10. Once the patience has been reached, training stops. The practitioner may choose to restore the best weights seen so far or to keep the final weights. (If the practitioner has been saving checkpoints after each epoch, they can always retrieve the best weights after the fact.)
Keras provides a callback for early stopping, whereas PyTorch requires the practitioner to implement it themself. PyTorch Ignite also provides a callback.
Sources: