I realized that I could no longer articulate why this was a thing. The intuitive image is the chaining of simple machines such as gears: the speed of the third gear depends on the speed of the first gear as translated by the second gear.

The actual proof is a bit gnarly. The “brute force” approach is to take the limit of the difference quotient of as . This can have discontinuities if can get arbitrarily close to , so then we have to do a piecewise u-substitution and then derive the chain rule for each piece.

According to Wikipedia, there are two more proofs. The first is to precisely define error in the linear approximation of the composed function. The second depends on an alternative definition of differentiability that I don’t have time to sit and think about.

For now, be satisfied to know that:

  1. You could prove this by taking the limit of as approaches , rearranging terms, and then doing a U-substitution.

I’m pretty sure Mrs. Newberry made me do this for homework at some point.