A deep mixture-of-experts model is a deep network in which each block begins with a “gating” layer that learns to route inputs to a subset of downstream neurons. In so doing, each successive layer adds exponentially more “routes” through the network. The result is a network that can develop highly specialized neurons without incurring the computational cost of activating all of them.