In statistical learning, a structural prior is an assumption that is encoded into the model through the structure of the learning model itself, rather than through the parameters of the model.
Such priors may force the model towards solutions with greater training loss values than would occur from a less constrained model, but which generalize better because their predictions better match the underlying mechanism. For example, latent dirichlet allocation forces the model to make predictions based on the assumption that text is generated by first selecting a topic and then selecting a word. (An unconstrained model might memorize the training examples.)
On the other hand, structural priors may simply create a parameter space that is more conducive to learning the objective, even though a less constrained model might eventually be able to find the same solution. For example, convolutions encode a structural prior that a pixel’s position in space encodes information. Cybenko’s theorem tells us that a sufficiently large fully connected network could eventually learn an arbitrarily good predictor for the same task, but in practice the search space would be far too large to be tractable.