The batch size The number of attention heads The length of the sequence The dimension of the inputs and outputs to each block in the model The context vector length for each attention head