David's raw ML reference notes

❯

01 Statistical (machine) learning (data science)

❯

30 Model classes

❯

10 Neural networks

❯

03 Architectures

❯

Transformer models

❯

Annotated Transformer, The

❯

00 Constants used in analysis of The Annotated Transformer

00 Constants used in analysis of The Annotated Transformer

Feb 14, 20251 min read

The batch size
The number of attention heads
The length of the sequence
The dimension of the inputs and outputs to each block in the model
The context vector length for each attention head

Graph View

Backlinks

01 Implementation of scaled dot-product attention
03 Implementation of multi-head attention
10 Implementation of positional encoding

Created with Quartz v4.4.0 © 2025

Terms of Use
LinkedIn
Buy me a coffee