David's raw ML reference notes

❯

01 Statistical (machine) learning (data science)

❯

30 Model classes

❯

10 Neural networks

❯

❯

❯

Pretrained neural network playbook

Pretrained neural network playbook

Feb 14, 20253 min read

When working with a pretrained neural network model, the workflow is different (and much simpler) than working with an existing model. To that end, here is a simpler sequence of steps based on Karpathy’s recipe. Note that this assumes a commercial setting where we do not have the luxury of, as Karpathy says, “squeezing out the juice”: once the model is working well enough to be better than nothing, we have to validate our market assumptions with an experiment. Note, though, that “better than nothing” is often a rather high bar!

TODO DATA AUGMENTATION / CLASS IMBALANCES

1. Thoroughly explore dataset

Data cleanliness issues
Class imbalances
- Make note of any data augmentation/class resampling strategies you’ll want to try (but don’t try them yet)
Collinearity / obvious dependencies
Low-hanging feature engineering fruit…
- …and make note of harder things that might help
How would a person go about making predictions?
- If practical, actually make predictions by hand to establish an (approximate) upper performance bound

2. Verify correctness of inputs

Vision models: visualize input examples
Sequences: eyeball sequence elements
Embeddings: visualize using t-SNE or PCA
etc.

3. Establish baselines

Fix random seed (for repeatability)
Mode (for classification), median/mean (for regression): must beat this
Confirm initial loss conforms to expectations
If applicable, establish another baseline after most obvious forms of data augmentation
Optional — may help to initialize the biases in the task head to match high level expectations
- Classification:
- Regression:

4. Overfit to a test batch

Fit until network has very low training loss (possibly approaching zero)
Visualize prediction dynamics over time
- Tweak learning rate if predictions are jumping around
Look at any examples it learns very slowly
Try with a stupid loss function to make sure you understand how gradient changes

5. Overfit the frozen model

At this stage, it makes sense to keep the original pretrained model frozen. That limits the search space of our optimization problem, though the options are still considerable:

Feature engineering
Loss function tweaks and/or replacement with a specialized loss (hinge, focal, etc.)
Data augmentation and resampling
Optimizer selection (but Karpathy says just use Adam with a learning rate of )
Hyperparameter tuning (but Karpathy says defer any LR scheduling, so the options are quite limited here)

At each iteration, we can examine the cases that the model is failing at to figure out what new features to build,. The goal is to push the training loss as low as we can possibly get it, without worrying one little bit about validation loss. We’ll get to that.

6. Regularize the frozen model

Now we try to help this model to generalize by adding some regularization. Again, our options are fairly limited, but we do have a couple of tools at our disposal:

Batch and layer normalization between the frozen model output and the task head (TODO figure out when we’d prefer each)
Dropout at the input to the task head
Weight decay (equivalent to L2 regularization) at the input to the task head

At this point, we can also introduce a learning rate schedule and early stopping. By the end of this step, we should feel that the frozen model has reached the point of diminishing returns for further tweaking.

7. Fine-tune the foundation model

Unfreeze the foundation model and allow the model to fine-tune with a much lower learning rate than before. Other than playing around with learning rate schedules and early stopping, there isn’t much need to play around with anything at this stage.

Graph View

1. Thoroughly explore dataset
2. Verify correctness of inputs
3. Establish baselines
4. Overfit to a test batch
5. Overfit the frozen model
6. Regularize the frozen model
7. Fine-tune the foundation model

Backlinks

No backlinks found

Created with Quartz v4.4.0 © 2025

Terms of Use
LinkedIn
Buy me a coffee