transformers

Official docs

Library for working with transformers directly in PyTorch, TensorFlow, or JAX. HF supplies higher-level libraries for some purposes, such as sentence-transformers, but these are generally leaky abstractions that quickly become limiting.

Its key features are as follows:

Pipelines

Pipelines take a model name and a dataset, then return inferences. This can be given a general task category instead of a model name, turning it into a form of AutoML. On the other hand, it can also be given a custom model, making it essentially a batch serving abstraction to use over your own models.

AutoModel and AutoTokenizer

AutoModel and AutoTokenizer, will load the correct model / tokenizer for a known model name. You can then simply call the model as a function and get back tokens/inferences. Note that this function call returns raw logits; you must transform them.

You can also save a model to treat as a pretrained model for this behavior, or customize a pretrained model (e.g. by specifying the number of attention heads you want), though I assume this requires that they happen to have a version of the model around with the attributes you want.

There are also really explicit classes for specific tasks like AutoModelForSequenceClassification. This is cool because you can just specify the basic contours of your task; it will put the appropriate task head onto your selected model and from there you can simply train it, e.g.:

model = AutoModelForSequenceClassification.from_pretrained(
	"google-bert/bert-base-cased", num_labels=5
)

Trainer

A high-level training class for managing the training loop of either a pretrained model or your own torch.nn.Module. Alongside this are convenient classes like DataCollatorWithPadding, which does just what it sounds like and gives you nice batches for your trainer.

Note that Hugging Face assumes that your model’s forward method returns a dictionary containing both the logits and the loss. Hence you don’t actually specify a loss function when you do this.

If you don’t like this much abstraction, it also provides some nice tools for reducing boilerplate in a traditional training loop. There’s also the accelerate package to help with this.

David's raw ML reference notes

Explorer

transformers

Pipelines

AutoModel and AutoTokenizer

Trainer

Graph View

Table of Contents

Backlinks