PyTorch — Modules

rw-book-cover

Metadata

Author: readwise.io
Full Title: PyTorch — Modules
Category:articles
Summary: PyTorch uses modules to represent neural networks with learnable parameters for optimization. Modules can be interconnected to create complex neural networks, and hooks can be added for custom computations during training. PyTorch’s autograd system handles the backward pass for gradient computation, simplifying the training process.
URL: https://readwise.io/reader/document_raw_content/188909545

Modules make it simple to specify learnable parameters for PyTorch’s Optimizers to update. (View Highlight)
Modules are straightforward to save and restore, transfer between CPU / GPU / TPU devices, prune, quantize, and more. (View Highlight)
the module itself is callable, and that calling it invokes its forward() function (View Highlight)
The “backward pass” computes gradients of module outputs with respect to its inputs, which can be used for “training” parameters through gradient descent methods. (View Highlight)
PyTorch’s autograd system automatically takes care of this backward pass computation, so it is not required to manually implement a backward() function for each module. (View Highlight)
Sequential automatically feeds the output of the first MyLinear module as input into the ReLU, and the output of that as input into the second MyLinear module. (View Highlight)
it is recommended to define a custom module for anything beyond the simplest use cases, as this gives full flexibility on how submodules are used for a module’s computation. (View Highlight)
Immediate children of a module can be iterated through via a call to children() or named_children() (View Highlight)
- Note: It’s not clear from the output if the modules are discovered when they are defined inside of __init__, or if they are discovered by parsing the forward method. I suspect the former, in which case the order in which they are declared matters for the order of their output.

ModuleList and ModuleDict modules are useful here; they register submodules from a list or dict: (View Highlight)
calls to parameters() and named_parameters() will recursively include child parameters, (View Highlight)
@torch.no_grad() (View Highlight)
- Note: no_grad can be used as either a context manager or a decorator.

net.zero_grad() (View Highlight)
- Note: They have to use the zero_grad method on the module because the “optimizer” is just the absolute value function.
In general, modules should be in training mode during training and only switched to evaluation mode for inference or evaluation. (View Highlight)
A module’s state_dict contains state that affects its computation. This includes, but is not limited to, the module’s parameters. (View Highlight)
self.register_buffer(‘mean’, torch.zeros(num_features)) (View Highlight)
- Note: This causes the value of mean to be saved in serialized instances of the object. It will also be retained when using the .to method. If persistent=False is passed, the state will still move devices, but it will not be serialized.
hooks (View Highlight)
- Note: Hooks are callbacks that can be registered to occur during the forward or backward pass. There are separate hooks for before and after each pass.