In the 02 Defining a custom PyTorch module from the tutorial, there is a line in the forward function:
x = x.view(-1, self.num_flat_features(x))
view is an instance method of the torch.Tensor class that returns the same tensor in a different shape. -1 is a magic value, indicating that this dimension should be inferred from the other dimensions.
But why do we have this extra dimension? Shouldn’t the network just pass a flat vector to fully connected layer? The idea is that the PyTorch nn.Module class assumes that we are passing in multiple independent inputs simultaneously, in batches, so that they can make full use of the processing device (CPU/GPU/TPU/NPU). By convention, the batch dimension is first.