In our example, the test loop is as follows:

def test(dataloader, model, loss_fn):
    size = len(dataloader.dataset)
    num_batches = len(dataloader)
    model.eval()
    test_loss, correct = 0, 0
    with torch.no_grad():
        for X, y in dataloader:
            X, y = X.to(device), y.to(device)
            pred = model(X)
            test_loss += loss_fn(pred, y).item()
            correct += (pred.argmax(1) == y).type(torch.float).sum().item()
    test_loss /= num_batches
    correct /= size
    print(f"Test Error: \n Accuracy: {(100*correct):>0.1f}%, Avg loss: {test_loss:>8f} \n")

As you would expect, we iterate through all batches from the test set DataLoader, make our predictions, then accumulate loss.

The main things to notice in the code are as follows:

model.eval() “eval” mode has nothing to do with autograd, despite appearances. Rather, it sets a stateful flag on the model instance that it is in “evaluation” mode. This propagates to the submodules (layers), and signals to training-only layers like dropout and batch normalization that they should allow data to pass through unchanged.

with torch.no_grad(): By default, all operations are added to the computational graph. Consequently, during the backwards pass, these operations will factor into the computation of the gradient. If the computational graph were to capture inferences against the test set, then error with respect to the test set would be included in the loss during the training stage of the next epoch. This would result in leakage of test data into the model.

As such, during evaluation, it is absolutely essential to use the torch.no_grad() context manager. This basically puts all computations into “incognito mode” for the purpose of autograd. (It also speeds things up a bit, since less work must be done.)

X, y = X.to(device), y.to(device) Tensors, models, and other PyTorch resources have a .to(device) function, which copies them over to the specified device. This is not in-place, hence the assignments. As long as everything has been copied over, PyTorch will automatically use the specified device for any operations. By the same token, if one or more necessary components has not been copied over, the operation will fail with an error.

pred = model(X) This returns a vector (really a rank-1 PyTorch tensor) with one element per example in the mini-batch.

test_loss += loss_fn(pred, y).item() Since pred is a vector of example inferences and y is a vector of example ground truth labels, loss_fun can return a zero-rank PyTorch tensor. To extract the underlying scalar, we must call item(), which only works for zero-rank tensors.

correct += (pred.argmax(1) == y).type(torch.float).sum().item() Get used to this kind of ugly one-lining in PyTorch; people do it everywhere. Unpacking, pred.argmax(1) == y returns a boolean vector with one value per example in the minibatch. The .type operation turns these into floats (1.0 or 0.0). sum() rolls this up into a zero-rank tensor. item() returns the underlying plain old Python scalar value.

print(f"Test Error: \n Accuracy: {(100*correct):>0.1f}%, Avg loss: {test_loss:>8f} \n") I find it interesting that PyTorch doesn’t provide us with some encapsulated way to handle this. I think the attitude is that they’re providing you with a numerical package and the rest is up to you.