Recall that the definition of cross-entropy loss is

In the case of a binary predictor, there are only two possible states (i.e, ), and the assigned probabilities must sum to one. Typically, this will be represented as a single output such that and become scalar. Therefore we can expand the sum as

Notice that, in the case of binary classification, one of the summands is always zero (since must be 1 or 0).

We can write this in terms of vectors:

Binary cross-entropy loss for images

In the case of a vectorized image, the channel intensities are continuous values between 0 and 1. We have such continuous intensities, and the loss for each of them can be described using the formula above. Hence, for the comparison of images (such as for use in autoencoders), we have