Recall that the definition of cross-entropy loss is
In the case of a binary predictor, there are only two possible states (i.e,
Notice that, in the case of binary classification, one of the summands is always zero (since
We can write this in terms of vectors:
Binary cross-entropy loss for images
In the case of a vectorized image, the channel intensities are continuous values between 0 and 1. We have