Chaudhury 2024, ch. 6 (Bayes, information theory)

New highlights added June 23, 2024 at 11:05 AM

If the inputs all have a common property, the points are not distributed randomly over the input space. Rather, they occupy a region in the input space with a definite shape. (View Highlight)
We then identify a probability distribution whose sample point cloud matches the shape of the (potentially transformed) training data point cloud. We can generate faux input by sampling from this distribution. (View Highlight)
The concept of entropy attempts to quantify the uncertainty associated with a chancy event. (View Highlight)
prefix coding: no two colors share the same prefix. (View Highlight)
Entropy measures the overall uncertainty associated with a probability distribution. (View Highlight)
Entropy is a measure that is high if everything is more or less equally probable and low if a few items have a much higher probability than the others. (View Highlight)
(View Highlight)
(6.6) (View Highlight)
- Note: iPad support is horrible — capture eq 6.6 for multidimensional entropy
Geometrically speaking, entropy is a function of how lopsided the PDF is (see figure 6.2). (View Highlight)
(6.7) (View Highlight)
- Note: The thing to notice is that the entropy of a Gaussian is proportional only the log of the variance.
(6.8) (View Highlight)
- Note: Likewise, for the multivariate Gaussian, it depends only on the determinant of the covariance matrix.

large contributions can happen only in case 1b, where pgt (i) is high and ppred (i) is low—that is, pgt and ppred are very dissimilar. (View Highlight)
if there is dissimilarity, the cross-entropy is high. (View Highlight)
We should look at cross-entropy as a dissimilarity with an offset. (View Highlight)