Noam Chomsky - "The machine, the ghost, and the limits of understanding"
Professor Noam Chomsky, Massachusetts Institute of Technology: "The machine, the ghost, and the limits of understanding: Newton's contributions to the study ...
Less entropy (or less disordered system) is favorable over more entropy. Because predictable results are preferred over randomness. This is why people say low perplexity is good and high perplexity is bad since the perplexity is the exponentiation of the entropy (and you can safely think of the concept of perplexity as entropy).
Why do we use perplexity instead of entropy? If we think of perplexity as a branching factor (the weighted average number of choices a random variable has), then that number is easier to understand than the entropy
In information theory, perplexity is a measurement of how well a probability distribution or probability model predicts a sample. It may be used to compare probability models. A low perplexity indicates the probability distribution is good at predicting the sample