calc::entropy_bits() is not calculating the entropy correctly. Fortunately, it works for the current use case, but should somebody else use it to calculate the entropy of a list with repeated elements the result would be totally wrong.
Example:
>>> entropy_bits(list('abcabcabcabc')) # repeated elements, problem
6.339850002884623 # should be 1.5849625007211559
>>> entropy_bits(list('abcdefghijkl')) # no element repetition, ok
3.584962500721156 # correct
The problem is not taking into consideration the number of times an element is repeated in the list. The fix is quite easy:
for prob, count in zip(probs, counts):
entropy -= prob * log2(prob) / count
print(entropy)
Note that len(probs) == len(counts) and are respectively ordered.
calc::entropy_bits() is not calculating the entropy correctly. Fortunately, it works for the current use case, but should somebody else use it to calculate the entropy of a list with repeated elements the result would be totally wrong.
Example:
The problem is not taking into consideration the number of times an element is repeated in the list. The fix is quite easy:
Note that
len(probs) == len(counts)and are respectively ordered.