Calculating Entropy (in Python)

Tags

I’ve been playing with calculating the entropy of a toy system used to illustrate the connection between “disorder” and entropy. (See Entropy 101 and Entropy 102.) I needed to calculate the minimum number of moves required to sort a disordered collection, but that turns out to be an NP problem (no doubt related to Traveling Salesman).

The illustration bridges Boltzmann and Shannon entropy, which got me playing around with the latter, and that led to some practical results. This article describes the code I used mainly as an excuse to try embedding my own colorized source code blocks.

I’m not going to get into entropy here (I’ve posted about it plenty elsewhere). What matters is that this code involves Shannon entropy, not Boltzmann entropy. They’re closely related but involve quite different domains and enough differences to make calling them the same thing ever so slightly questionable.

Regardless, rather than Boltzmann’s famous thermodynamic formula:

$\displaystyle{S}_\mathit{Boltzmann}={K}_{B}\;\log\;\Omega$

The code here uses Shannon’s version:

$\displaystyle{S}_\mathit{Shannon}=\sum_{i}\;\mathrm{PF}({x}_{i})\;\log_{2}\,\mathrm{PF}({x}_{i})$

Which gives an answer in binary bits because of the log2 function. Using the log10 function gives a result in terms of decimal digits. The symbol density determines the log base (or vice versa).

Implementing the equation is very simple:

 from math import log

 

 def shannon_entropy (NP, PF, logbase=2):

     “””Calculate the Shannon Entropy given PF(x).”””

     # Generate probabilities…

     probs = [PF(x) for x in range(NP)]

     tprob = sum(probs) # expected to be 1.000

     # Generate summation terms…

     ents = [p*log(p,logbase) for p in probs]

     ent = –sum(ents)

     # Return all the generated data…

     return (ent, ents, tprob, probs)

 

 def demo_shannon_entropy (n_patts=27):

     probfunc = lambda x:1/float(n_patts)

     ent,_,tp,_ = shannon_entropy(n_patts, probfunc)

     print(ent)

     print(tp)

     print()



It can be done in a single line lambda expression if one doesn’t mind calling the probability function twice:

 from math import log2

 

 “””Calculate the Shannon Entropy given PF(x).”””

 shannon_entropy = \

     lambda N,P: –sum([P(x)*log2(P(x)) for x in range(N)])



But I do, so first I generate a list of all the probabilities for each system micro-state (line #6). As a safety check, in the next line, I also generate a sum of those probabilities (something the one-liner can’t do). The expected sum is 1.000 (give or take a floating-point error).

Then I generate a list of summation terms (line #9), sum them to derive the entropy (line #10), and return the entropy, the total probability, and both lists. Most applications will only care about the entropy value, but the other data is available if wanted.

When run with the 27-pattern default, the demo prints:

4.754887502163471
0.9999999999999993

Telling us the entropy is ~4.755 bits and, as expected, the probabilities do sum to 1.000, give or take a floating-point error. The entropy value is the same as:

$\displaystyle\log_{2}({27})$

That’s because the probability function (line #15) assigns equal probability (1/NP) to all possible patterns. When all patterns are the same, the entropy is maximum and the same as the number of bits required to express all possible patterns. (If we had 32 patterns, the max entropy would be exactly 5.000 because five bits can express exactly 32 patterns.)

Note that if we change logbase to 10, then the demo prints:

1.4313637641589871
0.9999999999999993

Telling us the same 27-pattern system, when all patterns are equal, has an entropy of ~1.43 decimal digits. Setting logbase to other values results in different entropy values. The entropy just says how many symbols are required to express NP many values.

In a way, that’s it, everything else is an application of that calculation. But there are some interesting things we can do to explore its consequences.

Imagine we have a “tea” variable that can specify one of 27 types of tea. (Imagine a tiny dial with a pointer.) As just shown, when all 27 teas are equally likely, the dial could point to any of them, we say the variable has maximum entropy (of ~4.755 if we mean bits). This effectively means the variable is random — it has maximum surprise. Things get more interesting when we vary the pattern probabilities.

A very simple extension assigns a high probability to one pattern (tea) and a low probability to all others. We’re saying the tea dial is very likely to point to some given tea, and very unlikely to point to any other. In this simple case we’ll assign the same low probability to all other possible teas. Here’s some code:

 from random import randint

 

 def demo_shannon_entropy (NB=5, p_of_x=None):

     “””Shannon entropy for a specific P(X).”””

     NP = pow(2,NB)

     PF = (1.0/NP) if p_of_x is None else p_of_x

     PX = randint(1, NP–1)

     probs = lambda x:PF if x==PX else ((1.0–PF)/(NP–1))

     ent,_,tprob,_ = shannon_entropy(NP, probs)

     print()

     print(‘NB=%d, NP=%d’ % (NB,NP))

     print(‘P(PX)=PF=%.12f’ % PF)

     print(‘tot-prob=%.12f’ % tprob)

     print(‘entropy=%.12f’ % ent)

     print()

     return ent



The sum of all probabilities must be 1.000, so the probability function must be normalized. That’s why the example above used 1/NP for each pattern’s probability. The sum is NP×(1/NP), which is obviously one. Here, normalization is ever so slightly trickier.

Firstly, the demo function takes NB (number of bits), rather than NP (number of patterns). For these demos we’ll use whole numbers of bits because fractional values, such as for 27 teas, don’t really change anything. The convenience is that in using whole bits, our maximum entropy is always the number of bits.

We do need NP, so line #5 calculates it. The next line sets PF to the input argument p_of_x (probability of X) if it was provided, otherwise uses the default 1/NP (which will result in maximum entropy). We set PX to some random pattern, and line #8 defines the probability function: return PF if the pattern matches PX otherwise return a low probability calculated as 1.0-PF divided among all other patterns.

When run with defaults, the demo prints:

NB=5, NP=32
P(PX)=PF=0.031250000000
tot-prob=1.000000000000
entropy=5.000000000000

As expected, the entropy is 5.00 and the probabilities sum to 1.00. The probability of the expected number is only 3.125% — or odds of exactly 1/32 for each pattern. Maximum entropy, maximum surprise.

But if we set p_of_x=0.99, it prints:

NB=5, NP=32
P(PX)=PF=0.990000000000
tot-prob=1.000000000000
entropy=0.130335099000

A much-reduced entropy value. There is a 99% probability the dial will point to the expected number, so there is very little surprise. We can take things even further by setting p_of_x=0.999999 — a near certainty of the expected answer:

NB=5, NP=32
P(PX)=PF=0.999999000000
tot-prob=1.000000000000
entropy=0.000026328459

Almost no surprise at all.

Looking at the way making one answer more likely changes the entropy naturally led me to charting the curve as entropy is reduced this way from maximum to some tiny value. (It approaches zero, but since log(0) is an invalid expression, it can never get there.)

Here’s some code:

 from random import randint

 

 def demo_shannon_curve (NB=5):

     “””Shannon Entropy curve for P(X).”””

     extra = [0.999999, 0.9999999]

     NP = pow(2,NB)

     X = randint(1, NP–1)

     print(‘NB=%d, NP=%d’ % (NB,NP))

     print(‘X=%d’ % X)

     print()

     # P(X) goes from 1/NP thru (NP-1)/NP…

     ps = [n/NP for n in range(1,NP)]

     # Loop over probabilities for P(X)…

     for ix,p in enumerate(ps+extra):

         PF = lambda x:p if x==X else ((1.0–p)/(NP–1))

         ent,_,tp,_ = shannon_entropy(NP, PF)

         print(‘%4d: %.9f = %.12f (%.6f)’ % (1+ix, p, ent, tp))

     print()



It starts off like the demo above, but line #12 calculates a list of probabilities from 1/NP to (NP-1)/NP — which approaches, but doesn’t reach, 1.00. Two extra probabilities getting even closer to 1.00 get tacked on to get the curve closer to zero.

When run it prints a list:

   1: 0.031250000 = 5.000000000000 (1.000000)
   2: 0.062500000 = 4.981849107605 (1.000000)
…
…
  32: 0.999999000 = 0.000026328459 (1.000000)
  33: 0.999999900 = 0.000002965039 (1.000000)

I used a function like this to generate data I could make a chart with:

As you see, the entropy curve approaches zero. Mathematically it can’t get there, but conceptually, if the probability is 1.00, there’s no surprise in the answer (effectively, it’s already known) and the entropy is zero.

This post was mainly an excuse to see if inserting my HTML Python code would work (look like). I’ll leave you with a complete class you can use for your own explorations:

 from math import log,log2, factorial

 

 def bit_cmp (x, y):

     n = x ^ y

     return sum([int(b) for b in bin(n)[2:]])

 

 def n_choose_k (nb, k):

     “””nb!/k!(nb-k)!”””

     if nb < k: return 0

     return int(factorial(nb)/(factorial(k)*factorial(nb–k)))

 

 class entropy (object):

     “””Shannon Entropy Calculator class.”””

     def __init__ (self, nbr_of_bits):

         self.nb = nbr_of_bits

         self.np = pow(2,self.nb)

     def pf (self, x):

         “””Probability Function. (Override in subclass!)”””

         return 1.0

     def __call__ (self):

         “””Generate and print results.”””

         self.ns = [n_choose_k(self.nb,k) for k in range(self.nb+1)]

         # Normalized Probability Factors…

         self.fs = [self.pf(x) for x in range(self.nb+1)]

         self.ts = [n*f for n,f in zip(self.ns,self.fs)]

         self.tp = sum(self.ts)

         self.ps = [f/self.tp for f in self.fs]

         # Calculate Shannon entropy…

         self.xs = [self.ps[bit_cmp(0,p)] for p in range(self.np)]

         self.es = [x*log2(x) for x in self.xs]

         self.ent = –sum(self.es)

         self.tpx = sum(self.xs)

         # Print results…

         print(str(self))

         print()

         for ix,rcd in enumerate(zip(self.ns,self.fs,self.ts,self.ps)):

             print(‘%2d:%6d %12.4e %12.4e %14.6e’ % ((ix,)+rcd))

         print()

         print(‘PF(EXP)=%.8f (tp=%.6f)’ % (self.ps[0], self.tp))

         print(‘entropy=%.8f (tp=%.6f)’ % (self.ent, self.tpx))

         print()

         return self.ent

     def __str__ (self):

         return ‘#bits:%d, #patterns:%d’ % (self.nb, self.np)

 

 class entropy1 (entropy):

     def __init__ (self, nbits, divisor):

         super().__init__(nbits)

         self.prob = 1.0/float(divisor if divisor else self.np)

     def pf (self, x):

         return self.prob

 

 class entropy2 (entropy):

     def __init__ (self, nbits, probability):

         super().__init__(nbits)

         self.prob = 1.0/float(probability)

     def pf (self, x):

         return (1.0 if x==0 else self.prob)

 

 class entropy3 (entropy):

     def __init__ (self, nbits, factor):

         super().__init__(nbits)

         self.fact = float(factor)

     def pf (self, x):

         return (1.0/pow(self.fact,x))

 

 def demo_entropy_class (NB=8):

         e0 = entropy(NB)

         e1 = entropy1(NB, 5.0)

         e2 = entropy2(NB, 1e6)

         e3 = entropy3(NB, 250)

         e0()

         e1()

         e2()

         e3()

         return (e0,e1,e2,e3)



I won’t explain it here unless someone expresses an interest.

Link: Zip file containing all code fragments used in this post.

13 thoughts on “Calculating Entropy (in Python)”

Wyrd Smythe said:

December 21, 2021 at 3:02 pm

I hate — really, really hate — the way the WordPress Reader strips formatting from my posts! I highly advise never using the WP Reader and always viewing blog posts on the website.

This post, for instance, Calculating Entropy (in Python)

Wyrd Smythe said:

December 21, 2021 at 3:08 pm

I should probably document those entropy sub-classes…

Wyrd Smythe said:

April 3, 2023 at 9:31 am

I never expected this to become my most popular post!

- Wyrd Smythe said:
  
  October 17, 2023 at 11:56 am
  
  By a considerable margin!
  
  - Wyrd Smythe said:
    
    June 16, 2024 at 8:48 am
    
    As of today, 2,395 page hits. The next most-viewed post only has 412! 😮
    
Wyrd Smythe said:

May 8, 2024 at 8:41 pm

But no questions, so I guess it’s completely clear to everyone?

H. Ballesta said:

June 24, 2024 at 10:10 pm

Thank you for the post, very interesting. I would like to ask you something because I can’t find much specific information. Have you had convergence problems with the maximum entropy? I am working on a system that evolves over time in a simulation and theoretically the entropy should maximize and become an asymptote, but there are many fluctuations. My question is whether floating point or any other element of Python 3 could be responsible for these fluctuations, since theoretically this shouldn’t happen.

- Wyrd Smythe said:
  
  June 25, 2024 at 10:12 am
  
  Python floats are 64-bit floating point with only 15-16 decimal digits of precision, so, depending on what your math is doing, there can definitely be issues with series. Maximum entropy at any given time is just the log of the number of possible combinations of the system, so the issue may be in how you’re calculating that number.
  
  Also, depending on your system, Shannon entropy might not be what you need. I would think that most systems focused on evolution usually deal with Boltzmann entropy.
  
H. Ballesta said:

June 25, 2024 at 7:13 pm

Hello, thank you very much, according to what you told me I was verifying and you are right the problem is in the calculation not in the entropy, I will improve the process and I will insist with Shannon. Greetings.

H. Ballesta said:

June 25, 2024 at 8:51 pm

Not sure if the comment was submitted, again good response helped me a lot.

- Wyrd Smythe said:
  
  June 26, 2024 at 11:10 am
  
  Glad it worked out. (WordPress didn’t know who you were, so it put the message in a moderation queue. Unfortunately, it doesn’t provide any feedback, so the message seems to just vanish.)
  
Pingback: Friday Notes (Aug 30, 2024) | Logos con carne
Pingback: Friday Notes (Sep 27, 2024) | Logos con carne