So Random!

Tags

To ring in the new year, I thought I’d play around with an old friend from my earliest programming days, a random text generator. Back then (over 30 years ago), but a little bit always, a good way to practice programming is by working on small, relatively easy, but still fun, programs.

Simple games are common choice, but not the only one. (I’ve probably written a version of Mastermind in every programming language I know.) Another fun choice is various image or text generators (or processors). Random text generators, in particular, offer a range of complexity depending on your taste and time.

Let’s start with a very simple example:

 from random import randint

 

 # Random character from ‘A’ to ‘Z’…

 randchar = lambda: chr(ord(‘A’)+randint(0,25))

 

 def random_text_1 (size=12):

     ”’Simple Random Text Generator.”’

 

     # Generate a random list of chars…

     s = [randchar() for _ in range(min(24,size))]

 

     # Return list as a string…

     return ”.join(s)



The above code fragment just generates a short random text string. It’s pretty simple: all uppercase, just letters, and no structure (like spaces or periods or whatnot).

From this starting point, the only limitation is your imagination and interest. The ultimate goal is a random text generator that creates text that looks as close to real text as possible but is strictly random.

The first steps involve using spaces to break the random characters into words and sentences. This requires switching to lowercase and capitalizing the first letter of the first word. It also requires a period at the end of the sentence.

Later steps involve breaking text into paragraphs, adding parenthetical sentences, and making some sentences end with a question or exclamation mark.

One can also tune the random various points to reflect real language. The basic random() function provides a flat distribution. An obvious improvement is selecting letters based on their use in English. Sentence and word lengths can be tuned, too.

Without further ado, let’s jump into the code for creating my New Year’s Day blog post.

As is common for me, after some prototyping, I ended up creating a class, named RandomText. The constructor looks like this:

 def __init__ (self, **kwargs):

     ”’New RandomText instance.”’

     self.size = kwargs[‘paragraphs’] if ‘paragraphs’ in kwargs else 5

     self.para_min = kwargs[‘para_min’] if ‘para_min’ in kwargs else 1

     self.para_max = kwargs[‘para_max’] if ‘para_max’ in kwargs else 20

     self.sent_min = kwargs[‘sent_min’] if ‘sent_min’ in kwargs else 2

     self.sent_max = kwargs[‘sent_max’] if ‘sent_max’ in kwargs else 17

     self.word_min = kwargs[‘word_min’] if ‘word_min’ in kwargs else 2

     self.word_max = kwargs[‘word_max’] if ‘word_max’ in kwargs else 9

     self.nmbr_min = kwargs[‘nmbr_min’] if ‘nmbr_min’ in kwargs else 1

     self.nmbr_max = kwargs[‘nmbr_max’] if ‘nmbr_max’ in kwargs else 6

     self.single_f = kwargs[‘single_f’] if ‘single_f’ in kwargs else 0.01

     self.number_f = kwargs[‘number_f’] if ‘number_f’ in kwargs else 0.001

     self.parens_f = kwargs[‘parens_f’] if ‘parens_f’ in kwargs else 0.001

     self.q_mark_f = kwargs[‘q_mark_f’] if ‘q_mark_f’ in kwargs else 0.1

     self.exclam_f = kwargs[‘exclam_f’] if ‘exclam_f’ in kwargs else 0.03

     self.use_freq = kwargs[‘use_freq’] if ‘use_freq’ in kwargs else True

     self.use_html = kwargs[‘use_html’] if ‘use_html’ in kwargs else False

     self.use_sect = kwargs[‘use_sect’] if ‘use_sect’ in kwargs else False

     self.alphas = AlphaSet() if self.use_freq else Alphas

 

     # Generate text…

     ps = [self.paragraph(ix) for ix in range(self.size)]

     self.text = EOL.join(ps)



A RandomText instance has a lot of keyword parameters, all with defaults. They control various parameters, the minimum and maximum sizes of word, sentences, and paragraphs, for instance.

Down at the bottom (lines 21, 22) a list generator calls paragraph() to create the requested number of paragraphs of random text. The text is assigned to the text member and is available as the str() of the instance.

 def __str__ (self):

     return  self.text

 

One note as we go through this: I put in about a day of work on this with the goal of blog posts just after midnight. So there’s a deadline, is the point, and parts of the code are not as fully developed as intended.

Also, before we continue, here are various important constants:

 NUL = ”

 EOL = ‘\n’

 TAB = ‘\t’

 SPC = ‘ ‘

 DOT = ‘.’

 UCX = ord(‘A’)

 LCX = ord(‘a’)

 

 Alphas = [chr(LCX+n) for n in range(26)]

 Numbers = [‘0’,‘1’,‘2’,‘3’,‘4’,‘5’,‘6’,‘7’,‘8’,‘9’]

 Singles = [‘J’, ‘v’, ‘O’, ‘Y’]



Here’s the paragraph generator:

 def paragraph (self, seq=0):

     ”’Return a random paragraph.”’

     # Determine number of sentences…

     df = self.para_max – self.para_min

     ns = int(triangular(self.para_min, self.para_max, df/3))

 

     # Generate paragraph sentences…

     ss = [self.sentence(ix) for ix in range(ns)]

     # Join sentences into a paragraph…

     para = SPC.join(ss)

 

     # If using HTML…

     if self.use_html:

         #TODO: Fix Section use; include bold and center.

         n = Clip(0, int(gauss(5,2)), 8)

         s = ‘\xa7\n’ if (self.use_sect and (n &lt; seq)) else ”

         return ‘&lt;span style=”color: #000000;”&gt;%s&lt;/span&gt;%s%s’ % (para, EOL, s)

 

     # Else just text; return paragraph (and an EOL)…

     return ‘%s%s’ % (para, EOL)



This method sets a pattern others will follow. Essentially, it generates a random number using a triangular distribution centered at 1/3 of the range. The idea is bias the random sizes on the smaller side.

The list generator calls sentence() to create the (random) number of sentences.

For purposes of posting on my blog, each paragraph is enclosed in span tags to set the text color to black.

Note that I intended to also have it include the section breaks, but my first pass at it didn’t work, and I left it turned off due to lack of time.

Here’s the sentence generator:

 def sentence (self, seq=0):

     ”’Return a random sentence.”’

     # Determine sentence length…

     df = self.sent_max – self.sent_min

     nw = int(triangular(self.sent_min, self.sent_max, df/3))

 

     # Generate sentence of words…

     ws = [self.word(ix) for ix in range(nw)]

 

     # Occasionally, insert a comma…

     if (4 &lt;= len(ws)) and (random() &lt; 0.2):

         ix = Clip(0, int(gauss(len(ws), 2)), len(ws)–2)

         ws[ix] = ws[ix]+‘,’

 

     # Occasionally, use a question or exclamation mark…

     if random() &lt; self.q_mark_f:

         e = ‘?’

     elif random() &lt; self.exclam_f:

         e = ‘!’

     else:

         e = DOT

 

     # Join words into a sentence…

     s = SPC.join(ws)

 

     # Occasionally, parenthesize a sentence…

     if seq and (random() &lt; self.parens_f):

         return ‘(%s%s)’ % (s, e)

 

     # Return sentence…

     return ‘%s%s’ % (s, e)



It’s pretty much the same sort of thing as the paragraph generator, but has some added complexity to insert occasional commas, and to sometimes use a question or exclamation mark (rather than a period).

It also occasionally wraps a sentence in parenthesis.

The list generator here calls word() to create the sentence.

Here’s the word generator:

 def word (self, seq=0):

     ”’Return a random word. (Capitalize if first in sequence.)”’

 

     # Occasionally, return a single-character word…

     if random() &lt; self.single_f:

         return choice(Singles)

 

     # Occasionally, return a number…

     if random() &lt; self.number_f:

         return self.number()

 

     # Determine word length…

     df = self.word_max – self.word_min

     nc = int(triangular(self.word_min, self.word_max, df*0.42))

 

     # Generate word from characters…

     cs = [self.alpha(seq+ix) for ix in range(nc)]

 

     # Join characters and return word…

     return NUL.join(cs)



The word generator sometimes returns a “single,” a word with just one character. (The default settings set a minimum word-size of two.) The idea is to simulate “I” in English. Note that the generator has multiple singles, whereas English has just the one.

The generator can also sometimes return a random number.

Otherwise, the list generator calls alpha() to create the word.

Here are a couple last generators:

 def alpha (self, seq=0):

     ”’Return a random character. (Capitalize if first in sequence.)”’

     # Choose a random character from the set…

     a = choice(self.alphas)

 

     # Return it (capitalize if seq=0…

     return a if seq else a.upper()

 

 

 def number (self, seq=0):

     ”’Return a random number.”’

     # Determine number’s length…

     nd = randint(self.nmbr_min, self.nmbr_max)

 

     # Generate number from random digits…

     ds = [choice(Numbers) for _ in range(nd)]

 

     # Join digits and return number…

     return NUL.join(ds)



Note that the alpha() generator just returns a random letter, while the number() generator returns a multi-digit number.

A key point involves the alpha() generator and its choice() of self.alphas.

That instance member is set in the constructor to be either a simple list of the alphabet, or a generated list about 30000 characters long that contains multiple instances of each letter in amounts that reflect English usage.

This is generated from a frequency table where each letter frequency is multiplied by 30000 to generate a list of that letter.

 lambda ix,f: NUL.join([chr(LCX+ix)]*int(f*30000))

 

These are joined together into the alpha list, so a random choice() from the list reflects English letter use.

As an aside, I generated the frequency table by scanning the works of Shakespeare! The resulting frequency table looks like this:

Added together, the frequencies (probabilities) add up to one.

Python has a very nice random library that goes far beyond the random() function usually found in a math library. In particular, Python offers different probability distributions — gaussian and triangular, for instance.

I made a chart so I could see for myself:

Note how the random() function returns a flat distribution. The triangular distribution is centered on 0.5 and 0.2, respectively. The gaussian distribution is centered on 0.5 with standard deviations of 0.10 and 0.05.

The code for generating the data points looks like this:

 samples = 10**7

 bins = 3000

 

 ys0 = [0]*bins

 ys1 = [0]*bins

 ys2 = [0]*bins

 ys3 = [0]*bins

 ys4 = [0]*bins

 

 for _ in range(samples):

     n0 = int(random() * bins)

     n1 = int(triangular(0.0, 1.0, 0.5) * bins)

     n2 = int(triangular(0.0, 1.0, 0.2) * bins)

     n3 = int(gauss(0.5, 0.100) * bins)

     n4 = int(gauss(0.5, 0.050) * bins)

 

     ys0[n0] += 1

     ys1[n1] += 1

     ys2[n2] += 1

     ys3[Clip(0,n3,bins–1)] += 1

     ys4[Clip(0,n4,bins–1)] += 1



This generates histograms of frequency distribution. The gaussian data points can fall outside the range (of 0.0–1.0), so clipping is required to insure a legal index for the histogram bins.

All in all, I’m pretty happy with how the blog post turned out.

The only manual changes I made were to insert some section breaks and to italisize some bits to make it more real. Applying italics in the generator is a future improvement.

(Although it’s unlikely I’ll return to this. Not sure why I would.)

Anyway, Happy New Year!!

5 thoughts on “So Random!”

mwlange said:

January 1, 2019 at 9:22 pm

Is there a way to hide actual words within all the random text?

- Wyrd Smythe said:
  
  January 1, 2019 at 10:31 pm
  
  Easily! That’s what cryptography does, in a sense, although cryptographic functions create the random “text.”
  
  In this case, there was just random text, but one could then impose a text stream in any of a variety of ways. Every second letter of every third word, or whatever. In general the process is called steganography, and its applications are fascinating!
  
Pingback: Python Generators, part 3 | The Hard-Core Coder
Pingback: The Playfair Cipher | The Hard-Core Coder
Pingback: Friday Notes (Nov 21, 2025) | Logos con carne