Sep 21, 2017

Generating text according to some statistical model can be a mildly amusing experiment, and it also gives some insights into the model itself. The basic goal in compression is to remove redundancy, so the output of a good model/coder is typically rather random. We want to reverse the process, sending random data to a specific model to generate text. As Bell, Cleary, and Witten [8] remark, the output of this reverse process is a rough indication of how well the model captured information about the original source. Single-letter frequencies from a 133,000-character sample of English appear in Table 4.1. Text generated according to these probabilities is not likely to be mistaken for lucid prose, as the model knows very little about the structure of English. Higher-order models may do better, where the necessary statistics are gathered from a sample of “typical” text.

As an example, models were built from Kate Chopin’s The Awakening. 1 An order-1 model produced the following text when given random input: teres runs l ie t there Mentored t fff bandit’s alder ke Shintiyan ”Alesunghed thaf y, He,” ongthagn buid co. fouterokiste singr. fod, Moving to higher-order models will capture more of the structure of the source. An order-3 model, given the same random data, produced: assione mult-walking, hous the bodes, to site scoverselestillier from the for might. The eart bruthould Celeter, ange bourse, of him. They was made height opened the of her tunear bathe mid notion habited. Mrs. She fun andled sumed a vel even stremoiself the was the looke hang! Choose your favorite software tool and write a program that builds an order (k > 0) model from a given source, and then uses that model to generate characters from random input.2

