A childhood friend who grew up to be an artist has created many pieces based on letters and words – not just their meanings, but their visual form and their relationships to each other. Here is one of them:

The word “yes” changes into the word “no” in five steps, where each step involves either changing, adding, or deleting exactly one letter, with the result of each step being a valid English word. Upon seeing this I – being a computer programmer – immediately wondered if I could come up with an algorithm to find such sequences in a more general way. That is, given a starting word and an ending word, could I design an algorithm to transform one word into the other by changing, adding, or deleting exactly one letter at a time, with the result being a valid English word at each step?

After some thought, I did come up with an approach and soon had a working implementation (fair warning: the link is just the source code for a program that does the job; you have to download and compile it to use it. I’m working on an online version.) Here is an example of its output:

- table
- able
- ale
- all
- hall
- hail
- hair
- chair

Though I had a working program I wanted to see how other programmers would approach the problem, so I posted it as a challenge on Stackoverflow, a site where programmers exchange questions and answers. As it turns out, this question is a quite common programming challenge (although the rules about adding and deleting a character are most often left out – so that “cat hat hot” would be a valid sequence, but “cat chat that than tan” would not be. In this form the sequences are called word ladders.) Variants have appeared as an interview question. Within minutes of posting, I received answers describing essentially the same approach I had used: “Represent the word relationships as a graph, and use a shortest path algorithm. *Duh*.”

**Word Ladders as Applied Graph Theory**

For those readers not steeped in discrete math and computer science, some definitions are in order. A *graph* is a collection of *vertices* – in this case, the words – and *edges* – in this case, the transformations relating the words to each other. Such a graph formed from an English dictionary can be visualized as quite a large web spreading through space; here is a small part of it –

Once such a graph is defined, the problem of finding the desired sequence of words becomes the problem of finding the shortest possible path between two points in the graph. This is not an *easy* problem, but it is a very *standard* one, so it becomes possible to make use of its very well-known solution. As a programmer, you need not re-implement the solution or even understand it particularly well; you can just re-use an existing implementation. I used the one provided by the excellent igraph library. You simply feed it the graph, the start point, and the end point, and it returns the path. So the only real work is to build the word graph from all the words in the language.

**Building the Word Graph**

To build a graph out of all words in an English dictionary, you must provide a list of all pairs of words (cat <-> hat ; hat <-> hot; hot <-> shot) that should be connected. The graph library doesn’t care what order you provide them in, as long as you provide each pair once. Clearly the list will be large for the English language – there are over 200,000 words in the spelling dictionary on my computer, and there will be many more connections between these. So how can you generate a list of all possible pairs? How can you avoid finding the same pair repeatedly? How do you know when you’re done? How can you do this efficiently?

A naive algorithm would compare every word in the dictionary, in order, to every other word, in order. But this is slow – the number of comparisons grows as the square of the size of the dictionary. We can do much better by sorting the words into *equivalence classes*. Consider words that differ only in their first letter. We put these into an equivalence class by replacing the first character with a special place-holder. For example, “walk” and “talk” both become “*alk”; they are in the same equivalence class and therefore should be connected. Apply a similar procedure to the second character, the third character, etc… to build up a table like this:

walk | *alk |

walk | w*lk |

walk | wa*k |

walk | wal* |

tall | *all |

tall | t*ll |

tall | ta*l |

tall | tal* |

ear | *ar |

ear | e*r |

ear | ea* |

talk | *alk |

talk | t*lk |

talk | ta*k |

talk | tal* |

OK, so how does this speed things up? It becomes clear when we *sort the table by equivalence class*:

walk | *alk |

talk | *alk |

tall | *all |

ear | *ar |

ear | e*r |

ear | ea* |

talk | t*lk |

tall | t*ll |

talk | ta*k |

tall | ta*l |

talk | tal* |

tall | tal* |

walk | w*lk |

walk | wa*k |

walk | wal* |

The sort operation has grouped together all words that should be connected; “walk -> talk” and “talk -> tall” in this example. The slow step in this procedure is the sorting, which can be achieved by any of a number of standard algorithms in time proportional to n * log n – a big improvement over n * n for large dictionaries.

So what about the rule that connects words by adding or deleting one character? This variant of the problem is harder to find online, but I did find an implementation in a comment on this blog similar to the one I used. The key is to create a table, as before, but for each word add all possible rows formed by deleting one letter:

bring | ring |

bring | bing |

bring | brng |

bring | brig |

bring | brin |

rings | ings |

rings | rngs |

rings | rigs |

rings | rins |

rings | ring |

ear | ar |

ear | er |

ear | ea |

Then sort by the second column, and filter against the sorted dictionary to retain only those rows where the second column is an actual word:

ear | ar |

bring | bing |

bring | brig |

bring | brin |

bring | brng |

ear | ea |

ear | er |

rings | ings |

rings | rigs |

rings | ring |

bring | ring |

rings | rins |

rings | rngs |

So that in this example, the three rules “rings <-> ring”, “rings <-> rigs”, and “bring <-> ring” were found. As before, the sorting is the slow step with O(n log n) time complexity. So how do you combine the connections due to letter substitution with the connections due to deletion? This is called a *graph union* and is supported by any graph library. An interesting effect happens if you include only the connections due to deletion and leave out connections due to substitution. Now the word *must* change length by one letter at each step!

- hands
- hand
- and
- land
- lad
- lead
- led
- fled
- fed
- feed
- fee
- feet

**Conclusion: the nerdy stuff matters**

Every practicing programmer knows that it is possible – perhaps normal is the right word – to write practical, useful code without drawing on the nerdier parts of computer science. You can write a program that works just fine without ever analyzing it for time complexity or proving its correctness. The simplest, most intuitive data structures are often best. But hopefully what the word ladder example illustrates is that often the best way to solve an everyday, practical problem is to *translate it into a special case of a well-studied abstract problem*, and then apply a standard solution to that problem. If solving a word ladder does not seem very practical, applied graph theory shows up everywhere: in revision management, in the provenance of lab samples, in deciding the best layout for a communications network. So keep studying the nerdy stuff – one day, you will find yourself facing a down-to-earth problem that is harder than it looks, and realize that you already have the answer.