The Hidden Complexity of Casual Word Games

Leon Atkinson-Derman
5 min readFeb 19, 2021

--

Over the years I have enjoyed solving crossword puzzles, but I did not fully appreciate the computational complexity of generating them.

We at Winkpass recently created a causal word game where the player forms words that fit into a grid from a set of characters, like the following:

A screenshot of the word puzzle game level 39 with the character set A R E G
Word Landscape Puzzles casual word game

Since we wanted to introduce this game to our existing gaming community and since that community speaks many languages, we decided to build a system that could generate levels in different languages (English, Spanish, Portuguese, etc.) For each level we have a list of characters and the corresponding words that those characters can form. To flesh this all out, I needed to be able to generate a list of all valid words in a given language of length 2 to the number of characters in the level. I started thinking about this from the brute-force perspective where the primary problem is:

-find all permutations of each N-letter set from 2 to N and check each against a given language dictionary to see if it’s a word

For the example charset = {f, o, t}, we create 3P2 = 3! / (3–2)!, and 3P3 = 3! / (3–3)!, or more generally: nPk = n! / (n-k)!

So we have a total of 3P2 + 3P3 = 12 permutations to check against a language dictionary.

For the 26-character English language, we have the following number of permutations to check for each word-length:

For 3 chars: 3276 * 12 = 39,312
For 4 chars: 23751 * 60–1,425,060
For 5 chars: 142,506 * 320 = 45,601,920
For 6 chars: 736,281 * 1980 = 1.457836e9

Okay, that’s clearly highly inefficient. Since we already have a dictionary for a given language, it’s much easier to sort the dictionary into sets of all N-length words, pick a word to start with and use its characters as our “character set” for the level. Given that character set, we’ve constrained our search and need only find all other words that can be formed by those characters. The following python function, possible_words(), returns a list of all possible words given a language dictionary and character set.

def charCount(word): 
dict = {}
for i in word:
dict[i] = dict.get(i, 0) + 1
return dict

def possible_words(lwords, charSet):
mywords = []
for word in lwords:
flag = 1
chars = charCount(word)
for key in chars:
if key not in charSet:
flag = 0
else:
if charSet.count(key) != chars[key]:
flag = 0
if flag == 1:
mywords.append(word)
return mywords

python words_from_chars.py -s from

python words_from_chars.py -s from

using dictionary: large.txt

generating words…

— — — — — — — — — -

checking: from

[‘or’, ‘mor’, ‘of’, ‘rom’, ‘for’, ‘fro’, ‘form’, ‘om’, ‘from’, ‘mo’]

Hmmm. Looks like some words we want and some garbage…. (seriously, “mo”?). This illustrates the fact that most computer-readable dictionaries are full of garbage and highlights the adage that: garbage-in == garbage-out

But, hang on a mo! I turned to the python nltk (Natural Language Toolkit), went through the arduous task of filtering each language’s dictionary by stemming words, stripping html / punctuation / numbers and finally using parts of speech tagging and named entity recognition…

Removing html tags from the text like “<head><body>” using regex::

cleaned_text = re.sub(‘<[^<]+?>’,’’, text)

Removing numbers:

def strip_numbers(s):

return ‘’.join(c for c in s if not c.isdigit())

Removing punctuation:
from string import punctuation
def strip_punctuation(s):
return ‘’.join(c for c in s if c not in punctuation)

POS tagging:

import spacy

sp = spacy.load(‘en_core_web_sm’)

word_list = possible_words(lang_words, seed_word)

for w in word_list:

sp_w = sp(w)

for sp_word in sp_w:

print(f’{sp_word.text:{20}} {spacy.explain(sp_word.tag_)}’)

I will leave the details of filtering the various dictionaries, but you get the idea! Oh, and don’t forget filtering out bad words, too (from better_profanity import profanity).

The next problem to solve is, given a list of possible words for a given character set, how can they be arranged in a crossword grid puzzle layout. There are lots of possible ways to do this, but I ended up using a Monte Carlo (random sampling) based method, similar to one discussed on StackOverflow:

  1. Create a grid that is length(charset)+3
  2. Shuffle the word list, then sort it from longest to shortest
  3. Place first longest word in one of four corners either vertically or horizontally
  4. Move to next word, loop over each letter in word and each cell in the grid looking for letter to letter matches
  5. When a match is found, add that position to a suggested coordinate list for that word
  6. Loop over the suggested coordinate list and score the word placement based on how many other words it crosses.
  7. Return to step 4 until the list is exhausted
  8. We now have a crossword puzzle, but it may not be a quality one so keep it and return to step 2. If the next generated crossword has more words placed, then it replaces the kept board.

Keep doing this for N seconds and we will generally end up with a decent puzzle. Bigger grids will run exponentially slower, but since we’re pre-calculating the boards it is not an issue.

After generating thousands of boards it quickly became clear that we should strip out all 2-letter words, as they tend to dominate in terms of placement and the levels can feel repetitive. It was also important to ensure that a minimum number of words from the wordlist were used in a given puzzle and that we needed to gradually favor puzzles with longer words and higher word-placement density in order to gradually increase the difficulty and keep the player challenged but not overwhelmed!

We repeated the above to generate over 8000 levels in English, Spanish and Portuguese. It was important to properly handle unicode characters so that accents and other special characters in non-english languages were processed properly. Finally, we programmatically packaged up the levels into json and built a parser in C# so that our game could read in the level data and render each level.

The level generator was just one aspect of the game. We created the UI flow, level unlocking, scoring, world leaderboards, monetization (advertising and IAP) all in a cross-platform code base (supporting iOS and Android). From there, we created various particle systems and wove those into hundreds of natural backgrounds to create a soothing and hypnotic experience for our players.

If you would like to check it out you can find it on the app store at: https://apps.apple.com/us/app/word-landscape-puzzles/id1506497145

and other apps from Winkpass here: https://apps.apple.com/us/developer/winkpass-creations-inc/id288636661

Enjoy!

--

--