What is a trie, anyway?

The only missing feature from your latest product which will make the world a better place is autocompletion. You are therefore tasked with implementing the following interface:

void addWord(String word);

List<String> complete(String prefix);

This seems trivial enough. You only need a set of words and to pattern match all of them to see which ones match the prefix:

def addWord(dictionary, word):
  dictionary.add(word)

def complete(dictionary, prefix):
  return [word for word in dictionary if word.startswith(prefix)]

A piece of cake, right?

This solution is however quite inefficient. Look at this possible dictionary content:

{"lexic", "lexicon", "lexical", "lexicalization"}

As you can see, the prefix “lexic” is stored multiple time in the dictionary. This means that as we add more words, it will grow way too quickly. How way? Way way.

Furthermore, we need to go through each and every words in the diction

ary to see if they start with the prefix. Once again, this is not efficient.

This is where tries become useful.

To put it simply, a trie is a prefix tree, which means, it can be searched by prefixes. It is a recursive data structure where every edge between two vertices are tagged with a character. A word can then be defined as a succession of edges ending at a terminal vertex. Here is what a trie looks like:

Trie

As you can see, the trie takes advantage of the fact that a lot of words will have a common prefix to save space. Let’s implement one!

The Node class is pretty simple. It consists only of a boolean indicating whether a word ends with this node, and a map of the following nodes where the key is the character leading to the child.

  class Node:
    def __init__(self):
        self.end = False
        self.children = {} # {char -> node}

Then, the insertion is simply traversing the tree and appending new nodes if necessary. The last node added must then be tagged as a terminal node since the word ends at this node. This will let us retrieve the words more easily in the complete function.

def insert(root, string):
    node = root
    for i in range(len(string)):
        if string[i] in node.children:
            node = node.children[string[i]]
        else:
            # Append new nodes if necessary
            node.children[string[i]] = Node()
            node = node.children[string[i]]
    node.end = True

The complete function requires a little bit more work.

We first need to find the node representing the last character of the prefix. A simple tree traversing will do the work:

def find(root, word):
    node = root
    for c in word:
        print(node.children)
        if c not in node.children:
            return None
        else:
            node = node.children[c]
    return node

Once we have the last node, we can return all the following nodes if they are the end of a word. To do so, we can keep track of the prefix and return the right nodes:

def get_all_words(node, prefix):
    words = []
    if node.end:
        words.append(prefix)
    for char in node.children:
        words += get_all_words(node.children[char], prefix + char)
    return words

The combination of these two functions gives us the complete function:

def complete(root, prefix):
    node = find(root, prefix)
    return get_all_words(node, prefix)

We now have a much more efficient way to store our dictionary, which will make your project a success, and at the same time, will make the world a better place.

Related Posts

What is the Maker-Maker fee structure? 15 Mar 2020

Postgresql index usage 28 Feb 2019

Mock within a mock 19 Sep 2015