Zero Dependency Search
This site has a pattern.
Avoid dependency -> pay computation cost during build phase -> run phase is snappy and free of bloat.
The Search feature I added is no different. Most blogs will use something like Algolia, a paid service, to give users the ability to search their articles. I decided I wanted to implement search myself.
The Implementation
To implement search ourselves, we will use a few different methods:
- Inverted Indexes for the core search logic
- Levhenstein distance algorithm for fuzzy matching
- Stemming algorithm to match root words
Our logic will exist in Go and be compiled to Web Assembly. The performance gains when computing Levenshtein distance will be noticeable in a WASM vs JS implementation.
The high level steps of the implementation look like this:
- During the build phase, we generate an inverted index of our blog posts
- Modify the build phase to compile our search algorithm to WASM
- HTML/JS fetches the compiled WASM and the search index and uses them for Search
Inverted Index
What is an inverted index? In simple terms, it's a data structure that allows us to associate an individual word with a document. In this case, we use it to map all of the words that appear in each blog post, so the blog posts are the documents, and the individual words are the content of the document. This allows us to take a user input search term (an individual word) and map it to a document which contains that word.
Levenshtein Distance
What is the Levenshtein distance? It's an algorithm used to calculate the "distance" between words. Essentially, it is a way to match typos or misspelled words to the actual words in our document. This makes the search a little bit smarter than looking for exact word matches. Users can put in typos or even random spaces and still be given a match for a word.
Stemming Algorithm
The stemming algorithm we use is extremely naive. It simply converts common suffixes and prefixes of words to their "root" word. This allows us to match the word "run" to "running" and vice versa.
The Final Result
- During the build phase, my Go code scans all article content and builds an inverted index.
- Still during the build phase, my Go code compiles my Search logic from Go to a WASM file.
- When you fetch the page, your browser fetches the WASM file + the index and executes it.
The result is zero-dependency, near-instant search, without a network call and without costing me any money.