Search Engine Optimization in the Build
How does Google find websites to show users when they search a term?
The answer is that Google has web crawlers that scour the web. Below is how I gave Google's web crawlers a tour guide.
robots.txt and sitemap.xml
First, we need to understand the two main components that web crawlers use when accessing our site. One is the robots.txt data, and the other is the sitemap.xml file.
robots.txt
robots.txt is sort of like the "bouncer" for the site. It tells web crawlers which paths are allowed for them to crawl and which ones aren't. Here is what my robots.txt looks like:
1const robotsTxt = `User-agent: *
2Allow: /
3Disallow: /api/
4Sitemap: https://thorn.sh/sitemap.xml`
Note that I'm serving it directly from Go instead of an actual robots.txt file.
The robots.txt data tells the web crawler 3 important pieces of information:
- Which URLs it is allowed to crawl.
- Which URLs it should avoid.
- The path of our sitemap.xml.
Now is a perfect time to discuss the sitemap.
sitemap.xml
sitemap.xml exists to tell the web crawler about relevant content on the site. In my example, my sitemap.xml looks like this:
1<?xml version="1.0" encoding="UTF-8"?>
2<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
3 <url>
4 <loc>https://thorn.sh/about/</loc>
5 <lastmod>2025-12-07T19:32:14-06:00</lastmod>
6 </url>
7 <url>
8 <loc>https://thorn.sh/adding-metrics-sse/</loc>
9 <lastmod>2025-12-07T19:29:21-06:00</lastmod>
10 </url>
11 <url>
12 <loc>https://thorn.sh/architecture/</loc>
13 <lastmod>2025-12-07T19:32:14-06:00</lastmod>
14 </url>
15 ...
16 ...
Together, they function to augment Google's ability to discover content on our site, which optimizes search engine results.
How sitemap.xml is Implemented
We build the sitemap.xml file during the build phase. Here's what the process looks like:
- Our build process generates all of our public files in
/public. - Then our build process walks the entire
/publicdirectory, looking for any files with.htmlas a suffix. - Using that information, it forms a valid URL.
Here is what the code looks like for generating the site map:
1package builder
2
3import (
4 "encoding/xml"
5 "fmt"
6 "log"
7 "os"
8 "path/filepath"
9 "strings"
10 "time"
11)
12
13const (
14 baseURL = "https://thorn.sh"
15 publicDir = "./public"
16)
17
18type URLSet struct {
19 XMLName xml.Name `xml:"http://www.sitemaps.org/schemas/sitemap/0.9 urlset"`
20 URLs []URL `xml:"url"`
21}
22
23type URL struct {
24 Loc string `xml:"loc"`
25 LastMod string `xml:"lastmod"`
26}
27
28func GenerateSiteMap() {
29 log.Println("Generating site map...")
30
31 urls, err := generateURLs()
32 if err != nil {
33 panic(err)
34 }
35
36 f, err := os.Create(filepath.Join(publicDir, "sitemap.xml"))
37 if err != nil {
38 panic(err)
39 }
40 defer f.Close()
41
42 f.Write([]byte(xml.Header))
43 enc := xml.NewEncoder(f)
44 enc.Indent("", " ")
45 if err := enc.Encode(URLSet{URLs: urls}); err != nil {
46 panic(err)
47 }
48
49 log.Printf("Sitemap generated with %d URLs\n", len(urls))
50}
51
52func generateURLs() ([]URL, error) {
53 var urls []URL
54
55 err := filepath.Walk(publicDir, func(path string, info os.FileInfo, err error) error {
56 if err != nil {
57 return err
58 }
59
60 // Skip directories and non-html files
61 if info.IsDir() || filepath.Ext(path) != ".html" {
62 return nil
63 }
64
65 // Convert File Path to URL
66 // Rel path: "public/blog/post.html" -> "blog/post.html"
67 relPath, _ := filepath.Rel(publicDir, path)
68
69 // Fix Windows slashes to Web slashes
70 slug := filepath.ToSlash(relPath)
71
72 // Handle "Clean URLs" (remove index.html and .html extensions)
73 if filepath.Base(slug) == "index.html" {
74 slug = filepath.Dir(slug) // "blog/index.html" -> "blog"
75 if slug == "." {
76 slug = "" // root
77 }
78 } else {
79 slug = strings.TrimSuffix(slug, ".html") // "about.html" -> "about"
80 }
81
82 finalURL := ""
83 if slug == "" {
84 finalURL = strings.TrimSuffix(fmt.Sprintf("%s/%s", baseURL, slug), "/")
85 } else {
86 finalURL = fmt.Sprintf("%s/%s/", baseURL, slug)
87 }
88
89 urls = append(urls, URL{
90 Loc: finalURL,
91 // Note: This relies on self-hosted runners + incremental builds.
92 // On ephemeral runners (like standard GitHub Actions), file ModTime resets
93 // on every clone, causing every page to look "new" and confusing Google.
94 LastMod: info.ModTime().Format(time.RFC3339),
95 })
96
97 return nil
98 })
99
100 return urls, err
101}
Final Optimization
Previously, I talked about custom compression middleware that is implemented for this site. One thing I missed in that implementation is this:
1return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
2 w.Header().Add("Vary", "Accept-Encoding")
It acts as a signal to downstream caches (like CDNs) to store different versions of the file based on the user's browser capabilities. It ensures a user who can't handle compression doesn't accidentally get served a cached .br file meant for a modern browser.
Conclusion
Now, during the build phase, we generate a sitemap.xml file. Our robots.txt is a configuration that lives directly in our server code instead of the file system. Web crawlers will have better information about our website and thus our ranking in Google's search algorithm will (theoretically) increase.