Using Hugo and Goaccess to show most read posts of a static website

In this Post

In this post, we'll explore a straightforward and privacy-respecting method to bring dynamic content to Hugo website by integrating a 'most popular' functionality.

Table of Content


Intro

Hugo, a powerful static site generator, has become a go-to choice for developers and bloggers seeking efficient and flexible website creation, especially for sites with a very large number of pages. Whether hosted on a cheap VPS or services like Netlify, Hugo simplifies the process of building and maintaining static sites. However, static sites often lack dynamic features found in traditional content management systems (CMS), such as displaying «most popular» or «most read» pages.

How to add «most popular» functionaly to a static website? Answer: This can be done with Hugo and Goaccess, two very fast web tools. In such way you rely only on your server’s data, the data that you own, and respect for privacy of your website’s visitors.

In this blog post, we’ll explore a straightforward method to bring dynamic content to your Hugo site by integrating a «most popular» functionality. There are online instuctions to do just that, using Google Analytics API, Python scripting, Hugo itself, and GitHub Actions. They did not work for me, so instead we’ll focus on a simplified solution using GoAccess, Hugo, and a few lazy easy Unix one-liners any sys admin knows. You need to have access to log file of your website to be able to use these instructions because we will solely rely on server’s data instead of Google Analytics API.

Try free online Keyword Extractor and Text Analyzer

Try ChatGPT detector

Stage One, the Bash

Here is the bash script which runs the whole thing, namely, it parses log file and starts Hugo. Make sure you have goaccess and hugo installed on your server and that your hugo site is out there. Copy, adjust and save it, for example, as generatehugosite.

#!/bin/bash

# Set the log file path
LOG_FILE="/path/to/site.access.log"

# Set the report file path
REPORT_FILE="/path/to/site.report.csv"

# Set the YAML file path
YAML_FILE="/path/to/hugosite/data/popular.yaml"

# Run the goaccess command and create the report
goaccess --log-format=COMBINED --output="$REPORT_FILE" "$LOG_FILE" && \
  # Filter lines containing '/THIS/' from the report and extract the URLs
  grep '/THIS/' "$REPORT_FILE" | \
  # Use AWK to print URLs with more than 1 hit
  awk -F ',"' '$3 > 1 {print $11}' | \
  # Format the URLs into YAML list items and remove unnecessary characters
  sed -e 's/\/THIS\//  - "/' -e 's/[/]//g' | head -7 > "$YAML_FILE" && \
  # Generate the Hugo site using the updated YAML file
  hugo -s /path/to/hugosite/

Let’s go through each part of the script:

  • goaccess command:

    • This command analyzes the specified log file using the combined log format and generates a report in CSV format at the specified path.
  • grep '/THIS/' "$REPORT_FILE":

    • Filters the lines in the report file to include only those related to the ‘/THIS/’ path. ‘THIS’ is the part of your actual URLs pointing to content section, i.e. https://yourdomain.com/blog/first-post/ , replace /THIS/ with /blog/ . In such way we filter out pages that are not directing to actual content posts. Adjust this part based on your specific URL structure.
  • awk -F ',"' '$3 > 1 {print $11}':

    • Uses AWK to extract URLs where the third column (hits) is greater than 1 from the filtered report. You can adjust this, but we want to make sure that the resulting file is always populated with something, so the value is low.
  • sed -e 's/\/THIS\// - "/' -e 's/[/]//g':

    • Formats the URLs into YAML list items by replacing ‘/THIS/’ with ‘ - «’ and removing any remaining ‘/’ characters. Like before, ‘THIS’ is the part of your actual URLs pointing to content section, i.e. https://yourdomain.com/blog/first-post/ /blog/ is ‘/THIS/’.
  • head -7 > "$YAML_FILE":

    • Takes the first 7 lines (adjust the number as needed) of the formatted URLs and saves them to the YAML file. So, we will have a list of top 7 most read posts. Believe me, this is easier to set this value here rather than limit the loop in Hugo in the template below.
  • hugo -s /path/to/hugosite/:

    • Regenerates the Hugo site using the updated YAML file, incorporating the most popular URLs. The -s flag points to the location of the root dir of your Hugo site. Make sure to adjust the paths and customize the script based on your specific requirements and file structures.

This script works for me, but it is more of an example, you may need to adjust it according to your log format. Copy this script, adjust paths as needed and save it. Creat files by doing touch /path/to/hugosite/data/popular.yaml and touch /path/to/site.report.csv . Make sure /path/to/site.access.log is not empty (make sure you have nginx config for your site to have a separate log file) Do not run the scipt just yet, as the most important thing is yet to come - the wonderful Hugo template to process the data file with most popular posts, namely the popular.yaml file.

Stage Two, the Hugo

Presumably you already have a Hugo website. Add this template to partials or just copy it to your index.html layout template. The idea is to read URLs from /data/popular.yaml data file and search for them across your slugs. Slugs must be included in posts frontmatter. Without slugs in posts frontmatter this template will not work.

<div class="card">
    <div class="card-content">
        {{ range $index, $slug := $.Site.Data.popular }}
            {{ range $.Site.RegularPages }}
                {{ if eq .Params.slug $slug }}
                    <h1 class="title post-title"><a href="{{ .Permalink }}">{{ .Title }}</a></h1>
                {{ end }}
            {{ end }}
        {{ end }}
    </div>
</div>

Now, let’s add comments to explain the different sections of the template:

<div class="card">
    <div class="card-content">
        <!-- Iterate over each slug in the 'popular' YAML data -->
        {{ range $index, $slug := $.Site.Data.popular }}
            <!-- Iterate over all regular pages on the site -->
            {{ range $.Site.RegularPages }}
                <!-- Check if the current page has a matching slug -->
                {{ if eq .Params.slug $slug }}
                    <!-- Display post title with a link to the permalink -->
                    <h1 class="title post-title"><a href="{{ .Permalink }}">{{ .Title }}</a></h1>
                    <!-- Display whatever you want from post frontmatter, such as description, category, tags, date -->
                {{ end }}
            {{ end }}
        {{ end }}
    </div>
</div>

Also Read How to automatically tag posts in Hugo Static Site Generator with Python

This template iterates through each slug in the ‘popular’ YAML data, matches it with the corresponding regular page in Hugo, and then displays the post title with a link to the permalink. Adjust the template based on your site’s structure and styling preferences. Works with Hugo Static Site Generator v0.54.0.

Make sure that the /data/popular.yaml file has the specified format, nothing else and no headings:

- "first-post"
- "another-post"
- "yet-another-post"

The popular.yaml file should contain a list of popular post slugs, each surrounded by double quotes and in the format - "slug". This ensures that the file contains only slugs and follows the specified structure.

Stage Three, the Cron

Add generatehugosite to cron to schedule your bash script to run periodically, you can follow these steps:

  • Open your crontab file for editing. You can do this by running the following command in your terminal:
crontab -e

If you’re prompted to select an editor, choose your preferred one.

  • Add a new line to schedule your script. For example, to run the script every day at midnight, you can use the following line:
0 0 * * * /bin/bash /path/to/your/generatehugosite

Replace /path/to/your/generatehugosite with the actual path to your bash script.

In the example above, 0 0 * * * means «every day at midnight.»

  • Save and exit the editor.

This cron schedule will execute your script at the specified intervals, ensuring that your Hugo site is updated with the most read posts regularly. Adjust the schedule according to your preferences and update frequency requirements.

Remember to replace /path/to/your/generatehugosite with the actual path to your bash script. Make sure file persmissions of site.access.log allow running the script.

Questions or comments? Please, ask me on LinkedIn

Alexander Sotov

Text: Alexandre Sotov
Comments or Questions? Contact me on LinkedIn

𝕏   Facebook   Telegram

Other Posts:

Sentiment Analysis API

Semascope: Tool for Text Mining and Analysis

Track media sentiment with this app

How AI sees Dante's Divine Comedy in 27 words

Keyword Extractor and Text Analyzer - Help

Exploring Sacred Texts with Probabilistic Keyword Extractor

FAQ: Automated keyword detection, content extraction and text visualization

Make ChatGPT Content Undetectable with this App

ChatGPT Detector, a free online tool

The Intricate Tapestry of ChatGPT Texts: Why LLM overuses some words at the expense of others?

How to build word frequency matrix using AWK or Python

How to prepare your texts for creating a word frequency matrix

Intro to Automated Keyword Extraction

How to automatically tag posts in Hugo Static Site Generator with Python

How Textvisualization.app and its semascope 👁️ compare with traditional tag clouds?

Services

What is this website?

Help