Toggle theme

NearlyFreeSpeech.NET tips

I've been paying NearlyFreeSpeech.NET to host my website since 2005. I like their philosophy, honesty, and the way that their service has just been quietly humming along over the years without constantly forcing me to rewrite all my code.

I've written down some tricks that I've figured out for generating stats and automating the deployment of a static website in the hope that it might help other people.

Using GoAccess to generate statistics

NearlyFreeSpeech provides (members-only) instructions for using AWStats to process a site's access_log files to generate statistics. AWStats was initially released in 2000, and it shows. In addition to its turn-of-the-millennium UI, I had trouble figuring out how to convince it to show me stats preceding the current calendar year.

After noticing that NearlyFreeSpeech's 2022Q2 realm already includes the comparatively-modern GoAccess application (first released in 2010!), I decided to take a shot at setting it up.

Log File Options in site page
Speaking of turn-of-the-millennium UIs…

Before doing anything else, you'll want to make sure that your site is configured to rotate its log files every week so that they're written as smaller, easier-to-process chunks. In the Log File Options section of your site page, set Automatic Rotation to Weekly and Archival Compression to bzip2. If you get a lot more traffic than me, you might need to rotate your logs daily to avoid hitting CPU limits while processing the files.

GoAccess text interface showing visitor and request counts
I don't get much traffic, but this was also just after a log rotation.

In its simplest form, GoAccess provides an ncurses-based UI that you can explore.

After SSH-ing into your site, run goaccess /home/logs/access_log --log-format=COMBINED.

By default, you should see several vertical panels of information derived from the most recent log file. Tab moves the focus between panels, and the arrow keys scroll up and down. You can press the q key to exit when you get bored.

I like text-based interfaces, but my goal was to generate a web page that would show the last year's worth of statistics. To do that, I wrote a script called goaccess.sh and stuck it in my /home/protected directory:

#!/bin/sh -e

logdir=/home/logs
out=/home/public/goaccess/index.html
dbdir=/home/tmp/goaccess
days=365

# Extra arguments to pass to goaccess.
extra=
extra="${extra} --ignore-crawlers"
extra="${extra} --ignore-panel=KEYPHRASES"
extra="${extra} --ignore-panel=STATUS_CODES"

# GoAccess doesn't treat .webp files as static by default, and if we pass
# --static-file it no longer adds any of the default extensions:
# https://github.com/allinurl/goaccess/blob/b5611a3318b7cb800c9e0d9f7c360af6ae45142b/src/settings.c#L212
for ext in asc css csv gif gz ico jpg jpeg js json kmz mp3 pl png rb ttf txt \
    webp webm xml zip; do
  extra="${extra} --static-file=.${ext}"
done

# Args common across all goaccess invocations.
common="--db-path=${dbdir} --restore --persist --keep-last=${days}"
common="${common} --log-format=COMBINED -o ${out}"
common="${common} --geoip-database=/usr/local/share/GeoIP/GeoLite2-City.mmdb"
common="${common} ${extra}"

mkdir -p "$(dirname "$out")"

if [ "$1" = --init ]; then
  rm -rf "$dbdir"
  mkdir -p "$dbdir"
  start=$(date "-v-${days}d" '+%Y%m%d')
  for p in "${logdir}"/access_log.*.bz2; do
    if [ "$p" \< "${logdir}/access_log.${start}.bz2" ]; then continue; fi
    echo "Processing ${p}..."
    zcat "$p" | goaccess - $common
  done
  echo "Processing latest files..."
  goaccess "${logdir}/access_log" "${logdir}/access_log.old" $common
elif [ "$1" = --update ]; then
  if ! [ -e "$dbdir" ]; then
    echo "Run with --init first to create ${dbdir}" >&2
    exit 1
  fi
  # GoAccess prints a "Cleaning up resources..." message:
  # https://github.com/allinurl/goaccess/issues/2283
  # https://unix.stackexchange.com/a/330662
  goaccess "${logdir}/access_log" "${logdir}/access_log.old" \
    $common --no-progress 2>&1 | \
    { grep -v 'Cleaning up resources' || true; }
else
  echo "Usage: $0 [ --init | --update ]" >&2
  exit 2
fi

After saving the script, you'll need to mark it as executable:

$ chmod +x /home/protected/goaccess.sh

The basic idea is to manually run the script once with the --init flag to process the old rotated /home/logs/access_log.YYYYMMDD.bz2 files. I also updated permissions so the relevant files and directories would be writable by the web user (optional, but needed here since I chose to run the scheduled task as the web user as described below).

$ /home/protected/goaccess.sh --init
$ chown -R :web /home/tmp/goaccess
$ chown :web /home/public/goaccess /home/public/goaccess/index.html

Then, run the script periodically (at least once per week) with the --update flag to pick up new entries in /home/logs/access_log and /home/logs/access_log.old. To automate this, go to your site's Scheduled Tasks page and add a new task. I used the following settings in the Add Scheduled Task page:

Name Value
Tag goaccess
URL or Shell Command /home/protected/goaccess.sh --update
User web
Where Run in web environment
Hour 5
Day of Week Every
Date *

Scheduled task times appear to be interpreted in GMT, in case you were wondering, and there's some scattered information about the differences between the "ssh" and "web" environments.

Each time that GoAccess runs, it writes an nice-looking web page displaying stats to /home/public/goaccess/index.html and saves its progress under /home/tmp/goaccess.

GoAccess web interface showing browser and access time statistics
GoAccess's default theme is quite purple.

If you want your stats page to be password-protected, you can add a /home/public/goaccess/.htaccess file that references a /home/protected/.htpasswd file as described in the FAQ.

Using Google Cloud Build to build and deploy a static site

I wrote a program named intransigence that I use to generate this website from Markdown files. I used to manually run it on my development machine and then use rsync to copy the website to NearlyFreeSpeech over SSH, but I decided that it'd be nicer if I could build and deploy the site using Google Cloud Build so I wouldn't need to worry about accidentally using a stale checkout of the code.

I created a staging.yaml Cloud Build config file that builds the site and then copies it to Google Cloud Storage:

# staging.yaml
steps:
  - name: gcr.io/cloud-builders/gcloud
    entrypoint: sh
    args:
      - '-e'
      - '-c'
      - |
        # [Omitting commands to install dependencies and build the site to the 'out' subdirectory...]
        gsutil -m -h 'Cache-Control:no-store' rsync -d -P -r out gs://example-website
        gsutil -m -h 'Cache-Control:no-store' rsync -d -P -r -x '.*\.htaccess$|.*\.gz$' out gs://staging.example.org
        gsutil -h 'Cache-Control:no-store' cp build/staging-robots.txt gs://staging.example.org/robots.txt        

The site is actually copied to two Cloud Storage buckets:

  • gs://example-website contains all the files and is used when deploying the site to NearlyFreeSpeech.
  • gs://staging.example.org omits the .htaccess files that configure the NearlyFreeSpeech Apache server and the .gz files that NearlyFreeSpeech uses to serve gzip-compressed pages. This bucket is served as a static website so I can manually check it before deploying the site. I also copy a robots.txt file that prevents the staging site from being indexed by search engines.

Copying the files to two buckets feels gross, but I didn't want to serve the .htaccess files publicly, and I also didn't want to deal with the fragile mess that is the Cloud Storage ACL system.

The other piece is a deploy.yaml Cloud Build config file that uses rsync to copy the website from gs://example-website to NearlyFreeSpeech. I used Secret Manager to generate and store an SSH key as described in Google's Accessing GitHub from a build via SSH keys document.

# deploy.yaml
steps:
  - name: gcr.io/cloud-builders/gcloud
    secretEnv: ['SSH_KEY']
    entrypoint: sh
    args:
      - '-e'
      - '-c'
      - |
        echo "$$SSH_KEY" >>/root/.ssh/id_rsa
        chmod 400 /root/.ssh/id_rsa
        cp build/deploy-known_hosts /root/.ssh/known_hosts        
    volumes:
      - name: ssh
        path: /root/.ssh

  - name: gcr.io/cloud-builders/gcloud
    entrypoint: sh
    args:
      - '-e'
      - '-c'
      - |
        apt-get update
        apt-get install -y rsync ssh
        mkdir out
        gsutil -m rsync -P -r gs://example-website out
        rsync -avz --delete \
          --exclude /.well-known/acme-challenge \
          --exclude /goaccess \
          out/. user_example@ssh.phx.nearlyfreespeech.net:.        
    volumes:
      - name: ssh
        path: /root/.ssh

availableSecrets:
  secretManager:
    - versionName: projects/example-project/secrets/ssh-key/versions/latest
      env: SSH_KEY

Finally, I set up Cloud Build triggers for running both staging.yaml and deploy.yaml. I got tired of using the slow-as-heck Google Cloud Console to run them, so I wrote a short run_trigger.sh shell script so I can run triggers from the command line and tail their output:

#!/bin/sh -e

PROJECT=example-project

if [ $# -ne 1 ] || [ "$1" = '-h' ] || [ "$1" = '--help' ]; then
  echo "Usage: $0 TRIGGER_NAME"
  exit 2
fi

trigger=$1

# As of 20220718, 'triggers' in only available in alpha and beta.
out=$(gcloud --project="$PROJECT" beta builds triggers run "$trigger")
id=$(echo "$out" | sed -nre 's/^\s*id:\s*(.*)/\1/p' | head -n 1)

if [ -z "$id" ]; then
  echo "Didn't find build ID in output:"
  echo
  echo "$out"
  exit 1
fi

# As of 20220718, the non-beta version of 'log --stream' prints an annoying
# "gcloud builds log --stream only displays logs from Cloud Storage" warning.
exec gcloud --project="$PROJECT" beta builds log --stream "$id"

So, with all of this in place, I can:

  • Run ./run_trigger.sh staging to build and stage the site.
  • Visit staging.example.org in a web browser to check the staged version.
  • Run ./run_trigger.sh deploy to copy the staged version to NearlyFreeSpeech.