A pamphlet for hackable website systems

## The importance of blogging

Blogs (or personal websites) are an essential piece of the the Internet as means of sharing knowledge openly. In particular, blogs really shine at sharing:

A web feed (for instance RSS or Atom) is important to free the visitors from manually checking for updates: let the updates go to them.

## Nothing but Org

The World Wide Web was devised to use HTML, which is rather painful to write directly. I don’t want to go through that, it’s too heavy a burden. Many web writers including me until recently use the Markdown format.

Nonetheless, for a long time I’ve been wanting to write blog posts in the Org format. I believe that Org is a much superior markup format for reasons that are already well laid down by Karl Voit. I can’t help but highlight a few more perks of Org:

• It has excellent math support (see my article on homogeneous coordinates for an example). For an HTML output, several backends are supported including MathJax. It’s smart enough not to include MathJax when there is no math. To top it all, there is no extra or weird syntax: it’s simply raw TeX / LaTeX.
• It supports file hierarchies and updates inter-file links dynamically. It also detects broken links on export.
• It has excellent support for multiple export formats, including LaTeX and PDFs.

## Version control system

A significant downside of printed books and documents is that they can’t be updated. That is to say, unless you acquire a new edition, should there be any. The Internet comes with the advantage that it allows to update content worldwide, in a fraction of an instant.

Updating is important: originals are hardly ever typo-free; speculations might turn out to be wrong; phrasing could prove ambiguous. And most importantly, the readers feedback can significantly help improve the argumentation and need be taken into account.

The general trend around blogging seems to go in the other direction: articles are often published and left as-is, never to be edited.

As such, many blog articles are struck by the inescapable flail of time and technological advancement: they run out of fashion and lose much of their original value.

But there is a motivation behind this immobility: the ability to edit removes the guarantee that readers can access the article in its original form. Content could be lost in the process. External references become meaningless if the content has been removed or changed from the source they refer to.

Thankfully there is a solution to this problem: version control systems. They keep all versions available to the world and make editing fully transparent.

I keep the source of my website at https://gitlab.com/ambrevar/ambrevar.gitlab.io, in a public Git repository.

I cannot stress enough the importance of keeping your projects under version control in a publicly readable repository:

• It allows not only you but also all visitors to keep track of all changes. This gives a guarantee of transparency to your readers.
• It makes it trivial for anyone to clone the repository locally: the website can be read offline in the Org format!

## Publishing requirements

Worg has a list of blogging systems that work with the Org format. Most of them did not cut it for me however because I think a website needs to meet important requirements:

Full control over the URL of the published posts.
This is a golden rule of the web: should I change the publishing system, I want to be able to stick to the same URLs or else all external references would be broken. This is a big no-no and in my opinion it makes most blogging systems unacceptable.
Top-notch Org support.
I believe generators like Jekyll and Nikola only have partial Org support.
Simple publishing pipeline.
I want the generation process to be as simple as possible. This is important for maintenance. Should I someday switch host, I want to be sure that I can set up the same pipeline.
Full control over the publishing system.
I want maximum control over the generation process. I don’t want to be restricted by a non-Turing-complete configuration file or a dumb programming language.
Ease of use.
The process as a whole must be as immediate and friction-less as possible, or else I take the risk of feeling too lazy to publish new posts and update the content.
Hackability.
Last but not least, and this probably supersedes all other requirements: The system must be hackable. Lisp-based systems are prime contenders in that area.

## Org-publish

This narrows down the possibilities to just one, if I’m not mistaken: Emacs with Org-publish.

• The configuration happens in Lisp which gives me maximum control.
• Org-support is obviously optimal.
• The pipeline is as simple as it gets:

emacs --quick --script publish.el --funcall=org-publish-all


Org-publish comes with lots of options, including sitemap generation (here my post list with anti-chronological sorting). It supports code highlighting through the htmlize package.

### Webfeeds

One thing it lacked for me however was the generation of web feeds (RSS or Atom). I looked at the existing possibilities in Emacs Lisp but I could not find anything satisfying. There is ox-rss in Org-contrib, but it only works over a single Org file, which does not suit my needs of one file per blog post. So I went ahead and implemented my own generator.

### History of changes (dates and logs)

Org-publish comes with a timestamp system that proves handy to avoid building unchanged files twice. It’s not so useful though to retrieve the date of last modification because a file may be rebuilt for external reasons (e.g. change in the publishing script).

Since I use the version control system (here Git), it should be most natural to keep track of the creation dates and last modification date of the article.

Org-publish does not provide direct support for Git, but thanks to Lisp this feature can only be a simple hack away:

(defun ambrevar/git-creation-date (file)
"Return the first commit date of FILE.
Format is %Y-%m-%d."
(with-temp-buffer
(call-process "git" nil t nil "log" "--reverse" "--date=short" "--pretty=format:%cd" file)
(goto-char (point-min))
(buffer-substring-no-properties (line-beginning-position) (line-end-position))))

(defun ambrevar/git-last-update-date (file)
"Return the last commit date of FILE.
Format is %Y-%m-%d."
(with-output-to-string
(with-current-buffer standard-output
(call-process "git" nil t nil "log" "-1" "--date=short" "--pretty=format:%cd" file))))


Then only org-html-format-spec is left to hack so that the %d and %C specifiers (used by org-html-postamble-format) rely on Git instead.

See my publishing script for the full implementation.

## Personal domain and HTTPS

I previously stressed out the importance of keeping the URL permanents. Which means that we should not rely on the domain offered by a hosting platform such as GitLab Pages, since changing host implies changing domain, thus invalidating all former post URLs. Acquiring a domain is a necessary step.

This might turn off those looking for the cheapest option, but in fact getting domain name comes close to zero cost if you are not limitating yourself to just a subset of popular options. For a personal blog, the domain name and the top-level domain should not matter much and can be easily adjusted to bring the costs to a minimum.

There are many registrars to choose from. One of the biggest, GoDaddy has a debatable reputation. I’ve opted for https://www.gandi.net/.

With a custom domain, we also need a certificate for HTTPS. This used to come at a price but is now free and straightforward with Let’s Encrypt. Here is a tutorial for GitLab pages. (Note that the commandline tool is called certbot now.)

## Permanent URLs and folder organization pitfalls

Chris Wellons has some interesting insights about the architecture of a blog.

URLs are forever, and as such a key requirement of every website is to ensure all its URLs will remain permanent. Thus the folder organization of the blog has to be thought of beforehand.

Keep the URLs human-readable and easy to remember.
Make them short and meaningful.
Avoid dates in URLs.
This is a very frequent mishappen with blogs. There are usually no good reason to encode the date in the URL of a post, it only makes it harder to remember and more prone to change when moving platform.
Avoid hierarchies.
Hierarchies usually don’t help with the above points, put everything under the same folder instead. Even if some pages belong to different “categories” (for instance “articles” and “projects”), this is only a matter of presentation on the sitemap (or the welcome page). It should not influence the URLs. When the category is left out, it’s one thing less to remember whether the page foo was an article or a project.
Place index.html files in dedicated folders.
If the page extension does not matter (e.g. between .html and .htm), you can easily save the visitors from any further guessing by storing your foo article in foo/index.html. Thus browsing https://domain.tld/foo/ will automatically retrieve the right document. It’s easier and shorter than https://domain.tld/foo.htm.
Don’t rename files.
Think twice before naming a file: while you can later tweak some virtual mapping between the URL and a renamed file, it’s better to stick to the initial names to keep the file-URL association as straightforward as possible.

## Future development

There are more features I’d like to add to my homepage. Ideally, I’d rather stick to free software, limit external resources (i.e. avoid depending on external Javascript), keep code and data under my control.

Implementation suggestions and other ideas are more than welcome!

Comment system.
Disqus is non-free, Juvia looks good but it’s unmaintained, hashover seems to be one of the nicest options. I’m also particularly interested in the Webmention W3C recommendation. For now I simplly rely on the GitLab issue page.
Statistics and analytics.
Matomo looks interesting. I’d need to set it up on a server.
Tags and ubiquitous fuzzy-search.
I’d like to add a universal search bar with tag support so that the complete website can be fuzzy-searched and filtered by tags.

## Other publishing systems

• Frog is a blog generator written in Racket. While it may be one of the best of its kind, it sadly does not support the Org format as of this writing. Some blogs generated with Frog:
• Haunt is a blog generator written in Guile. It seems to be very complete and extensible, but sadly it does not support the Org format as of this writing. Some blogs generated with Haunt:
• Coleslaw is a blog generator in Common Lisp. It seems to be very complete and extensible, but sadly it does not support the Org format as of this writing. It has a commandline tool, Coleslaw-cli. Some blogs generated with Coleslaw: