Ghost on GitHub Pages With HTTrack

Danny Wahl's picture Danny Wahl  •   •  Ghost

I've written a little bit about this before but this site is currently created using a local copy of Ghost and then converted to a static HTML site and deployed to GitHub pages. This post will overview how I do that.

First I'd like to say that I am aware of, and initially used, the tool Buster (get it, Ghost Buster), it's a fine little utility that basically functions as a fancy wget wrapper, but it has some issues. Issues that were preventing this site from being fully functional. In the author's defense- some of it may have actually been caused by homebrew's wget. So I decided to investigate an alternative toolset.

This bring us to HTTrack. httrack is an alternative website downloader (like wget) and is available for just about every platform. HTTrack has a lot of options, and if you need something fancy I recommend that you read the guide. That said, here's a step-by-step to deploy your Ghost blog to GitHub Pages.

Initialize Your Git Repository

I'll go over how to deploy to a user (or organization) page on Github Pages. If you need to do it for a project, just replace the user steps with the steps necessary for your repo, but generally it's just using the gh-pages branch instead of master.

Create the directory that you want to init as a git repository and open a terminal prompt in that directory. HTTrack will create some meta files (a cache and a log) and clone your site to something that is ASCII safe. So for example if your Ghost blog is running on http://localhost:2368 which is the default, from your output directory HTTrack will create:

with ./localhost_2368/ containing your static site. So you'll need to initialize your git repository there. I have mine in ~/Sites/htdocs/

mkdir ~/Sites/htdocs/  
cd ~/Sites/htdocs/  
git init  
git checkout -b master  
git remote add origin[USER NAME]/[USER NAME]  

A few notes here:

  1. Replace [USER NAME] with your user name, or organization name.
  2. Replace -b master with -b gh-pages if this is a project page.
  3. You don't have to use the SSH repository path, you can use HTTPS but you'll be prompted to authenticate every time you update your site. If you don't have SSH configured, I strongly suggest you do it.
  4. If you have things in your repository you might want to git fetch and git merge them- but you probably don't. Even if you blow them away locally, they'll still be in your commit history.

Initializing your Git repository is obviously a one time task.

Generate Your Initial Static Site

One of the dowsides of HTTrack to wget is that it seems to be a bit slower. One of the benefits that HTTrack has over wget is its cache. That means that the first crawl of your site is going to be considerably slower than using wget (or buster) but updates to your site should be relatively quick. Luckily we can begin the process with a single command. The directory from where you run this command doesn't matter:

httrack http://localhost:2368/ -O /PATH/TO/OUTPUT/FOLDER/ -c128 -I0 -#p "+sitemap*"  

Let's break this command down a little bit. The first parameter is the path to the site you want to crawl, e.g. your ghost blog. The second parameter -O is the path to the output folder. You'll obviously need to replace /PATH/TO/OUTPUT/FOLDER/ with your actual path (NOT the localhost_2368 folder but it's parent. Mine is /Users/dannywahl/Sites/htdocs/ -c128 means that we scrape with a rate of 128 simulatneous connections. -I0 tells httrack not to make a custom / file for the output and -#p is detailed output (verbose).

The "+sitemap*" tells httrack to explicitly to grab any links to a sitemap. Ghost automatically generates a sitemap, but it uses @blog.url to populate them so httrack will not scrape them by default- so we need add this command. This might be dangerous if you blog content contains a lot of links to other sites' sitemaps. You might consider changing it to your Ghost blog url, e.g. "+*" to only grab your site's sitemap.

If it all goes right you'll see the progress of your site scrape in the terminal output like this:

Mirror launched on Wed, 04 Feb 2015 12:06:11 by HTTrack Website Copier/3.48-19 [XR&CO'2014]  
mirroring http://localhost:2368/ +/sitemap* with the wizard help..  
Thanks for using HTTrack!  

Now you should have a fully viewable static site in /localhost_2368/ that is ready to deploy to GitHub Pages.

Deploy to GitHub Pages

In terminal navigate back to the /localhost_2368/ directory and commit at push your changes to GitHub:

git add .  
git commit -m "Blog updated"  
git push -u origin master  

Now you should be able to visit your GitHub page and see the static version of your site.

Update Your Static Site

The next time you update your Ghost blog and you need to update your static site navigate to the folder containing /localhost_2368/ and run this command:

httrack -iC2  

This command updates your site and reuses your cache, you don't need to do a full re-scrape. It will delete items that are remotely deleted, update changed assets, and it will add new files. Then push to GitHub again.

Wrap it in a Shell Script

After the initial static site is created and the git repository is created it's quite a simple and repetitive task to update and deploy, so you can stick it all in a simple shell script like this:

# working path
cd /Users/dannywahl/Sites/htdocs/

# update
httrack -iC2

# replace favicon
mv favicon.ico localhost_2368/favicon.ico

# deploy
cd /Users/dannywahl/Sites/htdocs/

current_time=$(date -u +"%Y-%m-%d %T")  
git add .  
git commit -m "Blog update at $current_time"  
git push -u origin master  
echo "Deployed to github"  

I have that saved as a utility called bustit so now I can simply type bustit in terminal to deploy my updated blog.

It's not nearly as robust as buster but it very easily could be- it's the same wrapper for a site downloader and a git interface, I just didn't make it modular or extensible, maybe I will in the future.


If you're not happy with the look of your sitemaps it's because sitemap.xsl is missing- you'll need to visit http://localhost:2368/sitemap.xsl and save that to the static site root (next to sitemap.xml). I can't figure out how to get httrack (or buster) to download this- but you only need to do that the first time. This file is not necessary.

If you're used to using buster it automatically adds a README file to your site. You'll need to create that manually if you want it. Same for CNAME and robots.txt.

Well that's all there is to it. Create an initial static copy, update that copy when you update Ghost, and push to github. Maybe in the future I'll rewrite this as a robust application because I think it would be cool to integrate it with AppleScript Folder Actions so that whenever the Ghost sqlite database is updated this triggers, but for now this works just fine.