LaceySnr.com - Salesforce Development Posts by Matt Lacey

Blogger To Ghost

Posted: 2016-01-19

A new year, a new place to host my blog. I'd been planning to move away from Blogger to Ghost ever since I first heard about Ghost. I waited for it quite anxiously as the idea of using Markdown to format posts sounded like bliss in comparison to the confused mess of shitty HTML that Blogger would often generate. When Ghost launched however I realised there wasn't a quick way to get my content moved, so I put the project on the back burner and then it stayed there far longer than I'd ever like to admit. Well I've finally done it, and here's how.

Exporting Blogger Data

There's a few methods of doing this, including moving your data to Wordpress first (apparently that has better tooling) but I decided to use blogger2ghost. Before you run off to investigate, keep reading, it wasn't all that smooth for me.

First of all you need to download your blogger data. You need to check your Blogger settings to ensure your feed is set to Full and then use a tool such as wget or curl to pull down the data.

I had quite a bit of trouble finding the right URL format to extract all of my posts. There's plenty of examples out there but everything I tried seem to truncate the feed. In theend I found keeping the max results parameter lower than often suggested, and making it the first parameter I got what I needed:

<<base url>>/feeds/posts/default?max-results=400&alt=json

Once I had the JSON data I ran it through blogger2ghost. This is where things went a bit wonky (I suspect a bad regex in the tool, or maybe my Blogger markup was just too crap to comprehend); a lot of my paragraphs got truncated at the first anchor tag. Check your data at this stage. By the time I noticed, I'd gotten so far along with the rest of this that I decided to fix them all manually, which was not fun to say the least.

Hosting

Ghost needs to be hosted somewhere. The guys who provide it also provide hosting options but instead I went with Digital Ocean for a VPS, and because I've always had a soft spot for FreeBSD (I used it as my desktop OS around 12 years go) I chose that as the operating system.

There's plenty of guides on getting setup out there, the one thing that I did do a little different was to do with my post slugs, i.e. the post part of a URL on my blog. Blogger used a format with the date in there, Ghost, by default doesn't, but if you enable it then it also adds the day. There's ways around this but I like the clean option of no dates so I found a way to use that.

Slug Format

The tricky part is I didn't want to break any old links to my blog, so again I settled for a mix of automation and manual labour.

Dates

Using Nginx I rewrite parts of the URLs using a regex that eliminates the date part of incoming URLs and also removes the .html at the end.

rewrite "^/\d{4}/\d{2}/(.*)\.html/*$" /$1 ;

That was the easy part. Now if you follow an old link to http://www.laceysnr.com/posts/the-benefits-of-sending-your-developers.html, you'll end up at http://www.laceysnr.com/posts/the-benefits-of-sending-your-developers/.

Other Formatting

The harder part was that some of the slugs still differed in format. Blogger skips certain words (such as conjunctions) and hyphenates differently to Ghost (the former will use visualforce but Ghost somehow had visual-force (ack!)). This was a bigger issue but given that I only had to check 130 or so values I fixed things up manually.

The Spreadsheet The spreadsheet of fun and joy

I dumped my posts table from ghost.db into a CSV file, and got a list of URLs from Blogger using the RSS feed. I then put both in a spreadsheet side by side with a quick VLOOKUP() and some conditional formatting so that I could filter out all the ones that were already identical. That left me with around 80 URLs which I simply copied from the Blogger column to the Ghost column, saved the results as a CSV and used that to update the slugs for my posts in Ghost.

I could have simply used Nginx rules to rewrite each of the old URLs that was incoming but that would have been just as much manual work with the overhead added to all requests.

What's Missing?

Well, probably some chunks of paragraphs for start. It was mind numbing work (though also kind of interesting as somethings were farther in the past than I thought) so I wouldn't be surprised if I missed some of the conversion errors.

Pages. I don't have any yet, I might add some.

Widgets. Meh, it looks nicer clean, I think. I'm still not 100% settled on the theme but I wanted to keep the colours for consistency.

Comments. These weren't used all that much on my old blog, but then I think more people are likely to comment if I used Disqus over Blogger's built in mechanism, so maybe that'll come. For now if you have any questions feel free to ping me @LaceySnr.