the laughing cloud

Wiki Migration Hell: From Socialtext to Confluence

Consulting tends to be interesting.  Sometimes, you get interesting fun new things to do.  Sometimes you get to do the stuff nobody else wants to do.  Recently, I suspect it was the latter.

One wouldn’t think converting a wiki would be that tough.  Really.  Especially from one wiki to another.  C’mon how difficult can this be?

So my purpose with this blog post is to help prevent the next poor shmuck that has to do this from suffering as much as I have.  Here’s the summary:

  • Socialtext backups are almost useless.  They will, upon request, provide you with a backup of your wiki, however this backup DOES NOT CONTAIN ANY USER INFORMATION AT ALL.  Like none.  The RSS feed has user information, the HTML pages have user information, but the backups?  Nooooooooo.  Crippling backups is a douchebaggy thing to do.
  • Confluence has the Universal (Useless?) Wiki Converter.  The UWC is a monolithic java program which requires Gnome and X-windows to run, at least in any sort of interactive mode.  It never sucessfully converted a file in socialtext wiki format, and I never succeeded in getting it to upload an attachment.  It was however, excellent at producing java stack traces and sitting and spinning forever.
Screen_shot_2012-11-12_at_4

So here’s what to do:

  1. For Socialtext, request your backups.  They’re pretty useless except that they provide the attachments you’ll need, and grabbing them from the backups is generally easier than grabbing them manually out of Socialtext.
  2. Next, accept that you’re going to have to do this stuff manually.  I tried everything to avoid having to write code, but it just didn’t work out.  Info on the REST api is here: https://www.socialtext.net/st-rest-docs/
  3. Get a list of the workspaces you’re going to migrate, and I created them manually in Confluence.  Then for each article in the space, you’re going to want to grab three things (which I put in 3 separate files):
    • The article itself, in wiki format, i.e. workspace.downloadPage(space, page)
    • The Meta information for the file – this is where you’ll get the owner info from – something like: meta = workspace.getMetaData(page)
    • A list of the attachments associated with the article.
Screen_shot_2012-11-12_at_4

Now, for the Confluence imports:

  1. Turn on XMLRPC from the admin menu.  You will get to know it well.  Here are the docs: https://developer.atlassian.com/display/CONFDEV/Confluence+XML-RPC+and+SOAP+A…
  2. I couldn’t find a decent converter from socialtext to TEXTILE (the flavor of markup used by Confluence).  However the changes between the two formats aren’t terrible:
    • ^, ^^, ^^^, ^^^^ get translated into h1., h2., h3., and h4.
    • {file: blah} gets translated into |blah|
    • {image: blah} gets translated into !blah!
    • {user: blah} becomes [mailto:blah]
    • .code becomes {code}
    • {link: blah} becomes [blah]
    • {section: blah} becomes {anchor: blah}
  3. Once you’ve got some wiki text converted, try to upload it to confluence using XMLRPC, it’s straightforward, i.e. newpage = confluence.storePage(token, newpagedata)
  4. Next, the attachments – you should have all the info you need in the .meta file from socialtext, plus the attachments themselves from your socialtext backups – they’re straightforward except for having to convert the attachment data to base64: confluence.addAttachment(token, pageid, newattdata, XMLRPC::Base64.new(attdata))

Gotcha’s:

  • Confluence thinks any open-squiggly bracket { is a macro and will barf accordingly.  Either wrap the offending area inside of {code} bad-squigglies here {code}, or turn them into something else, like [
  • Obscure bug of the day – the ruby library for confluence barfs if the document being uploaded has any control characters in it, i.e. ^G’s etc.  I don’t know how they got in there, either.
  • I’ve seen the upload of large attachments hang mysteriously.  Beware.
  • I’ve run out of disk space – because of attachments.  Keep an eye on the confluence_data directory – it gets big.  Fast.
  • The whole thing is *really* CPU intensive.  I’m using a decent Amazon server and have seen the load average over 10.  I’ve also run out of memory and had the import crash.
  • The Socialtext REST interface may have issues.  When I pulled a list of all the wiki pages out using the following command, iterated
    https://www.socialtext.net/data/workspaces/angel/pages?filter=pagetype:wiki;count=100 it didn’t return all pages.  So I issued the same command without the filter – https://www.socialtext.net/data/workspaces/angel/pages?count=100 and some of the missing wiki files appeared.  To make sure I compared the 2 lists (after sorting) using the UNIX comm command… and found to my dismay that the command to get “all files” managed to miss some of the wiki files returned by the first command.  That sucks.  So I’ll use a list of files from both these commands – which leaves me unsure of whether I’ve missed anything.  Ugh.

Now think about moving your users.  If there’s a one-to-one mapping, it’s straightforward, just create the new users in Confluence and make sure the articles are owned by the correct person.  If however, you’re using something like Active Directory, then the mapping becomes a little more exciting.

In this case, I suggest spinning up a test instance of Confluence and creating a pile of dummy users with the correct A/D names and mappings.  You can then export spaces from a test version of Confluence into a production version with live A/D.  The mappings will carry over, which is nice.

If you’re running into issues, feel free to drop me a line, I’m happy to help.

 

 

Leave a Reply

Your email address will not be published. Required fields are marked *

*