{"id":24,"date":"2012-11-12T08:37:00","date_gmt":"2012-11-12T08:37:00","guid":{"rendered":"http:\/\/blog.maclawran.ca\/wiki-migration-hell-from-socialtext-to-conflu-84722"},"modified":"2014-05-10T06:18:59","modified_gmt":"2014-05-10T06:18:59","slug":"wiki-migration-hell-from-socialtext-to-conflu-84722","status":"publish","type":"post","link":"https:\/\/blog.maclawran.ca\/?p=24","title":{"rendered":"Wiki Migration Hell: From Socialtext to Confluence"},"content":{"rendered":"<p>Consulting tends to be interesting.\u00a0 Sometimes, you get interesting fun new things to do.\u00a0 Sometimes you get to do the stuff nobody else wants to do.\u00a0 Recently, I suspect it was the latter.<\/p>\n<p>One wouldn&#8217;t think converting a wiki would be that tough.\u00a0 Really.\u00a0 Especially from one wiki to another.\u00a0 C&#8217;mon how difficult can this be?<\/p>\n<p>So my purpose with this blog post is to help prevent the next poor shmuck that has to do this from suffering as much as I have.\u00a0 Here&#8217;s the summary:<\/p>\n<ul>\n<li>Socialtext backups are almost useless.\u00a0 They will, upon request, provide you with a backup of your wiki, however this backup <strong>DOES NOT CONTAIN ANY USER INFORMATION AT ALL<\/strong>.\u00a0 Like none.\u00a0 The RSS feed has user information, the HTML pages have user information, but the backups?\u00a0 Nooooooooo.\u00a0 Crippling backups is a douchebaggy thing to do.<\/li>\n<\/ul>\n<ul>\n<li>Confluence has the Universal (Useless?) Wiki Converter.\u00a0 The UWC is a monolithic java program which requires Gnome and X-windows to run, at least in any sort of interactive mode.\u00a0 It never sucessfully converted a file in socialtext wiki format, and I never succeeded in getting it to upload an attachment.\u00a0 It was however, excellent at producing java stack traces and sitting and spinning forever.<\/li>\n<\/ul>\n<div class=\"p_embed p_image_embed\"><img loading=\"lazy\" decoding=\"async\" src=\"http:\/\/blog.maclawran.ca\/wp-content\/uploads\/2013\/05\/c53b3-screen_shot_2012-11-12_at_4-21-40_am-scaled500.png\" alt=\"Screen_shot_2012-11-12_at_4\" width=\"189\" height=\"46\" \/><\/div>\n<p>So here&#8217;s what to do:<\/p>\n<ol>\n<li>For Socialtext, request your backups.\u00a0 They&#8217;re pretty useless except that they provide the attachments you&#8217;ll need, and grabbing them from the backups is generally easier than grabbing them manually out of Socialtext.<\/li>\n<li>Next, accept that you&#8217;re going to have to do this stuff manually.\u00a0 I tried everything to avoid having to write code, but it just didn&#8217;t work out.\u00a0 Info on the REST api is here: <a href=\"https:\/\/www.socialtext.net\/st-rest-docs\/\">https:\/\/www.socialtext.net\/st-rest-docs\/<\/a><\/li>\n<li>Get a list of the workspaces you&#8217;re going to migrate, and I created them manually in Confluence.\u00a0 Then for each article in the space, you&#8217;re going to want to grab three things (which I put in 3 separate files):\n<ul>\n<li>The article itself, in wiki format, i.e. workspace.downloadPage(space, page)<\/li>\n<li>The Meta information for the file &#8211; this is where you&#8217;ll get the owner info from &#8211; something like: meta = workspace.getMetaData(page)<\/li>\n<li>A list of the attachments associated with the article.<\/li>\n<\/ul>\n<\/li>\n<\/ol>\n<div class=\"p_embed p_image_embed\"><img loading=\"lazy\" decoding=\"async\" src=\"http:\/\/blog.maclawran.ca\/wp-content\/uploads\/2013\/05\/17a3f-screen_shot_2012-11-12_at_4-22-19_am-scaled500.png\" alt=\"Screen_shot_2012-11-12_at_4\" width=\"263\" height=\"67\" \/><\/div>\n<p>Now, for the Confluence imports:<\/p>\n<ol>\n<li>Turn on XMLRPC from the admin menu.\u00a0 You will get to know it well.\u00a0 Here are the docs: <a href=\"https:\/\/developer.atlassian.com\/display\/CONFDEV\/Confluence+XML-RPC+and+SOAP+APIs#ConfluenceXML-RPCandSOAPAPIs-Attachments\">https:\/\/developer.atlassian.com\/display\/CONFDEV\/Confluence+XML-RPC+and+SOAP+A&#8230;<\/a><\/li>\n<li>I couldn&#8217;t find a decent converter from socialtext to TEXTILE (the flavor of markup used by Confluence).\u00a0 However the changes between the two formats aren&#8217;t terrible:\n<ul>\n<li>^, ^^, ^^^, ^^^^ get translated into h1., h2., h3., and h4.<\/li>\n<li>{file: blah} gets translated into |blah|<\/li>\n<li>{image: blah} gets translated into !blah!<\/li>\n<li>{user: blah} becomes [mailto:blah]<\/li>\n<li>.code becomes {code}<\/li>\n<li>{link: blah} becomes [blah]<\/li>\n<li>{section: blah} becomes {anchor: blah}<\/li>\n<\/ul>\n<\/li>\n<li>Once you&#8217;ve got some wiki text converted, try to upload it to confluence using XMLRPC, it&#8217;s straightforward, i.e. newpage = confluence.storePage(token, newpagedata)<\/li>\n<li>Next, the attachments &#8211; you should have all the info you need in the .meta file from socialtext, plus the attachments themselves from your socialtext backups &#8211; they&#8217;re straightforward except for having to convert the attachment data to base64: confluence.addAttachment(token, pageid, newattdata, XMLRPC::Base64.new(attdata))<\/li>\n<\/ol>\n<p>Gotcha&#8217;s:<\/p>\n<ul>\n<li>Confluence thinks any open-squiggly bracket { is a macro and will barf accordingly.\u00a0 Either wrap the offending area inside of {code} bad-squigglies here {code}, or turn them into something else, like [<\/li>\n<li>Obscure bug of the day &#8211; the ruby library for confluence barfs if the document being uploaded has any control characters in it, i.e. ^G&#8217;s etc.\u00a0 I don&#8217;t know how they got in there, either.<\/li>\n<li>I&#8217;ve seen the upload of large attachments hang mysteriously.\u00a0 Beware.<\/li>\n<li>I&#8217;ve run out of disk space &#8211; because of attachments.\u00a0 Keep an eye on the confluence_data directory &#8211; it gets big.\u00a0 Fast.<\/li>\n<li>The whole thing is *really* CPU intensive.\u00a0 I&#8217;m using a decent Amazon server and have seen the load average over 10.\u00a0 I&#8217;ve also run out of memory and had the import crash.<\/li>\n<li>The Socialtext REST interface may have issues.\u00a0 When I pulled a list of all the wiki pages out using the following command, iterated<br \/>\n<a href=\"https:\/\/www.socialtext.net\/data\/workspaces\/angel\/pages?filter=pagetype\">https:\/\/www.socialtext.net\/data\/workspaces\/angel\/pages?filter=pagetype<\/a>:wiki;count=100 it didn&#8217;t return all pages.\u00a0 So I issued the same command without the filter &#8211; <a href=\"https:\/\/www.socialtext.net\/data\/workspaces\/angel\/pages?count=100\">https:\/\/www.socialtext.net\/data\/workspaces\/angel\/pages?count=100<\/a> and some of the missing wiki files appeared.\u00a0 To make sure I compared the 2 lists (after sorting) using the UNIX comm command&#8230; and found to my dismay that the command to get &#8220;all files&#8221; managed to miss some of the wiki files returned by the first command.\u00a0 That sucks.\u00a0 So I&#8217;ll use a list of files from both these commands &#8211; which leaves me unsure of whether I&#8217;ve missed anything.\u00a0 Ugh.<\/li>\n<\/ul>\n<p>Now think about moving your users.\u00a0 If there&#8217;s a one-to-one mapping, it&#8217;s straightforward, just create the new users in Confluence and make sure the articles are owned by the correct person.\u00a0 If however, you&#8217;re using something like Active Directory, then the mapping becomes a little more exciting.<\/p>\n<p>In this case, I suggest spinning up a test instance of Confluence and creating a pile of dummy users with the correct A\/D names and mappings.\u00a0 You can then export spaces from a test version of Confluence into a production version with live A\/D.\u00a0 The mappings will carry over, which is nice.<\/p>\n<p>If you&#8217;re running into issues, feel free to drop me a line, I&#8217;m happy to help.<\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n","protected":false},"excerpt":{"rendered":"<p class=\"excerpt\">Consulting tends to be interesting. Sometimes, you get interesting fun new things to do. Sometimes you get to do the stuff nobody else wants to do. Recently, I suspect it was the latter. One wouldn&#8217;t think converting a wiki would be that tough. Re&#8230;<\/p>\n<p class=\"more-link-p\"><a class=\"more-link\" href=\"https:\/\/blog.maclawran.ca\/?p=24\">Read more &rarr;<\/a><\/p>\n","protected":false},"author":2,"featured_media":177,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_exactmetrics_skip_tracking":false,"_exactmetrics_sitenote_active":false,"_exactmetrics_sitenote_note":"","_exactmetrics_sitenote_category":0,"footnotes":""},"categories":[1],"tags":[],"class_list":["post-24","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-uncategorized"],"_links":{"self":[{"href":"https:\/\/blog.maclawran.ca\/index.php?rest_route=\/wp\/v2\/posts\/24","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/blog.maclawran.ca\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blog.maclawran.ca\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blog.maclawran.ca\/index.php?rest_route=\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/blog.maclawran.ca\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=24"}],"version-history":[{"count":1,"href":"https:\/\/blog.maclawran.ca\/index.php?rest_route=\/wp\/v2\/posts\/24\/revisions"}],"predecessor-version":[{"id":178,"href":"https:\/\/blog.maclawran.ca\/index.php?rest_route=\/wp\/v2\/posts\/24\/revisions\/178"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/blog.maclawran.ca\/index.php?rest_route=\/wp\/v2\/media\/177"}],"wp:attachment":[{"href":"https:\/\/blog.maclawran.ca\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=24"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blog.maclawran.ca\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=24"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blog.maclawran.ca\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=24"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}