the laughing cloud

cloud storage as a linux filesystem

[Updated 22 feb 2012 – updated and clarified process]

I need space.  Lots of space.  Like an amazon S3 amount of space, spread out over a pile of little linux machines running Ubuntu.

My cloud is made up of many cheap VPS providers, some costing as little as $7 a month; for that you get some CPU, a little disk and some bandwidth… the idea being to allow for total node failures and be able to move machines quickly and pretty painlessly.

Persistent disk would allow me to drop and replace nodes really quickly, essentially I’d only be using CPU.

Amazon is expensive for disk.  So expensive I can’t stand it.  Amazon will charge me $150 a month to store 1 TB, or $1800 a year.  I can buy a 1 TB disk for $99… for life.  Then I get to pay for bandwidth, something like $0.15 per gb again.  Then to add insult to injury they charge me for almost every transaction (GET/PUT/etc)… only DELETE’s are free.  It adds up fast.

Along comes Connectria… $15 a month gets me 100gb https://www.mh.connectria.com/rp/order/cloud_storage_index of S3 compatible storage, which includes the first 100gb of transfer.  Good news is it’s cheaper than Amazon, transfer is after the first 100gb$0.09/gb and transactions are free.   You can set up unlimited users, and each user can have up to 100 buckets, and no speed limits.  In my testing they were much, much faster than Amazon, so much so that the speed itself caused it’s own set of strange problems.

If you’re going to mount a disk like this, there may or may not be a lot of disk usage, but you are likely to have a lot of transactions!  Using Amazon would be death here.

So now I want to use this S3 storage on my linux box.

1. Sign up for Connectria – https://www.mh.connectria.com/rp/order/cloud_storage_customer

2. Get s3cmd: http://s3tools.org/s3cmd

3. s3cmd --configure put the keys from Connectria in there.

Edit ~/.s3cfg – change the following 2 lines from referencing amazon to connectria:

host_base = rs2.connectria.com
host_bucket = %(bucket)s.rs2.connectria.com

4. Make a bucket:  s3cmd mb s3://testbucket

5. s3cmd ls – you should see testbucket

Cool.  You’re in.  Now we get funky.  You want to mount this bucket as a filesystem under linux. 

For this we’re going to need a really nice bit of software called s3backer: http://code.google.com/p/s3backer/ which creates a FUSE-based filesystem.  There are other programs that do this, including s3ql, but in live testing they created a horrible load on the system.  In addition during my “hammer the disk and reboot the system” tests the results were ugly, ugly, ugly.

Archie, the author, does a nice job of explaining how to install and test here: http://code.google.com/p/s3backer/wiki/RunningTheDemo

I’m going to assume you’ve gotten it running from that. 

Next step is actually choosing a filesystem type to create.  Again, in this case, the newest isn’t necessarily the best.  I tested a number of filesystems, ext2, ext3, ext4, XFS, and ReiserFS.  Because we were going to have a large number of small files (like 1K), ReiserFS worked best architecturally and passed all my tests as well.

Note: ReiserFS is technically great, it’s author is in jail for killing his wife which kinda put a kink in the development efforts: http://en.wikipedia.org/wiki/ReiserFS

Make the directories we’re going to need, somewhere to mount it, a cache directory and the loop directory for the FUSE filesystem.

mkdir /mnt/s3

mkdir /bbs3

mkdir /bbs3/cache
mkdir /bbs3/testbucket

Run s3backer…. assuming the password for Connectria lives in /root/.s3backer_passwd

        s3backer –accessFile=/root/.s3backer_passwd –filename=testbucket –minWriteDelay=1800–blockCacheFile=/bbs3/cache/testbucket –blockCacheThreads=8 –baseURL=https://rs2.connectria.com/ –blockSize=4k –size=1g –listBlocks testbucket/bbs3  /testbucket

There are a couple of funky parameters here, size=1g makes a 1 gb filesystem which you can extend later, minWriteDelay=1800 makes s3backer wait 1.8 seconds before re-writing a block that it just wrote – this gives Connectria a bit of time to get the block actually written and was the only wierdness I encountered.  Using a blocksize of 4k is natural since the filesystem’s blocksize is 4k and those are the blocks we’ll be writing.

Next we make our reiserFS and mount it.  Only wierdness here is the ‘noatime’ flag that helps reduce the number of blocks written to the filesystem.

If you’re on Ubuntu – the programs needed to use reiserfs can be installed with:
apt-get install reiserfsprogs

mkreiserfs -ff -b 4096 -s 513 /bbs3/testbucket/testbucket
mount -o loop,noatime  /bbs3/testbucket/testbucket /mnt/s3

If all goes well, you now have a 1 gb filesystem sitting out there in the cloud… At this point there are about 30 filesystems running simultaneously on 8 different machines, and they’re working great.

Leave a Reply

Your email address will not be published. Required fields are marked *

*