GlusterFS Workstation Cluster
At our office, there are a number of work-stations used for development work. However, I have a small concern about the safety of the files on these machines because they are all stored on a local hard-disk. The traditional method of backing-up would involve the use of a file-server and backing up the data onto it.
That model of data backup is still a valid one. However, seeing that there are already a number of work-stations in the office, it did not make a lot of sense to setup another file-server for this purpose, especially since each work-station has at least a 500GB hard-disk in it. Furthermore, this would just move the point-of-failure to the file-server instead.
There had to be a better way to do this – and there is a way to distribute and replicate files over the work-stations so that there are multiple copies of the data everywhere, to prevent single points of failure killing someone’s work. After some quick research, it seems that GlusterFS is the preferred production method for turning a bunch of networked computers into a scalable storage network.
It was very easy to set up. It took me less than an hour to get a basic replication cluster working. Now, each and every work-station in the office is part of a replicating storage network running both as a Gluster server and client.
In addition, to the file redundancy that this provides, it also allows a person to hot-desk easily. A developer can use any machine on the network because his/her home directory is already replicated on all the work-stations. Again, the typical way of doing this is via an NFS mounted home-directory but the NFS server would just introduce a single-point of failure for the office.
GlusterFS and others of its ilk, are very useful in this respect. Sure, there are caveats.
For example, at least two work-stations need to be running in order to have replication working correctly. This can be easily solved by just letting everyone know this and keeping at least two machines running at a time. Another way to solve this would be to keep a separate file-server running 24/7 in the server room, as a replication node or leaving all the work-stations running all the time. Our guide to setting up GlusterFS is on our wiki.
I do not have any numbers but it seems to run fine on a 100Mbps network. However, as the workloads increase, I foresee upgrading the network at some point in the near future.
PS: I am amazed at how well this actually works!