Write-behind, Read-ahead and Gluster

So, over on the gluster-user list, I just enjoyed giving an explanation of write-behind vs read-ahead, as it applies to a filesystem serving up VM images.

Having write-behind enabled is like juggling your data with with a partner. Having write-behind disabled is like you and your partner handing data to each other, rather than tossing it. Having read-ahead disabled is like asking your partner for a page of data, and having him give you that page of data. Having read-ahead enabled is like asking your partner for a page of data, and having him give you a fifty page report, because he thinks you may need the extra information–except you already made allowances yourself in asking for that full page of data; the only data you *knew* you needed was a single table in that page.

As another example of why you wouldn’t normally need read-ahead enabled in gluster, I could easily write a small books’ worth of theory into an email detailing the concept further, but I’ve already given sufficient information to illustrate the relevant concepts; anything further would be unnecessary detail I’m only guessing you might need. 😉

The read-ahead setting is about performance, not about data integrity. Virtual machines will be running an operating system. That operating system will be running block-device drivers and filesystem drivers. Both of those types of drivers have their own tunable concepts of read-ahead, so any further read-ahead at the gluster layer is unnecessary.

(Obviously, the remote filesystem in question was Gluster, but the same would apply with any filesystem. Discussion of risk management, flash-backed write caches and various bits of infrastructure redundancy can wait for other posts.)