Wednesday, June 11, 2008

Async IO related hackery

I've been staring at the Async IO code in preparation to migrate stuff out of the aufs directory and into a seperate library.

It requires the fd tracking code (understandable) and the comm code to monitor a notification pipe. This monitor pipe was used by the worker threads to wake up the main process if its waiting inside a select()/poll()/etc call, so it can immediately work on some disk IO.

Squid checks the aio completion queues each time through the comm loop. For aio, there isn't a per-storedir queue, there's just a global queue for all storedirs and other users, so aioCheckCallbacks() is called for each storedir.

There are two problems - firstly, select()/poll() take a while to run on a busy cache, so aioCheckCallbacks() isn't called that often. But the event notification based mechanisms end up running very often, returning a handful of filedescriptors each pass through the comm loop - and so the storedir checks are called. Secondly, its called once per storedir, so if you have 10 storedirs (like I have for testing!) aioCheckCallbacks() is called 10 times per IO loop.

This is a bit silly!

Instead, I've modified the async IO code to only call aioCheckCallbacks() when that pipe is written to. This ends up being the "normal" hack that UNIX thread programmers do to wake up a thread stuck waiting for both network and thread events. This cuts back substantially on the number of aioCheckCallbacks() calls without impacting performance (as far as I can see.)

Next! By default, the aufs store code only does async IO for open() and read() - write() and close() doesn't run asynchronously. Apparently this is due to testing under Linux - unless you're stressing the buffer cache too hard, write() to a disk FD didn't block, so there wasn't a reason to run write() and close() async. Apparently Solaris close() will block as metadata writes are done synchronously, and its possible FreeBSD + softupdates may do something similar. This is all "apparently", I haven't sat down and instrumented any of this!

FreeBSD and Solaris users have reported that diskd performs better than aufs - something I don't understand, as diskd only handles one outstanding disk IO at a time with similar issues with write() and close() to aufs (namely, if the calls block, the whole diskd process stops handling disk IO) but the difference here is the main process won't hang whilst these syscalls complete. Perhaps this is a reason for this behaviour. Its difficult for me to test; aufs has always performed fantastically for me.

There's so much to tidy up and reorganise, I still can't sit down and begin implementing any of the new features I want to!

No comments: