Monday, March 9, 2009

minimising traffic to the backends..

Squid/Cacheboy/Lusca has a nifty feature where it'll "piggyback" a client connection on an existing backend connection if the backend response is cachable AND said response is valid for the client (ie, its the right variant, doesn't require revalidation at that point, etc.)

I've been using this for the videolan and mozilla downloads. Basically, one client will suck down the whole object, and any other clients which want the same object (say, the 3.0.7 US english win32 update!) will share the same connection.

There's a few problems which have crept up.

Firstly - the "collapsed forwarding" support is not working in this instance. I think the logic is broken with large objects (it was only written for small objects, delaying forwarding the request until the forwarded response was known cachable) where it denies cachability of the response (well, it forces it to be RELEASEd after it finishes transferring) because of all of the concurrent range requests going on.

Secondly - Squid/Cacheboy/Lusca doesn't handle range request caching. It'll -serve- range responses for objects it has the data for, but it won't cache partial responses nor will it reassemble them into one chunk. I've been thinking about how to possibly fix that, but for now I'm hacking around the problems with some scripts.

Finally - the forwarding logic uses the speed of the -slowest- client to determine how quickly to download the file. This needs to be changed to use the speed of the -fastest- client to determine how quickly to download said file.

I need to get these fixed before the next mozilla release cycle if I'm to have a chance of increasing the traffic levels to a gigabit and beyond.

More to come..

2 comments:

ymg said...

Adrian, Is caching of range requests happening on a different branch or is it on LUSCA-HEAD ?

Adrian said...

There's no range request caching going on. I have a script which looks through the uncached partial responses and fetches full objects for those. Range replies can then be given to the full, cached objects.