Tuesday, July 22, 2008

Threading Squid - initial observations

My next task after some IPv6 related reshuffling is to bring in the bare essentials needed to make Squid^WCacheboy SMP-happy.

There are a few potential ideas:


  • Leave Squid single-threaded. Stop it from doing its own disk/memory caching; push that out to a shared external process and abuse sysvshm IPC/anonymous mmap/etc to share large amounts of data efficiently;

  • Thread Squid entirely. Allow multiple concurrent copies of squid running in threads - whichever "model" of thread helpers you choose - and parallelise everything;

  • Provide basic thread services but leave Squid monolithic - push certain things into threads for now, figure out what benefits from being run in parallel;

  • A mix of all of the above.



Some of the problems that are faced!

cbdata



The cbdata type makes it a pain in the ass. Specifically, anything which wants to be shared between threads needs to be able to be 'locked' into memory until the thread hands it back either completed, or cancelled.

cbdata doesn't give you any guarantees that the pointer is pointing to something even remotely valid - even if you cbdataLock()'ed the item, the owner (or not! Thats how horrible the code can get) can cbdataFree() the underlying pointer and suddenly you're pointing at gunk. It might smell mostly right, it might even have somewhat valid data, but its still freed gunk, and thats not good enough.

Shared Statistics



Squid keeps a lot of statistics and histograms. Something needs to be done to allow these to be kept in multiple threads without lots of fine-grain locks and/or stalling.

I may just get rid of a lot of the complicated statistics and require them to be post-process derived externally.

Memory Pools



The memory pools framework will be a nightmare to thread efficiently. Well, memory allocators in general are. I -could- just fine-grain lock it, but it gets a -lot- of requests and so I'd have to first fix the pool abusers before I consider this. (I'm going to do it anyway, but not so I can then fine-grain thread mempools.) I could figure out the best way to thread it - or run multiple pools per pool, one per thread - but damnit, this is 2008, there are better malloc implementations out there by people who understand concurrency issues better than I. Its a waste of time to try and thread it until I understand the workload and implications better.

So I'll -probably- be turfing mempools as it stands and replacing it with just enough to keep statistics before going direct to malloc(). See the statistics section above. I won't do this until I've modified the heaviest mempool abusers to -not- put such large demands on the allocator system, so it'll be a win/win situation everywhere.

more to come..

No comments: