Tuesday, November 10, 2009
More issues with Lighttpd
More investigation is required. Well, more statistics are needed so I can make better judgements. Well, actually, more functional backends are needed so I can take one out of production when something like this occurs, properly debug what is going on and try to fix it.
Cacheboy Update / October/November 2009
Just a few updates this time around!
- Cacheboy was pushing around 800-1200mbit during the Firefox 3.5.4 release cycle. I started to hit issues with the backend server not keeping up with revalidating requests and so I'll have to improve the edge caching logic a little more.
- Lusca seems quite happy serving up 300-400mbit from a single node though; which is a big plus.
- I've found some quite horrible memory leaks in Quagga on only one of the edge nodes. I'll have to find some time to login and debug this a little more.
- The second backend server is now offically toast. I need to acquire another 1ru server with 2 SATA slots to magically appear in downtown Manhattan, NY.
Thursday, October 8, 2009
Cacheboy downtime - hardware failures
Wednesday, September 30, 2009
Lusca updates - September 2009
- All of the Cacheboy CDN nodes are running Lusca-HEAD now and are nice and stable.
- I've deployed Lusca at a few customer sites and again, it is nice and stable.
- The rebuild logic changes are, for the most part, nice and stable. There seems to be some weirdness with 32 vs 64 bit compilation options which I need to suss out but everything "just works" if you compile Lusca with large file/large cache file support regardless of the platform you're using. I may make that the default option.
- I've got a couple of small coding projects to introduce a couple of small new features to Lusca - more on those when they're done!
- Finally, I'm going to be migrating some more of the internal code over to use the sqinet_t type in preparation for IPv4/IPv6 agnostic support.
Monday, September 21, 2009
My current wishlist
- More US nodes. I'll take anything from 50mbit to 5gbit at this point. I need more US nodes to be able to handle enough aggregate traffic to make optimising the CDN content selection methods worthwhile.
- Some donations to cover my upcoming APNIC membership for ASN and IPv4/IPv6 space. This will run to about AUD $3500 this year and then around AUD $2500 a year after that.
- Some 1ru/2ru server hardware in the San Francisco area
- Another site or two willing to run a relatively low bandwidth "master" mirror site. I have one site in New York but I'd prefer to run a couple of others spread around Europe and the United States.
New project - sugar labs!
Monday, August 31, 2009
Cacheboy presentation at AUSNOG
Monday, August 17, 2009
Cacheboy status update
Unfortunately what I'm now lacking is US hosts to send traffic from. I still have more Europe and Asian connectivity than North American - and North America is absolutely where I need connectivity the most. Right now I'm only able to push 350-450 megabits of content from North America - and this puts a big, big limit on how much content I can serve overall.
Please contact me as soon as possible if you're interested in hosting a node in North America. I ideally need enough nodes to push between a gigabit and ten gigabits of traffic.
I will be able to start pushing noticable amounts of content out of regional areas once I've sorted out North America. This includes places like Australia, Africa, South America and Eastern Europe. I'd love to be pushing more open source bits out of those locations to keep the transit use low but I just can't do so at the moment.
Canada node online and pushing bits!
Thanks John!
Cacheboy is on WAIX!
ASN | MBytes | Requests | % of overall | |
---|---|---|---|---|
AS7545 | 17946.77 | 7437 | 29.85 | TPG-INTERNET-AP TPG Internet Pty Ltd |
AS4802 | 12973.47 | 4476 | 21.58 | ASN-IINET iiNet Limited |
AS4739 | 8497.92 | 2947 | 14.13 | CIX-ADELAIDE-AS Internode Systems Pty Ltd |
AS9543 | 2524.57 | 1241 | 4.20 | WESTNET-AS-AP Westnet Internet Services |
AS4854 | 2097.32 | 941 | 3.49 | NETSPACE-AS-AP Netspace Online Systems |
AS17746 | 1881.17 | 1050 | 3.13 | ORCONINTERNET-NZ-AP Orcon Internet |
AS9822 | 1425.44 | 456 | 2.37 | AMNET-AU-AP Amnet IT Services Pty Ltd |
AS17435 | 1161.01 | 411 | 1.93 | WXC-AS-NZ WorldxChange Communications LTD |
AS9443 | 1140.62 | 701 | 1.90 | INTERNETPRIMUS-AS-AP Primus Telecommunications |
AS7657 | 891.93 | 1187 | 1.48 | VODAFONE-NZ-NGN-AS Vodafone NZ Ltd. |
AS7718 | 740.74 | 272 | 1.23 | TRANSACT-SDN-AS TransACT IP Service Provider |
AS7543 | 732.11 | 423 | 1.22 | PI-AU Pacific Internet (Australia) Pty Ltd |
AS24313 | 527.38 | 252 | 0.88 | NSW-DET-AS NSW Department of Education and Training |
AS9790 | 436.80 | 389 | 0.73 | CALLPLUS-NZ-AP CallPlus Services Limited |
AS17412 | 365.13 | 228 | 0.61 | WOOSHWIRELESSNZ Woosh Wireless |
AS17486 | 349.27 | 116 | 0.58 | SWIFTEL1-AP People Telecom Pty. Ltd. |
AS17808 | 311.65 | 248 | 0.52 | VODAFONE-NZ-AP AS number for Vodafone NZ IP Networks |
AS24093 | 303.40 | 114 | 0.50 | BIGAIR-AP BIGAIR. Multihoming ASN |
AS9889 | 288.85 | 197 | 0.48 | MAXNET-NZ-AP Auckland |
AS17705 | 282.49 | 84 | 0.47 | INSPIRENET-AS-AP InSPire Net Ltd |
Query content served: 54878.07 mbytes; 23170 requests.
Total content served: 60123.25 mbytes; 28037 requests.
BGP aware DNS
Sunday, August 9, 2009
Updates - or why I've not been doing very much
Wednesday, July 8, 2009
VLC 1.0 released
Graphs to follow!
Monday, June 29, 2009
Current Downtime/issues
Saturday, June 27, 2009
New mirror node - italy
Wednesday, June 17, 2009
And the GeoIP summary..
And the geoip summary:
From Sun Jun 7 00:00:00 2009 to Sun Jun 14 00:00:00 2009
Server | Country | MBytes | Requests |
---|---|---|---|
us | 5163783.09 | 6533162 | |
de | 1514664.22 | 2307222 | |
ca | 1152095.00 | 917777 | |
fr | 948433.27 | 1451105 | |
uk | 945640.71 | 1136455 | |
it | 818161.03 | 770164 | |
br | 542497.79 | 1426306 | |
se | 482932.15 | 229559 | |
es | 445444.34 | 647321 | |
pl | 397755.30 | 1021083 | |
nl | 373185.13 | 306023 | |
ru | 368124.64 | 749924 | |
tr | 293627.27 | 484965 | |
mx | 276775.12 | 463252 | |
be | 249088.62 | 213460 | |
ch | 201782.33 | 209530 | |
ro | 190059.45 | 274216 | |
fi | 172399.75 | 204630 | |
ar | 170421.77 | 374071 | |
no | 169351.46 | 155258 |
Tuesday, June 16, 2009
A quick snapshot of Cacheboy destinations..
The following is a snapshot of the per destination AS traffic information I'm keeping.
If you're peering with any of these ASes and are willing to sponsor a cacheboy node or two then please let me know. How well I can scale things at this point is rapidly becoming limited to where I can push traffic from, rather than anything intrinsic to the software.
From Sun Jun 7 00:00:00 2009 to Sun Jun 14 00:00:00 2009
Time | Site | ASN | MBytes | Requests | % of overall | |
---|---|---|---|---|---|---|
AS3320 | 602465.01 | 1021975 | 3.26 | DTAG Deutsche Telekom AG | ||
AS7132 | 583164.05 | 778259 | 3.16 | SBIS-AS - AT&T Internet Services | ||
AS19262 | 459322.30 | 603127 | 2.49 | VZGNI-TRANSIT - Verizon Internet Services Inc. | ||
AS3215 | 330962.95 | 553299 | 1.79 | AS3215 France Telecom - Orange | ||
AS3269 | 317534.06 | 333114 | 1.72 | ASN-IBSNAZ TELECOM ITALIA | ||
AS9121 | 259768.32 | 434932 | 1.41 | TTNET TTnet Autonomous System | ||
AS22773 | 244573.65 | 283427 | 1.32 | ASN-CXA-ALL-CCI-22773-RDC - Cox Communications Inc. | ||
AS12322 | 224708.25 | 343686 | 1.22 | PROXAD AS for Proxad/Free ISP | ||
AS3352 | 206093.84 | 305183 | 1.12 | TELEFONICADATA-ESPANA Internet Access Network of TDE | ||
AS812 | 204120.74 | 166633 | 1.10 | ROGERS-CABLE - Rogers Cable Communications Inc. | ||
AS8151 | 198918.22 | 328632 | 1.08 | Uninet S.A. de C.V. | ||
AS6327 | 197906.53 | 152861 | 1.07 | SHAW - Shaw Communications Inc. | ||
AS3209 | 191429.18 | 303787 | 1.04 | ARCOR-AS Arcor IP-Network | ||
AS20115 | 182407.09 | 225151 | 0.99 | CHARTER-NET-HKY-NC - Charter Communications | ||
AS2119 | 181719.20 | 117656 | 0.98 | TELENOR-NEXTEL T.net | ||
AS577 | 181167.02 | 152383 | 0.98 | BACOM - Bell Canada | ||
AS12874 | 172973.42 | 108429 | 0.94 | FASTWEB Fastweb Autonomous System | ||
AS6389 | 165445.73 | 236133 | 0.90 | BELLSOUTH-NET-BLK - BellSouth.net Inc. | ||
AS6128 | 165183.07 | 210300 | 0.89 | CABLE-NET-1 - Cablevision Systems Corp. | ||
AS2856 | 164332.96 | 219267 | 0.89 | BT-UK-AS BTnet UK Regional network |
Query content served: 5234195.61 mbytes; 6878234 requests (ie, what was displayed in the table.)
Total content served: 18473721.25 mbytes; 26272660 requests (ie, the total amount of content served over the time period.)
Saturday, June 13, 2009
Seeking a few more US / Canada hosts
I'm now actively looking for some more Cacheboy CDN nodes in the United States and Canada. I've got around 3gbit of available bandwidth in Europe, 1gbit of available bandwidth in Japan but only 300mbit of available bandwidth in North America.
I'd really, really appreciate a couple of well-connected North American nodes so I can properly test the platform and software that I'm building. The majority of traffic is still North American in destination; I'm having to serve a fraction of it from Sweden and the United Kingdom at the moment. Erk.
Please drop me a line if you're interested. The node requirements are at http://www.cacheboy.net/node_requirements.html . Thankyou!
Friday, June 12, 2009
Another day, another firefox release done..
The changes I've made to the Lusca load shedding code (ie, being able to disable it :) works well for this workload. Migrating the backend to lighttpd (and fixing up the ETag generation to be properly consistent between 32 bit and 64 bit platforms) fixed the initial issues I was seeing.
The network pushed out around 850mbit at peak. Not a lot (heck, I can do that on one CPU of a mid-range server without a problem!) but it was a good enough test to show that things are working.
I need to teach Lusca a couple of new tricks, namely:
- It needs to be taught to download at the fastest client speed, not the slowest; and
- Some better range request caching needs to be added.
The former isn't too difficult - that is a weekend 5 line patch. The latter is more difficult. I don't really want to shoehorn in range request caching into the current storage layer. It would look a lot like how Vary and Etag is currently handled (ie, with "magical" store entries acting as indexes to the real backend objects.) I'd rather put in a dirtier hack that is easy to undo now and use the opportunity to tidy up the whole storage layer a whole lot. But the "tidying up" rant is not for this blog entry, its for the Lusca development blog.
The hack will most likely be a little logic to start downloading full objects that aren't in the cache when their first range request comes in - so subsequent range requests for those objects will be "glued" to the current request. It means that subsequent requests will "stall" until enough of the object is transferred to start satisfying their range request. The alternative is to pass through each range request to a backend until the full object is transferred and this would improve initial performance but there's a point where the backend could be overloaded with too many range requests for highly popular objects and that starts affecting how fast full objects are transferred.
As a side note, I should probably do up some math on a whiteboard here and see if I can model some of the potential behaviour(s). It would certainly be a good excuse to brush up on higher math clue. Hm..!
Thursday, June 11, 2009
Migrating to Lighttpd on the backend, and why aren't my files being cached..
In theory, once all of the caching stuff is fixed, the backends will spend most of their time revalidating objects.
But for some weird reason I'm seeing TCP_REFRESH_MISS on my Lusca edge nodes and generally poor performance during this release. I look at the logs and find this:
[Host: mozilla.cdn.cacheboy.net\r\n
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.0.10) Gecko/2009042316 Firefox/3.0.10\r\n
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8\r\n
Accept-Language: en-us,en;q=0.5\r\n
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7\r\n
If-Modified-Since: Wed, 03 Jun 2009 15:09:39 GMT\r\n
If-None-Match: "1721454571"\r\n
Cache-Control: max-stale=0\r\n
Connection: Keep-Alive\r\n
Pragma: no-cache\r\n
X-BlueCoat-Via: 24C3C50D45B23509\r\n]
[HTTP/1.0 200 OK\r\n
Content-Type: application/octet-stream\r\n
Accept-Ranges: bytes\r\n
ETag: "1687308715"\r\n
Last-Modified: Wed, 03 Jun 2009 15:09:39 GMT\r\n
Content-Length: 2178196\r\n
Date: Fri, 12 Jun 2009 04:25:40 GMT\r\n
Server: lighttpd/1.4.19\r\n
X-Cache: MISS from mirror1.jp.cacheboy.net\r\n
Via: 1.0 mirror1.jp.cacheboy.net:80 (Lusca/LUSCA_HEAD)\r\n
Connection: keep-alive\r\n\r]
Notice the different ETags? Hm! I wonder whats going on. On a hunch I checked the Etags from both backends. master1 for that object gives "1721454571"; master2 gives "1687308715". They both have the same size and same timestamp. I wonder what is different?
Time to go digging into the depths of the lighttpd code.
EDIT: the etag generation is configurable. By default it uses the mtime, inode and filesize. Disabling inode and inode/mtime didn't help. I then found that earlier lighttpd versions have different etag generation behaviour based on 32 or 64 bit platforms. I'll build a local lighttpd package and see if I can replicate the behaviour on my 32/64 bit systems. Grr.
Meanwhile, Cacheboy isn't really serving any of the mozilla updates. :(
EDIT: so it turns out the bug is in the ETag generation code. They create an unsigned 32-bit integer hash value from the etag contents, then shovel it into a signed long for the ETag header. Unfortunately for FreeBSD-i386, "long" is a signed 32 bit type, and thus things go airy from time to time. Grrrrrr.
EDIT: fixed in a newly-built local lighttpd package; both backend servers are now doing the right thing. I'm going back to serving content.
Tuesday, June 2, 2009
New mirrors - mirror2.uk and mirror3.uk
mirror2.uk is thanks to UK Broadband, who have graciously given me access to a few hundred megabits of traffic and space on an ESX server.
mirror3.uk (due to be turned up today!) is thanks to a private donor named Alex who has given me a server in his colocation space and up to a gigabit of traffic.
Shiny! Thanks to you both.
Wednesday, April 22, 2009
mirror1.jp.cacheboy.net - mozilla!
Thanks guys!
Mozilla 3.0.9 release..
880mbit/sec and counting..
Monday, April 6, 2009
Lusca and Cacheboy improvements in the pipeline..
Tuesday, March 31, 2009
Lusca snapshot released
Monday, March 30, 2009
Mirroring a new project - Cyberduck!
Mozilla 3.0.8 release!
Saturday, March 28, 2009
shortcomings in the async io code
Googletalk: "Getting C++ threads to work right"
- Writing "correct" thread code using the pthreads and CPU instructions (fencing, for example) requires the code to know whats going on under the hood;
- Gluing concurrency to the "side" of a language which was specified without concurrency has shown to be a bit of a problem - eg, concurrent access to different variables in a structure and how various compilers have implemented this (eg, changing a byte in a struct becoming a 32 bit load, 8 bit modify, 32 bit store);
- Most programmers should really use higher level constructs, like what C++0x and what the Java specification groups have been doing.
Friday, March 27, 2009
Another open cdn project - mirrorbrain
Monday, March 23, 2009
Example CDN stats!
Sunday, March 22, 2009
wiki.cacheboy.net
Thursday, March 19, 2009
More "Content Delivery" done open
Filesystem Specifications, or EXT4 "Losing Data"
Monday, March 16, 2009
Breaking 200mbit..
GeoIP backend, or "reinventing the wheel"
- Take a GeoIP map to break up IPs into "country" regions (thanks nerd.dk!) ;
- Take the list of "up" CDN nodes;
- For each country in my redirection table, find the CDN node that is up with the highest weight;
- Generate a "geo-map" file consisting of the highest-weight "up" CDN node for each country in "3";
- Feed that to the PowerDNS geoip module (thanks Mark @ Wikipedia!)
Friday, March 13, 2009
Downtime!
- The first mirror threw a disk;
- For some reason, gmirror became unhappy, rather than running on the second mirror (I'm guessing the controller went unhappy; there wasn't anything logged to indicate the other disk in the mirror set was failing);
- The second mirror started taking load;
- For some weird reason, the second mirror hung hard without any logging to explain why.
Monday, March 9, 2009
minimising traffic to the backends..
I've been using this for the videolan and mozilla downloads. Basically, one client will suck down the whole object, and any other clients which want the same object (say, the 3.0.7 US english win32 update!) will share the same connection.
There's a few problems which have crept up.
Firstly - the "collapsed forwarding" support is not working in this instance. I think the logic is broken with large objects (it was only written for small objects, delaying forwarding the request until the forwarded response was known cachable) where it denies cachability of the response (well, it forces it to be RELEASEd after it finishes transferring) because of all of the concurrent range requests going on.
Secondly - Squid/Cacheboy/Lusca doesn't handle range request caching. It'll -serve- range responses for objects it has the data for, but it won't cache partial responses nor will it reassemble them into one chunk. I've been thinking about how to possibly fix that, but for now I'm hacking around the problems with some scripts.
Finally - the forwarding logic uses the speed of the -slowest- client to determine how quickly to download the file. This needs to be changed to use the speed of the -fastest- client to determine how quickly to download said file.
I need to get these fixed before the next mozilla release cycle if I'm to have a chance of increasing the traffic levels to a gigabit and beyond.
More to come..
Cacheboy Outage
This will be (somewhat) mitigated when I bring up another set of mirror master servers elsewhere.
Sunday, March 8, 2009
Cacheboy is pushing bits..
The server assignment is currently being done through GeoIP mapping via DNS. I've brought up BGP sessions to each of the sites to eventually use in the request forwarding process.
All in all, things are going reasonably successfully so far. There's been a few hiccups which I'll blog about over the next few days but the bits are flowing, and noone is complaining. :)
Friday, February 27, 2009
Cacheboy CDN is online!
* The "Cacheboy proxy" development has become Lusca; thats spun off into a little separate project of its own.
* The "Cacheboy" project is now focusing on providing an open source platform for content delivery. I've organised some donated hardware (some donated by me), some donated bandwidth (again, some donated by me) and a couple of test projects to serve content for.
More details to come!
(As a side note, I've got too many blogs; I think its time to rationalise them down to one or two and use labels to correctly identify which is which.)
Monday, February 23, 2009
Lusca and BGP, take 2.
1235459412.856 17063 118.92.109.x TCP_REFRESH_HIT/206 33405 GET http://videolan.cdn.cacheboy.net/vlc/0.9.8a/win32/vlc-0.9.8a-win32.exe - NONE/- application/x-msdownload AS7657
1235459417.194 1113 202.150.98.x TCP_HIT/200 45637 GET http://videolan.cdn.cacheboy.net/vlc/0.9.8a/win32/vlc-0.9.8a-win32.exe - NONE/- application/x-msdownload AS17746
Notice how the Squid logs have AS numbers in them? :)
Lusca and BGP
It *cough* mostly works. I need to figure out why there's occasional radix tree corruption (which probably means running it under valgrind to find when the radix code goes off the map..) and un-dirty some of the BGP code (ie, implement a real FSM; proper separation of the protocol handling, FSM, network and RIB code) and add in the AS path/community/attribute stuff before I commit it to LUSCA_HEAD.
It is kind of cool though having a live BGP feed in your application. :) All 280,000 odd routes of it. :)
Sunday, February 1, 2009
Lusca development, and changes to string handling
In terms of development, I've shifted the code to http://code.google.com/p/lusca-cache/ and I'm continuing my work in /branches/LUSCA_HEAD.
I've been working on src/http.c (the server-side HTTP code) in preparation for introducing reference counted buffer/string handling. I removed one copy (of the socket read buffer into another memory buffer, to assemble a buffer containing the HTTP reply, in preparation for parsing) and have just migrated that bit of the codebase over to use my reference counted buffer (buf_t; found in libmem/buf.[ch].) It's entirely possible that I've horribly broken the server-side code so I'm reluctant to do much else until I've finished restructuring and testing the server-side HTTP code.
I've also been tidying up a few more places where the current String API is used "incorrectly", at least incorrectly for reference counted strings/buffers. I have ~ 61 code chunks to rewrite, mostly in the logging code. I've done it twice already in other branches, so this won't be terribly difficult. Its just boring. :)
Oh, and I've also just removed the "caching" bits of the MemPools code. MemPools in LUSCA_HEAD is now just a small wrapper around malloc/calloc/free, mainly to preserve the "block allocator" style API and keep some statistics. At the end of the day, Squid uses memory very very poorly and the caching code in MemPools is purely to avoid said poor memory use. I'm going to just fix the memory use (mostly revolving around String buffers, HTTP headers and the TLV code, amazing that!) so the number of calls through the allocator is much, much reduced. I'm guessing once I've finished, the number of calls through the system allocator will be about 2 or 3% of what they are now. That should drop the CPU use quite a bit.
Ah, now to find testers..
Tuesday, January 20, 2009
Where the CPU is going
root@jennifer:/home/adrian/work/cacheboy/branches/CACHEBOY_HEAD/src# opreport -la -t 1 ./squid
CPU: PIII, speed 634.485 MHz (estimated)
Counted CPU_CLK_UNHALTED events (clocks processor is not halted) with a unit mask of 0x00 (No unit mask) count 90000
samples cum. samples % cum. % image name symbol name
2100394 2100394 6.9315 6.9315 libc-2.3.6.so memcpy
674036 2774430 2.2244 9.1558 libc-2.3.6.so vfprintf
657729 3432159 2.1706 11.3264 squid memPoolAlloc
463901 3896060 1.5309 12.8573 libc-2.3.6.so _int_malloc
453978 4350038 1.4982 14.3555 libc-2.3.6.so strncasecmp
442439 4792477 1.4601 15.8156 libc-2.3.6.so re_search_internal
438752 5231229 1.4479 17.2635 squid comm_select
423196 5654425 1.3966 18.6601 squid memPoolFree
418949 6073374 1.3826 20.0426 squid stackPop
412394 6485768 1.3609 21.4036 squid httpHeaderIdByName
402709 6888477 1.3290 22.7325 libc-2.3.6.so strtok
364201 7252678 1.2019 23.9344 squid httpHeaderClean
359257 7611935 1.1856 25.1200 squid statHistBin
343628 7955563 1.1340 26.2540 squid SQUID_MD5Transform
330128 8285691 1.0894 27.3434 libc-2.3.6.so memset
323962 8609653 1.0691 28.4125 libc-2.3.6.so memchr
root@jennifer:/home/adrian/work/cacheboy/branches/CACHEBOY_HEAD/src# opreport -la ./squid | wc -l
595
root@jennifer:/home/adrian/work/cacheboy/branches/CACHEBOY_HEAD/src# opreport -l ./squid | cut -f1 -d' ' | awk '{ s+= $1; } END { print s }'
30302294
root@jennifer:/home/adrian/work/cacheboy/branches/CACHEBOY_HEAD/src# opreport -lc -t 1 -i memcpy ./squid
CPU: PIII, speed 634.485 MHz (estimated)
Counted CPU_CLK_UNHALTED events (clocks processor is not halted) with a unit mask of 0x00 (No unit mask) count 90000
samples % image name symbol name
-------------------------------------------------------------------------------
28133 1.3394 squid storeSwapOut
31515 1.5004 squid stringInit
32619 1.5530 squid httpBuildRequestPrefix
54237 2.5822 squid strListAddStr
54322 2.5863 squid storeSwapMetaBuild
80047 3.8110 squid clientKeepaliveNextRequest
171738 8.1765 squid httpHeaderEntryParseCreate
211091 10.0501 squid httpHeaderEntryPackInto
318793 15.1778 squid stringDup
1022812 48.6962 squid storeAppend
2100394 100.000 libc-2.3.6.so memcpy
2100394 100.000 libc-2.3.6.so memcpy [self]
------------------------------------------------------------------------------
root@jennifer:/home/adrian/work/cacheboy/branches/CACHEBOY_HEAD/src# opreport -lc -t 1 -i clientReadRequest ./squid
CPU: PIII, speed 634.485 MHz (estimated)
Counted CPU_CLK_UNHALTED events (clocks processor is not halted) with a unit mask of 0x00 (No unit mask) count 90000
samples % symbol name
-------------------------------------------------------------------------------
87536 4.7189 clientKeepaliveNextRequest
1758418 94.7925 comm_select
88441 100.000 clientReadRequest
2121926 86.3731 clientTryParseRequest
88441 3.6000 clientReadRequest [self]
52951 2.1554 commSetSelect
-------------------------------------------------------------------------------
root@jennifer:/home/adrian/work/cacheboy/branches/CACHEBOY_HEAD/src# opreport -lc -t 1 -i httpReadReply ./squid
CPU: PIII, speed 634.485 MHz (estimated)
Counted CPU_CLK_UNHALTED events (clocks processor is not halted) with a unit mask of 0x00 (No unit mask) count 90000
samples % symbol name
-------------------------------------------------------------------------------
3962448 99.7463 comm_select
163081 100.000 httpReadReply
2781096 53.2193 httpAppendBody
1857597 35.5471 httpProcessReplyHeader
163081 3.1207 httpReadReply [self]
57084 1.0924 memBufGrow
------------------------------------------------------------------------------
Monday, January 19, 2009
Eliminating copies, or "god this code is horrible"
Sunday, January 18, 2009
Tidying up the http reply handling code..
Friday, January 16, 2009
Refcounted string buffers!
- Remove all of the current assumptions in code which uses String that the actual backing buffer (accessible via strBuf()) is NUL-terminated;
- Rewrite sections of the code which go between String and C string buffers (with copying, etc) to use String where applicable. Unfortunately a whole lot of the original client_side.c code which handles parsing the request involves a fair bit of crap - so..
- .. writing replacement request and reply HTTP parsers is probably the next thing to do;
- Shuffling around the client-side code and the http code to use a buf_t as a incoming socket buffer, instead of how they currently do things (in an ugly way..)
- Propagate down the incoming socket buffer to the request/reply parsing code, so said code can simply create references to the original socket buffer, bypassing any and all requirement for copying the request/reply data seperately.
Friday, January 9, 2009
More profiling!
The following info is for a 10,000 concurrent connections, keep-alived, of just a fetch of an internal icon object from Squid. This is using my apachebench-adrian package which can handle such traffic loads.
The below accounts for roughly 60% of total CPU time (ie, 60% of the CPU is spent in userspace) on one core.
With oprofile, it hits around 12,300 transactions a second.
I have much, much hatred for how Squid uses *printf() everywhere. Sigh.
CPU: AMD64 processors, speed 2613.4 MHz (estimated)
Counted CPU_CLK_UNHALTED events (Cycles outside of halt state) with a unit mask of 0x00 (No unit mask) count 100000
samples cum. samples % cum. % image name symbol name
5383709 5383709 4.5316 4.5316 libc-2.6.1.so vfprintf
4025991 9409700 3.3888 7.9203 libc-2.6.1.so memcpy
3673722 13083422 3.0922 11.0126 libc-2.6.1.so _int_malloc
3428362 16511784 2.8857 13.8983 libc-2.6.1.so memset
3306571 19818355 2.7832 16.6815 libc-2.6.1.so malloc_consolidate
2847887 22666242 2.3971 19.0787 squid memPoolFree
2634120 25300362 2.2172 21.2958 libm-2.6.1.so floor
2609922 27910284 2.1968 23.4927 squid memPoolAlloc
2408836 30319120 2.0276 25.5202 libc-2.6.1.so re_search_internal
2296612 32615732 1.9331 27.4534 libc-2.6.1.so strlen
2265816 34881548 1.9072 29.3605 libc-2.6.1.so _int_free
1826493 36708041 1.5374 30.8979 libc-2.6.1.so _IO_default_xsputn
1641986 38350027 1.3821 32.2800 libc-2.6.1.so free
1601997 39952024 1.3484 33.6285 squid httpHeaderGetEntry
1575919 41527943 1.3265 34.9549 libc-2.6.1.so memchr
1466114 42994057 1.2341 36.1890 libc-2.6.1.so re_string_reconstruct
1275377 44269434 1.0735 37.2625 squid clientTryParseRequest
1214714 45484148 1.0225 38.2850 squid httpMsgFindHeadersEnd
1185932 46670080 0.9982 39.2832 squid statHistBin
1170361 47840441 0.9851 40.2683 squid urlCanonicalClean
1169694 49010135 0.9846 41.2529 libc-2.6.1.so strtok
1145933 50156068 0.9646 42.2174 squid comm_select
1128595 51284663 0.9500 43.1674 libc-2.6.1.so __GI_____strtoll_l_internal
1116573 52401236 0.9398 44.1072 squid httpHeaderIdByName
956209 53357445 0.8049 44.9121 squid SQUID_MD5Transform
915844 54273289 0.7709 45.6830 squid memBufAppend
907609 55180898 0.7640 46.4469 squid stringLimitInit
898666 56079564 0.7564 47.2034 libc-2.6.1.so strspn
883282 56962846 0.7435 47.9468 squid urlParse
852875 57815721 0.7179 48.6647 libc-2.6.1.so calloc
819613 58635334 0.6899 49.3546 squid clientWriteComplete
800196 59435530 0.6735 50.0281 squid httpMsgParseRequestLine