<?xml version='1.0' encoding='UTF-8'?><?xml-stylesheet href="http://www.blogger.com/styles/atom.css" type="text/css"?><feed xmlns='http://www.w3.org/2005/Atom' xmlns:openSearch='http://a9.com/-/spec/opensearchrss/1.0/' xmlns:georss='http://www.georss.org/georss' xmlns:gd='http://schemas.google.com/g/2005' xmlns:thr='http://purl.org/syndication/thread/1.0'><id>tag:blogger.com,1999:blog-351311855402780691</id><updated>2011-12-25T09:15:13.190-08:00</updated><category term='lusca'/><category term='geoip'/><category term='lighttpd'/><category term='wiki'/><category term='as250'/><category term='proxy'/><category term='usenet'/><category term='olpc'/><category term='downtime'/><category term='news'/><category term='BGP'/><category term='videolan'/><category term='cacheboy'/><category term='cacheboy ipv6'/><category term='nntp'/><category term='cdn'/><category term='anycast'/><category term='sugarlabs'/><category term='squid'/><category term='wishlist'/><category term='oprofile'/><category term='dns'/><category term='nnrp'/><category term='quagga'/><category term='TPROXY'/><category term='performance'/><category term='freebsd'/><category term='mozilla'/><category term='cyberduck'/><category term='ipv6'/><title type='text'>Cacheboy Development</title><subtitle type='html'></subtitle><link rel='http://schemas.google.com/g/2005#feed' type='application/atom+xml' href='http://cacheboy.blogspot.com/feeds/posts/default'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/351311855402780691/posts/default?max-results=100'/><link rel='alternate' type='text/html' href='http://cacheboy.blogspot.com/'/><link rel='hub' href='http://pubsubhubbub.appspot.com/'/><author><name>Adrian</name><uri>http://www.blogger.com/profile/17496219706861321916</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='33' height='25' src='http://bp2.blogger.com/_37wobiQ3zUs/SAKidE1C46I/AAAAAAAAAAM/F065c-5eNR8/S220/2004-12-17-adrian.jpg'/></author><generator version='7.00' uri='http://www.blogger.com'>Blogger</generator><openSearch:totalResults>99</openSearch:totalResults><openSearch:startIndex>1</openSearch:startIndex><openSearch:itemsPerPage>100</openSearch:itemsPerPage><entry><id>tag:blogger.com,1999:blog-351311855402780691.post-1774833065840293511</id><published>2011-07-15T19:04:00.001-07:00</published><updated>2011-07-15T19:13:19.806-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='cacheboy'/><title type='text'>Where's cacheboy been hiding?</title><content type='html'>I've had a few people ask where Cacheboy has been hiding since late 2009.&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;In short - I had a lack of reliable traffic nodes and my bachelor's degree to finish off.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Now, the long version.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;The university reason is easy - I've been working on and off on a Bachelor of Arts for a few years, and decided late 2009 that I should just buckle down and get it all done. So I spent 2010 and the first part of 2011 studying (and working!) full-time. It was pretty intense. I had to put a few things on hold, and working on cacheboy was one of them.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;The traffic node problem is more difficult. I had plenty of interest in running edge nodes in places like Australia, Italy and South Africa - where local connectivity is great, but international transit is not. But in order to run any useful amount of traffic from those nodes, I'll have to serve a lot of content to the network as a whole. This means "US" and "Western Europe".&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;I found myself in a catch-22 situation. I'd like to serve content across local IXes, but in order to do so, I first need to serve a lot more content to the US/Europe.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;In order to do this, I'll need a few reliable nodes pushing at least a gigabit each. The last time I ran the numbers, more than 70% of traffic was destined to the US, UK and Germany.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;So to properly push proper CDN of open source content, I'm going to need some donated nodes in the US that can source at least a gigabit. If I can't get that, the amount of traffic served to other destinations which could benefit from local traffic is .. well, it's really quite small.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;So far, noone's stepped up to help with that.&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/351311855402780691-1774833065840293511?l=cacheboy.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cacheboy.blogspot.com/feeds/1774833065840293511/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=351311855402780691&amp;postID=1774833065840293511' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/351311855402780691/posts/default/1774833065840293511'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/351311855402780691/posts/default/1774833065840293511'/><link rel='alternate' type='text/html' href='http://cacheboy.blogspot.com/2011/07/wheres-cacheboy-been-hiding.html' title='Where&apos;s cacheboy been hiding?'/><author><name>Adrian</name><uri>http://www.blogger.com/profile/17496219706861321916</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='33' height='25' src='http://bp2.blogger.com/_37wobiQ3zUs/SAKidE1C46I/AAAAAAAAAAM/F065c-5eNR8/S220/2004-12-17-adrian.jpg'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-351311855402780691.post-7891640549350492644</id><published>2009-11-10T19:14:00.000-08:00</published><updated>2009-11-10T19:16:40.554-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='freebsd'/><category scheme='http://www.blogger.com/atom/ns#' term='lighttpd'/><category scheme='http://www.blogger.com/atom/ns#' term='cacheboy'/><title type='text'>More issues with Lighttpd</title><content type='html'>So occasionally Lighttpd on FreeBSD-7.x+ZFS gets all upset. I -think- there's something weird going on where I hit mbuf exhaustion somehow when ZFS starts taking a long time to complete IO requests; then all socket IO fails in Lighttpd until it is restarted.&lt;br /&gt;&lt;br /&gt;More investigation is required. Well, more statistics are needed so I can make better judgements. Well, actually, more functional backends are needed so I can take one out of production when something like this occurs, properly debug what is going on and try to fix it.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/351311855402780691-7891640549350492644?l=cacheboy.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cacheboy.blogspot.com/feeds/7891640549350492644/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=351311855402780691&amp;postID=7891640549350492644' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/351311855402780691/posts/default/7891640549350492644'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/351311855402780691/posts/default/7891640549350492644'/><link rel='alternate' type='text/html' href='http://cacheboy.blogspot.com/2009/11/more-issues-with-lighttpd.html' title='More issues with Lighttpd'/><author><name>Adrian</name><uri>http://www.blogger.com/profile/17496219706861321916</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='33' height='25' src='http://bp2.blogger.com/_37wobiQ3zUs/SAKidE1C46I/AAAAAAAAAAM/F065c-5eNR8/S220/2004-12-17-adrian.jpg'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-351311855402780691.post-3111385398637703157</id><published>2009-11-10T18:53:00.000-08:00</published><updated>2009-11-10T19:14:34.445-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='lusca'/><category scheme='http://www.blogger.com/atom/ns#' term='quagga'/><category scheme='http://www.blogger.com/atom/ns#' term='cacheboy'/><category scheme='http://www.blogger.com/atom/ns#' term='downtime'/><title type='text'>Cacheboy Update / October/November 2009</title><content type='html'>Howdy,&lt;br /&gt;&lt;br /&gt;Just a few updates this time around!&lt;br /&gt;&lt;ul&gt;&lt;li&gt;Cacheboy was pushing around 800-1200mbit during the Firefox 3.5.4 release cycle. I started to hit issues with the backend server not keeping up with revalidating requests and so I'll have to improve the edge caching logic a little more.&lt;/li&gt;&lt;li&gt;Lusca seems quite happy serving up 300-400mbit from a single node though; which is a big plus.&lt;br /&gt;&lt;/li&gt;&lt;li&gt;I've found some quite horrible memory leaks in Quagga on only one of the edge nodes. I'll have to find some time to login and debug this a little more.&lt;br /&gt;&lt;/li&gt;&lt;li&gt;The second backend server is now offically toast. I need to acquire another 1ru server with 2 SATA slots to magically appear in downtown Manhattan, NY.&lt;br /&gt;&lt;/li&gt;&lt;/ul&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/351311855402780691-3111385398637703157?l=cacheboy.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cacheboy.blogspot.com/feeds/3111385398637703157/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=351311855402780691&amp;postID=3111385398637703157' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/351311855402780691/posts/default/3111385398637703157'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/351311855402780691/posts/default/3111385398637703157'/><link rel='alternate' type='text/html' href='http://cacheboy.blogspot.com/2009/11/cacheboy-update-octobernovember-2009.html' title='Cacheboy Update / October/November 2009'/><author><name>Adrian</name><uri>http://www.blogger.com/profile/17496219706861321916</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='33' height='25' src='http://bp2.blogger.com/_37wobiQ3zUs/SAKidE1C46I/AAAAAAAAAAM/F065c-5eNR8/S220/2004-12-17-adrian.jpg'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-351311855402780691.post-56177144572239473</id><published>2009-10-08T18:38:00.001-07:00</published><updated>2009-10-08T18:40:15.043-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='cacheboy'/><category scheme='http://www.blogger.com/atom/ns#' term='downtime'/><title type='text'>Cacheboy downtime - hardware failures</title><content type='html'>Howdy,&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;I've had both backend servers fail today. One is throwing undervolt errors on one PSU line and is having disk issues (most likely related to an undervoltage); the other is just crashed.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;I'm waiting for remote hands to prod the other box into life.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;This is why I'd like some more donated equipment and hosting - I can make things much more fault tolerant. Hint hint.&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/351311855402780691-56177144572239473?l=cacheboy.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cacheboy.blogspot.com/feeds/56177144572239473/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=351311855402780691&amp;postID=56177144572239473' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/351311855402780691/posts/default/56177144572239473'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/351311855402780691/posts/default/56177144572239473'/><link rel='alternate' type='text/html' href='http://cacheboy.blogspot.com/2009/10/cacheboy-downtime-hardware-failures.html' title='Cacheboy downtime - hardware failures'/><author><name>Adrian</name><uri>http://www.blogger.com/profile/17496219706861321916</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='33' height='25' src='http://bp2.blogger.com/_37wobiQ3zUs/SAKidE1C46I/AAAAAAAAAAM/F065c-5eNR8/S220/2004-12-17-adrian.jpg'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-351311855402780691.post-506547186733728397</id><published>2009-09-30T23:31:00.000-07:00</published><updated>2009-09-30T23:34:35.265-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='lusca'/><category scheme='http://www.blogger.com/atom/ns#' term='squid'/><category scheme='http://www.blogger.com/atom/ns#' term='proxy'/><title type='text'>Lusca updates - September 2009</title><content type='html'>Just a few Lusca related updates!&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;ul&gt;&lt;li&gt;All of the Cacheboy CDN nodes are running Lusca-HEAD now and are nice and stable.&lt;/li&gt;&lt;li&gt;I've deployed Lusca at a few customer sites and again, it is nice and stable.&lt;/li&gt;&lt;li&gt;The rebuild logic changes are, for the most part, nice and stable. There seems to be some weirdness with 32 vs 64 bit compilation options which I need to suss out but everything "just works" if you compile Lusca with large file/large cache file support regardless of the platform you're using. I may make that the default option.&lt;/li&gt;&lt;li&gt;I've got a couple of small coding projects to introduce a couple of small new features to Lusca - more on those when they're done!&lt;/li&gt;&lt;li&gt;Finally, I'm going to be migrating some more of the internal code over to use the sqinet_t type in preparation for IPv4/IPv6 agnostic support.&lt;/li&gt;&lt;/ul&gt;Stay Tuned!&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/351311855402780691-506547186733728397?l=cacheboy.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cacheboy.blogspot.com/feeds/506547186733728397/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=351311855402780691&amp;postID=506547186733728397' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/351311855402780691/posts/default/506547186733728397'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/351311855402780691/posts/default/506547186733728397'/><link rel='alternate' type='text/html' href='http://cacheboy.blogspot.com/2009/09/lusca-updates-september-2009.html' title='Lusca updates - September 2009'/><author><name>Adrian</name><uri>http://www.blogger.com/profile/17496219706861321916</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='33' height='25' src='http://bp2.blogger.com/_37wobiQ3zUs/SAKidE1C46I/AAAAAAAAAAM/F065c-5eNR8/S220/2004-12-17-adrian.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-351311855402780691.post-1872161530909478705</id><published>2009-09-21T22:05:00.000-07:00</published><updated>2009-09-21T22:08:27.089-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='wishlist'/><category scheme='http://www.blogger.com/atom/ns#' term='cacheboy'/><title type='text'>My current wishlist</title><content type='html'>I'm going to put this on the website at some point, but I'm currently chasing a few things for Cacheboy:&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;ul&gt;&lt;li&gt;More US nodes. I'll take anything from 50mbit to 5gbit at this point. I need more US nodes to be able to handle enough aggregate traffic to make optimising the CDN content selection methods worthwhile.&lt;/li&gt;&lt;li&gt;Some donations to cover my upcoming APNIC membership for ASN and IPv4/IPv6 space. This will run to about AUD $3500 this year and then around AUD $2500 a year after that.&lt;/li&gt;&lt;li&gt;Some 1ru/2ru server hardware in the San Francisco area&lt;/li&gt;&lt;li&gt;Another site or two willing to run a relatively low bandwidth "master" mirror site. I have one site in New York but I'd prefer to run a couple of others spread around Europe and the United States.&lt;/li&gt;&lt;/ul&gt;I'm sure more will come to mind as I build things out a little more.&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/351311855402780691-1872161530909478705?l=cacheboy.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cacheboy.blogspot.com/feeds/1872161530909478705/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=351311855402780691&amp;postID=1872161530909478705' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/351311855402780691/posts/default/1872161530909478705'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/351311855402780691/posts/default/1872161530909478705'/><link rel='alternate' type='text/html' href='http://cacheboy.blogspot.com/2009/09/my-current-wishlist.html' title='My current wishlist'/><author><name>Adrian</name><uri>http://www.blogger.com/profile/17496219706861321916</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='33' height='25' src='http://bp2.blogger.com/_37wobiQ3zUs/SAKidE1C46I/AAAAAAAAAAM/F065c-5eNR8/S220/2004-12-17-adrian.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-351311855402780691.post-6722087855939702993</id><published>2009-09-21T22:03:00.001-07:00</published><updated>2009-09-21T22:04:52.104-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='sugarlabs'/><category scheme='http://www.blogger.com/atom/ns#' term='olpc'/><category scheme='http://www.blogger.com/atom/ns#' term='cacheboy'/><title type='text'>New project - sugar labs!</title><content type='html'>I've just put the finishing touches on the basic sugar labs software repository. I'll hopefully be serving part or all of their software downloads shortly.&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Sugar is the software behind the OLPC environment. It works on normal intel based PCs as far as I can tell.  More information can be found at http://www.sugarlabs.org/&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/351311855402780691-6722087855939702993?l=cacheboy.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cacheboy.blogspot.com/feeds/6722087855939702993/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=351311855402780691&amp;postID=6722087855939702993' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/351311855402780691/posts/default/6722087855939702993'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/351311855402780691/posts/default/6722087855939702993'/><link rel='alternate' type='text/html' href='http://cacheboy.blogspot.com/2009/09/new-project-sugar-labs.html' title='New project - sugar labs!'/><author><name>Adrian</name><uri>http://www.blogger.com/profile/17496219706861321916</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='33' height='25' src='http://bp2.blogger.com/_37wobiQ3zUs/SAKidE1C46I/AAAAAAAAAAM/F065c-5eNR8/S220/2004-12-17-adrian.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-351311855402780691.post-8049232539584940548</id><published>2009-08-31T22:50:00.001-07:00</published><updated>2009-08-31T22:50:56.319-07:00</updated><title type='text'>Cacheboy presentation at AUSNOG</title><content type='html'>I've just presented on Cacheboy at AUSNOG in Sydney. The feedback so far has been reasonably positive.&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;There's more information available at &lt;a href="http://www.creative.net.au/talks/"&gt;http://www.creative.net.au/talks/&lt;/a&gt;.&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/351311855402780691-8049232539584940548?l=cacheboy.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cacheboy.blogspot.com/feeds/8049232539584940548/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=351311855402780691&amp;postID=8049232539584940548' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/351311855402780691/posts/default/8049232539584940548'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/351311855402780691/posts/default/8049232539584940548'/><link rel='alternate' type='text/html' href='http://cacheboy.blogspot.com/2009/08/cacheboy-presentation-at-ausnog.html' title='Cacheboy presentation at AUSNOG'/><author><name>Adrian</name><uri>http://www.blogger.com/profile/17496219706861321916</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='33' height='25' src='http://bp2.blogger.com/_37wobiQ3zUs/SAKidE1C46I/AAAAAAAAAAM/F065c-5eNR8/S220/2004-12-17-adrian.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-351311855402780691.post-8586760059472099746</id><published>2009-08-17T19:32:00.001-07:00</published><updated>2009-08-17T19:36:02.634-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='news'/><category scheme='http://www.blogger.com/atom/ns#' term='cacheboy'/><title type='text'>Cacheboy status update</title><content type='html'>So by and large, the pushing of bits is working quite well. I have a bunch of things to tidy up and a DNS backend to rewrite in C or C++ but that won't stop the bits from being pushed.&lt;br /&gt;&lt;br /&gt;Unfortunately what I'm now lacking is US hosts to send traffic from. I still have more Europe and Asian connectivity than North American - and North America is absolutely where I need connectivity the most. Right now I'm only able to push 350-450 megabits of content from North America - and this puts a big, big limit on how much content I can serve overall.&lt;br /&gt;&lt;br /&gt;Please contact me as soon as possible if you're interested in hosting a node in North America. I ideally need enough nodes to push between a gigabit and ten gigabits of traffic.&lt;br /&gt;&lt;br /&gt;I will be able to start pushing noticable amounts of content out of regional areas once I've sorted out North America. This includes places like Australia, Africa, South America and Eastern Europe. I'd love to be pushing more open source bits out of those locations to keep the transit use low but I just can't do so at the moment.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/351311855402780691-8586760059472099746?l=cacheboy.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cacheboy.blogspot.com/feeds/8586760059472099746/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=351311855402780691&amp;postID=8586760059472099746' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/351311855402780691/posts/default/8586760059472099746'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/351311855402780691/posts/default/8586760059472099746'/><link rel='alternate' type='text/html' href='http://cacheboy.blogspot.com/2009/08/cacheboy-status-update.html' title='Cacheboy status update'/><author><name>Adrian</name><uri>http://www.blogger.com/profile/17496219706861321916</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='33' height='25' src='http://bp2.blogger.com/_37wobiQ3zUs/SAKidE1C46I/AAAAAAAAAAM/F065c-5eNR8/S220/2004-12-17-adrian.jpg'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-351311855402780691.post-5138634317106863599</id><published>2009-08-17T19:30:00.000-07:00</published><updated>2009-08-17T19:31:58.780-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='news'/><category scheme='http://www.blogger.com/atom/ns#' term='cacheboy'/><title type='text'>Canada node online and pushing bits!</title><content type='html'>The Canada/TORIX node is online thanks to John Nistor at &lt;a href="http://www.prioritycolo.com/"&gt;prioritycolo&lt;/a&gt; in Toronto, Canada.&lt;br /&gt;&lt;br /&gt;Thanks John!&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/351311855402780691-5138634317106863599?l=cacheboy.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cacheboy.blogspot.com/feeds/5138634317106863599/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=351311855402780691&amp;postID=5138634317106863599' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/351311855402780691/posts/default/5138634317106863599'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/351311855402780691/posts/default/5138634317106863599'/><link rel='alternate' type='text/html' href='http://cacheboy.blogspot.com/2009/08/canada-node-online-and-pushing-bits.html' title='Canada node online and pushing bits!'/><author><name>Adrian</name><uri>http://www.blogger.com/profile/17496219706861321916</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='33' height='25' src='http://bp2.blogger.com/_37wobiQ3zUs/SAKidE1C46I/AAAAAAAAAAM/F065c-5eNR8/S220/2004-12-17-adrian.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-351311855402780691.post-7497973865340229040</id><published>2009-08-17T18:55:00.001-07:00</published><updated>2009-08-17T19:01:30.198-07:00</updated><title type='text'>Cacheboy is on WAIX!</title><content type='html'>Yesterday's traffic from mirror1.au into WAIX:&lt;br /&gt;&lt;table cellpadding="1" cellspacing="1"&gt;&lt;tbody&gt;&lt;tr&gt;&lt;th&gt;ASN&lt;/th&gt;&lt;th&gt;MBytes&lt;/th&gt;&lt;th&gt;Requests&lt;/th&gt;&lt;th&gt;% of overall&lt;/th&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;AS7545&lt;/td&gt;&lt;td&gt;17946.77&lt;/td&gt;&lt;td&gt;7437&lt;/td&gt;&lt;td&gt;29.85&lt;/td&gt;&lt;td&gt;TPG-INTERNET-AP TPG Internet Pty Ltd&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;AS4802&lt;/td&gt;&lt;td&gt;12973.47&lt;/td&gt;&lt;td&gt;4476&lt;/td&gt;&lt;td&gt;21.58&lt;/td&gt;&lt;td&gt;ASN-IINET iiNet Limited&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;AS4739&lt;/td&gt;&lt;td&gt;8497.92&lt;/td&gt;&lt;td&gt;2947&lt;/td&gt;&lt;td&gt;14.13&lt;/td&gt;&lt;td&gt;CIX-ADELAIDE-AS Internode Systems Pty Ltd&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;AS9543&lt;/td&gt;&lt;td&gt;2524.57&lt;/td&gt;&lt;td&gt;1241&lt;/td&gt;&lt;td&gt;4.20&lt;/td&gt;&lt;td&gt;WESTNET-AS-AP Westnet Internet Services&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;AS4854&lt;/td&gt;&lt;td&gt;2097.32&lt;/td&gt;&lt;td&gt;941&lt;/td&gt;&lt;td&gt;3.49&lt;/td&gt;&lt;td&gt;NETSPACE-AS-AP Netspace Online Systems&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;AS17746&lt;/td&gt;&lt;td&gt;1881.17&lt;/td&gt;&lt;td&gt;1050&lt;/td&gt;&lt;td&gt;3.13&lt;/td&gt;&lt;td&gt;ORCONINTERNET-NZ-AP Orcon Internet&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;AS9822&lt;/td&gt;&lt;td&gt;1425.44&lt;/td&gt;&lt;td&gt;456&lt;/td&gt;&lt;td&gt;2.37&lt;/td&gt;&lt;td&gt;AMNET-AU-AP Amnet IT Services Pty Ltd&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;AS17435&lt;/td&gt;&lt;td&gt;1161.01&lt;/td&gt;&lt;td&gt;411&lt;/td&gt;&lt;td&gt;1.93&lt;/td&gt;&lt;td&gt;WXC-AS-NZ WorldxChange Communications LTD&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;AS9443&lt;/td&gt;&lt;td&gt;1140.62&lt;/td&gt;&lt;td&gt;701&lt;/td&gt;&lt;td&gt;1.90&lt;/td&gt;&lt;td&gt;INTERNETPRIMUS-AS-AP Primus Telecommunications&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;AS7657&lt;/td&gt;&lt;td&gt;891.93&lt;/td&gt;&lt;td&gt;1187&lt;/td&gt;&lt;td&gt;1.48&lt;/td&gt;&lt;td&gt;VODAFONE-NZ-NGN-AS Vodafone NZ Ltd.&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;AS7718&lt;/td&gt;&lt;td&gt;740.74&lt;/td&gt;&lt;td&gt;272&lt;/td&gt;&lt;td&gt;1.23&lt;/td&gt;&lt;td&gt;TRANSACT-SDN-AS TransACT IP Service Provider&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;AS7543&lt;/td&gt;&lt;td&gt;732.11&lt;/td&gt;&lt;td&gt;423&lt;/td&gt;&lt;td&gt;1.22&lt;/td&gt;&lt;td&gt;PI-AU Pacific Internet (Australia) Pty Ltd&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;AS24313&lt;/td&gt;&lt;td&gt;527.38&lt;/td&gt;&lt;td&gt;252&lt;/td&gt;&lt;td&gt;0.88&lt;/td&gt;&lt;td&gt;NSW-DET-AS NSW Department of Education and Training&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;AS9790&lt;/td&gt;&lt;td&gt;436.80&lt;/td&gt;&lt;td&gt;389&lt;/td&gt;&lt;td&gt;0.73&lt;/td&gt;&lt;td&gt;CALLPLUS-NZ-AP CallPlus Services Limited&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;AS17412&lt;/td&gt;&lt;td&gt;365.13&lt;/td&gt;&lt;td&gt;228&lt;/td&gt;&lt;td&gt;0.61&lt;/td&gt;&lt;td&gt;WOOSHWIRELESSNZ Woosh Wireless&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;AS17486&lt;/td&gt;&lt;td&gt;349.27&lt;/td&gt;&lt;td&gt;116&lt;/td&gt;&lt;td&gt;0.58&lt;/td&gt;&lt;td&gt;SWIFTEL1-AP People Telecom Pty. Ltd.&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;AS17808&lt;/td&gt;&lt;td&gt;311.65&lt;/td&gt;&lt;td&gt;248&lt;/td&gt;&lt;td&gt;0.52&lt;/td&gt;&lt;td&gt;VODAFONE-NZ-AP AS number for Vodafone NZ IP Networks&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;AS24093&lt;/td&gt;&lt;td&gt;303.40&lt;/td&gt;&lt;td&gt;114&lt;/td&gt;&lt;td&gt;0.50&lt;/td&gt;&lt;td&gt;BIGAIR-AP BIGAIR. Multihoming ASN&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;AS9889&lt;/td&gt;&lt;td&gt;288.85&lt;/td&gt;&lt;td&gt;197&lt;/td&gt;&lt;td&gt;0.48&lt;/td&gt;&lt;td&gt;MAXNET-NZ-AP Auckland&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;AS17705&lt;/td&gt;&lt;td&gt;282.49&lt;/td&gt;&lt;td&gt;84&lt;/td&gt;&lt;td&gt;0.47&lt;/td&gt;&lt;td&gt;INSPIRENET-AS-AP InSPire Net Ltd&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;br /&gt;Query content served: 54878.07 mbytes; 23170 requests.&lt;br /&gt;Total content served: 60123.25 mbytes; 28037 requests.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/351311855402780691-7497973865340229040?l=cacheboy.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cacheboy.blogspot.com/feeds/7497973865340229040/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=351311855402780691&amp;postID=7497973865340229040' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/351311855402780691/posts/default/7497973865340229040'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/351311855402780691/posts/default/7497973865340229040'/><link rel='alternate' type='text/html' href='http://cacheboy.blogspot.com/2009/08/cacheboy-is-on-waix.html' title='Cacheboy is on WAIX!'/><author><name>Adrian</name><uri>http://www.blogger.com/profile/17496219706861321916</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='33' height='25' src='http://bp2.blogger.com/_37wobiQ3zUs/SAKidE1C46I/AAAAAAAAAAM/F065c-5eNR8/S220/2004-12-17-adrian.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-351311855402780691.post-8828300247441198366</id><published>2009-08-17T18:33:00.000-07:00</published><updated>2009-08-17T18:50:38.350-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='dns'/><category scheme='http://www.blogger.com/atom/ns#' term='BGP'/><category scheme='http://www.blogger.com/atom/ns#' term='cacheboy'/><title type='text'>BGP aware DNS</title><content type='html'>I've just written up the first "test" hack of BGP aware DNS.&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;The basic logic is simple but evil. I'm simply mapping BGP next-hop to a set of weighted servers. A server is then randomly chosen from this pool.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;I'm not doing this for -all- prefixes and POPs - it is only being used for two specific POPs where there is a lot of peering and almost no transit. There are a few issues regarding split horizon BGP/DNS and request routing which I'd like to fully sort out before I enable it for everything. I don't want a quirk to temporarily redirect -all- requests to -one- server cluster!&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;In any case, the test is working well. I'm serving ~10mbit to WAIX (Western Australia) and ~ 30mbit to TORIX (Toronto, Canada.)&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;All of the DNS based redirection caveats apply - most certainly that not all client requests to the caches will also be over peering. I'll have to craft some method(s) of tracking this.&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/351311855402780691-8828300247441198366?l=cacheboy.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cacheboy.blogspot.com/feeds/8828300247441198366/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=351311855402780691&amp;postID=8828300247441198366' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/351311855402780691/posts/default/8828300247441198366'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/351311855402780691/posts/default/8828300247441198366'/><link rel='alternate' type='text/html' href='http://cacheboy.blogspot.com/2009/08/bgp-aware-dns.html' title='BGP aware DNS'/><author><name>Adrian</name><uri>http://www.blogger.com/profile/17496219706861321916</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='33' height='25' src='http://bp2.blogger.com/_37wobiQ3zUs/SAKidE1C46I/AAAAAAAAAAM/F065c-5eNR8/S220/2004-12-17-adrian.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-351311855402780691.post-1369066547573024650</id><published>2009-08-09T11:28:00.001-07:00</published><updated>2009-08-09T11:31:59.863-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='cacheboy'/><title type='text'>Updates - or why I've not been doing very much</title><content type='html'>G'day! Cacheboy has been running on autopilot for the last couple of months whilst I've been focusing on paid work and growing my little company. So far (mostly) so good there.&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;The main issue scaling traffic has been the range request handling in Squid/Lusca, so I've been working on fixing things up "just enough" to make it work in the firefox update environment. I think I've finally figured it out - and figured out the bugs in the range request handling in Squid too! - so I'll push out some updates to the network next week and throw it some more traffic.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;I really am hoping to ramp traffic up past the gigabit mark once this is done. We'll just have to see!&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/351311855402780691-1369066547573024650?l=cacheboy.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cacheboy.blogspot.com/feeds/1369066547573024650/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=351311855402780691&amp;postID=1369066547573024650' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/351311855402780691/posts/default/1369066547573024650'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/351311855402780691/posts/default/1369066547573024650'/><link rel='alternate' type='text/html' href='http://cacheboy.blogspot.com/2009/08/updates-or-why-ive-not-been-doing-very.html' title='Updates - or why I&apos;ve not been doing very much'/><author><name>Adrian</name><uri>http://www.blogger.com/profile/17496219706861321916</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='33' height='25' src='http://bp2.blogger.com/_37wobiQ3zUs/SAKidE1C46I/AAAAAAAAAAM/F065c-5eNR8/S220/2004-12-17-adrian.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-351311855402780691.post-4673450438531420963</id><published>2009-07-08T09:53:00.000-07:00</published><updated>2009-07-08T09:54:50.425-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='cdn'/><category scheme='http://www.blogger.com/atom/ns#' term='videolan'/><category scheme='http://www.blogger.com/atom/ns#' term='cacheboy'/><title type='text'>VLC 1.0 released</title><content type='html'>VLC-1.0 has been released. The CDN is pushing out between 550 and 700mbit of VLC downloads. I'm sure it can do more but as I'm busy working elsewhere, I'm going to be overly conservative and leave the mirror weighting where it is.&lt;br /&gt;&lt;br /&gt;Graphs to follow!&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/351311855402780691-4673450438531420963?l=cacheboy.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cacheboy.blogspot.com/feeds/4673450438531420963/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=351311855402780691&amp;postID=4673450438531420963' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/351311855402780691/posts/default/4673450438531420963'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/351311855402780691/posts/default/4673450438531420963'/><link rel='alternate' type='text/html' href='http://cacheboy.blogspot.com/2009/07/vlc-10-released.html' title='VLC 1.0 released'/><author><name>Adrian</name><uri>http://www.blogger.com/profile/17496219706861321916</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='33' height='25' src='http://bp2.blogger.com/_37wobiQ3zUs/SAKidE1C46I/AAAAAAAAAAM/F065c-5eNR8/S220/2004-12-17-adrian.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-351311855402780691.post-2251989979314198155</id><published>2009-06-29T01:36:00.000-07:00</published><updated>2009-06-29T02:20:18.261-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='cacheboy'/><category scheme='http://www.blogger.com/atom/ns#' term='downtime'/><title type='text'>Current Downtime/issues</title><content type='html'>&lt;div&gt;There's a current issue with content not being served correctly. It stemmed from a ZFS related panic on one of the backend servers (note to self - update to the very latest FreeBSD-7-stable code; these are all fixed!) which then came up with lighttpd but no ZFS mounts. Lighttpd then started returning 404's.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;I'm now watching the backend(s) throw random connection failures and the Lusca caches then cache an error rather than the object.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;I've fixed the backend giving trouble so it won't start up in that failed mode again and I've set the negative caching in the Lusca cache nodes to 30 seconds instead of the default 5 minutes. Hopefully the traffic levels now pick up to where its supposed to be.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;EDIT: The problem is again related to the Firefox range requests and Squid/Lusca's inability to cache range request fragments.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;The backend failure(s) removed the objects from the cache. The problem now is that the objects aren't re-entering the cache because they are all range requests.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;I'm going to wind down the Firefox content serving for now until I get some time to hack up Lusca "enough" to cache the range request objects. I may just do something dodgy with the URL rewriter to force a full object request to occur in the background. Hm, actually..&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/351311855402780691-2251989979314198155?l=cacheboy.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cacheboy.blogspot.com/feeds/2251989979314198155/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=351311855402780691&amp;postID=2251989979314198155' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/351311855402780691/posts/default/2251989979314198155'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/351311855402780691/posts/default/2251989979314198155'/><link rel='alternate' type='text/html' href='http://cacheboy.blogspot.com/2009/06/current-downtimeissues.html' title='Current Downtime/issues'/><author><name>Adrian</name><uri>http://www.blogger.com/profile/17496219706861321916</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='33' height='25' src='http://bp2.blogger.com/_37wobiQ3zUs/SAKidE1C46I/AAAAAAAAAAM/F065c-5eNR8/S220/2004-12-17-adrian.jpg'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-351311855402780691.post-4913909234103828892</id><published>2009-06-27T10:03:00.000-07:00</published><updated>2009-06-27T10:04:47.777-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='cacheboy'/><title type='text'>New mirror node - italy</title><content type='html'>I've just turned on a new mirror node in Italy thanks to New Media Labs. They've provided some transit services and (I believe) 100mbit access to the local internet exchange.&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Thanks guys!&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/351311855402780691-4913909234103828892?l=cacheboy.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cacheboy.blogspot.com/feeds/4913909234103828892/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=351311855402780691&amp;postID=4913909234103828892' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/351311855402780691/posts/default/4913909234103828892'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/351311855402780691/posts/default/4913909234103828892'/><link rel='alternate' type='text/html' href='http://cacheboy.blogspot.com/2009/06/new-mirror-node-italy.html' title='New mirror node - italy'/><author><name>Adrian</name><uri>http://www.blogger.com/profile/17496219706861321916</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='33' height='25' src='http://bp2.blogger.com/_37wobiQ3zUs/SAKidE1C46I/AAAAAAAAAAM/F065c-5eNR8/S220/2004-12-17-adrian.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-351311855402780691.post-4138728753261971210</id><published>2009-06-17T07:26:00.000-07:00</published><updated>2009-06-17T07:39:29.657-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='geoip'/><category scheme='http://www.blogger.com/atom/ns#' term='cdn'/><category scheme='http://www.blogger.com/atom/ns#' term='cacheboy'/><title type='text'>And the GeoIP summary..</title><content type='html'>&lt;p&gt;And the geoip summary:&lt;/p&gt;&lt;br /&gt;&lt;p&gt;From Sun Jun  7 00:00:00 2009 to Sun Jun 14 00:00:00 2009&lt;/p&gt;&lt;br /&gt;&lt;table class="stats"&gt;&lt;br /&gt;&lt;tbody&gt;&lt;tr&gt;&lt;th&gt;Server&lt;/th&gt;&lt;th&gt;Country&lt;/th&gt;&lt;th&gt;MBytes&lt;/th&gt;&lt;th&gt;Requests&lt;/th&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;br /&gt;&lt;/td&gt;&lt;td&gt;us&lt;/td&gt;&lt;td&gt;5163783.09&lt;/td&gt;&lt;td&gt;6533162&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;/td&gt;&lt;td&gt;de&lt;/td&gt;&lt;td&gt;1514664.22&lt;/td&gt;&lt;td&gt;2307222&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;/td&gt;&lt;td&gt;ca&lt;/td&gt;&lt;td&gt;1152095.00&lt;/td&gt;&lt;td&gt;917777&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;/td&gt;&lt;td&gt;fr&lt;/td&gt;&lt;td&gt;948433.27&lt;/td&gt;&lt;td&gt;1451105&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;/td&gt;&lt;td&gt;uk&lt;/td&gt;&lt;td&gt;945640.71&lt;/td&gt;&lt;td&gt;1136455&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;/td&gt;&lt;td&gt;it&lt;/td&gt;&lt;td&gt;818161.03&lt;/td&gt;&lt;td&gt;770164&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;/td&gt;&lt;td&gt;br&lt;/td&gt;&lt;td&gt;542497.79&lt;/td&gt;&lt;td&gt;1426306&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;/td&gt;&lt;td&gt;se&lt;/td&gt;&lt;td&gt;482932.15&lt;/td&gt;&lt;td&gt;229559&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;/td&gt;&lt;td&gt;es&lt;/td&gt;&lt;td&gt;445444.34&lt;/td&gt;&lt;td&gt;647321&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;/td&gt;&lt;td&gt;pl&lt;/td&gt;&lt;td&gt;397755.30&lt;/td&gt;&lt;td&gt;1021083&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;/td&gt;&lt;td&gt;nl&lt;/td&gt;&lt;td&gt;373185.13&lt;/td&gt;&lt;td&gt;306023&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;/td&gt;&lt;td&gt;ru&lt;/td&gt;&lt;td&gt;368124.64&lt;/td&gt;&lt;td&gt;749924&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;/td&gt;&lt;td&gt;tr&lt;/td&gt;&lt;td&gt;293627.27&lt;/td&gt;&lt;td&gt;484965&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;/td&gt;&lt;td&gt;mx&lt;/td&gt;&lt;td&gt;276775.12&lt;/td&gt;&lt;td&gt;463252&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;/td&gt;&lt;td&gt;be&lt;/td&gt;&lt;td&gt;249088.62&lt;/td&gt;&lt;td&gt;213460&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;/td&gt;&lt;td&gt;ch&lt;/td&gt;&lt;td&gt;201782.33&lt;/td&gt;&lt;td&gt;209530&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;/td&gt;&lt;td&gt;ro&lt;/td&gt;&lt;td&gt;190059.45&lt;/td&gt;&lt;td&gt;274216&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;/td&gt;&lt;td&gt;fi&lt;/td&gt;&lt;td&gt;172399.75&lt;/td&gt;&lt;td&gt;204630&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;/td&gt;&lt;td&gt;ar&lt;/td&gt;&lt;td&gt;170421.77&lt;/td&gt;&lt;td&gt;374071&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;/td&gt;&lt;td&gt;no&lt;/td&gt;&lt;td&gt;169351.46&lt;/td&gt;&lt;td&gt;155258&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/351311855402780691-4138728753261971210?l=cacheboy.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cacheboy.blogspot.com/feeds/4138728753261971210/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=351311855402780691&amp;postID=4138728753261971210' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/351311855402780691/posts/default/4138728753261971210'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/351311855402780691/posts/default/4138728753261971210'/><link rel='alternate' type='text/html' href='http://cacheboy.blogspot.com/2009/06/and-geoip-summary.html' title='And the GeoIP summary..'/><author><name>Adrian</name><uri>http://www.blogger.com/profile/17496219706861321916</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='33' height='25' src='http://bp2.blogger.com/_37wobiQ3zUs/SAKidE1C46I/AAAAAAAAAAM/F065c-5eNR8/S220/2004-12-17-adrian.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-351311855402780691.post-5120080341431794455</id><published>2009-06-16T18:17:00.000-07:00</published><updated>2009-06-17T07:41:00.184-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='BGP'/><category scheme='http://www.blogger.com/atom/ns#' term='cdn'/><category scheme='http://www.blogger.com/atom/ns#' term='cacheboy'/><title type='text'>A quick snapshot of Cacheboy destinations..</title><content type='html'>&lt;p&gt;The following is a snapshot of the per destination AS traffic information I'm keeping.&lt;/p&gt;&lt;br /&gt;&lt;p&gt;If you're peering with any of these ASes and are willing to sponsor a cacheboy node or two then please let me know. How well I can scale things at this point is rapidly becoming limited to where I can push traffic from, rather than anything intrinsic to the software.&lt;/p&gt;&lt;br /&gt;&lt;p&gt;From Sun Jun  7 00:00:00 2009 to Sun Jun 14 00:00:00 2009&lt;/p&gt;&lt;br /&gt;&lt;table&gt;&lt;tr&gt;&lt;th&gt;Time&lt;/th&gt;&lt;th&gt;Site&lt;/th&gt;&lt;th&gt;ASN&lt;/th&gt;&lt;th&gt;MBytes&lt;/th&gt;&lt;th&gt;Requests&lt;/th&gt;&lt;th&gt;% of overall&lt;/th&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;/td&gt;&lt;td&gt;&lt;/td&gt;&lt;td&gt;AS3320&lt;/td&gt;&lt;td&gt;602465.01&lt;/td&gt;&lt;td&gt;1021975&lt;/td&gt;&lt;td&gt;3.26&lt;/td&gt;&lt;td&gt;DTAG Deutsche Telekom AG&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;/td&gt;&lt;td&gt;&lt;/td&gt;&lt;td&gt;AS7132&lt;/td&gt;&lt;td&gt;583164.05&lt;/td&gt;&lt;td&gt;778259&lt;/td&gt;&lt;td&gt;3.16&lt;/td&gt;&lt;td&gt;SBIS-AS - AT&amp;T Internet Services&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;/td&gt;&lt;td&gt;&lt;/td&gt;&lt;td&gt;AS19262&lt;/td&gt;&lt;td&gt;459322.30&lt;/td&gt;&lt;td&gt;603127&lt;/td&gt;&lt;td&gt;2.49&lt;/td&gt;&lt;td&gt;VZGNI-TRANSIT - Verizon Internet Services Inc.&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;/td&gt;&lt;td&gt;&lt;/td&gt;&lt;td&gt;AS3215&lt;/td&gt;&lt;td&gt;330962.95&lt;/td&gt;&lt;td&gt;553299&lt;/td&gt;&lt;td&gt;1.79&lt;/td&gt;&lt;td&gt;AS3215 France Telecom - Orange&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;/td&gt;&lt;td&gt;&lt;/td&gt;&lt;td&gt;AS3269&lt;/td&gt;&lt;td&gt;317534.06&lt;/td&gt;&lt;td&gt;333114&lt;/td&gt;&lt;td&gt;1.72&lt;/td&gt;&lt;td&gt;ASN-IBSNAZ TELECOM ITALIA&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;/td&gt;&lt;td&gt;&lt;/td&gt;&lt;td&gt;AS9121&lt;/td&gt;&lt;td&gt;259768.32&lt;/td&gt;&lt;td&gt;434932&lt;/td&gt;&lt;td&gt;1.41&lt;/td&gt;&lt;td&gt;TTNET TTnet Autonomous System&lt;/td&gt;&lt;/tr&gt;&lt;br /&gt;&lt;tr&gt;&lt;td&gt;&lt;/td&gt;&lt;td&gt;&lt;/td&gt;&lt;td&gt;AS22773&lt;/td&gt;&lt;td&gt;244573.65&lt;/td&gt;&lt;td&gt;283427&lt;/td&gt;&lt;td&gt;1.32&lt;/td&gt;&lt;td&gt;ASN-CXA-ALL-CCI-22773-RDC - Cox Communications Inc.&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;/td&gt;&lt;td&gt;&lt;/td&gt;&lt;td&gt;AS12322&lt;/td&gt;&lt;td&gt;224708.25&lt;/td&gt;&lt;td&gt;343686&lt;/td&gt;&lt;td&gt;1.22&lt;/td&gt;&lt;td&gt;PROXAD AS for Proxad/Free ISP&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;/td&gt;&lt;td&gt;&lt;/td&gt;&lt;td&gt;AS3352&lt;/td&gt;&lt;td&gt;206093.84&lt;/td&gt;&lt;td&gt;305183&lt;/td&gt;&lt;td&gt;1.12&lt;/td&gt;&lt;td&gt;TELEFONICADATA-ESPANA Internet Access Network of TDE&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;/td&gt;&lt;td&gt;&lt;/td&gt;&lt;td&gt;AS812&lt;/td&gt;&lt;td&gt;204120.74&lt;/td&gt;&lt;td&gt;166633&lt;/td&gt;&lt;td&gt;1.10&lt;/td&gt;&lt;td&gt;ROGERS-CABLE - Rogers Cable Communications Inc.&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;/td&gt;&lt;td&gt;&lt;/td&gt;&lt;td&gt;AS8151&lt;/td&gt;&lt;td&gt;198918.22&lt;/td&gt;&lt;td&gt;328632&lt;/td&gt;&lt;td&gt;1.08&lt;/td&gt;&lt;td&gt;Uninet S.A. de C.V.&lt;/td&gt;&lt;/tr&gt;&lt;br /&gt;&lt;tr&gt;&lt;td&gt;&lt;/td&gt;&lt;td&gt;&lt;/td&gt;&lt;td&gt;AS6327&lt;/td&gt;&lt;td&gt;197906.53&lt;/td&gt;&lt;td&gt;152861&lt;/td&gt;&lt;td&gt;1.07&lt;/td&gt;&lt;td&gt;SHAW - Shaw Communications Inc.&lt;/td&gt;&lt;/tr&gt;&lt;br /&gt;&lt;tr&gt;&lt;td&gt;&lt;/td&gt;&lt;td&gt;&lt;/td&gt;&lt;td&gt;AS3209&lt;/td&gt;&lt;td&gt;191429.18&lt;/td&gt;&lt;td&gt;303787&lt;/td&gt;&lt;td&gt;1.04&lt;/td&gt;&lt;td&gt;ARCOR-AS Arcor IP-Network&lt;/td&gt;&lt;/tr&gt;&lt;br /&gt;&lt;tr&gt;&lt;td&gt;&lt;/td&gt;&lt;td&gt;&lt;/td&gt;&lt;td&gt;AS20115&lt;/td&gt;&lt;td&gt;182407.09&lt;/td&gt;&lt;td&gt;225151&lt;/td&gt;&lt;td&gt;0.99&lt;/td&gt;&lt;td&gt;CHARTER-NET-HKY-NC - Charter Communications&lt;/td&gt;&lt;/tr&gt;&lt;br /&gt;&lt;tr&gt;&lt;td&gt;&lt;/td&gt;&lt;td&gt;&lt;/td&gt;&lt;td&gt;AS2119&lt;/td&gt;&lt;td&gt;181719.20&lt;/td&gt;&lt;td&gt;117656&lt;/td&gt;&lt;td&gt;0.98&lt;/td&gt;&lt;td&gt;TELENOR-NEXTEL T.net&lt;/td&gt;&lt;/tr&gt;&lt;br /&gt;&lt;tr&gt;&lt;td&gt;&lt;/td&gt;&lt;td&gt;&lt;/td&gt;&lt;td&gt;AS577&lt;/td&gt;&lt;td&gt;181167.02&lt;/td&gt;&lt;td&gt;152383&lt;/td&gt;&lt;td&gt;0.98&lt;/td&gt;&lt;td&gt;BACOM - Bell Canada&lt;/td&gt;&lt;/tr&gt;&lt;br /&gt;&lt;tr&gt;&lt;td&gt;&lt;/td&gt;&lt;td&gt;&lt;/td&gt;&lt;td&gt;AS12874&lt;/td&gt;&lt;td&gt;172973.42&lt;/td&gt;&lt;td&gt;108429&lt;/td&gt;&lt;td&gt;0.94&lt;/td&gt;&lt;td&gt;FASTWEB Fastweb Autonomous System&lt;/td&gt;&lt;/tr&gt;&lt;br /&gt;&lt;tr&gt;&lt;td&gt;&lt;/td&gt;&lt;td&gt;&lt;/td&gt;&lt;td&gt;AS6389&lt;/td&gt;&lt;td&gt;165445.73&lt;/td&gt;&lt;td&gt;236133&lt;/td&gt;&lt;td&gt;0.90&lt;/td&gt;&lt;td&gt;BELLSOUTH-NET-BLK - BellSouth.net Inc.&lt;/td&gt;&lt;/tr&gt;&lt;br /&gt;&lt;tr&gt;&lt;td&gt;&lt;/td&gt;&lt;td&gt;&lt;/td&gt;&lt;td&gt;AS6128&lt;/td&gt;&lt;td&gt;165183.07&lt;/td&gt;&lt;td&gt;210300&lt;/td&gt;&lt;td&gt;0.89&lt;/td&gt;&lt;td&gt;CABLE-NET-1 - Cablevision Systems Corp.&lt;/td&gt;&lt;/tr&gt;&lt;br /&gt;&lt;tr&gt;&lt;td&gt;&lt;/td&gt;&lt;td&gt;&lt;/td&gt;&lt;td&gt;AS2856&lt;/td&gt;&lt;td&gt;164332.96&lt;/td&gt;&lt;td&gt;219267&lt;/td&gt;&lt;td&gt;0.89&lt;/td&gt;&lt;td&gt;BT-UK-AS BTnet UK Regional network&lt;/td&gt;&lt;/tr&gt;&lt;br /&gt;&lt;br /&gt;&lt;/table&gt;&lt;br /&gt;&lt;p&gt;Query content served: 5234195.61 mbytes; 6878234 requests (ie, what was displayed in the table.)&lt;/p&gt;&lt;br /&gt;&lt;p&gt;Total content served: 18473721.25 mbytes; 26272660 requests (ie, the total amount of content served over the time period.)&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/351311855402780691-5120080341431794455?l=cacheboy.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cacheboy.blogspot.com/feeds/5120080341431794455/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=351311855402780691&amp;postID=5120080341431794455' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/351311855402780691/posts/default/5120080341431794455'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/351311855402780691/posts/default/5120080341431794455'/><link rel='alternate' type='text/html' href='http://cacheboy.blogspot.com/2009/06/quick-snapshot-of-cacheboy-destinations.html' title='A quick snapshot of Cacheboy destinations..'/><author><name>Adrian</name><uri>http://www.blogger.com/profile/17496219706861321916</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='33' height='25' src='http://bp2.blogger.com/_37wobiQ3zUs/SAKidE1C46I/AAAAAAAAAAM/F065c-5eNR8/S220/2004-12-17-adrian.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-351311855402780691.post-2932985122453010274</id><published>2009-06-13T03:27:00.001-07:00</published><updated>2009-06-13T03:29:40.732-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='cdn'/><category scheme='http://www.blogger.com/atom/ns#' term='cacheboy'/><title type='text'>Seeking a few more US / Canada hosts</title><content type='html'>G'day everyone!&lt;br /&gt;&lt;br /&gt;I'm now actively looking for some more Cacheboy CDN nodes in the United States and Canada. I've got around 3gbit of available bandwidth in Europe, 1gbit of available bandwidth in Japan but only 300mbit of available bandwidth in North America.&lt;br /&gt;&lt;br /&gt;I'd really, really appreciate a couple of well-connected North American nodes so I can properly test the platform and software that I'm building. The majority of traffic is still North American in destination; I'm having to serve a fraction of it from Sweden and the United Kingdom at the moment. Erk.&lt;br /&gt;&lt;br /&gt;Please drop me a line if you're interested. The node requirements are at http://www.cacheboy.net/node_requirements.html . Thankyou!&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/351311855402780691-2932985122453010274?l=cacheboy.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cacheboy.blogspot.com/feeds/2932985122453010274/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=351311855402780691&amp;postID=2932985122453010274' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/351311855402780691/posts/default/2932985122453010274'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/351311855402780691/posts/default/2932985122453010274'/><link rel='alternate' type='text/html' href='http://cacheboy.blogspot.com/2009/06/seeking-few-more-us-canada-hosts.html' title='Seeking a few more US / Canada hosts'/><author><name>Adrian</name><uri>http://www.blogger.com/profile/17496219706861321916</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='33' height='25' src='http://bp2.blogger.com/_37wobiQ3zUs/SAKidE1C46I/AAAAAAAAAAM/F065c-5eNR8/S220/2004-12-17-adrian.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-351311855402780691.post-6124742366377822073</id><published>2009-06-12T21:41:00.000-07:00</published><updated>2009-06-12T21:50:37.477-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='lusca'/><category scheme='http://www.blogger.com/atom/ns#' term='cdn'/><category scheme='http://www.blogger.com/atom/ns#' term='cacheboy'/><title type='text'>Another day, another firefox release done..</title><content type='html'>The June Firefox 3.0.11 release rush is all but over and Cacheboy worked without much of a problem.&lt;br /&gt;&lt;br /&gt;The changes I've made to the Lusca load shedding code (ie, being able to disable it :) works well for this workload. Migrating the backend to lighttpd (and fixing up the ETag generation to be properly consistent between 32 bit and 64 bit platforms) fixed the initial issues I was seeing.&lt;br /&gt;&lt;br /&gt;The network pushed out around 850mbit at peak. Not a lot (heck, I can do that on one CPU of a mid-range server without a problem!) but it was a good enough test to show that things are working.&lt;br /&gt;&lt;br /&gt;I need to teach Lusca a couple of new tricks, namely:&lt;br /&gt;&lt;br /&gt;&lt;ul&gt;&lt;br /&gt;&lt;li&gt;It needs to be taught to download at the fastest client speed, not the slowest; and&lt;li&gt;&lt;br /&gt;&lt;li&gt;Some better range request caching needs to be added.&lt;/li&gt;&lt;br /&gt;&lt;/ul&gt;&lt;br /&gt;&lt;br /&gt;The former isn't too difficult - that is a weekend 5 line patch. The latter is more difficult. I don't really want to shoehorn in range request caching into the current storage layer. It would look a lot like how Vary and Etag is currently handled (ie, with "magical" store entries acting as indexes to the real backend objects.) I'd rather put in a dirtier hack that is easy to undo now and use the opportunity to tidy up the whole storage layer a whole lot. But the "tidying up" rant is not for this blog entry, its for the Lusca development blog.&lt;br /&gt;&lt;br /&gt;The hack will most likely be a little logic to start downloading full objects that aren't in the cache when their first range request comes in - so subsequent range requests for those objects will be "glued" to the current request. It means that subsequent requests will "stall" until enough of the object is transferred to start satisfying their range request. The alternative is to pass through each range request to a backend until the full object is transferred and this would improve initial performance but there's a point where the backend could be overloaded with too many range requests for highly popular objects and that starts affecting how fast full objects are transferred.&lt;br /&gt;&lt;br /&gt;As a side note, I should probably do up some math on a whiteboard here and see if I can model some of the potential behaviour(s). It would certainly be a good excuse to brush up on higher math clue. Hm..!&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/351311855402780691-6124742366377822073?l=cacheboy.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cacheboy.blogspot.com/feeds/6124742366377822073/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=351311855402780691&amp;postID=6124742366377822073' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/351311855402780691/posts/default/6124742366377822073'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/351311855402780691/posts/default/6124742366377822073'/><link rel='alternate' type='text/html' href='http://cacheboy.blogspot.com/2009/06/another-day-another-firefox-release.html' title='Another day, another firefox release done..'/><author><name>Adrian</name><uri>http://www.blogger.com/profile/17496219706861321916</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='33' height='25' src='http://bp2.blogger.com/_37wobiQ3zUs/SAKidE1C46I/AAAAAAAAAAM/F065c-5eNR8/S220/2004-12-17-adrian.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-351311855402780691.post-3858101519756252565</id><published>2009-06-11T22:31:00.001-07:00</published><updated>2009-06-12T02:29:11.287-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='lighttpd'/><category scheme='http://www.blogger.com/atom/ns#' term='cdn'/><category scheme='http://www.blogger.com/atom/ns#' term='cacheboy'/><title type='text'>Migrating to Lighttpd on the backend, and why aren't my files being cached..</title><content type='html'>I migrated away from apache-1.3 to Lighttpd-1.4.19 to handle the load better. Apache-1.3 handles lots of concurrent disk IO on large files fine but it bites for lots of concurrent network connections.&lt;br /&gt;&lt;br /&gt;In theory, once all of the caching stuff is fixed, the backends will spend most of their time revalidating objects.&lt;br /&gt;&lt;br /&gt;But for some weird reason I'm seeing TCP_REFRESH_MISS on my Lusca edge nodes and generally poor performance during this release. I look at the logs and find this:&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;[Host: mozilla.cdn.cacheboy.net\r\n&lt;br /&gt;User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.0.10) Gecko/2009042316 Firefox/3.0.10\r\n&lt;br /&gt;Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8\r\n&lt;br /&gt;Accept-Language: en-us,en;q=0.5\r\n&lt;br /&gt;Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7\r\n&lt;br /&gt;If-Modified-Since: Wed, 03 Jun 2009 15:09:39 GMT\r\n&lt;br /&gt;If-None-Match: "1721454571"\r\n&lt;br /&gt;Cache-Control: max-stale=0\r\n&lt;br /&gt;Connection: Keep-Alive\r\n&lt;br /&gt;Pragma: no-cache\r\n&lt;br /&gt;X-BlueCoat-Via: 24C3C50D45B23509\r\n]&lt;br /&gt;&lt;br /&gt;[HTTP/1.0 200 OK\r\n&lt;br /&gt;Content-Type: application/octet-stream\r\n&lt;br /&gt;Accept-Ranges: bytes\r\n&lt;br /&gt;ETag: "1687308715"\r\n&lt;br /&gt;Last-Modified: Wed, 03 Jun 2009 15:09:39 GMT\r\n&lt;br /&gt;Content-Length: 2178196\r\n&lt;br /&gt;Date: Fri, 12 Jun 2009 04:25:40 GMT\r\n&lt;br /&gt;Server: lighttpd/1.4.19\r\n&lt;br /&gt;X-Cache: MISS from mirror1.jp.cacheboy.net\r\n&lt;br /&gt;Via: 1.0 mirror1.jp.cacheboy.net:80 (Lusca/LUSCA_HEAD)\r\n&lt;br /&gt;Connection: keep-alive\r\n\r]&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;&lt;br /&gt;Notice the different ETags? Hm! I wonder whats going on. On a hunch I checked the Etags from both backends. master1 for that object gives "1721454571"; master2 gives "1687308715". They both have the same size and same timestamp. I wonder what is different?&lt;br /&gt;&lt;br /&gt;Time to go digging into the depths of the lighttpd code.&lt;br /&gt;&lt;br /&gt;EDIT: the etag generation is configurable. By default it uses the mtime, inode and filesize. Disabling inode and inode/mtime didn't help. I then found that earlier lighttpd versions have different etag generation behaviour based on 32 or 64 bit platforms. I'll build a local lighttpd package and see if I can replicate the behaviour on my 32/64 bit systems. Grr.&lt;br /&gt;&lt;br /&gt;Meanwhile, Cacheboy isn't really serving any of the mozilla updates. :(&lt;br /&gt;&lt;br /&gt;EDIT: so it turns out the bug is in the ETag generation code. They create an unsigned 32-bit integer hash value from the etag contents, then shovel it into a signed long for the ETag header. Unfortunately for FreeBSD-i386, "long" is a signed 32 bit type, and thus things go airy from time to time. Grrrrrr.&lt;br /&gt;&lt;br /&gt;EDIT: fixed in a newly-built local lighttpd package; both backend servers are now doing the right thing. I'm going back to serving content.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/351311855402780691-3858101519756252565?l=cacheboy.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cacheboy.blogspot.com/feeds/3858101519756252565/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=351311855402780691&amp;postID=3858101519756252565' title='5 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/351311855402780691/posts/default/3858101519756252565'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/351311855402780691/posts/default/3858101519756252565'/><link rel='alternate' type='text/html' href='http://cacheboy.blogspot.com/2009/06/migrating-to-lighttpd-on-backend-and.html' title='Migrating to Lighttpd on the backend, and why aren&apos;t my files being cached..'/><author><name>Adrian</name><uri>http://www.blogger.com/profile/17496219706861321916</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='33' height='25' src='http://bp2.blogger.com/_37wobiQ3zUs/SAKidE1C46I/AAAAAAAAAAM/F065c-5eNR8/S220/2004-12-17-adrian.jpg'/></author><thr:total>5</thr:total></entry><entry><id>tag:blogger.com,1999:blog-351311855402780691.post-6783266454788260291</id><published>2009-06-02T22:49:00.000-07:00</published><updated>2009-06-02T22:52:03.142-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='cdn'/><category scheme='http://www.blogger.com/atom/ns#' term='cacheboy'/><title type='text'>New mirrors - mirror2.uk and mirror3.uk</title><content type='html'>I've just had two new sponsors show up with a pair of UK mirrors.&lt;br /&gt;&lt;br /&gt;mirror2.uk is thanks to UK Broadband, who have graciously given me access to a few hundred megabits of traffic and space on an ESX server.&lt;br /&gt;&lt;br /&gt;mirror3.uk (due to be turned up today!) is thanks to a private donor named Alex who has given me a server in his colocation space and up to a gigabit of traffic.&lt;br /&gt;&lt;br /&gt;Shiny! Thanks to you both.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/351311855402780691-6783266454788260291?l=cacheboy.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cacheboy.blogspot.com/feeds/6783266454788260291/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=351311855402780691&amp;postID=6783266454788260291' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/351311855402780691/posts/default/6783266454788260291'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/351311855402780691/posts/default/6783266454788260291'/><link rel='alternate' type='text/html' href='http://cacheboy.blogspot.com/2009/06/new-mirrors-mirror2uk-and-mirror3uk.html' title='New mirrors - mirror2.uk and mirror3.uk'/><author><name>Adrian</name><uri>http://www.blogger.com/profile/17496219706861321916</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='33' height='25' src='http://bp2.blogger.com/_37wobiQ3zUs/SAKidE1C46I/AAAAAAAAAAM/F065c-5eNR8/S220/2004-12-17-adrian.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-351311855402780691.post-7292297869780940689</id><published>2009-04-22T06:14:00.001-07:00</published><updated>2009-04-22T06:14:29.089-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='cacheboy'/><title type='text'>More traffic!</title><content type='html'>970mbit and rising...!&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/351311855402780691-7292297869780940689?l=cacheboy.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cacheboy.blogspot.com/feeds/7292297869780940689/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=351311855402780691&amp;postID=7292297869780940689' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/351311855402780691/posts/default/7292297869780940689'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/351311855402780691/posts/default/7292297869780940689'/><link rel='alternate' type='text/html' href='http://cacheboy.blogspot.com/2009/04/more-traffic.html' title='More traffic!'/><author><name>Adrian</name><uri>http://www.blogger.com/profile/17496219706861321916</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='33' height='25' src='http://bp2.blogger.com/_37wobiQ3zUs/SAKidE1C46I/AAAAAAAAAAM/F065c-5eNR8/S220/2004-12-17-adrian.jpg'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-351311855402780691.post-1187673286880718841</id><published>2009-04-22T05:36:00.000-07:00</published><updated>2009-04-22T05:37:27.773-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='cacheboy'/><category scheme='http://www.blogger.com/atom/ns#' term='mozilla'/><title type='text'>mirror1.jp.cacheboy.net - mozilla!</title><content type='html'>The nice folk at mozilla.org have provided me with a .jp CDN node. I'm now serving a good stack of bits from it into Australia, Malaysia, India, Japan, China, Korea and the Phillipines.&lt;br /&gt;&lt;br /&gt;Thanks guys!&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/351311855402780691-1187673286880718841?l=cacheboy.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cacheboy.blogspot.com/feeds/1187673286880718841/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=351311855402780691&amp;postID=1187673286880718841' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/351311855402780691/posts/default/1187673286880718841'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/351311855402780691/posts/default/1187673286880718841'/><link rel='alternate' type='text/html' href='http://cacheboy.blogspot.com/2009/04/mirror1jpcacheboynet-mozilla.html' title='mirror1.jp.cacheboy.net - mozilla!'/><author><name>Adrian</name><uri>http://www.blogger.com/profile/17496219706861321916</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='33' height='25' src='http://bp2.blogger.com/_37wobiQ3zUs/SAKidE1C46I/AAAAAAAAAAM/F065c-5eNR8/S220/2004-12-17-adrian.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-351311855402780691.post-7792081875493314104</id><published>2009-04-22T05:34:00.000-07:00</published><updated>2009-04-22T05:35:20.483-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='cacheboy'/><title type='text'>Mozilla 3.0.9 release..</title><content type='html'>The mozilla release (3.0.9) is currently going on. The traffic levels are ramping up now to the release peak.&lt;br /&gt;&lt;br /&gt;880mbit/sec and counting..&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/351311855402780691-7792081875493314104?l=cacheboy.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cacheboy.blogspot.com/feeds/7792081875493314104/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=351311855402780691&amp;postID=7792081875493314104' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/351311855402780691/posts/default/7792081875493314104'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/351311855402780691/posts/default/7792081875493314104'/><link rel='alternate' type='text/html' href='http://cacheboy.blogspot.com/2009/04/mozilla-309-release.html' title='Mozilla 3.0.9 release..'/><author><name>Adrian</name><uri>http://www.blogger.com/profile/17496219706861321916</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='33' height='25' src='http://bp2.blogger.com/_37wobiQ3zUs/SAKidE1C46I/AAAAAAAAAAM/F065c-5eNR8/S220/2004-12-17-adrian.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-351311855402780691.post-3911505043858762681</id><published>2009-04-06T01:10:00.000-07:00</published><updated>2009-04-06T01:22:31.652-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='performance'/><category scheme='http://www.blogger.com/atom/ns#' term='lusca'/><category scheme='http://www.blogger.com/atom/ns#' term='cacheboy'/><title type='text'>Lusca and Cacheboy improvements in the pipeline..</title><content type='html'>After profiling Lusca-HEAD rather extensively on the CDN nodes, I've discovered that the largest CPU "use" on the core 2 duo class boxes is memcpy(). On the ia64-2 node memcpy() shows up much lower down in the list. I'm sure this has to do with the differing FSB and general memory bus bandwidth available on the two architectures.&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;I'm planning out the changes to the store client needed to support fully copy-free async read and write. This should reduce the CPU overhead on core 2 duo class machines to the point where Lusca should break GigE throughput on this workload without too much CPU use. (I'm sure it could break GigE throughput right now on this workload though.)&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;I'll code this all up during the week and build a simulated testing rig at home "pretending" to be a whole lot of clients downloading partial bits of mozilla/firefox updates, complete with a random packetloss, latency and abort probability.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;I also plan on finally releasing the bulk of the Cacheboy CDN software (hackish as it is!) during the week, right after I finally remove the last few bits of hard-coded configuration locations. :) I still haven't finished merging in the bits of code which do the health check, calculate the current probabilities to assign each host and then write out the geoip map files. I'll try to sort that out over the next few days and get a public subversion repository with the software online.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;By the way, I plan on releasing the Cacheboy CDN software under the Affero GPL (AGPL) licence.&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/351311855402780691-3911505043858762681?l=cacheboy.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cacheboy.blogspot.com/feeds/3911505043858762681/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=351311855402780691&amp;postID=3911505043858762681' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/351311855402780691/posts/default/3911505043858762681'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/351311855402780691/posts/default/3911505043858762681'/><link rel='alternate' type='text/html' href='http://cacheboy.blogspot.com/2009/04/lusca-and-cacheboy-improvements-in.html' title='Lusca and Cacheboy improvements in the pipeline..'/><author><name>Adrian</name><uri>http://www.blogger.com/profile/17496219706861321916</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='33' height='25' src='http://bp2.blogger.com/_37wobiQ3zUs/SAKidE1C46I/AAAAAAAAAAM/F065c-5eNR8/S220/2004-12-17-adrian.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-351311855402780691.post-1366996965452212747</id><published>2009-03-31T22:11:00.000-07:00</published><updated>2009-03-31T22:14:57.763-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='lusca'/><title type='text'>Lusca snapshot released</title><content type='html'>I've just put up a snapshot of the version of lusca-head which is running on the cacheboy cdn. Head to http://code.google.com/p/lusca-cache/downloads/list .&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/351311855402780691-1366996965452212747?l=cacheboy.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cacheboy.blogspot.com/feeds/1366996965452212747/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=351311855402780691&amp;postID=1366996965452212747' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/351311855402780691/posts/default/1366996965452212747'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/351311855402780691/posts/default/1366996965452212747'/><link rel='alternate' type='text/html' href='http://cacheboy.blogspot.com/2009/03/lusca-snapshot-released.html' title='Lusca snapshot released'/><author><name>Adrian</name><uri>http://www.blogger.com/profile/17496219706861321916</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='33' height='25' src='http://bp2.blogger.com/_37wobiQ3zUs/SAKidE1C46I/AAAAAAAAAAM/F065c-5eNR8/S220/2004-12-17-adrian.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-351311855402780691.post-6118780347958128088</id><published>2009-03-30T14:33:00.000-07:00</published><updated>2009-03-30T14:37:35.250-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='cyberduck'/><category scheme='http://www.blogger.com/atom/ns#' term='cdn'/><category scheme='http://www.blogger.com/atom/ns#' term='cacheboy'/><title type='text'>Mirroring a new project - Cyberduck!</title><content type='html'>I've just started providing mirror download services for &lt;a href="http://cyberduck.ch/"&gt;Cyberduck&lt;/a&gt; - a file manager for a wide variety of platforms including the traditional (SFTP, FTP) and the new (Amazon/S3, WebDAV.) Cacheboy is listed as the primary download site on the main page.&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Woo!&lt;br /&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/351311855402780691-6118780347958128088?l=cacheboy.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cacheboy.blogspot.com/feeds/6118780347958128088/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=351311855402780691&amp;postID=6118780347958128088' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/351311855402780691/posts/default/6118780347958128088'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/351311855402780691/posts/default/6118780347958128088'/><link rel='alternate' type='text/html' href='http://cacheboy.blogspot.com/2009/03/mirroring-new-project-cyberduck.html' title='Mirroring a new project - Cyberduck!'/><author><name>Adrian</name><uri>http://www.blogger.com/profile/17496219706861321916</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='33' height='25' src='http://bp2.blogger.com/_37wobiQ3zUs/SAKidE1C46I/AAAAAAAAAAM/F065c-5eNR8/S220/2004-12-17-adrian.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-351311855402780691.post-4539383693851970611</id><published>2009-03-30T14:26:00.000-07:00</published><updated>2009-03-30T14:33:28.333-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='cdn'/><category scheme='http://www.blogger.com/atom/ns#' term='cacheboy'/><title type='text'>Mozilla 3.0.8 release!</title><content type='html'>The CDN handled the load with oodles to spare. The aggregate client traffic peak was about 650mbit across 5 major boxes. The boxes themselves peaked at about 160mbit each, depending upon the time of day (ie, whether Europe or the US was active.) None of the nodes were anywhere near maximum CPU utilisation.&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;About 2 and a half TB of mozilla updates a day are being shuffled out.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;I'd like to try pushing a couple of the nodes up to 600mbit -each- but I don't have enough CDN nodes to guarantee the bits will keep flowing if said node fails. I'll just have to be patient and wait for a few more sponsors to step up and provide some hardware and bandwidth to the project.&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;So far so good - the bits are flowing, I'm able to use this to benchmark Lusca development and fix performance bottlenecks before they become serious (in this environment, at least) and things are growing at about the right rate for me to not need to panic. :)&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;My next major goal will be to finish off the BGP library and lookup daemon; flesh out some BGP related redirection map logic; and start investigating reporting for "services" on the box. Hm, I may have to write some nagios plugins after all..&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/351311855402780691-4539383693851970611?l=cacheboy.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cacheboy.blogspot.com/feeds/4539383693851970611/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=351311855402780691&amp;postID=4539383693851970611' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/351311855402780691/posts/default/4539383693851970611'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/351311855402780691/posts/default/4539383693851970611'/><link rel='alternate' type='text/html' href='http://cacheboy.blogspot.com/2009/03/mozilla-308-release.html' title='Mozilla 3.0.8 release!'/><author><name>Adrian</name><uri>http://www.blogger.com/profile/17496219706861321916</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='33' height='25' src='http://bp2.blogger.com/_37wobiQ3zUs/SAKidE1C46I/AAAAAAAAAAM/F065c-5eNR8/S220/2004-12-17-adrian.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-351311855402780691.post-7773904100367477047</id><published>2009-03-28T16:05:00.000-07:00</published><updated>2009-03-28T16:14:07.521-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='lusca'/><title type='text'>shortcomings in the async io code</title><content type='html'>Profiling the busy(ish) Lusca nodes during the Mozilla 3.0.8 release cycle has shown significant CPU wastage in memset() (ie, 0'ing memory) - via the aioRead and aioCheckCallbacks code paths.&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;The problem stems from the disk IO interface inherited from Squid. With Squid, there's no explicit cancel-and-wait-for-cancel to occur with both the network and disk IO code, so the async disk IO read code would actually allocate its own read buffer, read into that, and then provide said read buffer to the completion callback to copy said read data out of. If the request is cancelled but the worker thread is currently read()'ing data, it'll read into its own buffer and not a potentially free()'d buffer from the owner. Its a bit inefficient but in the grand scheme of Squid CPU use, its not that big a waste on modern hardware.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;In the short term, I'm going to re-jig the async IO code to not zero buffers that are involved in the aioRead() path. In the longer term, I'm not sure. I prefer cancels which may fail - ie, if an operation is in progress, let it complete, if not then return immediately. I'd like this for the network code too, so I can use async network IO threads for less copy network IO (eg FreeBSD and aio_read() / aio_write()); but there's significant amounts of existing code which assumes things can be cancelled immediately and assumes temporary copies of data are made everywhere. Sigh.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Anyway - grr'ing aside, fixing the pointless zero'ing of buffers should drop the CPU use for large file operations reasonably noticably - by at least 10% to 15%. I'm sure that'll be a benefit to someone.&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/351311855402780691-7773904100367477047?l=cacheboy.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cacheboy.blogspot.com/feeds/7773904100367477047/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=351311855402780691&amp;postID=7773904100367477047' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/351311855402780691/posts/default/7773904100367477047'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/351311855402780691/posts/default/7773904100367477047'/><link rel='alternate' type='text/html' href='http://cacheboy.blogspot.com/2009/03/shortcomings-in-async-io-code.html' title='shortcomings in the async io code'/><author><name>Adrian</name><uri>http://www.blogger.com/profile/17496219706861321916</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='33' height='25' src='http://bp2.blogger.com/_37wobiQ3zUs/SAKidE1C46I/AAAAAAAAAAM/F065c-5eNR8/S220/2004-12-17-adrian.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-351311855402780691.post-3847681734279244578</id><published>2009-03-28T15:14:00.000-07:00</published><updated>2009-03-28T15:19:27.441-07:00</updated><title type='text'>Googletalk: "Getting C++ threads to work right"</title><content type='html'>I've been watching a few Google dev talks on Youtube. I thought I'd write up a summary of this one:&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="color: rgb(51, 51, 51); font-family: 'trebuchet ms'; font-size: 13px; "&gt;http://www.youtube.com/watch?v=mrvAqvtWYb4&lt;/span&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="color: rgb(51, 51, 51); font-family: 'trebuchet ms'; font-size: 13px;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="color: rgb(51, 51, 51); font-family: 'trebuchet ms'; font-size: 13px;"&gt;In summary:&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;ul&gt;&lt;li&gt;Writing "correct" thread code using the pthreads and CPU instructions (fencing, for example) requires the code to know whats going on under the hood;&lt;/li&gt;&lt;li&gt;Gluing concurrency to the "side" of a language which was specified without concurrency has shown to be a bit of a problem - eg, concurrent access to different variables in a structure and how various compilers have implemented this (eg, changing a byte in a struct becoming a 32 bit load, 8 bit modify, 32 bit store);&lt;/li&gt;&lt;li&gt;Most programmers should really use higher level constructs, like what C++0x and what the Java specification groups have been doing.&lt;/li&gt;&lt;/ul&gt;If you write threaded code or you're curious about it, you should watch this talk. It provides a very good overview of the problems and should open your mind up a little to what may go wrong..&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/351311855402780691-3847681734279244578?l=cacheboy.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cacheboy.blogspot.com/feeds/3847681734279244578/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=351311855402780691&amp;postID=3847681734279244578' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/351311855402780691/posts/default/3847681734279244578'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/351311855402780691/posts/default/3847681734279244578'/><link rel='alternate' type='text/html' href='http://cacheboy.blogspot.com/2009/03/googletalk-getting-c-threads-to-work.html' title='Googletalk: &quot;Getting C++ threads to work right&quot;'/><author><name>Adrian</name><uri>http://www.blogger.com/profile/17496219706861321916</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='33' height='25' src='http://bp2.blogger.com/_37wobiQ3zUs/SAKidE1C46I/AAAAAAAAAAM/F065c-5eNR8/S220/2004-12-17-adrian.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-351311855402780691.post-2685433406622676617</id><published>2009-03-27T14:21:00.000-07:00</published><updated>2009-03-27T14:31:54.073-07:00</updated><title type='text'>Another open cdn project - mirrorbrain</title><content type='html'>I've been made aware of Mirrorbrain (http://mirrorbrain.org), another project working towards an open CDN framework. Mirrorbrain uses Apache as the web server and some apache module smarts to redirect users between mirrors.&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;I like it - I'm going to read through their released source and papers to see what clue can be crimed from them - but they still base the CDN on an untrusted, third-party mirror network out of their control. I still think the path forward to an "open CDN" involves complete control right out to the mirror nodes and, in some places, the network which the mirror nodes live on.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;There's a couple of shortcomings - most notably, their ASN implementation currently uses snapshots of the BGP network topology table rather than a live BGP feed distributed out to each mirror and DNS node. They also store central indexes of files and attempt to maintain maps of which mirror nodes have which updated versions of files, rather than building on top of perfectly good HTTP/1.1 caching semantics. I wonder why..&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/351311855402780691-2685433406622676617?l=cacheboy.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cacheboy.blogspot.com/feeds/2685433406622676617/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=351311855402780691&amp;postID=2685433406622676617' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/351311855402780691/posts/default/2685433406622676617'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/351311855402780691/posts/default/2685433406622676617'/><link rel='alternate' type='text/html' href='http://cacheboy.blogspot.com/2009/03/another-open-cdn-project-mirrorbrain.html' title='Another open cdn project - mirrorbrain'/><author><name>Adrian</name><uri>http://www.blogger.com/profile/17496219706861321916</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='33' height='25' src='http://bp2.blogger.com/_37wobiQ3zUs/SAKidE1C46I/AAAAAAAAAAM/F065c-5eNR8/S220/2004-12-17-adrian.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-351311855402780691.post-6191126594874265590</id><published>2009-03-23T14:57:00.000-07:00</published><updated>2009-03-23T15:09:50.819-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='cdn'/><category scheme='http://www.blogger.com/atom/ns#' term='cacheboy'/><title type='text'>Example CDN stats!</title><content type='html'>&lt;div&gt;Here's a snapshot of the global aggregate traffic level:&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/_37wobiQ3zUs/ScgGXbxob6I/AAAAAAAAACI/C0d1OxNljtg/s1600-h/20090323-aggregate-stats.png"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 400px; height: 152px;" src="http://4.bp.blogspot.com/_37wobiQ3zUs/ScgGXbxob6I/AAAAAAAAACI/C0d1OxNljtg/s400/20090323-aggregate-stats.png" border="0" alt="" id="BLOGGER_PHOTO_ID_5316506359773556642" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;div&gt;.. and top 10 AS stats from last Sunday (UTC) :&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/_37wobiQ3zUs/ScgGNVASclI/AAAAAAAAACA/NmJ4lIufYwY/s1600-h/20090323-asn-stats.png"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 400px; height: 168px;" src="http://2.bp.blogspot.com/_37wobiQ3zUs/ScgGNVASclI/AAAAAAAAACA/NmJ4lIufYwY/s400/20090323-asn-stats.png" border="0" alt="" id="BLOGGER_PHOTO_ID_5316506186157290066" /&gt;&lt;/a&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/351311855402780691-6191126594874265590?l=cacheboy.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cacheboy.blogspot.com/feeds/6191126594874265590/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=351311855402780691&amp;postID=6191126594874265590' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/351311855402780691/posts/default/6191126594874265590'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/351311855402780691/posts/default/6191126594874265590'/><link rel='alternate' type='text/html' href='http://cacheboy.blogspot.com/2009/03/example-cdn-stats.html' title='Example CDN stats!'/><author><name>Adrian</name><uri>http://www.blogger.com/profile/17496219706861321916</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='33' height='25' src='http://bp2.blogger.com/_37wobiQ3zUs/SAKidE1C46I/AAAAAAAAAAM/F065c-5eNR8/S220/2004-12-17-adrian.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://4.bp.blogspot.com/_37wobiQ3zUs/ScgGXbxob6I/AAAAAAAAACI/C0d1OxNljtg/s72-c/20090323-aggregate-stats.png' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-351311855402780691.post-1060366745445930373</id><published>2009-03-22T17:13:00.000-07:00</published><updated>2009-03-22T17:14:33.184-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='wiki'/><category scheme='http://www.blogger.com/atom/ns#' term='cdn'/><category scheme='http://www.blogger.com/atom/ns#' term='cacheboy'/><title type='text'>wiki.cacheboy.net</title><content type='html'>I've setup http://wiki.cacheboy.net/, a simple mediawiki install which will serve as a place for me to braindump stuff into.&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/351311855402780691-1060366745445930373?l=cacheboy.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cacheboy.blogspot.com/feeds/1060366745445930373/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=351311855402780691&amp;postID=1060366745445930373' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/351311855402780691/posts/default/1060366745445930373'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/351311855402780691/posts/default/1060366745445930373'/><link rel='alternate' type='text/html' href='http://cacheboy.blogspot.com/2009/03/wikicacheboynet.html' title='wiki.cacheboy.net'/><author><name>Adrian</name><uri>http://www.blogger.com/profile/17496219706861321916</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='33' height='25' src='http://bp2.blogger.com/_37wobiQ3zUs/SAKidE1C46I/AAAAAAAAAAM/F065c-5eNR8/S220/2004-12-17-adrian.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-351311855402780691.post-2289708532506043404</id><published>2009-03-19T14:30:00.000-07:00</published><updated>2009-03-19T14:35:59.359-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='BGP'/><category scheme='http://www.blogger.com/atom/ns#' term='cdn'/><category scheme='http://www.blogger.com/atom/ns#' term='as250'/><category scheme='http://www.blogger.com/atom/ns#' term='anycast'/><category scheme='http://www.blogger.com/atom/ns#' term='cacheboy'/><title type='text'>More "Content Delivery" done open</title><content type='html'>Another network-savvy guy in Europe is doing something content-delivery related: http://www.as250.net/ .&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;div&gt;AS250 is building a BGP anycast based platform for various 'open' content delivery and other applications. I plan on doing something similar (or maybe just partner with him, I'm not sure!) but anycast is only part of my over-all solution space.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;He's put up some slides from a presentation he did earlier in the year:&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;http://www.trex.fi/2009/as250-anycast-bgp.pdf&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/351311855402780691-2289708532506043404?l=cacheboy.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cacheboy.blogspot.com/feeds/2289708532506043404/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=351311855402780691&amp;postID=2289708532506043404' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/351311855402780691/posts/default/2289708532506043404'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/351311855402780691/posts/default/2289708532506043404'/><link rel='alternate' type='text/html' href='http://cacheboy.blogspot.com/2009/03/more-content-delivery-done-open.html' title='More &quot;Content Delivery&quot; done open'/><author><name>Adrian</name><uri>http://www.blogger.com/profile/17496219706861321916</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='33' height='25' src='http://bp2.blogger.com/_37wobiQ3zUs/SAKidE1C46I/AAAAAAAAAAM/F065c-5eNR8/S220/2004-12-17-adrian.jpg'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-351311855402780691.post-2705595471056116170</id><published>2009-03-19T12:24:00.001-07:00</published><updated>2009-03-19T12:33:40.648-07:00</updated><title type='text'>Filesystem Specifications, or EXT4 "Losing Data"</title><content type='html'>This is a bit off-topic for this blog, but the particular issue at hand bugs the heck out of me.&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;EXT4 "meets" the POSIX specifications for filesystems. The specification does not make any requirements for data to be written out in any order - and for very good reason. If the application developer -requires- data to be written out in order, they should serialise their operations through use of fsync(). If they do -not- require it, then the operating system should be free to optimise away the physical IO operations.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;As a clueful(!) application developer, -I- appreciate being given the opportunity to provide this kind of feedback to the operating system. I don't want one or the other. I'd like to be able to use both where and when I choose.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Application developers - stop being stupid. Fix your applications. Read and understand the specification and what it provides -everyone- rather than just you.&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/351311855402780691-2705595471056116170?l=cacheboy.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cacheboy.blogspot.com/feeds/2705595471056116170/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=351311855402780691&amp;postID=2705595471056116170' title='3 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/351311855402780691/posts/default/2705595471056116170'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/351311855402780691/posts/default/2705595471056116170'/><link rel='alternate' type='text/html' href='http://cacheboy.blogspot.com/2009/03/filesystem-specifications-or-ext4.html' title='Filesystem Specifications, or EXT4 &quot;Losing Data&quot;'/><author><name>Adrian</name><uri>http://www.blogger.com/profile/17496219706861321916</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='33' height='25' src='http://bp2.blogger.com/_37wobiQ3zUs/SAKidE1C46I/AAAAAAAAAAM/F065c-5eNR8/S220/2004-12-17-adrian.jpg'/></author><thr:total>3</thr:total></entry><entry><id>tag:blogger.com,1999:blog-351311855402780691.post-2325965160650349769</id><published>2009-03-16T16:29:00.001-07:00</published><updated>2009-03-16T16:31:01.879-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='cdn'/><category scheme='http://www.blogger.com/atom/ns#' term='cacheboy'/><title type='text'>Breaking 200mbit..</title><content type='html'>The CDN broke 200mbit at peak today - roughly half mozilla and half videolan.&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;200mbit is still tiny in the grand scheme of things, but it proves that things are working fine.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;The next goal is to handle 500mbit average traffic during the the day, and keep a very close eye on the overheads in doing so (specifically - making sure that things don't blow up when the number of concurrent clients grows.)&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/351311855402780691-2325965160650349769?l=cacheboy.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cacheboy.blogspot.com/feeds/2325965160650349769/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=351311855402780691&amp;postID=2325965160650349769' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/351311855402780691/posts/default/2325965160650349769'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/351311855402780691/posts/default/2325965160650349769'/><link rel='alternate' type='text/html' href='http://cacheboy.blogspot.com/2009/03/breaking-200mbit.html' title='Breaking 200mbit..'/><author><name>Adrian</name><uri>http://www.blogger.com/profile/17496219706861321916</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='33' height='25' src='http://bp2.blogger.com/_37wobiQ3zUs/SAKidE1C46I/AAAAAAAAAAM/F065c-5eNR8/S220/2004-12-17-adrian.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-351311855402780691.post-2659798730578798420</id><published>2009-03-16T15:27:00.001-07:00</published><updated>2009-03-16T15:40:44.708-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='cdn'/><category scheme='http://www.blogger.com/atom/ns#' term='cacheboy'/><title type='text'>GeoIP backend, or "reinventing the wheel"</title><content type='html'>The first incantation of the Cacheboy CDN uses 100% GeoIP to redirect users. This is roughly how it goes:&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;ol&gt;&lt;li&gt;Take a GeoIP map to break up IPs into "country" regions (thanks nerd.dk!) ;&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Take the list of "up" CDN nodes;&lt;/li&gt;&lt;li&gt;For each country in my redirection table, find the CDN node that is up with the highest weight;&lt;/li&gt;&lt;li&gt;Generate a "geo-map" file consisting of the highest-weight "up" CDN node for each country in "3";&lt;/li&gt;&lt;li&gt;Feed that to the PowerDNS geoip module (thanks Mark @ Wikipedia!)&lt;/li&gt;&lt;/ol&gt;This really is a good place to start - its simple, its tested and it provides me with some basic abilities for distributing traffic across multiple sites to both speed up transfer times to end-users and better use the bandwidth available. The trouble is that it knows very little about the current state of the "internet" at any point in time. But, as I said, as a first (coarse!) step to get the CDN delivering bits, it worked out.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;My next step is to build a much easier "hackable" backend which I can start adding functionality to. I've reimplemented the geoip backend in Perl and glued it to the "pipe-backend" module in PowerDNS. This simply passes DNS requests to an external process which spits back DNS replies. The trouble is that multiple backend processes will be invoked regardless of whether you want to or not. This means that I can't simply load in large databases into the backend process as it'll take time to load, waste RAM, and generally make things scale (less) well.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;So I broke out the first memory hungry bit - the "geoip" lookup - and stuffed it into a small C daemon. All the daemon does is take a client IP and answer the geoip information for that IP. It will periodically check and reload the GeoIP database file in the background if its changed - maintaining whatever request rate I'm throwing at it rather than pausing for a few seconds whilst things are loaded in.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;I can then use the "geoip daemon" (lets call it "geoipd") by the PowerDNS pipe-backend process I'm writing. All this process has to do at the moment is load in the geo maps (which are small) and reload them as required. It sends all geoip requests to the geoipd and uses the reply. If there is a problem talking to the geoipd, the backend process will simply use a weighted round robin of well-connected servers as a last resort.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;The aim is to build a flexible backend framework for processing redirection requests which can be used by a variety of applications. For example, when its time for the CDN proxy nodes to also do 302 redirections to "closer" nodes, I can simply reuse a large part of the modular libraries written. When I integrate BGP information into the DNS infrastructure, I can reuse all of those libraries in the CDN proxy redirection logic, or the webserver URL rewriting logic, or anywhere else where its needed.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;The next step? Figuring out how to load balance traffic destined to the same AS / GeoIP region across multiple CDN end nodes. This should let me scale the CDN up to a gigabit of aggregate traffic given the kind of sponsored boxes I'm currently receiving. More to come..&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/351311855402780691-2659798730578798420?l=cacheboy.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cacheboy.blogspot.com/feeds/2659798730578798420/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=351311855402780691&amp;postID=2659798730578798420' title='4 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/351311855402780691/posts/default/2659798730578798420'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/351311855402780691/posts/default/2659798730578798420'/><link rel='alternate' type='text/html' href='http://cacheboy.blogspot.com/2009/03/geoip-backend-or-reinventing-wheel.html' title='GeoIP backend, or &quot;reinventing the wheel&quot;'/><author><name>Adrian</name><uri>http://www.blogger.com/profile/17496219706861321916</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='33' height='25' src='http://bp2.blogger.com/_37wobiQ3zUs/SAKidE1C46I/AAAAAAAAAAM/F065c-5eNR8/S220/2004-12-17-adrian.jpg'/></author><thr:total>4</thr:total></entry><entry><id>tag:blogger.com,1999:blog-351311855402780691.post-7357625494353894470</id><published>2009-03-13T22:09:00.001-07:00</published><updated>2009-03-13T22:27:04.874-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='cdn'/><category scheme='http://www.blogger.com/atom/ns#' term='cacheboy'/><title type='text'>Downtime!</title><content type='html'>The CDN had a bit of downtime tonight. It went like this:&lt;br /&gt;&lt;br /&gt;&lt;div&gt;&lt;ul&gt;&lt;li&gt;The first mirror threw a disk;&lt;/li&gt;&lt;li&gt;For some reason, gmirror became unhappy, rather than running on the second mirror (I'm guessing the controller went unhappy; there wasn't anything logged to indicate the other disk in the mirror set was failing);&lt;/li&gt;&lt;li&gt;The second mirror started taking load;&lt;/li&gt;&lt;li&gt;For some weird reason, the second mirror hung hard without any logging to explain why.&lt;/li&gt;&lt;/ul&gt;I've replaced the disks in mirror1 and its slowly rebuilding the content. It probably won't be finished resync'ing the (new) mirror set until tomorrow. Hopefully mirror-2 will stay just as stable as it currently is.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;The CDN ended up still serving content whilst the masters were down - they just couldn't download uncached content. So it wasn't a -total- loss.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;This just highlights that I really do require another mirror master or two located elsewhere. :)&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/351311855402780691-7357625494353894470?l=cacheboy.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cacheboy.blogspot.com/feeds/7357625494353894470/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=351311855402780691&amp;postID=7357625494353894470' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/351311855402780691/posts/default/7357625494353894470'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/351311855402780691/posts/default/7357625494353894470'/><link rel='alternate' type='text/html' href='http://cacheboy.blogspot.com/2009/03/downtime.html' title='Downtime!'/><author><name>Adrian</name><uri>http://www.blogger.com/profile/17496219706861321916</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='33' height='25' src='http://bp2.blogger.com/_37wobiQ3zUs/SAKidE1C46I/AAAAAAAAAAM/F065c-5eNR8/S220/2004-12-17-adrian.jpg'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-351311855402780691.post-1653105508406478621</id><published>2009-03-09T22:11:00.000-07:00</published><updated>2009-03-09T22:16:45.934-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='cdn'/><category scheme='http://www.blogger.com/atom/ns#' term='cacheboy'/><category scheme='http://www.blogger.com/atom/ns#' term='mozilla'/><title type='text'>minimising traffic to the backends..</title><content type='html'>Squid/Cacheboy/Lusca has a nifty feature where it'll "piggyback" a client connection on an existing backend connection if the backend response is cachable AND said response is valid for the client (ie, its the right variant, doesn't require revalidation at that point, etc.)&lt;br /&gt;&lt;br /&gt;I've been using this for the videolan and mozilla downloads. Basically, one client will suck down the whole object, and any other clients which want the same object (say, the 3.0.7 US english win32 update!) will share the same connection.&lt;br /&gt;&lt;br /&gt;There's a few problems which have crept up.&lt;br /&gt;&lt;br /&gt;Firstly - the "collapsed forwarding" support is not working in this instance. I think the logic is broken with large objects (it was only written for small objects, delaying forwarding the request until the forwarded response was known cachable) where it denies cachability of the response (well, it forces it to be RELEASEd after it finishes transferring) because of all of the concurrent range requests going on.&lt;br /&gt;&lt;br /&gt;Secondly - Squid/Cacheboy/Lusca doesn't handle range request caching. It'll -serve- range responses for objects it has the data for, but it won't cache partial responses nor will it reassemble them into one chunk. I've been thinking about how to possibly fix that, but for now I'm hacking around the problems with some scripts.&lt;br /&gt;&lt;br /&gt;Finally - the forwarding logic uses the speed of the -slowest- client to determine how quickly to download the file. This needs to be changed to use the speed of the -fastest- client to determine how quickly to download said file.&lt;br /&gt;&lt;br /&gt;I need to get these fixed before the next mozilla release cycle if I'm to have a chance of increasing the traffic levels to a gigabit and beyond.&lt;br /&gt;&lt;br /&gt;More to come..&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/351311855402780691-1653105508406478621?l=cacheboy.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cacheboy.blogspot.com/feeds/1653105508406478621/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=351311855402780691&amp;postID=1653105508406478621' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/351311855402780691/posts/default/1653105508406478621'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/351311855402780691/posts/default/1653105508406478621'/><link rel='alternate' type='text/html' href='http://cacheboy.blogspot.com/2009/03/minimising-traffic-to-backends.html' title='minimising traffic to the backends..'/><author><name>Adrian</name><uri>http://www.blogger.com/profile/17496219706861321916</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='33' height='25' src='http://bp2.blogger.com/_37wobiQ3zUs/SAKidE1C46I/AAAAAAAAAAM/F065c-5eNR8/S220/2004-12-17-adrian.jpg'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-351311855402780691.post-940607600511397784</id><published>2009-03-09T22:08:00.001-07:00</published><updated>2009-03-09T22:11:07.673-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='cdn'/><category scheme='http://www.blogger.com/atom/ns#' term='cacheboy'/><title type='text'>Cacheboy Outage</title><content type='html'>There was a brief outage earlier tonight due to some troubles with the transit provider of one of my sponsors. They're sponsoring the (only) pair of mirror master servers at the moment.&lt;br /&gt;&lt;br /&gt;This will be (somewhat) mitigated when I bring up another set of mirror master servers elsewhere.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/351311855402780691-940607600511397784?l=cacheboy.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cacheboy.blogspot.com/feeds/940607600511397784/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=351311855402780691&amp;postID=940607600511397784' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/351311855402780691/posts/default/940607600511397784'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/351311855402780691/posts/default/940607600511397784'/><link rel='alternate' type='text/html' href='http://cacheboy.blogspot.com/2009/03/cacheboy-outage.html' title='Cacheboy Outage'/><author><name>Adrian</name><uri>http://www.blogger.com/profile/17496219706861321916</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='33' height='25' src='http://bp2.blogger.com/_37wobiQ3zUs/SAKidE1C46I/AAAAAAAAAAM/F065c-5eNR8/S220/2004-12-17-adrian.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-351311855402780691.post-5369960841663346607</id><published>2009-03-08T23:19:00.001-07:00</published><updated>2009-03-08T23:23:39.747-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='cdn'/><category scheme='http://www.blogger.com/atom/ns#' term='cacheboy'/><title type='text'>Cacheboy is pushing bits..</title><content type='html'>The Cacheboy CDN is currently pushing about 1.2 TB a day (~ 100mbit average) of mozilla and videolan downloads out of 5 seperate CDN locations. A couple of the servers are running LUSCA_HEAD and they seem to handle the traffic just fine.&lt;br /&gt;&lt;br /&gt;The server assignment is currently being done through GeoIP mapping via DNS. I've brought up BGP sessions to each of the sites to eventually use in the request forwarding process.&lt;br /&gt;&lt;br /&gt;All in all, things are going reasonably successfully so far. There's been a few hiccups which I'll blog about over the next few days but the bits are flowing, and noone is complaining. :)&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/351311855402780691-5369960841663346607?l=cacheboy.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cacheboy.blogspot.com/feeds/5369960841663346607/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=351311855402780691&amp;postID=5369960841663346607' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/351311855402780691/posts/default/5369960841663346607'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/351311855402780691/posts/default/5369960841663346607'/><link rel='alternate' type='text/html' href='http://cacheboy.blogspot.com/2009/03/cacheboy-is-pushing-bits.html' title='Cacheboy is pushing bits..'/><author><name>Adrian</name><uri>http://www.blogger.com/profile/17496219706861321916</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='33' height='25' src='http://bp2.blogger.com/_37wobiQ3zUs/SAKidE1C46I/AAAAAAAAAAM/F065c-5eNR8/S220/2004-12-17-adrian.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-351311855402780691.post-432808042002568857</id><published>2009-02-27T23:04:00.000-08:00</published><updated>2009-02-27T23:06:33.475-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='cdn'/><category scheme='http://www.blogger.com/atom/ns#' term='cacheboy'/><title type='text'>Cacheboy CDN is online!</title><content type='html'>There's been a few changes!&lt;br /&gt;&lt;br /&gt;* The "Cacheboy proxy" development has become Lusca; thats spun off into a little separate project of its own.&lt;br /&gt;* The "Cacheboy" project is now focusing on providing an open source platform for content delivery. I've organised some donated hardware (some donated by me), some donated bandwidth (again, some donated by me) and a couple of test projects to serve content for.&lt;br /&gt;&lt;br /&gt;More details to come!&lt;br /&gt;&lt;br /&gt;(As a side note, I've got too many blogs; I think its time to rationalise them down to one or two and use labels to correctly identify which is which.)&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/351311855402780691-432808042002568857?l=cacheboy.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cacheboy.blogspot.com/feeds/432808042002568857/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=351311855402780691&amp;postID=432808042002568857' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/351311855402780691/posts/default/432808042002568857'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/351311855402780691/posts/default/432808042002568857'/><link rel='alternate' type='text/html' href='http://cacheboy.blogspot.com/2009/02/cacheboy-cdn-is-online.html' title='Cacheboy CDN is online!'/><author><name>Adrian</name><uri>http://www.blogger.com/profile/17496219706861321916</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='33' height='25' src='http://bp2.blogger.com/_37wobiQ3zUs/SAKidE1C46I/AAAAAAAAAAM/F065c-5eNR8/S220/2004-12-17-adrian.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-351311855402780691.post-2434897275772904171</id><published>2009-02-23T23:10:00.001-08:00</published><updated>2009-02-23T23:11:27.620-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='BGP'/><category scheme='http://www.blogger.com/atom/ns#' term='lusca'/><category scheme='http://www.blogger.com/atom/ns#' term='squid'/><category scheme='http://www.blogger.com/atom/ns#' term='cacheboy'/><title type='text'>Lusca and BGP, take 2.</title><content type='html'>I've ironed out the crash kinks (the rest of the "kinks" are in the BGP FSM implementation); thus I'm left with:&lt;br /&gt;&lt;br /&gt;1235459412.856  17063 118.92.109.x TCP_REFRESH_HIT/206 33405 GET http://videolan.cdn.cacheboy.net/vlc/0.9.8a/win32/vlc-0.9.8a-win32.exe - NONE/- application/x-msdownload AS7657&lt;br /&gt;1235459417.194   1113 202.150.98.x TCP_HIT/200 45637 GET http://videolan.cdn.cacheboy.net/vlc/0.9.8a/win32/vlc-0.9.8a-win32.exe - NONE/- application/x-msdownload AS17746&lt;br /&gt;&lt;br /&gt;Notice how the Squid logs have AS numbers in them? :)&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/351311855402780691-2434897275772904171?l=cacheboy.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cacheboy.blogspot.com/feeds/2434897275772904171/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=351311855402780691&amp;postID=2434897275772904171' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/351311855402780691/posts/default/2434897275772904171'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/351311855402780691/posts/default/2434897275772904171'/><link rel='alternate' type='text/html' href='http://cacheboy.blogspot.com/2009/02/lusca-and-bgp-take-2.html' title='Lusca and BGP, take 2.'/><author><name>Adrian</name><uri>http://www.blogger.com/profile/17496219706861321916</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='33' height='25' src='http://bp2.blogger.com/_37wobiQ3zUs/SAKidE1C46I/AAAAAAAAAAM/F065c-5eNR8/S220/2004-12-17-adrian.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-351311855402780691.post-2442489314557137463</id><published>2009-02-23T08:46:00.001-08:00</published><updated>2009-02-23T08:49:19.474-08:00</updated><title type='text'>Lusca and BGP</title><content type='html'>I've been fleshing out some very, very basic BGP support in a lusca-head branch. I'm only using the BGP information right now for  logging but I'll eventually use it as part of the request and reply processing.&lt;br /&gt;&lt;br /&gt;It *cough* mostly works. I need to figure out why there's occasional radix tree corruption (which probably means running it under valgrind to find when the radix code goes off the map..) and un-dirty some of the BGP code (ie, implement a real FSM; proper separation of the protocol handling, FSM, network and RIB code) and add in the AS path/community/attribute stuff before I commit it to LUSCA_HEAD.&lt;br /&gt;&lt;br /&gt;It is kind of cool though having a live BGP feed in your application. :) All 280,000 odd routes of it. :)&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/351311855402780691-2442489314557137463?l=cacheboy.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cacheboy.blogspot.com/feeds/2442489314557137463/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=351311855402780691&amp;postID=2442489314557137463' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/351311855402780691/posts/default/2442489314557137463'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/351311855402780691/posts/default/2442489314557137463'/><link rel='alternate' type='text/html' href='http://cacheboy.blogspot.com/2009/02/lusca-and-bgp.html' title='Lusca and BGP'/><author><name>Adrian</name><uri>http://www.blogger.com/profile/17496219706861321916</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='33' height='25' src='http://bp2.blogger.com/_37wobiQ3zUs/SAKidE1C46I/AAAAAAAAAAM/F065c-5eNR8/S220/2004-12-17-adrian.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-351311855402780691.post-184886102248225945</id><published>2009-02-01T21:36:00.001-08:00</published><updated>2009-02-01T21:43:23.533-08:00</updated><title type='text'>Lusca development, and changes to string handling</title><content type='html'>I've just renamed Cacheboy to "Lusca". I've had a few potential users comment that "Cacheboy" isn't uhm, "management compatible", so the project has been renamed to try and bring some of these users on board. I'm also hoping to make Lusca less Adrian-focused and involve more of the community. We'll see how that goes.&lt;br /&gt;&lt;br /&gt;In terms of development, I've shifted the code to http://code.google.com/p/lusca-cache/ and I'm continuing my work in /branches/LUSCA_HEAD.&lt;br /&gt;&lt;br /&gt;I've been working on src/http.c (the server-side HTTP code) in preparation for introducing reference counted buffer/string handling. I removed one copy (of the socket read buffer into another memory buffer, to assemble a buffer containing the HTTP reply, in preparation for parsing) and have just migrated that bit of the codebase over to use my reference counted buffer (buf_t; found in libmem/buf.[ch].) It's entirely possible that I've horribly broken the server-side code so I'm reluctant to do much else until I've finished restructuring and testing the server-side HTTP code.&lt;br /&gt;&lt;br /&gt;I've also been tidying up a few more places where the current String API is used "incorrectly", at least incorrectly for reference counted strings/buffers. I have ~ 61 code chunks to rewrite, mostly in the logging code. I've done it twice already in other branches, so this won't be terribly difficult. Its just boring. :)&lt;br /&gt;&lt;br /&gt;Oh, and I've also just removed the "caching" bits of the MemPools code. MemPools in LUSCA_HEAD is now just a small wrapper around malloc/calloc/free, mainly to preserve the "block allocator" style API and keep some statistics. At the end of the day, Squid uses memory very very poorly and the caching code in MemPools is purely to avoid said poor memory use. I'm going to just fix the memory use (mostly revolving around String buffers, HTTP headers and the TLV code, amazing that!) so the number of calls through the allocator is much, much reduced. I'm guessing once I've finished, the number of calls through the system allocator will be about 2 or 3% of what they are now. That should drop the CPU use quite a bit.&lt;br /&gt;&lt;br /&gt;Ah, now to find testers..&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/351311855402780691-184886102248225945?l=cacheboy.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cacheboy.blogspot.com/feeds/184886102248225945/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=351311855402780691&amp;postID=184886102248225945' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/351311855402780691/posts/default/184886102248225945'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/351311855402780691/posts/default/184886102248225945'/><link rel='alternate' type='text/html' href='http://cacheboy.blogspot.com/2009/02/lusca-development-and-changes-to-string.html' title='Lusca development, and changes to string handling'/><author><name>Adrian</name><uri>http://www.blogger.com/profile/17496219706861321916</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='33' height='25' src='http://bp2.blogger.com/_37wobiQ3zUs/SAKidE1C46I/AAAAAAAAAAM/F065c-5eNR8/S220/2004-12-17-adrian.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-351311855402780691.post-4897964105129503207</id><published>2009-01-20T15:31:00.000-08:00</published><updated>2009-01-20T15:56:48.398-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='performance'/><category scheme='http://www.blogger.com/atom/ns#' term='squid'/><category scheme='http://www.blogger.com/atom/ns#' term='cacheboy'/><category scheme='http://www.blogger.com/atom/ns#' term='oprofile'/><title type='text'>Where the CPU is going</title><content type='html'>Oprofile is fun.&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;So, lets find out all of the time spent in cacheboy-head, per-symbol, with accumulative time, but only showing symbols taking 1% or more of CPU:&lt;/div&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;root@jennifer:/home/adrian/work/cacheboy/branches/CACHEBOY_HEAD/src# opreport -la -t 1 ./squid&lt;br /&gt;CPU: PIII, speed 634.485 MHz (estimated)&lt;br /&gt;Counted CPU_CLK_UNHALTED events (clocks processor is not halted) with a unit mask of 0x00 (No unit mask) count 90000&lt;br /&gt;samples  cum. samples  %        cum. %     image name               symbol name&lt;br /&gt;2100394  2100394        6.9315   6.9315    libc-2.3.6.so            memcpy&lt;br /&gt;674036   2774430        2.2244   9.1558    libc-2.3.6.so            vfprintf&lt;br /&gt;657729   3432159        2.1706  11.3264    squid                    memPoolAlloc&lt;br /&gt;463901   3896060        1.5309  12.8573    libc-2.3.6.so            _int_malloc&lt;br /&gt;453978   4350038        1.4982  14.3555    libc-2.3.6.so            strncasecmp&lt;br /&gt;442439   4792477        1.4601  15.8156    libc-2.3.6.so            re_search_internal&lt;br /&gt;438752   5231229        1.4479  17.2635    squid                    comm_select&lt;br /&gt;423196   5654425        1.3966  18.6601    squid                    memPoolFree&lt;br /&gt;418949   6073374        1.3826  20.0426    squid                    stackPop&lt;br /&gt;412394   6485768        1.3609  21.4036    squid                    httpHeaderIdByName&lt;br /&gt;402709   6888477        1.3290  22.7325    libc-2.3.6.so            strtok&lt;br /&gt;364201   7252678        1.2019  23.9344    squid                    httpHeaderClean&lt;br /&gt;359257   7611935        1.1856  25.1200    squid                    statHistBin&lt;br /&gt;343628   7955563        1.1340  26.2540    squid                    SQUID_MD5Transform&lt;br /&gt;330128   8285691        1.0894  27.3434    libc-2.3.6.so            memset&lt;br /&gt;323962   8609653        1.0691  28.4125    libc-2.3.6.so            memchr&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;&lt;div&gt;Ok, thats sort of useful. Whats unfortunate is that there's uhm, a lot more symbols than that:&lt;/div&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;root@jennifer:/home/adrian/work/cacheboy/branches/CACHEBOY_HEAD/src# opreport -la ./squid | wc -l&lt;br /&gt;595&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;&lt;div&gt;Ok, so thats a bit annoying. 16 symbols take ~ 28% of the CPU time, but the other 569 odd take the ~ 72% remaining CPU. This sort of makes traditional optimisation techniques a bit pointless now. I've optimised almost all of the "stupid" bits - double/triple copying of data, over-allocating and freeing pointlessly, multiple parsing attempts, etc.&lt;/div&gt;&lt;br /&gt;&lt;div&gt;How many samples in total?&lt;/div&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;root@jennifer:/home/adrian/work/cacheboy/branches/CACHEBOY_HEAD/src# opreport -l ./squid | cut -f1 -d' ' | awk '{ s+= $1; } END { print s }'&lt;br /&gt;30302294&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;&lt;div&gt;Lets look now at what memcpy() is doing, just to get an idea of what needs to be changed&lt;/div&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;root@jennifer:/home/adrian/work/cacheboy/branches/CACHEBOY_HEAD/src# opreport -lc -t 1 -i memcpy ./squid&lt;br /&gt;CPU: PIII, speed 634.485 MHz (estimated)&lt;br /&gt;Counted CPU_CLK_UNHALTED events (clocks processor is not halted) with a unit mask of 0x00 (No unit mask) count 90000&lt;br /&gt;samples  %        image name               symbol name&lt;br /&gt;-------------------------------------------------------------------------------&lt;br /&gt; 28133     1.3394  squid                    storeSwapOut&lt;br /&gt; 31515     1.5004  squid                    stringInit&lt;br /&gt; 32619     1.5530  squid                    httpBuildRequestPrefix&lt;br /&gt; 54237     2.5822  squid                    strListAddStr&lt;br /&gt; 54322     2.5863  squid                    storeSwapMetaBuild&lt;br /&gt; 80047     3.8110  squid                    clientKeepaliveNextRequest&lt;br /&gt; 171738    8.1765  squid                    httpHeaderEntryParseCreate&lt;br /&gt; 211091   10.0501  squid                    httpHeaderEntryPackInto&lt;br /&gt; 318793   15.1778  squid                    stringDup&lt;br /&gt; 1022812  48.6962  squid                    storeAppend&lt;br /&gt;2100394  100.000  libc-2.3.6.so            memcpy&lt;br /&gt; 2100394  100.000  libc-2.3.6.so            memcpy [self]&lt;br /&gt;------------------------------------------------------------------------------&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;&lt;div&gt;So hm, half the memcpy() CPU time is spent in storeAppend, followed by storeDup, and httpHeaderEntryPackInto. Ok, those are what I'm going to be working on eliminating next anyway, so its not a big deal. This means I'll eliminate ~ 73% of the memcpy() CPU time, which is 73% of 7%, so around 5% of CPU time. Not too shabby. There'll be some overheads introduced by how its done (referenced buffer management) but one of the side-effects of that should be a drop in the number of calls to the memory allocator functions, so they should drop off a bit.&lt;/div&gt;&lt;br /&gt;&lt;div&gt;But this stuff is still just micro-optimisation. What I need is an idea of what code -paths- are taking up precious CPU time and thus what I should consider first to reimplement. Lets use the "-t" on non-top-level symbols. To start with, lets look at the two top-level "read" functions, which generally lead to some kind of other processing.&lt;/div&gt;&lt;br /&gt;&lt;pre&gt;root@jennifer:/home/adrian/work/cacheboy/branches/CACHEBOY_HEAD/src# opreport -lc -t 1 -i clientReadRequest ./squid&lt;br /&gt;CPU: PIII, speed 634.485 MHz (estimated)&lt;br /&gt;Counted CPU_CLK_UNHALTED events (clocks processor is not halted) with a unit mask of 0x00 (No unit mask) count 90000&lt;br /&gt;samples  %        symbol name&lt;br /&gt;-------------------------------------------------------------------------------&lt;br /&gt;  87536     4.7189  clientKeepaliveNextRequest&lt;br /&gt;  1758418  94.7925  comm_select&lt;br /&gt;88441    100.000  clientReadRequest&lt;br /&gt;  2121926  86.3731  clientTryParseRequest&lt;br /&gt;  88441     3.6000  clientReadRequest [self]&lt;br /&gt;  52951     2.1554  commSetSelect&lt;br /&gt;-------------------------------------------------------------------------------&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;root@jennifer:/home/adrian/work/cacheboy/branches/CACHEBOY_HEAD/src# opreport -lc -t 1 -i httpReadReply ./squid&lt;br /&gt;CPU: PIII, speed 634.485 MHz (estimated)&lt;br /&gt;Counted CPU_CLK_UNHALTED events (clocks processor is not halted) with a unit mask of 0x00 (No unit mask) count 90000&lt;br /&gt;samples  %        symbol name&lt;br /&gt;-------------------------------------------------------------------------------&lt;br /&gt;  3962448  99.7463  comm_select&lt;br /&gt;163081   100.000  httpReadReply&lt;br /&gt;  2781096  53.2193  httpAppendBody&lt;br /&gt;  1857597  35.5471  httpProcessReplyHeader&lt;br /&gt;  163081    3.1207  httpReadReply [self]&lt;br /&gt;  57084     1.0924  memBufGrow&lt;br /&gt;------------------------------------------------------------------------------&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;&lt;div&gt;Here we're not interested in who is -calling- these functions (since its just the comm routine :) but which functions this routine is calling. The next trick, of course, is to try and figure out which of these paths are taking a noticable amount of CPU time. Obviously httpAppendBody() and httpProcessReplyHeader() are; they're doing both a lot of copying and a lot of parsing.&lt;/div&gt;&lt;br /&gt;&lt;div&gt;I'll look into things a little more in-depth in a few days; I need to get back to paid work. :)&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/351311855402780691-4897964105129503207?l=cacheboy.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cacheboy.blogspot.com/feeds/4897964105129503207/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=351311855402780691&amp;postID=4897964105129503207' title='4 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/351311855402780691/posts/default/4897964105129503207'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/351311855402780691/posts/default/4897964105129503207'/><link rel='alternate' type='text/html' href='http://cacheboy.blogspot.com/2009/01/where-cpu-is-going.html' title='Where the CPU is going'/><author><name>Adrian</name><uri>http://www.blogger.com/profile/17496219706861321916</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='33' height='25' src='http://bp2.blogger.com/_37wobiQ3zUs/SAKidE1C46I/AAAAAAAAAAM/F065c-5eNR8/S220/2004-12-17-adrian.jpg'/></author><thr:total>4</thr:total></entry><entry><id>tag:blogger.com,1999:blog-351311855402780691.post-7321879568832517066</id><published>2009-01-19T20:09:00.002-08:00</published><updated>2009-01-19T20:24:09.271-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='performance'/><category scheme='http://www.blogger.com/atom/ns#' term='squid'/><category scheme='http://www.blogger.com/atom/ns#' term='cacheboy'/><category scheme='http://www.blogger.com/atom/ns#' term='oprofile'/><title type='text'>Eliminating copies, or "god this code is horrible"</title><content type='html'>I've been (slowlyish!) unwinding some of the evil horridness that exists in the src/http.c code which handles reading data from upstream servers/caches, parsing it, and throwing it into the store.&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;There's two annoying memory copies as I've said before - one was a copy of the incoming data into a MemBuf, used -just- to assemble the full response headers for parsing, and the other (well, other two) are for appending the data coming in from the network into the memory store, on its way to the client-side code to be sent back to the client.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Now, as I've said before, the src/http.c code isn't all that long and complicated (by far most of the logic actually happens in the forward and client-side routines; the http.c routines do very little besides pump data back into the memory store) but unfortunately enough various layers of logic are mashed together to make things uhm, "very difficult" to work on separately.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Anyway, back on track. I've mostly pulled apart the code which handles reading the reply and parsing the response headers, and I've eliminated the first copy. The data is now read directly into a MemBuf, which serves as both the incoming buffer (which gets appended to) for the reply status line + headers, _AND_ the incoming buffer for HTTP body data (which never gets appended to - it is written out to the memory store and then reset back to empty.)&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;So the good news now is the number one place for L2 loads, L2 stores and CPU cycles spent unhalted (as measured on my P3 667mhz celeron test box, nice and slow, to expose all those stupid inefficiencies modern CPUs try to cover up :) comes from the memcpy() from src/http.c -&gt; { header parsing (12%), http body appending (84%) } -&gt; storeAppend().&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;This means one main thing - if I can eliminate the copying from into the store, and instead read directly into variable-sized pages (which is unfortunately the bloody tricky part), which are then handed to their entirety to the memory store, that last memcpy() will be eliminated, along with hopefully a good 10 + % of CPU time on this P3.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;After that, its fixing the various uses of *printf() functions in the critical path, which absolutely should be avoided. I've got some basic patches to begin replacing some of the really STUPID uses of those. I'll begin committing the really obviously easy ones to Cacheboy HEAD once I've verified they don't break anything (in particular, SNMP indexes of all things..)&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Once the two above are done, which accounts for a good 15 - 20% of the current CPU use in Cacheboy (at least in my small objects, memory-cache-only test load on the above hardware), I'll absolutely stop adding any and all new changes, features, optimisations, etc, and go -straight- to "make everything stable" mode again.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;There's still so much that needs doing (proper refcounted buffers and strings, comm library functions which properly implement readv() and writev() so I can do things like write out the entire request/reply using vector operations and avoid the other bits of copying which go on, lessening the load on the memory allocator by actually efficiently packing structures, rewriting the http request/reply handling in preparation for replacement HTTP client/server modules, oh and IPv6/threading!) but that will come later.&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/351311855402780691-7321879568832517066?l=cacheboy.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cacheboy.blogspot.com/feeds/7321879568832517066/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=351311855402780691&amp;postID=7321879568832517066' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/351311855402780691/posts/default/7321879568832517066'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/351311855402780691/posts/default/7321879568832517066'/><link rel='alternate' type='text/html' href='http://cacheboy.blogspot.com/2009/01/eliminating-copies-or-god-this-code-is_19.html' title='Eliminating copies, or &quot;god this code is horrible&quot;'/><author><name>Adrian</name><uri>http://www.blogger.com/profile/17496219706861321916</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='33' height='25' src='http://bp2.blogger.com/_37wobiQ3zUs/SAKidE1C46I/AAAAAAAAAAM/F065c-5eNR8/S220/2004-12-17-adrian.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-351311855402780691.post-7205886175677459709</id><published>2009-01-19T20:09:00.001-08:00</published><updated>2009-01-19T20:09:45.145-08:00</updated><title type='text'>Eliminating copies, or "god this code is horrible"</title><content type='html'>&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/351311855402780691-7205886175677459709?l=cacheboy.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cacheboy.blogspot.com/feeds/7205886175677459709/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=351311855402780691&amp;postID=7205886175677459709' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/351311855402780691/posts/default/7205886175677459709'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/351311855402780691/posts/default/7205886175677459709'/><link rel='alternate' type='text/html' href='http://cacheboy.blogspot.com/2009/01/eliminating-copies-or-god-this-code-is.html' title='Eliminating copies, or &quot;god this code is horrible&quot;'/><author><name>Adrian</name><uri>http://www.blogger.com/profile/17496219706861321916</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='33' height='25' src='http://bp2.blogger.com/_37wobiQ3zUs/SAKidE1C46I/AAAAAAAAAAM/F065c-5eNR8/S220/2004-12-17-adrian.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-351311855402780691.post-3994473508443428708</id><published>2009-01-18T12:28:00.001-08:00</published><updated>2009-01-18T12:34:21.568-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='performance'/><category scheme='http://www.blogger.com/atom/ns#' term='squid'/><category scheme='http://www.blogger.com/atom/ns#' term='cacheboy'/><title type='text'>Tidying up the http reply handling code..</title><content type='html'>One of the unfortunate parts of the Squid codebase is that the HTTP request and reply handling code is messed up with the client and server code, and contains both stuff specific to a Cache (eg, looking for headers to control cache behaviour) as well as connection stuff (eg Transfer Encoding stuff, Keepalive, etc.)&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;My long-term goal is to finally separate all of this mess out so there's "generic" routines to be a HTTP client and server, create requests/replies and parse responses. But for now, tidying up some of the messy code to improve performance (and thus give people motivation to migrate their busy sites to Cacheboy) is on my short-term TODO list.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;I spent some time ~ 18 months ago tidying up all of the client-side code so the request line and request header parsing didn't require half a dozen copies of various things just to complete. That was quite successful. The code structure is still horrible, but it works, and that for now is absolutely the most important part.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Now I'm doing something similar to the server-side code. The HTTP server code (src/http.c) combines both reply buffer appending, parsing, 100-continue response handling (well, "handling") and the various header checks for caching and connection in one enormous puddle of code. I'm trying to tease these apart so each part is done separately and the reply data isn't double-copied - once into the reply buffer, then once via storeAppend() into the memory store.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;The CPU time spent doing this copying isn't all that high on current systems but it is definitely noticable (~30% of all CPU time spent in memcpy()) for slower systems talking to LAN-connected servers. So I'm going to do it - primarily to fix performance on slower hardware, but it also forces me to tidy up the existing code somewhat.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;The next step is avoiding the copy into the memory store entirely, removing another 65% or so of memcpy() CPU time.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/351311855402780691-3994473508443428708?l=cacheboy.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cacheboy.blogspot.com/feeds/3994473508443428708/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=351311855402780691&amp;postID=3994473508443428708' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/351311855402780691/posts/default/3994473508443428708'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/351311855402780691/posts/default/3994473508443428708'/><link rel='alternate' type='text/html' href='http://cacheboy.blogspot.com/2009/01/tidying-up-http-reply-handling-code.html' title='Tidying up the http reply handling code..'/><author><name>Adrian</name><uri>http://www.blogger.com/profile/17496219706861321916</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='33' height='25' src='http://bp2.blogger.com/_37wobiQ3zUs/SAKidE1C46I/AAAAAAAAAAM/F065c-5eNR8/S220/2004-12-17-adrian.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-351311855402780691.post-4640741584294544117</id><published>2009-01-16T10:41:00.000-08:00</published><updated>2009-01-16T10:55:05.064-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='performance'/><category scheme='http://www.blogger.com/atom/ns#' term='squid'/><category scheme='http://www.blogger.com/atom/ns#' term='cacheboy'/><title type='text'>Refcounted string buffers!</title><content type='html'>Those of you who have been watching may have noticed a few String tidyups going into CACHEBOY_HEAD recently (one of which caused a bug in the first cacheboy-1.6 stable release that made it very non-stable!)&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;This is all in preparation for more sensible string and buffer handling. Unfortunately the Cacheboy codebase inherited a lot of dirty string handling and it needed some house cleaning before I could look towards the future.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Well, the future is here now (well, in /svn/branches/CACHEBOY_HEAD_strref ...) - I brought in my refcounted buffer routines from my previous attempts at all of this and converted String.[ch] over to use it.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;For now, the refcounted string implementation doubles the malloc overhead for new strings (since it has to create a small buf_t and a string buffer) but stringDup() becomes essentially free. Since in a lot of cases, the stringDup() occurs when copying string headers and basically leaving them alone, this saves on a bunch of memory copying.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Decent performance benefits will only come with a whole lot of work:&lt;/div&gt;&lt;div&gt;&lt;ul&gt;&lt;li&gt;Remove all of the current assumptions in code which uses String that the actual backing buffer (accessible via strBuf()) is NUL-terminated;&lt;/li&gt;&lt;li&gt;Rewrite sections of the code which go between String and C string buffers (with copying, etc) to use String where applicable. Unfortunately a whole lot of the original client_side.c code which handles parsing the request involves a fair bit of crap - so..&lt;/li&gt;&lt;li&gt;.. writing replacement request and reply HTTP parsers is probably the next thing to do;&lt;/li&gt;&lt;li&gt;Shuffling around the client-side code and the http code to use a buf_t as a incoming socket buffer, instead of how they currently do things (in an ugly way..)&lt;/li&gt;&lt;li&gt;Propagate down the incoming socket buffer to the request/reply parsing code, so said code can simply create references to the original socket buffer, bypassing any and all requirement for copying the request/reply data seperately.&lt;/li&gt;&lt;/ul&gt;&lt;/div&gt;&lt;div&gt;I'm reasonably excited about the future benefits this code holds, but for now I'm going to remain reasonably conservative and leave the current String improvements where they are. I don't mind if these and the next round of changes to the MemBuf code reduce performance but improve the code; I know that the medium-term goal is going to provide some pretty decent benefits and I want to keep things stable and usable in production whilst I get there.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Next on my list though; looking at removing the places where *printf() is used in critical sections..&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/351311855402780691-4640741584294544117?l=cacheboy.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cacheboy.blogspot.com/feeds/4640741584294544117/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=351311855402780691&amp;postID=4640741584294544117' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/351311855402780691/posts/default/4640741584294544117'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/351311855402780691/posts/default/4640741584294544117'/><link rel='alternate' type='text/html' href='http://cacheboy.blogspot.com/2009/01/refcounted-string-buffers.html' title='Refcounted string buffers!'/><author><name>Adrian</name><uri>http://www.blogger.com/profile/17496219706861321916</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='33' height='25' src='http://bp2.blogger.com/_37wobiQ3zUs/SAKidE1C46I/AAAAAAAAAAM/F065c-5eNR8/S220/2004-12-17-adrian.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-351311855402780691.post-857576985502623480</id><published>2009-01-09T07:08:00.000-08:00</published><updated>2009-01-09T07:11:55.340-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='performance'/><category scheme='http://www.blogger.com/atom/ns#' term='squid'/><category scheme='http://www.blogger.com/atom/ns#' term='cacheboy'/><category scheme='http://www.blogger.com/atom/ns#' term='oprofile'/><title type='text'>More profiling!</title><content type='html'>&lt;p&gt;The following info is for a 10,000 concurrent connections, keep-alived, of just a fetch of an internal icon object from Squid. This is using my apachebench-adrian package which can handle such traffic loads.&lt;br /&gt;&lt;br /&gt;The below accounts for roughly 60% of total CPU time (ie, 60% of the CPU is spent in userspace) on one core.&lt;br /&gt;With oprofile, it hits around 12,300 transactions a second.&lt;br /&gt;&lt;br /&gt;I have much, much hatred for how Squid uses *printf() everywhere. Sigh.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;/p&gt;&lt;div&gt;&lt;pre&gt;&lt;br /&gt;&lt;br /&gt;CPU: AMD64 processors, speed 2613.4 MHz (estimated)&lt;br /&gt;Counted CPU_CLK_UNHALTED events (Cycles outside of halt state) with a unit mask of 0x00 (No unit mask) count 100000&lt;br /&gt;samples  cum. samples  %        cum. %     image name               symbol name&lt;br /&gt;5383709  5383709        4.5316   4.5316    libc-2.6.1.so            vfprintf&lt;br /&gt;4025991  9409700        3.3888   7.9203    libc-2.6.1.so            memcpy&lt;br /&gt;3673722  13083422       3.0922  11.0126    libc-2.6.1.so            _int_malloc&lt;br /&gt;3428362  16511784       2.8857  13.8983    libc-2.6.1.so            memset&lt;br /&gt;3306571  19818355       2.7832  16.6815    libc-2.6.1.so            malloc_consolidate&lt;br /&gt;2847887  22666242       2.3971  19.0787    squid                    memPoolFree&lt;br /&gt;2634120  25300362       2.2172  21.2958    libm-2.6.1.so            floor&lt;br /&gt;2609922  27910284       2.1968  23.4927    squid                    memPoolAlloc&lt;br /&gt;2408836  30319120       2.0276  25.5202    libc-2.6.1.so            re_search_internal&lt;br /&gt;2296612  32615732       1.9331  27.4534    libc-2.6.1.so            strlen&lt;br /&gt;2265816  34881548       1.9072  29.3605    libc-2.6.1.so            _int_free&lt;br /&gt;1826493  36708041       1.5374  30.8979    libc-2.6.1.so            _IO_default_xsputn&lt;br /&gt;1641986  38350027       1.3821  32.2800    libc-2.6.1.so            free&lt;br /&gt;1601997  39952024       1.3484  33.6285    squid                    httpHeaderGetEntry&lt;br /&gt;1575919  41527943       1.3265  34.9549    libc-2.6.1.so            memchr&lt;br /&gt;1466114  42994057       1.2341  36.1890    libc-2.6.1.so            re_string_reconstruct&lt;br /&gt;1275377  44269434       1.0735  37.2625    squid                    clientTryParseRequest&lt;br /&gt;1214714  45484148       1.0225  38.2850    squid                    httpMsgFindHeadersEnd&lt;br /&gt;1185932  46670080       0.9982  39.2832    squid                    statHistBin&lt;br /&gt;1170361  47840441       0.9851  40.2683    squid                    urlCanonicalClean&lt;br /&gt;1169694  49010135       0.9846  41.2529    libc-2.6.1.so            strtok&lt;br /&gt;1145933  50156068       0.9646  42.2174    squid                    comm_select&lt;br /&gt;1128595  51284663       0.9500  43.1674    libc-2.6.1.so            __GI_____strtoll_l_internal&lt;br /&gt;1116573  52401236       0.9398  44.1072    squid                    httpHeaderIdByName&lt;br /&gt;956209   53357445       0.8049  44.9121    squid                    SQUID_MD5Transform&lt;br /&gt;915844   54273289       0.7709  45.6830    squid                    memBufAppend&lt;br /&gt;907609   55180898       0.7640  46.4469    squid                    stringLimitInit&lt;br /&gt;898666   56079564       0.7564  47.2034    libc-2.6.1.so            strspn&lt;br /&gt;883282   56962846       0.7435  47.9468    squid                    urlParse&lt;br /&gt;852875   57815721       0.7179  48.6647    libc-2.6.1.so            calloc&lt;br /&gt;819613   58635334       0.6899  49.3546    squid                    clientWriteComplete&lt;br /&gt;800196   59435530       0.6735  50.0281    squid                    httpMsgParseRequestLine&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/351311855402780691-857576985502623480?l=cacheboy.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cacheboy.blogspot.com/feeds/857576985502623480/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=351311855402780691&amp;postID=857576985502623480' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/351311855402780691/posts/default/857576985502623480'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/351311855402780691/posts/default/857576985502623480'/><link rel='alternate' type='text/html' href='http://cacheboy.blogspot.com/2009/01/more-profiling.html' title='More profiling!'/><author><name>Adrian</name><uri>http://www.blogger.com/profile/17496219706861321916</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='33' height='25' src='http://bp2.blogger.com/_37wobiQ3zUs/SAKidE1C46I/AAAAAAAAAAM/F065c-5eNR8/S220/2004-12-17-adrian.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-351311855402780691.post-2873957127576524364</id><published>2009-01-08T15:53:00.000-08:00</published><updated>2009-01-08T15:56:26.981-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='TPROXY'/><category scheme='http://www.blogger.com/atom/ns#' term='squid'/><category scheme='http://www.blogger.com/atom/ns#' term='cacheboy'/><title type='text'>FreeBSD TPROXY works!</title><content type='html'>The FreeBSD TPROXY support (with a patched FreeBSD kernel for now) works just fine in testing.&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;I'm going to commit the changes to FreeBSD in the next couple of days. I'll then bring in the TPROXY4 support from Squid-3, and hopefully get functioning TPROXY2, TPROXY4 and FreeBSD TPROXY support into the upcoming Cacheboy-1.6 release.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/351311855402780691-2873957127576524364?l=cacheboy.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cacheboy.blogspot.com/feeds/2873957127576524364/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=351311855402780691&amp;postID=2873957127576524364' title='13 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/351311855402780691/posts/default/2873957127576524364'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/351311855402780691/posts/default/2873957127576524364'/><link rel='alternate' type='text/html' href='http://cacheboy.blogspot.com/2009/01/freebsd-tproxy-works.html' title='FreeBSD TPROXY works!'/><author><name>Adrian</name><uri>http://www.blogger.com/profile/17496219706861321916</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='33' height='25' src='http://bp2.blogger.com/_37wobiQ3zUs/SAKidE1C46I/AAAAAAAAAAM/F065c-5eNR8/S220/2004-12-17-adrian.jpg'/></author><thr:total>13</thr:total></entry><entry><id>tag:blogger.com,1999:blog-351311855402780691.post-2277920059214784516</id><published>2009-01-07T14:43:00.000-08:00</published><updated>2009-01-07T14:45:39.144-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='squid'/><category scheme='http://www.blogger.com/atom/ns#' term='cacheboy'/><title type='text'>TPROXY support</title><content type='html'>G'day,&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;I've done a bit of shuffling in the communication code to include a more modular approach to IP source address spoofing.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;There's (currently) untested support for some FreeBSD source IP address spoofing that I'm bringing over courtesy of Julian Elischer; and there's a Linux TPROXY2 module.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;I'll look at porting over the TPROXY4 support from Squid-3 in a few days.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;I think this release is about as close to "stable" as Cacheboy-1.6 is going to get, so look forward to a "stable" release as soon as the FreeBSD port has been setup.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;I already have a list of things to do for Cacheboy-1.7 which should prove to be interesting. Stay tuned..&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/351311855402780691-2277920059214784516?l=cacheboy.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cacheboy.blogspot.com/feeds/2277920059214784516/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=351311855402780691&amp;postID=2277920059214784516' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/351311855402780691/posts/default/2277920059214784516'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/351311855402780691/posts/default/2277920059214784516'/><link rel='alternate' type='text/html' href='http://cacheboy.blogspot.com/2009/01/tproxy-support.html' title='TPROXY support'/><author><name>Adrian</name><uri>http://www.blogger.com/profile/17496219706861321916</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='33' height='25' src='http://bp2.blogger.com/_37wobiQ3zUs/SAKidE1C46I/AAAAAAAAAAM/F065c-5eNR8/S220/2004-12-17-adrian.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-351311855402780691.post-7786050132140719206</id><published>2009-01-04T18:54:00.000-08:00</published><updated>2009-01-04T19:03:58.473-08:00</updated><title type='text'>next steps..</title><content type='html'>I've been slowly fixing whatever bugs creep up in my local testing. The few people publicly testing Cacheboy-1.6 have reported that all the bugs have been fixed. I'd appreciate some further testing but I'll get what I can for now. :)&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;I'll be doing a few things to the CACHEBOY_HEAD branch after the 1.6 release which will hopefully lead towards a mostly thread-safe core. I'm also busy documenting various things in the core libraries which I haven't yet gotten around to. I also really should sort out some changes to the apple HeaderDoc software to support generating slightly better looking documents from C source.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;I've almost finished the first round of code reorganisation. The stuff that I would have liked to have done this round includes shuffling out the threaded async operations from the AUFS code and building a proper disk library, sort of like what Squid-2.2 "had" and what Squid-3.0 almost did; but instead use them to implement general disk IO versus specialised modules just for storage. I'd like to take advantage of threaded/non-blocking disk IO in a variety of situations, including logfile writing.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/351311855402780691-7786050132140719206?l=cacheboy.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cacheboy.blogspot.com/feeds/7786050132140719206/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=351311855402780691&amp;postID=7786050132140719206' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/351311855402780691/posts/default/7786050132140719206'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/351311855402780691/posts/default/7786050132140719206'/><link rel='alternate' type='text/html' href='http://cacheboy.blogspot.com/2009/01/next-steps.html' title='next steps..'/><author><name>Adrian</name><uri>http://www.blogger.com/profile/17496219706861321916</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='33' height='25' src='http://bp2.blogger.com/_37wobiQ3zUs/SAKidE1C46I/AAAAAAAAAAM/F065c-5eNR8/S220/2004-12-17-adrian.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-351311855402780691.post-244862020820475004</id><published>2008-12-28T12:04:00.000-08:00</published><updated>2008-12-28T12:08:12.337-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='squid'/><category scheme='http://www.blogger.com/atom/ns#' term='cacheboy'/><title type='text'>Cacheboy-HEAD updates</title><content type='html'>I've finished cleaning up the bits of the IPv6 work from CACHEBOY_HEAD - it should be just a slightly better structured Squid-2.HEAD / Cacheboy-1.5.&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Right now I'm pulling out as much of the HTTP related code from src/ into libhttp/ before the 1.6 release. I'm hoping to glue together bits and pieces of the HTTP code into a very lightweight (for Squid) HTTP server implementation which can be used to test out various things like thread safe-ness. Of course, properly testing thread-safeness in production relies on a lot of the other code being thread-safe, like the comm code, the event registration code, the memory allocation code, the debugging and logging code ... aiee, etc. Oh well, I said I wanted to..&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;I'm also going through and adding some headerDoc comments to various library files. headerDoc (from apple) is actually rather nice. It lacks -one- function - the ability to merge multiple files together (say, libsqinet/sqinet.[ch]) into one "module" for documentation. I may look at doing that in some of my spare time.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/351311855402780691-244862020820475004?l=cacheboy.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cacheboy.blogspot.com/feeds/244862020820475004/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=351311855402780691&amp;postID=244862020820475004' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/351311855402780691/posts/default/244862020820475004'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/351311855402780691/posts/default/244862020820475004'/><link rel='alternate' type='text/html' href='http://cacheboy.blogspot.com/2008/12/cacheboy-head-updates.html' title='Cacheboy-HEAD updates'/><author><name>Adrian</name><uri>http://www.blogger.com/profile/17496219706861321916</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='33' height='25' src='http://bp2.blogger.com/_37wobiQ3zUs/SAKidE1C46I/AAAAAAAAAAM/F065c-5eNR8/S220/2004-12-17-adrian.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-351311855402780691.post-5116799852628674206</id><published>2008-12-27T15:28:00.000-08:00</published><updated>2008-12-27T15:36:03.159-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='squid'/><category scheme='http://www.blogger.com/atom/ns#' term='cacheboy'/><title type='text'>Reverting IPv6 for now; moving forward with structural changes</title><content type='html'>I've been working on the IPv6 support in Cacheboy for a couple months now and I've come to the conclusion that I'm not getting anywhere near as far along the development path as I wanted to be.&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;So I've taken a rather drastic step - I've branched off CACHEBOY_HEAD from the last point along the main codebase where the non-intrusive IPv6 changes had occured and I'm going to pursue Cacheboy-1.6 development from that.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;The primary short-term goal with Cacheboy was to restructure the codebase in such a way as to make further development much, much simpler. I sort of lost track with the IPv6 development stuff and I rushed it in when the codebase obviously wasn't ready.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;So, the IPv6 changes will stay in the CACHEBOY_PRE branch for now; development will continue in CACHEBOY_HEAD. I'll continue the restructuring work and stability work towards a Cacheboy-1.6 release come January 1. I'll then look at merging over the IPv6 infrastructure work into CACHEBOY_HEAD far before I merge in the client and server related code - specifically, completing the DNS updates, the ipcache/fqdncache updates, port over the IPv6 SNMP changes from Squid-3, and look at modularising the ACL code in preparation for IPv6'ifying that. The goal is less to IPv6-ify Cacheboy; its more to tidy up the code to the point where IPv6 becomes trivial.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/351311855402780691-5116799852628674206?l=cacheboy.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cacheboy.blogspot.com/feeds/5116799852628674206/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=351311855402780691&amp;postID=5116799852628674206' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/351311855402780691/posts/default/5116799852628674206'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/351311855402780691/posts/default/5116799852628674206'/><link rel='alternate' type='text/html' href='http://cacheboy.blogspot.com/2008/12/reverting-ipv6-for-now-moving-forward.html' title='Reverting IPv6 for now; moving forward with structural changes'/><author><name>Adrian</name><uri>http://www.blogger.com/profile/17496219706861321916</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='33' height='25' src='http://bp2.blogger.com/_37wobiQ3zUs/SAKidE1C46I/AAAAAAAAAAM/F065c-5eNR8/S220/2004-12-17-adrian.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-351311855402780691.post-2451409792519408548</id><published>2008-11-29T12:19:00.000-08:00</published><updated>2008-11-29T12:49:29.383-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='nnrp'/><category scheme='http://www.blogger.com/atom/ns#' term='nntp'/><category scheme='http://www.blogger.com/atom/ns#' term='usenet'/><title type='text'>Working on NNRP proxies, or writing threaded C code..</title><content type='html'>So it turns out that I'm working on some closed-source NNRP proxy code. It sits between clients / readers and backend spool servers and directs/balances requests to the correct backend servers as required.&lt;br /&gt;&lt;br /&gt;News is a kind of interesting setup. There are servers with just lots of articles, indexed via message ID (or a hash thereof.) There are servers with the overview databases, which keep track of article ids, message ids, group names, and all that junk. The client reader interface has a series of commands which may invoke a combination of access to both the overview databases and the article spools.&lt;br /&gt;&lt;br /&gt;I'm working on a bit of code which began life as a reader -&gt; spool load balancer; I'm turning it into a general reader and client facing bit of software which speaks enough NNRP to route connections to the relevant backend servers. The architecture is pretty simplistic - one thread per backend connection, one thread per client connection, "message queues" sit between all of these and use pthread primitives to serialise access. For the number of requests and concurrency, it scales quite well. It won't scale to 100,000 connections by any means but considering the article sizes (megabytes at a time) a 10GE pipe will be filled far, far before that sort of connection and request rate limit is reached.&lt;br /&gt;&lt;br /&gt;So, today's post is going to cover a few things I've learnt whilst writing this bit of code. Its in C, so by its very nature its going to be horrible. The question is whether I can make it less horrible to work with.&lt;br /&gt;&lt;br /&gt;Each client thread sits in a loop reading requests, parsing them, figuring out what needs to happen, then queuing messages to the relevant spool or reader server queue to be handled by one of the connection threads. Its relatively straightforward. The trick is to figure out how to keep connections around long enough so the thing you've sent the request to is still there when you reply.&lt;br /&gt;&lt;br /&gt;There's a couple of options which are used in the codebase.&lt;br /&gt;&lt;br /&gt;The first is what the previous authors did - they would parse an article request (ARTICLE, HEAD, BODY), create a request, push it onto the queue, and wait 1 second for a reply. If the reply didn't occur in that time they would push another request to another server. The idea is to minimise latency on the article fetches - instead of waiting around for a potentially overloaded server, they just queue requests to the other servers which may have the article and then stop queuing requests when one issues a reply. The rest of the replies then have to be dequeued and tossed away.&lt;br /&gt;&lt;br /&gt;The second is what I did for the reader/overview side - I would parse a client request, (GROUP, STAT, XOVER, etc), create a request to the backend, push it onto the queue, and wait for the reply. The backend code took care of trying the set of commands required to handle that client request (eg a STAT &lt;articlenum&gt; would require a GROUP, then a STAT; but a STAT &lt;msgid&gt; would only require a STAT &lt;msgid&gt; on the backend), with explicit timeouts. If the request didn't happen by then, the backend reader thread would send a "timeout" reply to the client thread, and then attempt to complete the transaction before dequeuing the next.&lt;br /&gt;&lt;br /&gt;There are some implications from the above!&lt;br /&gt;&lt;br /&gt;The first method is easier to code and easier to understand conceptually - the client handles timeouts and throws away unwanted responses. The backend server code is easy - dequeue, attempt the request until completion or error, return. The problem is that there is no guaranteed time in which the client will be notified of the completion of the request.&lt;br /&gt;&lt;br /&gt;The second method is trickier. The backend thread handles timeouts and sends them to the client thread. The backend then needs to track the NNTP transaction state so it can resume it and run the request to completion, tossing away whatever data was being returned. The benefit is that the client -will- get a message from the backend in the specified time period.&lt;br /&gt;&lt;br /&gt;These approaches aren't mutually exclusive either. The first works better for article fetches where the isn't any code to try and monitor server performance and issue requests to servers that are responding quickly. I'm going to add that code in soon anyway. The second approach works great for the reader commands because they're either nice and quick, or they're extremely long-lived. Article replies are generally maxing out at a few megabytes. Overview commands can server back tens or hundreds of megabytes of database information and this can take time.&lt;br /&gt;&lt;br /&gt;One of the important implications is when the client thread can be freed. In the first method, the client thread MUST stay around until all the pending article requests have been replied to in some fashion. In the second method, the client thread waits for a response to its message immediately after queuing it, so it doesn't have to reap queue events on connection shutdown.&lt;br /&gt;&lt;br /&gt;The current crash bugs I've seen seem to be related to message queuing. I'm seeing both junk being dequeued from the client reader queue (when there should be NO messages pending in that queue once a command has been fully processed!) and I'm seeing article responses being sent to queues for clients which have been destroyed for one reason or another. I'm going to spend some time over the next few hours putting in assert()ions to track these conditions down and naff them on the head before the stack gets scrambled and I end up with a 4 gigabyte core which gives me absolutely no useful traceback. :P&lt;br /&gt;&lt;br /&gt;Oh look, the application cored again, this time in the GNU malloc code! Time to figure out what is going on again..&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/351311855402780691-2451409792519408548?l=cacheboy.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cacheboy.blogspot.com/feeds/2451409792519408548/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=351311855402780691&amp;postID=2451409792519408548' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/351311855402780691/posts/default/2451409792519408548'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/351311855402780691/posts/default/2451409792519408548'/><link rel='alternate' type='text/html' href='http://cacheboy.blogspot.com/2008/11/working-on-nnrp-proxies-or-writing.html' title='Working on NNRP proxies, or writing threaded C code..'/><author><name>Adrian</name><uri>http://www.blogger.com/profile/17496219706861321916</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='33' height='25' src='http://bp2.blogger.com/_37wobiQ3zUs/SAKidE1C46I/AAAAAAAAAAM/F065c-5eNR8/S220/2004-12-17-adrian.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-351311855402780691.post-71167261102967858</id><published>2008-11-22T11:31:00.001-08:00</published><updated>2008-11-22T12:04:25.126-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='squid'/><category scheme='http://www.blogger.com/atom/ns#' term='cacheboy'/><title type='text'>Updates!</title><content type='html'>A few updates!&lt;br /&gt;&lt;br /&gt;I've fixed a few bugs in CACHEBOY_PRE which will be back-ported to CACHEBOY_1.5. This is in line with my current goal of stability before features. CACHEBOY_PRE and CACHEBOY_1.5 have passed all the polygraph runs I've been throwing at them and there aren't any outstanding stability issues in the Issue tracker.&lt;br /&gt;&lt;br /&gt;I'll roll CACHEBOY_1.6.PRE3 and CACHEBOY_1.5.2 releases in the next day or two and get those out there.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/351311855402780691-71167261102967858?l=cacheboy.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cacheboy.blogspot.com/feeds/71167261102967858/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=351311855402780691&amp;postID=71167261102967858' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/351311855402780691/posts/default/71167261102967858'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/351311855402780691/posts/default/71167261102967858'/><link rel='alternate' type='text/html' href='http://cacheboy.blogspot.com/2008/11/updates.html' title='Updates!'/><author><name>Adrian</name><uri>http://www.blogger.com/profile/17496219706861321916</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='33' height='25' src='http://bp2.blogger.com/_37wobiQ3zUs/SAKidE1C46I/AAAAAAAAAAM/F065c-5eNR8/S220/2004-12-17-adrian.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-351311855402780691.post-3258232688895687846</id><published>2008-10-16T21:10:00.000-07:00</published><updated>2008-10-16T21:15:56.440-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='ipv6'/><category scheme='http://www.blogger.com/atom/ns#' term='squid'/><category scheme='http://www.blogger.com/atom/ns#' term='cacheboy'/><title type='text'>Serving IPv6 from Cacheboy-1.6.PRE2</title><content type='html'>I've done the very minimum amount of work required to get Cacheboy-1.6.PRE2 to the point where it'll handle IPv6 client requests. I've put it in front of http://www.cacheboy.net/ which now has v4 and v6 records.&lt;br /&gt;&lt;br /&gt;There's still plenty of work to do to bring it up to par with the Squid-3 IPv6 support but that will have to wait a while. Specifically, (if anyone feels up to handling it), the dns, ipcache and fqdncache code all needs to be massaged to support IPv4 and IPv6 handling. It shouldn't be that much work.&lt;br /&gt;&lt;br /&gt;Cacheboy-1.6 is definitely now in the "freeze and fix bugs as they creep up" stage. I'll continue the memory allocator and HTTP parser code reimplementation in their respective branches and get them ready for merge once I'm happy 1.6 is stable. The rest of the IPv6 support will also have to wait.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/351311855402780691-3258232688895687846?l=cacheboy.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cacheboy.blogspot.com/feeds/3258232688895687846/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=351311855402780691&amp;postID=3258232688895687846' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/351311855402780691/posts/default/3258232688895687846'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/351311855402780691/posts/default/3258232688895687846'/><link rel='alternate' type='text/html' href='http://cacheboy.blogspot.com/2008/10/serving-ipv6-from-cacheboy-16pre2.html' title='Serving IPv6 from Cacheboy-1.6.PRE2'/><author><name>Adrian</name><uri>http://www.blogger.com/profile/17496219706861321916</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='33' height='25' src='http://bp2.blogger.com/_37wobiQ3zUs/SAKidE1C46I/AAAAAAAAAAM/F065c-5eNR8/S220/2004-12-17-adrian.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-351311855402780691.post-2309540264748054629</id><published>2008-10-03T05:00:00.001-07:00</published><updated>2008-10-03T05:03:32.504-07:00</updated><title type='text'>Cacheboy IPv6 update</title><content type='html'>I've made some progress in the IPv6 reorganisation in cacheboy. I've converted the ACL, authentication and ident code over to support v4/v6. I'm now going to convert over the client_db, request_t structure and then the related stuff like logging, x-forwarded-for, etc. I'll then revisit what else is required before I enable v6 sockets on the http client-side. It -should- be pretty minimal - persistent connections/connection pinning (for just assembling the hash key) and some SNMP code to just gloss over IPv6 connections for the time being.&lt;br /&gt;&lt;br /&gt;Hm, I was hoping to have this all done by the end of September but I've been a bit busy with paid work. I'll hopefully have this done just after NYCBSDCON. I hope. :)&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/351311855402780691-2309540264748054629?l=cacheboy.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cacheboy.blogspot.com/feeds/2309540264748054629/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=351311855402780691&amp;postID=2309540264748054629' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/351311855402780691/posts/default/2309540264748054629'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/351311855402780691/posts/default/2309540264748054629'/><link rel='alternate' type='text/html' href='http://cacheboy.blogspot.com/2008/10/cacheboy-ipv6-update.html' title='Cacheboy IPv6 update'/><author><name>Adrian</name><uri>http://www.blogger.com/profile/17496219706861321916</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='33' height='25' src='http://bp2.blogger.com/_37wobiQ3zUs/SAKidE1C46I/AAAAAAAAAAM/F065c-5eNR8/S220/2004-12-17-adrian.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-351311855402780691.post-2214895444603839693</id><published>2008-09-21T08:41:00.000-07:00</published><updated>2008-09-21T09:25:16.492-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='cacheboy ipv6'/><title type='text'>IPv6 ACL code, sort of!</title><content type='html'>I'm just doing a spot of testing with my new IPv6 ACL code.&lt;br /&gt;&lt;br /&gt;Take a look at this:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;(adrian) agnus:~/work/cacheboy/playpen/ipv6_acl/tools% ./squidclient mgr:config@PASSWORD | grep acl&lt;br /&gt;acl all src 0.0.0.0/0.0.0.0&lt;br /&gt;acl all6 src6 ::/::&lt;br /&gt;acl lclnet6 src6 fe80::/fff0::&lt;br /&gt;acl test1 src6 2a01:348:147:5::/ffff:ffff:ffff:ffff::&lt;br /&gt;acl test1 src6 fe80::/fff0::&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;&lt;br /&gt;That there is an IPv6 "src6" ACL (well, three) with somewhat unfriendly netmask display code. I'll tidy that up later. Importantly, the IPv6 code seems to be coming along fine. I'm going to generate up some large random IPv4 and IPv6 ACLs tomorrow to make sure they load in and display out from the splay tree fine, then I'll look at writing some test cases for all of this.&lt;br /&gt;&lt;br /&gt;The last bit of code that needs converting before -very basic- client-side IPv6 support can be enabled is to convvert the ACL checklist struct "src_addr" and "my_addr" over to sqaddr_t IPv6 types. This will probably require a whole lot of horrible code changes but luckily I can convert most of them to just be "assign that an IPv4 address thx" and everything should just work as before. Although I need to remind myself to make sure aclMatchIp() checks the _type_ of the ACL its looking up against - doing an IPv4 lookup against an IPv6 splay tree won't really work out.&lt;br /&gt;&lt;br /&gt;(Amos / Squid-3 have a single IPv6 "type" for this, and the IPv4 addresses are merged into the IPv6 address space. The ACL types for IP src/dst/myip is then -always- an IPv6 type lookup. I decided to keep seperate IPv4/IPv6 ACL types for now to make testing and development easier. It will double up on the ACL sizes a little - holy crap, I'm doing something less efficient then Squid-3?!? - but thats a small price to pay at the moment for an easier to migrate codebase. Basically, if you compile this up and listen on an IPv6 address, but don't configure an IPv6 ACL, you won't get surprised when IPv6 requests are let through when they shouldn't..)&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/351311855402780691-2214895444603839693?l=cacheboy.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cacheboy.blogspot.com/feeds/2214895444603839693/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=351311855402780691&amp;postID=2214895444603839693' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/351311855402780691/posts/default/2214895444603839693'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/351311855402780691/posts/default/2214895444603839693'/><link rel='alternate' type='text/html' href='http://cacheboy.blogspot.com/2008/09/ipv6-acl-code-sort-of.html' title='IPv6 ACL code, sort of!'/><author><name>Adrian</name><uri>http://www.blogger.com/profile/17496219706861321916</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='33' height='25' src='http://bp2.blogger.com/_37wobiQ3zUs/SAKidE1C46I/AAAAAAAAAAM/F065c-5eNR8/S220/2004-12-17-adrian.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-351311855402780691.post-32716749037579545</id><published>2008-09-05T23:29:00.000-07:00</published><updated>2008-09-05T23:31:37.411-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='news'/><category scheme='http://www.blogger.com/atom/ns#' term='ipv6'/><category scheme='http://www.blogger.com/atom/ns#' term='cacheboy'/><title type='text'>Cacheboy-1.5: IPv6 DNS servers</title><content type='html'>I'm just debugging the last couple of issues with the IPv6-aware UDP/TCP DNS code. The Internal DNS resolver still only understands IPv4 code (and, more importantly, the ipcache/fqdncache layer too!) but the code itself will communicate with IPv4/IPv6 DNS servers.&lt;br /&gt;&lt;br /&gt;I think I'll stop the development here and concentrate on getting the Cacheboy-1.5 release out the door. I'll then work on IPv6 record resolution in a seperate branch in preparation for Cacheboy-1.6. I may even break out the ipcache/fqdncache code into external libraries so I can reuse/debug/test that code during development.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/351311855402780691-32716749037579545?l=cacheboy.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cacheboy.blogspot.com/feeds/32716749037579545/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=351311855402780691&amp;postID=32716749037579545' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/351311855402780691/posts/default/32716749037579545'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/351311855402780691/posts/default/32716749037579545'/><link rel='alternate' type='text/html' href='http://cacheboy.blogspot.com/2008/09/cacheboy-15-ipv6-dns-servers.html' title='Cacheboy-1.5: IPv6 DNS servers'/><author><name>Adrian</name><uri>http://www.blogger.com/profile/17496219706861321916</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='33' height='25' src='http://bp2.blogger.com/_37wobiQ3zUs/SAKidE1C46I/AAAAAAAAAAM/F065c-5eNR8/S220/2004-12-17-adrian.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-351311855402780691.post-8476248486082456240</id><published>2008-09-02T02:34:00.000-07:00</published><updated>2008-09-02T11:00:11.108-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='news'/><category scheme='http://www.blogger.com/atom/ns#' term='cacheboy'/><title type='text'>Upcoming Cacheboy-1.5.PRE3 development release</title><content type='html'>(Yes, I've been slack in posting about this stuff.)&lt;br /&gt;&lt;br /&gt;I'm just about to roll the next Cacheboy-1.5 development pre-release. Cacheboy-1.5 is probably the last "almost but not quite squid-2.HEAD" release. Besides the IPv6 core, Cacheboy-1.5 resembles the Squid code but with a more sensible layout of modules and libraries.&lt;br /&gt;&lt;br /&gt;Its main difference is the inclusion of core comm layer changes to support IPv6 in preparation of IPv6 client and server support. This particular pre-release includes some changes to the internal DNS code to decouple it from a few routines in src/ relating to TCP socket connection. Its possible I've busted stuff - just run cacheboy with "debug_options ALL,1 78,2" for a while to see if you're falling back to TCP DNS properly.&lt;br /&gt;&lt;br /&gt;I'm about to put Cacheboy-1.5.PRE3 in production for a couple of clients to get some real world feedback.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/351311855402780691-8476248486082456240?l=cacheboy.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cacheboy.blogspot.com/feeds/8476248486082456240/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=351311855402780691&amp;postID=8476248486082456240' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/351311855402780691/posts/default/8476248486082456240'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/351311855402780691/posts/default/8476248486082456240'/><link rel='alternate' type='text/html' href='http://cacheboy.blogspot.com/2008/09/upcoming-cacheboy-15pre3-development.html' title='Upcoming Cacheboy-1.5.PRE3 development release'/><author><name>Adrian</name><uri>http://www.blogger.com/profile/17496219706861321916</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='33' height='25' src='http://bp2.blogger.com/_37wobiQ3zUs/SAKidE1C46I/AAAAAAAAAAM/F065c-5eNR8/S220/2004-12-17-adrian.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-351311855402780691.post-6817449729634046075</id><published>2008-08-24T21:08:00.000-07:00</published><updated>2008-08-24T21:11:17.211-07:00</updated><title type='text'>Standalone HTTP header parser!</title><content type='html'>I've finally broken out enough of the HTTP header parsing code from src/ into libhttp/ to run the http header parser standalone.&lt;br /&gt;&lt;br /&gt;This allows me to write some test cases to make sure I don't break things whilst changing how the HTTP header parser and HTTP header entry code uses (ie, abuses!) the memory allocator. It's also one step closer to being able to reuse bits of the Squid internals in a "simpler" HTTP proxy core.&lt;br /&gt;&lt;br /&gt;I'll commit this code reorganisation to Cacheboy trunk after I've released and tested a few developer previews.&lt;br /&gt;&lt;br /&gt;So, without further delay:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;test1b: test parsing sample headers&lt;br /&gt;| init-ing hdr: 0x7fffffffe6f0 owner: 2&lt;br /&gt;| parsing hdr: (0x7fffffffe6f0)&lt;br /&gt;Host: www.creative.net.au&lt;br /&gt;Content-type: text/html&lt;br /&gt;Foo: bar&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;| creating entry 0x60ed40: near 'Host: www.creative.net.au'&lt;br /&gt;| created entry 0x60ed40: 'Host: www.creative.net.au'&lt;br /&gt;| 0x7fffffffe6f0 adding entry: 27 at 0&lt;br /&gt;| creating entry 0x60eda0: near 'Content-type: text/html'&lt;br /&gt;| created entry 0x60eda0: 'Content-Type: text/html'&lt;br /&gt;| 0x7fffffffe6f0 adding entry: 18 at 1&lt;br /&gt;| creating entry 0x60ee00: near 'Foo: bar'&lt;br /&gt;| created entry 0x60ee00: 'Foo: bar'&lt;br /&gt;| 0x7fffffffe6f0 adding entry: 68 at 2&lt;br /&gt;  retval from parse: 1&lt;br /&gt;  Parsed Header: Host: www.creative.net.au&lt;br /&gt;  Parsed Header: Content-Type: text/html&lt;br /&gt;  Parsed Header: Foo: bar&lt;br /&gt;| cleaning hdr: 0x7fffffffe6f0 owner: 2&lt;br /&gt;| destroying entry 0x60ed40: 'Host: www.creative.net.au'&lt;br /&gt;| destroying entry 0x60eda0: 'Content-Type: text/html'&lt;br /&gt;| destroying entry 0x60ee00: 'Foo: bar'&lt;br /&gt;&lt;/pre&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/351311855402780691-6817449729634046075?l=cacheboy.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cacheboy.blogspot.com/feeds/6817449729634046075/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=351311855402780691&amp;postID=6817449729634046075' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/351311855402780691/posts/default/6817449729634046075'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/351311855402780691/posts/default/6817449729634046075'/><link rel='alternate' type='text/html' href='http://cacheboy.blogspot.com/2008/08/standalone-http-header-parser.html' title='Standalone HTTP header parser!'/><author><name>Adrian</name><uri>http://www.blogger.com/profile/17496219706861321916</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='33' height='25' src='http://bp2.blogger.com/_37wobiQ3zUs/SAKidE1C46I/AAAAAAAAAAM/F065c-5eNR8/S220/2004-12-17-adrian.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-351311855402780691.post-2725689150650917044</id><published>2008-08-21T22:24:00.000-07:00</published><updated>2008-08-21T22:25:33.692-07:00</updated><title type='text'>IPv6 core merged into cacheboy trunk</title><content type='html'>I've just completed merging the IPv6 core into the cacheboy trunk. This doesn't mean it handles IPv6 client/server requests yet - there's a lot more to do before that can happen!&lt;br /&gt;&lt;br /&gt;I'll next merge in the IPv6 DNS changes from husni's Squid-2.6 IPv6 patch and do up a basic test suite for all of that. Once done, I'll roll the first Cacheboy-1.5 pre-release.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/351311855402780691-2725689150650917044?l=cacheboy.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cacheboy.blogspot.com/feeds/2725689150650917044/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=351311855402780691&amp;postID=2725689150650917044' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/351311855402780691/posts/default/2725689150650917044'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/351311855402780691/posts/default/2725689150650917044'/><link rel='alternate' type='text/html' href='http://cacheboy.blogspot.com/2008/08/ipv6-core-merged-into-cacheboy-trunk.html' title='IPv6 core merged into cacheboy trunk'/><author><name>Adrian</name><uri>http://www.blogger.com/profile/17496219706861321916</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='33' height='25' src='http://bp2.blogger.com/_37wobiQ3zUs/SAKidE1C46I/AAAAAAAAAAM/F065c-5eNR8/S220/2004-12-17-adrian.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-351311855402780691.post-8165356045333007893</id><published>2008-08-20T02:38:00.000-07:00</published><updated>2008-08-20T02:45:12.282-07:00</updated><title type='text'>Merging sockaddr_rework into Cacheboy trunk</title><content type='html'>I'm slowly cherrypicking bits and pieces of the Cacheboy sockaddr_rework into trunk. I've merged in the no_addr/any_addr tidyup which makes those comparisons and sets much clearer. I'll next bring over the sqinet_ routines as just files, ignoring the change history. I'll then bring over the changesets implementing the sqinet_ changes to the comm code and main codebase, retaining the basic change history.&lt;br /&gt;&lt;br /&gt;I now need some live testing under decent amounts of real traffic so I can make sure I haven't missed some silly corner condition in the base comm code.&lt;br /&gt;&lt;br /&gt;All of this work exposed some of the ugliness that happens in the IPC code with filedescriptor creation that bypasses the comm layer. Basically, the IPC helper code creates file descriptors itself and uses fd_open() to tell Squid about them but then unconditionally uses comm_close() to close them. This is .. stupid.&lt;br /&gt;&lt;br /&gt;I may drop in some debugging code to ensure that only sockets created by the comm layer are closed by comm_close(). I wonder how many bad uses of file descriptors will be caught out by that..&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/351311855402780691-8165356045333007893?l=cacheboy.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cacheboy.blogspot.com/feeds/8165356045333007893/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=351311855402780691&amp;postID=8165356045333007893' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/351311855402780691/posts/default/8165356045333007893'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/351311855402780691/posts/default/8165356045333007893'/><link rel='alternate' type='text/html' href='http://cacheboy.blogspot.com/2008/08/merging-sockaddrrework-into-cacheboy.html' title='Merging sockaddr_rework into Cacheboy trunk'/><author><name>Adrian</name><uri>http://www.blogger.com/profile/17496219706861321916</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='33' height='25' src='http://bp2.blogger.com/_37wobiQ3zUs/SAKidE1C46I/AAAAAAAAAAM/F065c-5eNR8/S220/2004-12-17-adrian.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-351311855402780691.post-5987767125864825358</id><published>2008-08-12T10:35:00.000-07:00</published><updated>2008-08-12T10:40:28.991-07:00</updated><title type='text'>Cacheboy IPv6 (phase 1): More Updates!</title><content type='html'>The IPv6 code hackery is going along well. I'm just sorting out making a few loose ends .. well, slightly less loose.&lt;br /&gt;&lt;br /&gt;I'll run polygraph PolyMix-4 over this codebase in the next few days to make sure I haven't busted anything and then I'll start preparing to merge it back into CACHEBOY_PRE.&lt;br /&gt;&lt;br /&gt;I'm not quite sure how to conditional-compile IPv6; I'm not bothering to do it at the moment (ie, its always included.) Thats a later problem.&lt;br /&gt;&lt;br /&gt;The IPv6 TCP proxy is still happily chugging along. FreeBSD's IPv6 stack seems still partially Giant locked but I'm still pushing ~ 350mbit through this Core 2 Duo test server.&lt;br /&gt;&lt;br /&gt;This has been too easy. What the hell have I missed!??&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/351311855402780691-5987767125864825358?l=cacheboy.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cacheboy.blogspot.com/feeds/5987767125864825358/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=351311855402780691&amp;postID=5987767125864825358' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/351311855402780691/posts/default/5987767125864825358'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/351311855402780691/posts/default/5987767125864825358'/><link rel='alternate' type='text/html' href='http://cacheboy.blogspot.com/2008/08/cacheboy-ipv6-phase-1-more-updates.html' title='Cacheboy IPv6 (phase 1): More Updates!'/><author><name>Adrian</name><uri>http://www.blogger.com/profile/17496219706861321916</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='33' height='25' src='http://bp2.blogger.com/_37wobiQ3zUs/SAKidE1C46I/AAAAAAAAAAM/F065c-5eNR8/S220/2004-12-17-adrian.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-351311855402780691.post-6950333405310735058</id><published>2008-08-10T09:19:00.000-07:00</published><updated>2008-08-10T09:23:22.342-07:00</updated><title type='text'>IPv6 tcp proxy success</title><content type='html'>I feel like an undergraduate computer science student after all of this. I've managed to coax the cacheboy core to support v4/v6 and am using it in the tcpproxy test application.&lt;br /&gt;&lt;br /&gt;I've got a modified apachebench speaking IPv6 to tcpproxy, listening on a :8080 IPv6 socket. It then forwards all requests to a thttpd instance running on IPv4.&lt;br /&gt;&lt;br /&gt;Tomorrow's job - making sure the squid proxy codebase is still happy with these latest changes, and then preparing for some further testing and the implementation of some unit tests for the comm and inet libraries. Then it back to commercial projects for a few weeks.&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;Server Software:        thttpd/2.25b                                       &lt;br /&gt;Server Hostname:        [2a01:348:XXX:3207]&lt;br /&gt;Server Port:            8080&lt;br /&gt;&lt;br /&gt;Document Path:          /test8k&lt;br /&gt;Document Length:        8192 bytes&lt;br /&gt;&lt;br /&gt;Concurrency Level:      1000&lt;br /&gt;Time taken for tests:   21.690 seconds&lt;br /&gt;Complete requests:      100000&lt;br /&gt;Failed requests:        0&lt;br /&gt;Broken pipe errors:     0&lt;br /&gt;Total transferred:      844171764 bytes&lt;br /&gt;HTML transferred:       819841632 bytes&lt;br /&gt;Requests per second:    4610.42 [#/sec] (mean)&lt;br /&gt;Time per request:       216.90 [ms] (mean)&lt;br /&gt;Time per request:       0.22 [ms] (mean, across all concurrent requests)&lt;br /&gt;Transfer rate:          38919.86 [Kbytes/sec] received&lt;br /&gt;&lt;/pre&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/351311855402780691-6950333405310735058?l=cacheboy.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cacheboy.blogspot.com/feeds/6950333405310735058/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=351311855402780691&amp;postID=6950333405310735058' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/351311855402780691/posts/default/6950333405310735058'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/351311855402780691/posts/default/6950333405310735058'/><link rel='alternate' type='text/html' href='http://cacheboy.blogspot.com/2008/08/ipv6-tcp-proxy-success.html' title='IPv6 tcp proxy success'/><author><name>Adrian</name><uri>http://www.blogger.com/profile/17496219706861321916</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='33' height='25' src='http://bp2.blogger.com/_37wobiQ3zUs/SAKidE1C46I/AAAAAAAAAAM/F065c-5eNR8/S220/2004-12-17-adrian.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-351311855402780691.post-1076618270159061541</id><published>2008-08-09T19:54:00.000-07:00</published><updated>2008-08-09T20:03:11.985-07:00</updated><title type='text'>Cacheboy IPv6 (phase 1): Updates!</title><content type='html'>I've been working on the IPv6 core support in a cacheboy branch (Changes: http://code.google.com/p/cacheboy/source/list?path=playpen/sockaddr_change) and it seems to be coming along swimmingly.&lt;br /&gt;&lt;br /&gt;The current goal is just to get basic IPv6 support into the base libraries and keep the rest of the codebase IPv4-only.&lt;br /&gt;&lt;br /&gt;I've converted commBind(), comm_connect_addr() and comm_accept() to my new IPv4/IPv6 address type and nothing seems amiss at the present time. comm_open() and comm_openex() will take a little more time as there are plenty of places which create a new outgoing socket.&lt;br /&gt;&lt;br /&gt;My next move is to modify my tcp proxy to listen on both IPv4 and IPv6 incoming ports and proxy to an IPv4 destination. I can then fire off some HTTP clients at it and see what happens.&lt;br /&gt;&lt;br /&gt;(I may have to modify apachebench-adrian to support IPv6 though; I'm not sure what other stupidly-high-traffic open source http benchmarking clients exist at the present time.)&lt;br /&gt;&lt;br /&gt;I hope to get all of this sorted out in the next week or so and head over to the Sydney Squid developers meet with my "alternate" IPv6 core for Squid-2 and better understand the IPv4/IPv6 requirements before discussing them with Amos.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/351311855402780691-1076618270159061541?l=cacheboy.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cacheboy.blogspot.com/feeds/1076618270159061541/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=351311855402780691&amp;postID=1076618270159061541' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/351311855402780691/posts/default/1076618270159061541'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/351311855402780691/posts/default/1076618270159061541'/><link rel='alternate' type='text/html' href='http://cacheboy.blogspot.com/2008/08/cacheboy-ipv6-phase-1-updates.html' title='Cacheboy IPv6 (phase 1): Updates!'/><author><name>Adrian</name><uri>http://www.blogger.com/profile/17496219706861321916</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='33' height='25' src='http://bp2.blogger.com/_37wobiQ3zUs/SAKidE1C46I/AAAAAAAAAAM/F065c-5eNR8/S220/2004-12-17-adrian.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-351311855402780691.post-634991662135513650</id><published>2008-07-29T05:36:00.000-07:00</published><updated>2008-07-29T05:42:01.711-07:00</updated><title type='text'>Benchmarking is available!</title><content type='html'>I've begun benchmarking Cacheboy-1.4. The details are available at &lt;a href="http://www.cacheboy.net/benchmarks.html"&gt;http://www.cacheboy.net/benchmarks.html&lt;/a&gt;. They aren't spectacular - I'm mainly doing them to keep track on development and make sure I'm not introducing regressions anywhere.&lt;br /&gt;&lt;br /&gt;I'm not all that happy with 50% CPU (on one CPU too!) at 500 req/sec. Alas, thats what I have to work with - I can't push these disks nor the polygraph hosts any harder at the present time. Maybe if I spent two weeks fixing polygraph so it used kqueue() instead of poll() ..&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/351311855402780691-634991662135513650?l=cacheboy.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cacheboy.blogspot.com/feeds/634991662135513650/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=351311855402780691&amp;postID=634991662135513650' title='4 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/351311855402780691/posts/default/634991662135513650'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/351311855402780691/posts/default/634991662135513650'/><link rel='alternate' type='text/html' href='http://cacheboy.blogspot.com/2008/07/benchmarking-is-available.html' title='Benchmarking is available!'/><author><name>Adrian</name><uri>http://www.blogger.com/profile/17496219706861321916</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='33' height='25' src='http://bp2.blogger.com/_37wobiQ3zUs/SAKidE1C46I/AAAAAAAAAAM/F065c-5eNR8/S220/2004-12-17-adrian.jpg'/></author><thr:total>4</thr:total></entry><entry><id>tag:blogger.com,1999:blog-351311855402780691.post-4719231769752995744</id><published>2008-07-26T00:32:00.000-07:00</published><updated>2008-07-26T00:35:33.507-07:00</updated><title type='text'>Commercial work updates!</title><content type='html'>I've just completed the development and local testing of the client-side delay pools. That'll go into Squid-2.HEAD in the next few days. I'll try untangling the client-side delay pools from the class 5 delay pool work (which shouldn't be -that- difficult, just slightly tedious) and commit them as two seperate chunks.&lt;br /&gt;&lt;br /&gt;I'll post more details on my company blog - http://xenionhosting.blogspot.com/ - as I think the details of my current and future commercial Squid stuff should be detailed over there.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/351311855402780691-4719231769752995744?l=cacheboy.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cacheboy.blogspot.com/feeds/4719231769752995744/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=351311855402780691&amp;postID=4719231769752995744' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/351311855402780691/posts/default/4719231769752995744'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/351311855402780691/posts/default/4719231769752995744'/><link rel='alternate' type='text/html' href='http://cacheboy.blogspot.com/2008/07/commercial-work-updates.html' title='Commercial work updates!'/><author><name>Adrian</name><uri>http://www.blogger.com/profile/17496219706861321916</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='33' height='25' src='http://bp2.blogger.com/_37wobiQ3zUs/SAKidE1C46I/AAAAAAAAAAM/F065c-5eNR8/S220/2004-12-17-adrian.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-351311855402780691.post-4001228729278446896</id><published>2008-07-23T05:20:00.000-07:00</published><updated>2008-07-23T05:24:06.900-07:00</updated><title type='text'>Surviving polymix-4..</title><content type='html'>I'm putting Cacheboy-1.4 through a basic polymix-4 polygraph workload. So far so good - its just unfortunate that polygraph still uses poll() / select(). Most of the process CPU time is spent in those two system calls and not doing any useful work.&lt;br /&gt;&lt;br /&gt;So far, so good at ~ 500 req/sec (with &lt;10% CPU usage..) I'm going to resolve a few strange issues I'm seeing and then begin publishing some actual performance numbers over the next few weeks. I'll also start publishing some microbench numbers comparing Squid-2.6, Squid-2.7, Squid-3.0, Squid-3.1 and Cacheboy. Cacheboy will come out on top, of that I'm quite sure. :)&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/351311855402780691-4001228729278446896?l=cacheboy.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cacheboy.blogspot.com/feeds/4001228729278446896/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=351311855402780691&amp;postID=4001228729278446896' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/351311855402780691/posts/default/4001228729278446896'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/351311855402780691/posts/default/4001228729278446896'/><link rel='alternate' type='text/html' href='http://cacheboy.blogspot.com/2008/07/surviving-polymix-4.html' title='Surviving polymix-4..'/><author><name>Adrian</name><uri>http://www.blogger.com/profile/17496219706861321916</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='33' height='25' src='http://bp2.blogger.com/_37wobiQ3zUs/SAKidE1C46I/AAAAAAAAAAM/F065c-5eNR8/S220/2004-12-17-adrian.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-351311855402780691.post-1776844486038352654</id><published>2008-07-22T02:55:00.000-07:00</published><updated>2008-07-22T03:04:43.805-07:00</updated><title type='text'>Threading Squid - initial observations</title><content type='html'>My next task after some IPv6 related reshuffling is to bring in the bare essentials needed to make Squid^WCacheboy SMP-happy.&lt;br /&gt;&lt;br /&gt;There are a few potential ideas:&lt;br /&gt;&lt;br /&gt;&lt;ul&gt;&lt;br /&gt;&lt;li&gt;Leave Squid single-threaded. Stop it from doing its own disk/memory caching; push that out to a shared external process and abuse sysvshm IPC/anonymous mmap/etc to share large amounts of data efficiently;&lt;/li&gt;&lt;br /&gt;&lt;li&gt;Thread Squid entirely. Allow multiple concurrent copies of squid running in threads - whichever "model" of thread helpers you choose - and parallelise everything;&lt;/li&gt;&lt;br /&gt;&lt;li&gt;Provide basic thread services but leave Squid monolithic - push certain things into threads for now, figure out what benefits from being run in parallel;&lt;/li&gt;&lt;br /&gt;&lt;li&gt;A mix of all of the above.&lt;/li&gt;&lt;br /&gt;&lt;/ul&gt;&lt;br /&gt;&lt;br /&gt;Some of the problems that are faced!&lt;br /&gt;&lt;br /&gt;&lt;h3&gt;cbdata&lt;/h3&gt;&lt;br /&gt;&lt;br /&gt;The cbdata type makes it a pain in the ass. Specifically, anything which wants to be shared between threads needs to be able to be 'locked' into memory until the thread hands it back either completed, or cancelled.&lt;br /&gt;&lt;br /&gt;cbdata doesn't give you any guarantees that the pointer is pointing to something even remotely valid - even if you cbdataLock()'ed the item, the owner (or not! Thats how horrible the code can get) can cbdataFree() the underlying pointer and suddenly you're pointing at gunk. It might smell mostly right, it might even have somewhat valid data, but its still freed gunk, and thats not good enough.&lt;br /&gt;&lt;br /&gt;&lt;h3&gt;Shared Statistics&lt;/h3&gt;&lt;br /&gt;&lt;br /&gt;Squid keeps a lot of statistics and histograms. Something needs to be done to allow these to be kept in multiple threads without lots of fine-grain locks and/or stalling.&lt;br /&gt;&lt;br /&gt;I may just get rid of a lot of the complicated statistics and require them to be post-process derived externally.&lt;br /&gt;&lt;br /&gt;&lt;h3&gt;Memory Pools&lt;/h3&gt;&lt;br /&gt;&lt;br /&gt;The memory pools framework will be a nightmare to thread efficiently. Well, memory allocators in general are. I -could- just fine-grain lock it, but it gets a -lot- of requests and so I'd have to first fix the pool abusers before I consider this. (I'm going to do it anyway, but not so I can then fine-grain thread mempools.) I could figure out the best way to thread it - or run multiple pools per pool, one per thread - but damnit, this is 2008, there are better malloc implementations out there by people who understand concurrency issues better than I. Its a waste of time to try and thread it until I understand the workload and implications better.&lt;br /&gt;&lt;br /&gt;So I'll -probably- be turfing mempools as it stands and replacing it with just enough to keep statistics before going direct to malloc(). See the statistics section above. I won't do this until I've modified the heaviest mempool abusers to -not- put such large demands on the allocator system, so it'll be a win/win situation everywhere.&lt;br /&gt;&lt;br /&gt;more to come..&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/351311855402780691-1776844486038352654?l=cacheboy.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cacheboy.blogspot.com/feeds/1776844486038352654/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=351311855402780691&amp;postID=1776844486038352654' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/351311855402780691/posts/default/1776844486038352654'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/351311855402780691/posts/default/1776844486038352654'/><link rel='alternate' type='text/html' href='http://cacheboy.blogspot.com/2008/07/threading-squid-initial-observations.html' title='Threading Squid - initial observations'/><author><name>Adrian</name><uri>http://www.blogger.com/profile/17496219706861321916</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='33' height='25' src='http://bp2.blogger.com/_37wobiQ3zUs/SAKidE1C46I/AAAAAAAAAAM/F065c-5eNR8/S220/2004-12-17-adrian.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-351311855402780691.post-4970440684561692577</id><published>2008-07-15T09:59:00.000-07:00</published><updated>2008-07-15T10:03:40.102-07:00</updated><title type='text'>Commercial projects and such..</title><content type='html'>I've got a few commercial projects to finish up on Squid over the next few weeks which will be taking my time away from Cacheboy development.&lt;br /&gt;&lt;br /&gt;Specifically:&lt;br /&gt;&lt;br /&gt;&lt;ul&gt;&lt;br /&gt;  &lt;li&gt;I'm adding client-side -write- delay pools, so you can rate limit the replies sent back to clients whether they are a cache hit or miss (specifically for reverse proxies, but I'm sure forward proxies will have a use for them);&lt;/li&gt;&lt;br /&gt;  &lt;li&gt;Buffering POST requests a bit before connecting to the back-end origin server, which matters when your back-end server pays a high price for holding a connection open with no data going over it;&lt;/li&gt;&lt;br /&gt;  &lt;li&gt;Finally - some log reporting tools (hopefully written in Lua! :) for basic WebUI logfile reporting in a fast, sensible manner&lt;/li&gt;&lt;br /&gt;&lt;/ul&gt;&lt;br /&gt;&lt;br /&gt;I've got a few other possibilities which might creep up over the next couple of months but nothing yet concrete.&lt;br /&gt;&lt;br /&gt;Client-side IPv6, HTTP/1.1 and a threaded core will have to wait until I've completed the paid work I'm afraid! OSS coders have to eat too!&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/351311855402780691-4970440684561692577?l=cacheboy.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cacheboy.blogspot.com/feeds/4970440684561692577/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=351311855402780691&amp;postID=4970440684561692577' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/351311855402780691/posts/default/4970440684561692577'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/351311855402780691/posts/default/4970440684561692577'/><link rel='alternate' type='text/html' href='http://cacheboy.blogspot.com/2008/07/commercial-projects-and-such.html' title='Commercial projects and such..'/><author><name>Adrian</name><uri>http://www.blogger.com/profile/17496219706861321916</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='33' height='25' src='http://bp2.blogger.com/_37wobiQ3zUs/SAKidE1C46I/AAAAAAAAAAM/F065c-5eNR8/S220/2004-12-17-adrian.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-351311855402780691.post-8453584150374157691</id><published>2008-07-13T06:50:00.000-07:00</published><updated>2008-07-13T07:00:38.554-07:00</updated><title type='text'>Watching things evolve..</title><content type='html'>I'm finding it interesting to watch myself "evolve" the Cacheboy roadmap over time. Take the previous two cacheboy-users posts: first I thought Cacheboy-1.4 will get the IPv6 enabled core, but after doing the latest set of changes I've decided the best thing to do is to get Cacheboy-1.4 out with the current code layout, sort out whatever bugs crept in, then build the IPv6 enabled core in Cacheboy-1.5 and IPv6 client-side support in Cacheboy-1.6.&lt;br /&gt;&lt;br /&gt;I have a general idea where I'd like to take things and I have a specific set of goals in mind along the way, but everything is still evolving with time. Its an interesting experience - there are dozens of areas in the codebase which I'd like to spend time working on but I have to keep the medium and long-term project goals in mind.&lt;br /&gt;&lt;br /&gt;Which isn't to say I won't get distracted from time to time and break out a test branch to play with something, like one of the branches playing around with memory allocation overheads. I just treat that, like the last 10 or so years of experimenting with the codebase, as a way to get more of an idea what work needs to be done.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/351311855402780691-8453584150374157691?l=cacheboy.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cacheboy.blogspot.com/feeds/8453584150374157691/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=351311855402780691&amp;postID=8453584150374157691' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/351311855402780691/posts/default/8453584150374157691'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/351311855402780691/posts/default/8453584150374157691'/><link rel='alternate' type='text/html' href='http://cacheboy.blogspot.com/2008/07/watching-things-evolve.html' title='Watching things evolve..'/><author><name>Adrian</name><uri>http://www.blogger.com/profile/17496219706861321916</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='33' height='25' src='http://bp2.blogger.com/_37wobiQ3zUs/SAKidE1C46I/AAAAAAAAAAM/F065c-5eNR8/S220/2004-12-17-adrian.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-351311855402780691.post-5626144073916874257</id><published>2008-07-12T13:22:00.000-07:00</published><updated>2008-07-12T13:33:03.263-07:00</updated><title type='text'>Cacheboy: shuffling around the DNS code</title><content type='html'>I'm shuffling around the DNS code in preparation for some work toward an IPv6 core. Strictly speaking, I could have just left the dns code in src/ and IPv6'ed the raw network/socket layer but I've decided "basic" functional IPv6 support will require DNS support and so be it. It'll let me write test cases to make sure that the new code handles IPv4 and IPv6 DNS "right". I still don't know what "right" entails and I'm sure that journey will be very enlightening!&lt;br /&gt;&lt;br /&gt;Its been more tedious than complicated. There's a bunch of config file parsing which needs to stay in src/ and I've split out the "libsqdns" DNS initialisation from the "squid" DNS initialisation. It compiles and runs here, resolving DNS requests happily, so I guess I'm mostly on track. I had to shuffle around some config variables so its entirely possible I've screwed that up somewhere.&lt;br /&gt;&lt;br /&gt;This highlights the requirement for a much more sensible configuration management framework. It doesn't even have to be that complicated - just not the "one great big Config struct" that Squid currently has. I've got some plans in the back of my head to generic-ify that much later on down the track but it'll have to wait a while. It'll probably come in when the ACL code is split out into squid-specific and generic ACL types. (A lot of the ACL types aren't really specific to HTTP and in reality can be reused in a variety of network applications.)&lt;br /&gt;&lt;br /&gt;So tomorrow I'll find some time to get the external DNS code working again which I hope will be slightly easier than the internal DNS code. Then I can let this codebase simmer for a bit, push Cacheboy-1.4 out the door and wait for it to stabilise before my next round of changes towards IPv6.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/351311855402780691-5626144073916874257?l=cacheboy.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cacheboy.blogspot.com/feeds/5626144073916874257/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=351311855402780691&amp;postID=5626144073916874257' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/351311855402780691/posts/default/5626144073916874257'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/351311855402780691/posts/default/5626144073916874257'/><link rel='alternate' type='text/html' href='http://cacheboy.blogspot.com/2008/07/cacheboy-shuffling-around-dns-code.html' title='Cacheboy: shuffling around the DNS code'/><author><name>Adrian</name><uri>http://www.blogger.com/profile/17496219706861321916</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='33' height='25' src='http://bp2.blogger.com/_37wobiQ3zUs/SAKidE1C46I/AAAAAAAAAAM/F065c-5eNR8/S220/2004-12-17-adrian.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-351311855402780691.post-4559654191051582610</id><published>2008-07-05T01:25:00.000-07:00</published><updated>2008-07-05T02:00:23.892-07:00</updated><title type='text'>The Squid Callback Data Type</title><content type='html'>One of the issues programmers frequently face is knowing whether some piece of data you have is actually valid. Modern languages provide a variety of methods for creating an "invariant condition" about the validity of your data - reference counting, for example, allows you to ensure that data is not free'd before all references to it have been removed. This invariant is not always what you first think. The invariant condition for reference counting, for example, is that the data is either referenced by something or by nothing at all. Generally the programmer will treat the transition from "referenced by something" to "referenced by nothing" as the important transition and do something like removing the object from whatever lists its on, notifying other objects that its going away, cleaning up allocated memory, etc.&lt;br /&gt;&lt;br /&gt;Take traditional "callback" type programming. The programmer decides that some function is to be called after an event has occured (for example, "the ACL lookup has completed", or "the network write has completed") and this function needs some sort of "state" to know what its operating on. You could view this state as a sort of object. The trouble in C is that the language itself doesn't give you any tools to know that the supplied pointer is valid or not. Now, think about this - firstly, whats "valid" mean. The pointer is pointing to some region of memory that hasn't been freed? What about the state of the object? What if the object state changed between the callback being scheduled and the callback being executed? Is this "valid"?&lt;br /&gt;&lt;br /&gt;Squid implements "callback data". Initially, this "callback data" (called &lt;span style="font-weight:bold;"&gt;cbdata&lt;/span&gt; in the code) was a registry for callback data pointers. Pointers were reference counted when passed in as part of a scheduled callback; they would be decremented before the callback was about to run, and the callback would only be executed if the callback pointer was "valid". The "owner" (for whatever meanings of "own" you'd like to try and define) could "free" the data pointer - in which case the callback data registry would mark that pointer as invalid; subsequent checks for the validity of said pointer would return invalid, and any callbacks that were going to occur could be ignored. Eventually, the reference count would hit 0 and at 0 the memory at the pointer would be freed.&lt;br /&gt;&lt;br /&gt;Expressed as code:&lt;br /&gt;&lt;br /&gt;ptr = cbdataAlloc(type);&lt;br /&gt;...&lt;br /&gt;doSomething(someFunc, ptr)&lt;br /&gt;&lt;br /&gt;which would:&lt;br /&gt;  state-&gt;cb = someFunc;&lt;br /&gt;  state-&gt;cbdata = ptr;&lt;br /&gt;  cbdataLock(ptr);&lt;br /&gt;&lt;br /&gt;.. then, when the chain of events which doSomething() started would finish, this would occur:&lt;br /&gt;&lt;br /&gt;if (state-&gt;cb &amp;&amp; cbdataValid(ptr)) {&lt;br /&gt;    state-&gt;cb(state-&gt;cbdata);&lt;br /&gt;}&lt;br /&gt;cbdataUnlock(ptr);&lt;br /&gt;&lt;br /&gt;This way, the callback would only occur IFF there was a callback and the callback data was still valid.&lt;br /&gt;&lt;br /&gt;cbdataAlloc() returns a pointer with refcount = 0 and valid = true ; cbdataLock() incremented refcount; cbdataUnlock() decremented refcount and would free the pointer if (valid == false &amp;&amp; refcount == 0); cbdataValid() returned (valid == true); cbdataFree() would set valid = false and free the object if (valid == false &amp;&amp; refcount == 0)&lt;br /&gt;&lt;br /&gt;This worked out to be quite helpful in preventing callbacks from being run if the data was freed. It however introduces a few assumptions which make certain things difficult to debug and implement.&lt;br /&gt;&lt;br /&gt;Firstly, you don't have any guarantee that the callback will be called when you schedule for the call. So in the above code, if something calls cbdataFree(ptr) between the callback registration and the completion of the action initiated by doSomething(), the action will complete but the callback won't be made. The programmer needs to make sure that the code can handle not having the callback ever be made. Traditionally, you would instead either cancel the operation explicitly instead of letting it continue to completion and handle the situation where it couldn't be cancelled, or let the operation complete before transitioning to some "dying" state.&lt;br /&gt;&lt;br /&gt;Secondly, generally the "object destructor" here is called not by the cbdata reference count hitting 0, but by some explicit destruction call elsewhere in the code. For example, you would have this in the code:&lt;br /&gt;&lt;br /&gt;fooComplete(foo *ptr)&lt;br /&gt;{&lt;br /&gt;   free(ptr-&gt;data);&lt;br /&gt;   cbdataFree(ptr);&lt;br /&gt;}&lt;br /&gt;&lt;br /&gt;There still may be references to the callback data but no callbacks will occur on it because cbdataFree() marks that ptr as invalid. So cbdata isn't quite behaving traditional reference counted "types" behave.&lt;br /&gt;&lt;br /&gt;Here's where this gets ugly: but it can - you can register a function to be called just before the ptr is finally freed. _SOME_ areas of code do this. _SOME_ areas of code do not. You can't assume that the behaviour for a given cbdata pointer type will be one or the other.&lt;br /&gt;&lt;br /&gt;Thirdly, if the action initiated by doSomething() requires some part of ptr to be valid then it will need to wrap every access to the data inside ptr with a if (cbdataValid(ptr)) check. This doesn't always happen :) and has been the cause of all sorts of silly bugs because although the memory pointed to by ptr is still valid, the object may have gone through its "destruction" phase and whats left in memory (which again, hasn't been freed) is actually the last traces of object state. This may be valid, this may be invalid. Who knows. I can't guarantee that accesses to cbdata pointer dereferences are always done conditional to said pointer being valid. That would be a fun thing to hack in as a valgrind module!&lt;br /&gt;&lt;br /&gt;This all started rearing its ugly head in Squid-3 as a few things were converted from cbdata type pointers to more traditional reference counted types. The programmers assumed the behaviour was equivalent when it wasn't and all kinds of strange bugs arose some of which took over 12 months to find and fix.&lt;br /&gt;&lt;br /&gt;What would I like to see? Thats a good question and will probably form the basis of further improvements in Cacheboy..&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/351311855402780691-4559654191051582610?l=cacheboy.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cacheboy.blogspot.com/feeds/4559654191051582610/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=351311855402780691&amp;postID=4559654191051582610' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/351311855402780691/posts/default/4559654191051582610'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/351311855402780691/posts/default/4559654191051582610'/><link rel='alternate' type='text/html' href='http://cacheboy.blogspot.com/2008/07/squid-callback-data-type.html' title='The Squid Callback Data Type'/><author><name>Adrian</name><uri>http://www.blogger.com/profile/17496219706861321916</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='33' height='25' src='http://bp2.blogger.com/_37wobiQ3zUs/SAKidE1C46I/AAAAAAAAAAM/F065c-5eNR8/S220/2004-12-17-adrian.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-351311855402780691.post-2522131953377771385</id><published>2008-07-02T08:32:00.001-07:00</published><updated>2008-07-02T18:23:31.245-07:00</updated><title type='text'>libevent httperf!</title><content type='html'>I decided to poke httperf a little as a testing suite - and it uses select()! What the hell?&lt;br /&gt;&lt;br /&gt;Four hours later, I think I have a libevent enabled httperf.&lt;br /&gt;&lt;br /&gt;http://code.google.com/p/httperf-adrian/&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/351311855402780691-2522131953377771385?l=cacheboy.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cacheboy.blogspot.com/feeds/2522131953377771385/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=351311855402780691&amp;postID=2522131953377771385' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/351311855402780691/posts/default/2522131953377771385'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/351311855402780691/posts/default/2522131953377771385'/><link rel='alternate' type='text/html' href='http://cacheboy.blogspot.com/2008/07/libevent-httperf.html' title='libevent httperf!'/><author><name>Adrian</name><uri>http://www.blogger.com/profile/17496219706861321916</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='33' height='25' src='http://bp2.blogger.com/_37wobiQ3zUs/SAKidE1C46I/AAAAAAAAAAM/F065c-5eNR8/S220/2004-12-17-adrian.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-351311855402780691.post-3397333689552172303</id><published>2008-06-30T22:34:00.000-07:00</published><updated>2008-06-30T22:38:03.790-07:00</updated><title type='text'>Initial Profiling!</title><content type='html'>&lt;div&gt;&lt;br /&gt;Here's the output trace: I'm running it on the Sun X2100 running a flavour of ubuntu; this is doing ~ 300mbit FDX at about 9000 req/sec (tiny transactions!) w/ 1000 concurrent connections; I'm specifically trying to trace the management overhead versus the data copying overhead. This has maxed out both thttpd on the server-side and the tcp proxy itself.&lt;br /&gt;&lt;/div&gt;&lt;br /&gt;&lt;br /&gt;&lt;div&gt;&lt;br /&gt;Gah, look at all of those mallocs and stdio calls doing "stuff"..&lt;br /&gt;&lt;/div&gt;&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;root@rachelle:/home/adrian/work/cacheboy/branches/CACHEBOY_PRE/app/tcptest# opreport -l ./tcptest | less&lt;br /&gt;CPU: AMD64 processors, speed 2613.43 MHz (estimated)&lt;br /&gt;Counted CPU_CLK_UNHALTED events (Cycles outside of halt state) with a unit mask of 0x00 (No unit mask) count 100000&lt;br /&gt;samples  %        image name               symbol name&lt;br /&gt;96851    11.3738  libc-2.6.1.so            vfprintf&lt;br /&gt;62317     7.3182  libc-2.6.1.so            _int_malloc&lt;br /&gt;37556     4.4104  tcptest                  comm_select&lt;br /&gt;35405     4.1578  tcptest                  commSetEvents&lt;br /&gt;32901     3.8638  libc-2.6.1.so            _int_free&lt;br /&gt;30245     3.5518  tcptest                  commSetSelect&lt;br /&gt;29890     3.5102  tcptest                  commUpdateEvents&lt;br /&gt;28812     3.3836  libc-2.6.1.so            _IO_default_xsputn&lt;br /&gt;20360     2.3910  tcptest                  sslSetSelect&lt;br /&gt;17279     2.0292  libc-2.6.1.so            malloc_consolidate&lt;br /&gt;16610     1.9506  libc-2.6.1.so            epoll_ctl&lt;br /&gt;16307     1.9150  tcptest                  sslReadServer&lt;br /&gt;16154     1.8971  libc-2.6.1.so            fcntl&lt;br /&gt;14601     1.7147  tcptest                  xstrncpy&lt;br /&gt;12003     1.4096  libc-2.6.1.so            memset&lt;br /&gt;11617     1.3643  tcptest                  memPoolAlloc&lt;br /&gt;10931     1.2837  libc-2.6.1.so            calloc&lt;br /&gt;&lt;/pre&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/351311855402780691-3397333689552172303?l=cacheboy.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cacheboy.blogspot.com/feeds/3397333689552172303/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=351311855402780691&amp;postID=3397333689552172303' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/351311855402780691/posts/default/3397333689552172303'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/351311855402780691/posts/default/3397333689552172303'/><link rel='alternate' type='text/html' href='http://cacheboy.blogspot.com/2008/06/heres-output-trace-im-running-it-on-sun.html' title='Initial Profiling!'/><author><name>Adrian</name><uri>http://www.blogger.com/profile/17496219706861321916</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='33' height='25' src='http://bp2.blogger.com/_37wobiQ3zUs/SAKidE1C46I/AAAAAAAAAAM/F065c-5eNR8/S220/2004-12-17-adrian.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-351311855402780691.post-531651035863110679</id><published>2008-06-30T21:57:00.000-07:00</published><updated>2008-06-30T22:01:01.699-07:00</updated><title type='text'>First milestone - code reuse!</title><content type='html'>I've spent the last couple of evenings committing code to break out the last bits of the core event loop. I then added in a chopped up copy of src/ssl.c (the SSL CONNECT tunneling stuff) and voila! I now have a TCP proxy.&lt;br /&gt;&lt;br /&gt;A (comparitively) slow TCP proxy (3000 small obj/sec instead of where it should be: ~10,000 small obj/sec). A slow, single-threaded TCP proxy, but a TCP proxy nonetheless.&lt;br /&gt;&lt;br /&gt;I can now instrument just the core libraries to find out where they perform and scale poorly, seperate from the rest of the Squid codebase. I count this as a pretty big milestone.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/351311855402780691-531651035863110679?l=cacheboy.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cacheboy.blogspot.com/feeds/531651035863110679/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=351311855402780691&amp;postID=531651035863110679' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/351311855402780691/posts/default/531651035863110679'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/351311855402780691/posts/default/531651035863110679'/><link rel='alternate' type='text/html' href='http://cacheboy.blogspot.com/2008/06/first-milestone-code-reuse.html' title='First milestone - code reuse!'/><author><name>Adrian</name><uri>http://www.blogger.com/profile/17496219706861321916</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='33' height='25' src='http://bp2.blogger.com/_37wobiQ3zUs/SAKidE1C46I/AAAAAAAAAAM/F065c-5eNR8/S220/2004-12-17-adrian.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-351311855402780691.post-7350720092755547701</id><published>2008-06-26T23:20:00.000-07:00</published><updated>2008-06-26T23:21:19.877-07:00</updated><title type='text'>Why memory allocation is a pain in the ass..</title><content type='html'>The memory allocator routines in particular are annoying - a lot of work has gone into malloc implementations over the last few years to make them perform -very- well in threaded applications as long as you know what you are doing. This means doing things like allocating/freeing memory in the same thread and limiting memory exchange between threads (mostly a deal with very small allocations).&lt;br /&gt;&lt;br /&gt;Unfortunately, the mempools implementation saves a noticable amount of CPU because it hides all of the repetitive small memory allocations which Squid does for a variety of things. Its hard to profile too - I see that the CPU spends a lot of time in the allocator, but figuring out which functions are causing the CPU usage is difficult. Sure, I can find out the biggest malloc users by call - but they're not the biggest CPU users according to the oprofile callgraphs. I think I'll end up having to spend a month or so rewriting a few areas of code that account for the bulk of the malloc'ing to see what affect it has on CPU before I decide what to do here.&lt;br /&gt;&lt;br /&gt;I just don't see the point in trying to thread the mempools codebase for anything other than per-pool statistics when others have been doing a much better job of understanding memory allocation contention on massively parallel machines.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/351311855402780691-7350720092755547701?l=cacheboy.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cacheboy.blogspot.com/feeds/7350720092755547701/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=351311855402780691&amp;postID=7350720092755547701' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/351311855402780691/posts/default/7350720092755547701'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/351311855402780691/posts/default/7350720092755547701'/><link rel='alternate' type='text/html' href='http://cacheboy.blogspot.com/2008/06/why-memory-allocation-is-pain-in-ass.html' title='Why memory allocation is a pain in the ass..'/><author><name>Adrian</name><uri>http://www.blogger.com/profile/17496219706861321916</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='33' height='25' src='http://bp2.blogger.com/_37wobiQ3zUs/SAKidE1C46I/AAAAAAAAAAM/F065c-5eNR8/S220/2004-12-17-adrian.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-351311855402780691.post-4945334473314171064</id><published>2008-06-26T23:06:00.000-07:00</published><updated>2008-06-26T23:20:28.070-07:00</updated><title type='text'>Cacheboy-1.3 (.1) released; short-term future</title><content type='html'>I've just merged the latest Squid-2.HEAD changes into Cacheboy and released 1.3.1.&lt;br /&gt;&lt;br /&gt;1.3 and 1.3.1 fix the Vary related issues which affects hit rates.&lt;br /&gt;&lt;br /&gt;1.3.1 fixes the SNMP counter bugs.&lt;br /&gt;&lt;br /&gt;This ends the first set of mostly non-intrusive changes which have been made to the codebase. The next area of work will be pulling out the rest of the event/communications/signal code from src/ and into libiapp/ so I can begin treating "Squid" as a client of "libiapp" - ie, the libiapp code handles event, fd, communication and event scheduling (disk stuff is still in src/ for now) making callbacks into the Squid application. I can then begin writing a few test applications to give the core and support libraries a good thrashing.&lt;br /&gt;&lt;br /&gt;I'll start planning out threading and ipv6 support in the libraries themselves with the minimum amount of Squid changes required to continue functioning (but still staying in IPv4/non-threaded land.) The plan is to take something like a minimalistic TCP proxy thats been fully debugged and use it as the basis for testing out potential IPv6 and threading related changes, seperate from the rest of the application.&lt;br /&gt;&lt;br /&gt;My tentative aim is to run the current "Squid" application in just one thread but have the support libraries support threading (either by explicitly supporting concurrency or being labelled as "not locking" and thus callers must guarantee nothing quirky will happen.) The three areas that strike me as being problematic right now are the shared fd/comm state (fd_table[]), the statistics being kept all over the place and the memory allocator routines. (I'll write up the malloc stuff in a different post.)&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/351311855402780691-4945334473314171064?l=cacheboy.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cacheboy.blogspot.com/feeds/4945334473314171064/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=351311855402780691&amp;postID=4945334473314171064' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/351311855402780691/posts/default/4945334473314171064'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/351311855402780691/posts/default/4945334473314171064'/><link rel='alternate' type='text/html' href='http://cacheboy.blogspot.com/2008/06/cacheboy-13-1-released-short-term.html' title='Cacheboy-1.3 (.1) released; short-term future'/><author><name>Adrian</name><uri>http://www.blogger.com/profile/17496219706861321916</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='33' height='25' src='http://bp2.blogger.com/_37wobiQ3zUs/SAKidE1C46I/AAAAAAAAAAM/F065c-5eNR8/S220/2004-12-17-adrian.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-351311855402780691.post-8353452417667155665</id><published>2008-06-24T22:31:00.000-07:00</published><updated>2008-06-24T22:35:11.072-07:00</updated><title type='text'>Current CPU usage</title><content type='html'>Where's the CPU going?&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Here's an oprofile output from a very naive custom polygraph workload, ~ 1000 requests a second, ~14kbyte objects. MemPools are disabled; Zero buffers are off so the majority of the allocations aren't zero'ed.&lt;/div&gt;&lt;br /&gt;&lt;br /&gt;&lt;div&gt;Note that somehow, memPoolAlloc takes 4% of CPU even with memory pools switched off. The allocations still go via the pool code but deallocations aren't "cached". What the hell is taking the 4% of CPU time?&lt;/div&gt;&lt;br /&gt;&lt;br /&gt;&lt;div&gt;&lt;pre&gt;&lt;br /&gt;CPU: AMD64 processors, speed 2613.43 MHz (estimated)&lt;br /&gt;Counted CPU_CLK_UNHALTED events (Cycles outside of halt state) with a unit mask of 0x00 (No unit mask) count 100000&lt;br /&gt;samples  %        image name               symbol name&lt;br /&gt;176014    6.7039  libc-2.6.1.so            _int_malloc&lt;br /&gt;160779    6.1236  libc-2.6.1.so            memcpy&lt;br /&gt;128371    4.8893  libc-2.6.1.so            malloc_consolidate&lt;br /&gt;123734    4.7127  squid                    memPoolAlloc&lt;br /&gt;101514    3.8664  libc-2.6.1.so            free&lt;br /&gt;76772     2.9240  libc-2.6.1.so            _int_free&lt;br /&gt;55696     2.1213  libc-2.6.1.so            malloc&lt;br /&gt;55681     2.1207  libc-2.6.1.so            vfprintf&lt;br /&gt;50245     1.9137  libc-2.6.1.so            calloc&lt;br /&gt;48095     1.8318  squid                    httpHeaderIdByName&lt;br /&gt;41172     1.5681  libm-2.6.1.so            floor&lt;br /&gt;37573     1.4310  libc-2.6.1.so            re_search_internal&lt;br /&gt;37434     1.4258  libc-2.6.1.so            memchr&lt;br /&gt;36536     1.3916  squid                    xfree&lt;br /&gt;30646     1.1672  libc-2.6.1.so            memset&lt;br /&gt;30576     1.1646  squid                    memPoolFree&lt;br /&gt;30108     1.1467  squid                    headersEnd&lt;br /&gt;28626     1.0903  squid                    httpHeaderGetEntry&lt;br /&gt;26668     1.0157  squid                    storeKeyHashCmp&lt;br /&gt;...&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/351311855402780691-8353452417667155665?l=cacheboy.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cacheboy.blogspot.com/feeds/8353452417667155665/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=351311855402780691&amp;postID=8353452417667155665' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/351311855402780691/posts/default/8353452417667155665'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/351311855402780691/posts/default/8353452417667155665'/><link rel='alternate' type='text/html' href='http://cacheboy.blogspot.com/2008/06/current-cpu-usage.html' title='Current CPU usage'/><author><name>Adrian</name><uri>http://www.blogger.com/profile/17496219706861321916</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='33' height='25' src='http://bp2.blogger.com/_37wobiQ3zUs/SAKidE1C46I/AAAAAAAAAAM/F065c-5eNR8/S220/2004-12-17-adrian.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-351311855402780691.post-8343612429363313140</id><published>2008-06-19T22:19:00.001-07:00</published><updated>2008-06-19T22:33:42.917-07:00</updated><title type='text'>Updates - comm code, etc</title><content type='html'>I've finally managed to divorce the comm code from the base system. Its proving to be a pain in the butt for a few reasons:&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;ul&gt;&lt;li&gt;The DNS code is involved in the socket connection path - most users just pass a hostname in to the comm connect call and it gets diverted via the ipcache/dns code. Tsk!&lt;/li&gt;&lt;li&gt;There's quite a bit of statistics gathering which goes on - the code is very monolithic and the statistics code keeps 5/60 minute histograms as well as raw counters&lt;/li&gt;&lt;li&gt;The event loop needs to be sorted out quite a bit better - right now the event loop is still stuck in src/main.c and this needs to change&lt;br /&gt;&lt;/li&gt;&lt;/ul&gt;The statistics gathering and reporting for the network/disk syscalls and events will have to change - I don't feel like trying to make the histogram code more generic and modular. I don't think that Squid should be maintaining the histograms - thats the job for a reporting suite. Squid should just export raw counters for a reporting suite to record and present as appropriate. I'll add in a new cachemgr option to report the "application core" statistics in a machine-parsable manner and leave it at that for now. (As a side-note, I also don't think that Squid should have SNMP code integrated. It should have an easier, cleaner way of grabbing statistics and modifying the configuration and an external SNMP daemon to do SNMP stuff.)&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;I then need to extract out the main event loop somewhat from src/main.c and turn it into something that can be reused. The main loop handles the following:&lt;/div&gt;&lt;div&gt;&lt;ul&gt;&lt;li&gt;comm events&lt;/li&gt;&lt;li&gt;store dir events&lt;/li&gt;&lt;li&gt;timed/immediate registered events&lt;/li&gt;&lt;li&gt;signals - which basically just set global variables!&lt;/li&gt;&lt;li&gt;checking signal global variables - for rotate, shutdown, etc&lt;/li&gt;&lt;/ul&gt;&lt;/div&gt;&lt;div&gt;I think I'll implement a libevent setup of sorts - I'll implement some methods in libiapp to register callbacks to occur when certain signals are set (sort of like libevent) but the storedir and signal global variable handler will just be functions called in the src/main.c loop. I'd like to implement a Squid-3 like method of registering event dispatchers but I will leave all of that alone until this is all stable and I've done planning into concurrency and SMP.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Its also possible that the reasons for registering dispatchers will go away with a slightly more sensible event abstraction (eg, if I convert the signal handlers to proper events (exactly like libevent!) which get pushed into the head of the event queue and called at the beginning of the next loop iteration - this however assumes the global variables that are set in the current signal handlers are &lt;span class="Apple-style-span" style="font-weight: bold;"&gt;only&lt;/span&gt; checked in the main loop and not elsewhere..!)&lt;/div&gt;&lt;div&gt; &lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/351311855402780691-8343612429363313140?l=cacheboy.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cacheboy.blogspot.com/feeds/8343612429363313140/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=351311855402780691&amp;postID=8343612429363313140' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/351311855402780691/posts/default/8343612429363313140'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/351311855402780691/posts/default/8343612429363313140'/><link rel='alternate' type='text/html' href='http://cacheboy.blogspot.com/2008/06/updates-comm-code-etc.html' title='Updates - comm code, etc'/><author><name>Adrian</name><uri>http://www.blogger.com/profile/17496219706861321916</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='33' height='25' src='http://bp2.blogger.com/_37wobiQ3zUs/SAKidE1C46I/AAAAAAAAAAM/F065c-5eNR8/S220/2004-12-17-adrian.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-351311855402780691.post-6908538577598672788</id><published>2008-06-11T18:32:00.000-07:00</published><updated>2008-06-11T18:52:00.280-07:00</updated><title type='text'>Async IO related hackery</title><content type='html'>I've been staring at the Async IO code in preparation to migrate stuff out of the aufs directory and into a seperate library.&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;It requires the fd tracking code (understandable) and the comm code to monitor a notification pipe. This monitor pipe was used by the worker threads to wake up the main process if its waiting inside a select()/poll()/etc call, so it can immediately work on some disk IO.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Squid checks the aio completion queues each time through the comm loop. For aio, there isn't a per-storedir queue, there's just a global queue for all storedirs and other users, so aioCheckCallbacks() is called for each storedir.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;There are two problems - firstly, select()/poll() take a while to run on a busy cache, so aioCheckCallbacks() isn't called that often. But the event notification based mechanisms end up running very often, returning a handful of filedescriptors each pass through the comm loop - and so the storedir checks are called. Secondly, its called once per storedir, so if you have 10 storedirs (like I have for testing!) aioCheckCallbacks() is called 10 times per IO loop.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;This is a bit silly!&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Instead, I've modified the async IO code to only call aioCheckCallbacks() when that pipe is written to. This ends up being the "normal" hack that UNIX thread programmers do to wake up a thread stuck waiting for both network and thread events. This cuts back substantially on the number of aioCheckCallbacks() calls without impacting performance (as far as I can see.)&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Next! By default, the aufs store code only does async IO for open() and read() - write() and close() doesn't run asynchronously. Apparently this is due to testing under Linux - unless you're stressing the buffer cache too hard, write() to a disk FD didn't block, so there wasn't a reason to run write() and close() async. Apparently Solaris close() will block as metadata writes are done synchronously, and its possible FreeBSD + softupdates may do something similar. This is all "apparently", I haven't sat down and instrumented any of this!&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;FreeBSD and Solaris users have reported that diskd performs better than aufs - something I don't understand, as diskd only handles one outstanding disk IO at a time with similar issues with write() and close() to aufs (namely, if the calls block, the whole diskd process stops handling disk IO) but the difference here is the main process won't hang whilst these syscalls complete. Perhaps this is a reason for this behaviour. Its difficult for me to test; aufs has always performed fantastically for me.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;There's so much to tidy up and reorganise, I still can't sit down and begin implementing any of the new features I want to!&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/351311855402780691-6908538577598672788?l=cacheboy.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cacheboy.blogspot.com/feeds/6908538577598672788/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=351311855402780691&amp;postID=6908538577598672788' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/351311855402780691/posts/default/6908538577598672788'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/351311855402780691/posts/default/6908538577598672788'/><link rel='alternate' type='text/html' href='http://cacheboy.blogspot.com/2008/06/async-io-related-hackery.html' title='Async IO related hackery'/><author><name>Adrian</name><uri>http://www.blogger.com/profile/17496219706861321916</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='33' height='25' src='http://bp2.blogger.com/_37wobiQ3zUs/SAKidE1C46I/AAAAAAAAAAM/F065c-5eNR8/S220/2004-12-17-adrian.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-351311855402780691.post-6265037985281806178</id><published>2008-06-05T23:26:00.000-07:00</published><updated>2008-06-05T23:43:48.926-07:00</updated><title type='text'>More reorganisation..</title><content type='html'>I've moved cbdata, mempools, fd and legacy disk (file_*) routines out of src/. I also shuffled the comm related -definitions- out but not the code. I hit a big of a snag - the comm code path used for connecting a socket to a remote site actually uses the DNS code. Fixing this will involve divorcing the DNS lookup stuff so sockets can be connected to a remote IP directly - and the DNS lookup path will just be in another module.&lt;br /&gt;&lt;br /&gt;This however is more intrusive than "code reorganisation" so its going to have to wait a while. Unfortunately, this means that my grand plans for 1.1 will have to be put on hold a little until I've thought this out a little more and implemented it in a seperate branch.&lt;br /&gt;&lt;br /&gt;Thus, things will change a little. 1.1 will be released shortly, with the current set of changes included. I'll then concentrate on planning out the next set of changes required to properly divorce the core event/disk code from src/.&lt;br /&gt;&lt;br /&gt;Why do this? Well, the biggest reason is to be able to build "other" bits of code which reuse the Squid core. I can write unit tests for a lot of stuff, sure, but it also means I can write simple network and disk applications which reuse the Squid core and find exactly how hard I can push them. I can also break off a branch and hack up the code to see what impact changes make without worrying that said changes expose strange stuff in the rest of the Squid codebase.&lt;br /&gt;&lt;br /&gt;The four main things that I'd like to finally sort out are:&lt;br /&gt;&lt;br /&gt;&lt;ul&gt;&lt;li&gt;IPv6 socket support - support v4/v6 in the base core, and make sure that it works properly&lt;/li&gt;&lt;li&gt;Sort out the messy disk related code and reintegrate async IO as a top-level disk producer (like it was in Squid-2.2 and it almost is in Squid-3) so it can be again used for things like logfile writing!&lt;/li&gt;&lt;li&gt;Begin looking at scatter/gather disk and network IO - gather disk IO should work out great for writing logfile buffers and objects to disk, for example&lt;/li&gt;&lt;li&gt;Design a parallelism model which allows multiple threads to cooperate on tasks - worker threads implementing callback type stuff for some work; entire seperate network event threads (look at memcached as an example.) "Squid" as it stands will simply run as one thread, but some CPU intensive stuff can be pushed into worker threads for some cheap parallelism gains (as on the roadmap, ACLs and rewriting/content manipulation are two easy targets.)&lt;br /&gt;&lt;/li&gt;&lt;/ul&gt;So there's a lot of work to do, and not a lot of time to do it in.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/351311855402780691-6265037985281806178?l=cacheboy.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cacheboy.blogspot.com/feeds/6265037985281806178/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=351311855402780691&amp;postID=6265037985281806178' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/351311855402780691/posts/default/6265037985281806178'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/351311855402780691/posts/default/6265037985281806178'/><link rel='alternate' type='text/html' href='http://cacheboy.blogspot.com/2008/06/more-reorganisation.html' title='More reorganisation..'/><author><name>Adrian</name><uri>http://www.blogger.com/profile/17496219706861321916</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='33' height='25' src='http://bp2.blogger.com/_37wobiQ3zUs/SAKidE1C46I/AAAAAAAAAAM/F065c-5eNR8/S220/2004-12-17-adrian.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-351311855402780691.post-2760786949919274556</id><published>2008-05-24T17:19:00.001-07:00</published><updated>2008-05-24T17:28:31.269-07:00</updated><title type='text'>cacheboy 1.0 released</title><content type='html'>Cacheboy 1.0 has been tagged, tarballed and port'ed. Its been in production at my beta testers site for a week or so now and hasn't missed a beat.&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;There's a lot more work to do to Cacheboy to shape it up like I believe Squid should've been; a stable 1.0 release is the first step along this path.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;I'll let this settle for a couple weeks (ie, Adrian needs to sit his mid-year exams in two weeks!) before I begin some more larger-scale code refactoring and shuffling around.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Cacheboy-1.1 changes will include MemBuf, cbdata, most of the http request/reply/header manipulation code and potentially a little of the filedescriptor, disk, event and network communication code. This stuff forms the "core" of Squid/Cacheboy. I'll then look at some basic infrastructure changes to support IPv6 clients.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Like the Cacheboy-1.0 changes, these will not be terribly difficult or intrusive (the IPv6 client-only changes will be the most intrusive by far!) but a lot of refactoring, rewriting and shuffling about of the core needs to take place before I can begin work on the necessary stuff - HTTP/1.1, SMP, modularity, performance.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/351311855402780691-2760786949919274556?l=cacheboy.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cacheboy.blogspot.com/feeds/2760786949919274556/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=351311855402780691&amp;postID=2760786949919274556' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/351311855402780691/posts/default/2760786949919274556'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/351311855402780691/posts/default/2760786949919274556'/><link rel='alternate' type='text/html' href='http://cacheboy.blogspot.com/2008/05/cacheboy-10-released.html' title='cacheboy 1.0 released'/><author><name>Adrian</name><uri>http://www.blogger.com/profile/17496219706861321916</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='33' height='25' src='http://bp2.blogger.com/_37wobiQ3zUs/SAKidE1C46I/AAAAAAAAAAM/F065c-5eNR8/S220/2004-12-17-adrian.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-351311855402780691.post-7210699510410866380</id><published>2008-05-17T09:37:00.000-07:00</published><updated>2008-05-17T09:38:17.067-07:00</updated><title type='text'>FreeBSD port update</title><content type='html'>I've updated the port to CACHEBOY_0.PRE6; it also now defaults to the replacement shiny english errors (NewEnglish) rather than the default ones (English).&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/351311855402780691-7210699510410866380?l=cacheboy.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cacheboy.blogspot.com/feeds/7210699510410866380/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=351311855402780691&amp;postID=7210699510410866380' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/351311855402780691/posts/default/7210699510410866380'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/351311855402780691/posts/default/7210699510410866380'/><link rel='alternate' type='text/html' href='http://cacheboy.blogspot.com/2008/05/freebsd-port-update.html' title='FreeBSD port update'/><author><name>Adrian</name><uri>http://www.blogger.com/profile/17496219706861321916</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='33' height='25' src='http://bp2.blogger.com/_37wobiQ3zUs/SAKidE1C46I/AAAAAAAAAAM/F065c-5eNR8/S220/2004-12-17-adrian.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-351311855402780691.post-5718213582849095047</id><published>2008-05-16T23:14:00.000-07:00</published><updated>2008-05-16T23:20:06.304-07:00</updated><title type='text'>Revalidating objects in Polygraph</title><content type='html'>I have a locally hacked up polygraph config based on datacomm-1. Datacomm-1 is a very simple workload which doesn't pretend to be the real world at all; it thus makes it really easy for me to implement custom bits of polygraph to test specific things.&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;One thing I needed to test was object revalidation. I needed objects to be revalidated in a relatively short period of time so I could trigger a storage revalidation bug in Squid-2.HEAD.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Here's the changes.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;include/content.pg; added:&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;ObjLifeCycle olcRevalid = {&lt;/div&gt;&lt;div&gt;  length = const(2min);   &lt;/div&gt;&lt;div&gt;  variance = 50%;&lt;/div&gt;&lt;div&gt;  with_lmt = 100%;&lt;/div&gt;&lt;div&gt;  expires = [&lt;/div&gt;&lt;div&gt;    lmt + const(2min) : 5%,&lt;/div&gt;&lt;div&gt;    now + const(5min) : 15%&lt;/div&gt;&lt;div&gt;  ];&lt;/div&gt;&lt;div&gt;};&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;content cntRevalid = {&lt;/div&gt;&lt;div&gt;  kind = "revalid";&lt;/div&gt;&lt;div&gt;  obj_life_cycle = olcRevalid;&lt;/div&gt;&lt;div&gt;  size = logn(32KB, 32KB);&lt;/div&gt;&lt;div&gt;  cachable = 80%;&lt;/div&gt;&lt;div&gt;  checksum = 1%;&lt;/div&gt;&lt;div&gt;};&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;I then edited my locally modified datacomm-1.pg to set the contents to cntRevalid.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Now, I get stale objects popping up during the test - and I need to figure out why -that- is happening - but note that my life cycles are very quick (couple minutes). squid _should_ be good down to object lifetime of 1 second so I'm a bit surprised.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;In any case, it tripped the bug, which is all that matters..&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/351311855402780691-5718213582849095047?l=cacheboy.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cacheboy.blogspot.com/feeds/5718213582849095047/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=351311855402780691&amp;postID=5718213582849095047' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/351311855402780691/posts/default/5718213582849095047'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/351311855402780691/posts/default/5718213582849095047'/><link rel='alternate' type='text/html' href='http://cacheboy.blogspot.com/2008/05/revalidating-objects-in-polygraph.html' title='Revalidating objects in Polygraph'/><author><name>Adrian</name><uri>http://www.blogger.com/profile/17496219706861321916</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='33' height='25' src='http://bp2.blogger.com/_37wobiQ3zUs/SAKidE1C46I/AAAAAAAAAAM/F065c-5eNR8/S220/2004-12-17-adrian.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-351311855402780691.post-4669786855293778520</id><published>2008-05-16T23:00:00.001-07:00</published><updated>2008-05-16T23:02:18.422-07:00</updated><title type='text'>Cacheboy PRE6 is out</title><content type='html'>I've just rolled PRE6. This includes the Squid-2.HEAD fix for the signed vs unsigned comparison bug I introduced earlier (which lead to a crash.)&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;This code -should- be stable enough for public consumption.&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/351311855402780691-4669786855293778520?l=cacheboy.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cacheboy.blogspot.com/feeds/4669786855293778520/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=351311855402780691&amp;postID=4669786855293778520' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/351311855402780691/posts/default/4669786855293778520'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/351311855402780691/posts/default/4669786855293778520'/><link rel='alternate' type='text/html' href='http://cacheboy.blogspot.com/2008/05/cacheboy-pre6-is-out.html' title='Cacheboy PRE6 is out'/><author><name>Adrian</name><uri>http://www.blogger.com/profile/17496219706861321916</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='33' height='25' src='http://bp2.blogger.com/_37wobiQ3zUs/SAKidE1C46I/AAAAAAAAAAM/F065c-5eNR8/S220/2004-12-17-adrian.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-351311855402780691.post-3589269544129470423</id><published>2008-05-14T13:38:00.000-07:00</published><updated>2008-05-14T13:41:46.658-07:00</updated><title type='text'>Error page update, phase 2</title><content type='html'>&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://bp2.blogger.com/_37wobiQ3zUs/SCtOQVmI2WI/AAAAAAAAAAw/n-qXHVz-Mqw/s1600-h/Picture+3.png"&gt;&lt;img style="margin: 0pt 0pt 10px 10px; float: right; cursor: pointer;" src="http://bp2.blogger.com/_37wobiQ3zUs/SCtOQVmI2WI/AAAAAAAAAAw/n-qXHVz-Mqw/s400/Picture+3.png" alt="" id="BLOGGER_PHOTO_ID_5200336237311351138" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;I've gone and modified all the English error pages in my little playpen project.&lt;br /&gt;&lt;br /&gt;Here's an example of a live DNS failure. Same (confusing) text with Squid; slightly nicer layout.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/351311855402780691-3589269544129470423?l=cacheboy.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cacheboy.blogspot.com/feeds/3589269544129470423/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=351311855402780691&amp;postID=3589269544129470423' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/351311855402780691/posts/default/3589269544129470423'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/351311855402780691/posts/default/3589269544129470423'/><link rel='alternate' type='text/html' href='http://cacheboy.blogspot.com/2008/05/error-page-update-phase-2.html' title='Error page update, phase 2'/><author><name>Adrian</name><uri>http://www.blogger.com/profile/17496219706861321916</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='33' height='25' src='http://bp2.blogger.com/_37wobiQ3zUs/SAKidE1C46I/AAAAAAAAAAM/F065c-5eNR8/S220/2004-12-17-adrian.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://bp2.blogger.com/_37wobiQ3zUs/SCtOQVmI2WI/AAAAAAAAAAw/n-qXHVz-Mqw/s72-c/Picture+3.png' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-351311855402780691.post-1541540851353115835</id><published>2008-05-01T08:57:00.000-07:00</published><updated>2008-05-01T09:04:48.225-07:00</updated><title type='text'>Errors shouldn't be ugly</title><content type='html'>&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://bp2.blogger.com/_37wobiQ3zUs/SBnp1B1rbZI/AAAAAAAAAAg/BXgvzv5dZ6g/s1600-h/new_error.PNG"&gt;&lt;img style="margin: 0pt 0pt 10px 10px; float: right; cursor: pointer;" src="http://bp2.blogger.com/_37wobiQ3zUs/SBnp1B1rbZI/AAAAAAAAAAg/BXgvzv5dZ6g/s320/new_error.PNG" alt="" id="BLOGGER_PHOTO_ID_5195440742384496018" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;Part of my "things i hate about Squid" list includes the god awful error pages which haven't really changed since .. well, since I got involved with the project in 2000.&lt;br /&gt;&lt;br /&gt;Here's my take on the "simple" error page. The text is exactly the same as the old error page (note that I haven't included the "Generated by.." footer text here, as thats included by Squid/Cacheboy) but I've reformatted the error page to use CSS for layout and then crafted a very simple example CSS.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/351311855402780691-1541540851353115835?l=cacheboy.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cacheboy.blogspot.com/feeds/1541540851353115835/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=351311855402780691&amp;postID=1541540851353115835' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/351311855402780691/posts/default/1541540851353115835'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/351311855402780691/posts/default/1541540851353115835'/><link rel='alternate' type='text/html' href='http://cacheboy.blogspot.com/2008/05/errors-shouldnt-be-ugly.html' title='Errors shouldn&apos;t be ugly'/><author><name>Adrian</name><uri>http://www.blogger.com/profile/17496219706861321916</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='33' height='25' src='http://bp2.blogger.com/_37wobiQ3zUs/SAKidE1C46I/AAAAAAAAAAM/F065c-5eNR8/S220/2004-12-17-adrian.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://bp2.blogger.com/_37wobiQ3zUs/SBnp1B1rbZI/AAAAAAAAAAg/BXgvzv5dZ6g/s72-c/new_error.PNG' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-351311855402780691.post-130514988872710433</id><published>2008-04-27T09:18:00.001-07:00</published><updated>2008-04-27T09:35:53.366-07:00</updated><title type='text'>Where is my CPU time going? (or how to divine useful information from oprofile)</title><content type='html'>OProfile is cool - it lets you dig into where your CPU is being spent. But aggregating statistics can be aggrevating. (Yes yes, it was bad, I know..)&lt;br /&gt;&lt;br /&gt;Take this example from cacheboy:&lt;br /&gt;&lt;br /&gt;&lt;span style="font-size:85%;"&gt;&lt;span style="font-family: courier new;"&gt;CPU: Core 2, speed 2194.48 MHz (estimated)&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask of 0x00 (Unhalted core cycles) count 100000&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;samples  %        image name               symbol name&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;216049    6.5469  libc-2.7.so              memcpy&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;115581    3.5024  libc-2.7.so              _int_malloc&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;103345    3.1316  libc-2.7.so              vfprintf&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;85197     2.5817  squid                    memPoolAlloc&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;64652     1.9591  libc-2.7.so              memchr&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;60720     1.8400  libc-2.7.so              strlen&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;...&lt;br /&gt;&lt;br /&gt;Now, these tell you that CPU is being spent in the function (which is great) but its not the entire picture. The trouble is this: there's 527 functions in the top-level list, and 25 of them account for 1 or more percent of total runtime. Those top 25 account for ~ 45% of the total CPU time - so another 55% is being spent in the 501 functions remaining.&lt;br /&gt;&lt;br /&gt;You may now ask yourself what the problem with that is - just optimise those top 25 functions and you'll be fine. Unfortunately, those top 25 functions aren't being called in one place - they're being called all over the shop.&lt;br /&gt;&lt;br /&gt;Here's a example. Notice the strlen time:&lt;br /&gt;&lt;br /&gt; &lt;span style="font-size:85%;"&gt;&lt;span style="font-family: courier new;"&gt; 496      13.7816  squid                    httpRequestFree&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;  773      21.4782  squid                    httpHeaderPutStrf&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;9518      0.3432  libc-2.7.so              vsnprintf&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;  85433    55.6846  libc-2.7.so              vfprintf&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;  18212    11.8704  libc-2.7.so              strchrnul&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;  16037    10.4528  libc-2.7.so              _IO_default_xsputn&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;  13351     8.7021  libc-2.7.so              _itoa_word&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;  10872     7.0863  libc-2.7.so              strlen&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;  9518      6.2038  libc-2.7.so              vsnprintf [self]&lt;/span&gt;&lt;br /&gt;&lt;/span&gt;&lt;br /&gt;...&lt;br /&gt;&lt;br /&gt;Note that the CPU times above "vsnprintf" are from the functions which call it, and CPU times below "vsnprintf" are the calls which it makes. Its not immediately obvious that I have to optimise "vsnprintf" calls from the top-level trace, as most of the *printf() calls end up being to  "vsnprintf" (which shows up at 0.3% of CPU time) rather than "vfprintf" and friends.&lt;br /&gt;&lt;br /&gt;Its obvious here that finding those places which call the *printf() functions in performance critical code - and then exorcising them - will probably help quite a bit.&lt;br /&gt;&lt;br /&gt;What about the rest of the 500 odd functions? What I'd like to do is build aggregates of CPU time spent in different functions, including their called functions, and figure out which execution stacks are chewing the most CPU. Thats something to do after Cacheboy-1 is stable, and then only after my June exams.&lt;br /&gt;&lt;br /&gt;The important thing here is that I have the data to figure out where Squid does things poorly and given enough time, I'm going to start fixing them in the next Cacheboy release.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/351311855402780691-130514988872710433?l=cacheboy.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cacheboy.blogspot.com/feeds/130514988872710433/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=351311855402780691&amp;postID=130514988872710433' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/351311855402780691/posts/default/130514988872710433'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/351311855402780691/posts/default/130514988872710433'/><link rel='alternate' type='text/html' href='http://cacheboy.blogspot.com/2008/04/where-is-my-cpu-time-going-or-how-to.html' title='Where is my CPU time going? (or how to divine useful information from oprofile)'/><author><name>Adrian</name><uri>http://www.blogger.com/profile/17496219706861321916</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='33' height='25' src='http://bp2.blogger.com/_37wobiQ3zUs/SAKidE1C46I/AAAAAAAAAAM/F065c-5eNR8/S220/2004-12-17-adrian.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-351311855402780691.post-7387370436005412266</id><published>2008-04-23T10:48:00.000-07:00</published><updated>2008-04-23T10:50:35.521-07:00</updated><title type='text'>Solaris Event Ports for Network IO</title><content type='html'>What do I do at midnight to try and relax?&lt;br /&gt;&lt;br /&gt;&lt;a href="http://code.google.com/p/cacheboy/source/detail?r=12611"&gt;Figure out how to make Solaris Event Ports work for Network IO.&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;It took me a while to realise that "num" needs to be initialised with the minimum number of events you'd like to wait for before port_getn() returns. I haven't any idea whether this will restrict the returned event count to 1 or whether it will grow to MAX - this will need further testing. It &lt;span style="font-weight: bold;"&gt;is&lt;/span&gt; enough to handle single requests though, so its a start!&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/351311855402780691-7387370436005412266?l=cacheboy.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cacheboy.blogspot.com/feeds/7387370436005412266/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=351311855402780691&amp;postID=7387370436005412266' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/351311855402780691/posts/default/7387370436005412266'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/351311855402780691/posts/default/7387370436005412266'/><link rel='alternate' type='text/html' href='http://cacheboy.blogspot.com/2008/04/solaris-event-ports-for-network-io.html' title='Solaris Event Ports for Network IO'/><author><name>Adrian</name><uri>http://www.blogger.com/profile/17496219706861321916</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='33' height='25' src='http://bp2.blogger.com/_37wobiQ3zUs/SAKidE1C46I/AAAAAAAAAAM/F065c-5eNR8/S220/2004-12-17-adrian.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-351311855402780691.post-4778351968125703573</id><published>2008-04-20T02:58:00.000-07:00</published><updated>2008-04-21T05:35:04.898-07:00</updated><title type='text'>Knowing what your allocator is doing..</title><content type='html'>I committed a change a few years ago which collapsed the mem_node struct + buffer into one structure. This relieved quite a high volume of allocator requests, but it made the structure slightly larger than 4k.&lt;br /&gt;&lt;br /&gt;Modern malloc implementations (and its possible earlier ones circa 2001 did too; remember I was only 21 then!) have a separation between "small" and "large" (and "huge"!) objects. Small objects (say, under a page size) will generally go in a pool of just those object sizes. Large objects (from say page size to something larger, like a megabyte) will be allocated a multiple of pages.&lt;br /&gt;&lt;br /&gt;This unfortunately means that my 4096 + 12 byte structure may suddenly take 8192 bytes of RAM! Oops.&lt;br /&gt;&lt;br /&gt;I decided to test this out. This is what happens when you do that with FreeBSD's allocator. Henrik has tried this under GNUMalloc and has found that the 4108 byte allocation doesn't take two pages.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-size:85%;"&gt;&lt;span style="font-family:courier new;"&gt;&lt;/span&gt;&lt;blockquote&gt;&lt;span style="font-family:courier new;"&gt;[adrian@sarah ~]$ ./test1 test1 131072&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;allocating 12, then 4096 byte structures 131072 times..&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;RSS: 537708&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;[adrian@sarah ~]$ ./test1 test2 131072&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;allocating 4108 byte structure 131072 times..&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;RSS: 1063840&lt;/span&gt;&lt;/blockquote&gt;&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/351311855402780691-4778351968125703573?l=cacheboy.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cacheboy.blogspot.com/feeds/4778351968125703573/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=351311855402780691&amp;postID=4778351968125703573' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/351311855402780691/posts/default/4778351968125703573'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/351311855402780691/posts/default/4778351968125703573'/><link rel='alternate' type='text/html' href='http://cacheboy.blogspot.com/2008/04/i-committed-change-few-years-ago-which.html' title='Knowing what your allocator is doing..'/><author><name>Adrian</name><uri>http://www.blogger.com/profile/17496219706861321916</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='33' height='25' src='http://bp2.blogger.com/_37wobiQ3zUs/SAKidE1C46I/AAAAAAAAAAM/F065c-5eNR8/S220/2004-12-17-adrian.jpg'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-351311855402780691.post-2388988728945877924</id><published>2008-04-19T04:43:00.000-07:00</published><updated>2008-04-19T06:07:29.436-07:00</updated><title type='text'>"Dial before you Dig"</title><content type='html'>This is the sort of stuff lurking in the Squid codebase which really needs to be cleaned out.&lt;br /&gt;&lt;br /&gt;&lt;a href="http://www.blogger.com/The%20short:%20http://code.google.com/p/cacheboy/source/detail?r=12592"&gt;&lt;/a&gt;The short:&lt;br /&gt;&lt;br /&gt;&lt;a href="http://code.google.com/p/cacheboy/source/detail?r=12592"&gt;http://code.google.com/p/cacheboy/source/detail?r=12592&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;The long:&lt;br /&gt;&lt;br /&gt;I'm going through the legacy memory allocator uses and pushing the allocator initialisation out to the modules themselves rather than having some of them globally initialised. This will let me push the buffer allocator code (ie, the "rest" of the current legacy memory allocator uses) outside of the Squid/cacheboy src/ directory and allows them to be reused by other modules. I can then begin pushing some more code out of the src/ directory and into libraries to make dependencies saner, code reuse easier and unit testing much easier.&lt;br /&gt;&lt;br /&gt;One of these types is MEM_LINK_LIST.&lt;br /&gt;&lt;br /&gt;A FIFO type queue implementation was implemented using an single linked list. An SLIST has an O(1) dequeue behaviour but an O(n) queue behaviour - the whole list has to be traversed to find the end before it can append to the end. This requires touching potentially dirty pages which may also stall the bus a little. (I haven't measured that in my testing btw; my benchmarking focused on the memory-hit/miss pathway and left ACLs/disk access out - thus thats all currently conjecture!)&lt;br /&gt;&lt;br /&gt;The FIFO implementation allocated a temporary list object (MEM_LINK_LIST) to hold the next and data pointers. This was mempooled and thus "cached", rather than hitting the system malloc each time.&lt;br /&gt;&lt;br /&gt;The only user is the threaded aufs storage code - to store the pending disk read and write operations for a given open storage file.&lt;br /&gt;&lt;br /&gt;Now, the "n" in O(n) shouldn't be that great as not very many operations are queued on an open file - generally, there's one read pending on a store file and potentially many writes pending on a store file (if the object is large and coming in faster than 4kbytes every few milliseconds.) In any case, I dislike unbounded cases like this, so I created a new function in the double-linked list type which pops the head item off the dlink list and returns it (and returns NULL if the list is empty) and re-worked the aufs code to use it. The above link is the second half of the work - read the previous few commits for the dlink changes.&lt;br /&gt;&lt;br /&gt;Note: I really want to move all of this code over to the BSD queue/list types. ARGH! But I digress.&lt;br /&gt;&lt;br /&gt;Initial testing shows that I haven't screwed anything up too badly. (~400 req/sec to a pair of 10,000 RPM 18gig SCSI disks, ~50mbit client traffic, 80% idle CPU.)&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/351311855402780691-2388988728945877924?l=cacheboy.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cacheboy.blogspot.com/feeds/2388988728945877924/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=351311855402780691&amp;postID=2388988728945877924' title='3 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/351311855402780691/posts/default/2388988728945877924'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/351311855402780691/posts/default/2388988728945877924'/><link rel='alternate' type='text/html' href='http://cacheboy.blogspot.com/2008/04/dial-before-you-dig.html' title='&quot;Dial before you Dig&quot;'/><author><name>Adrian</name><uri>http://www.blogger.com/profile/17496219706861321916</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='33' height='25' src='http://bp2.blogger.com/_37wobiQ3zUs/SAKidE1C46I/AAAAAAAAAAM/F065c-5eNR8/S220/2004-12-17-adrian.jpg'/></author><thr:total>3</thr:total></entry><entry><id>tag:blogger.com,1999:blog-351311855402780691.post-1565152223637940662</id><published>2008-04-17T18:44:00.000-07:00</published><updated>2008-04-17T21:55:11.577-07:00</updated><title type='text'>Development thus far</title><content type='html'>The initial release has been done. The cacheboy-0pre1 release is just a vanilla Squid-2.HEAD tree with the first part of the code reorganisation included - they should be 1:1 bug compliant.&lt;br /&gt;&lt;br /&gt;There's a developer who has found a bug in Squid-2.HEAD relating to larger-than-requested data replies in the data pipeline. That shows up during Vary processing. It shouldn't show up in Squid-2.HEAD / Cacheboy as I committed a workaround.&lt;br /&gt;&lt;br /&gt;The Squid-2.HEAD / Cacheboy stuff should give a ~5% CPU reduction over Squid-2.7 (dataflow changes), and a ~10% CPU reduction over Squid-2.6 (HTTP parsing changes).&lt;br /&gt;&lt;br /&gt;Next: sorting out the rest of the code shuffling - the generic parts of the mem and cbdata routines, and then a look at the comm and disk code.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/351311855402780691-1565152223637940662?l=cacheboy.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cacheboy.blogspot.com/feeds/1565152223637940662/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=351311855402780691&amp;postID=1565152223637940662' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/351311855402780691/posts/default/1565152223637940662'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/351311855402780691/posts/default/1565152223637940662'/><link rel='alternate' type='text/html' href='http://cacheboy.blogspot.com/2008/04/development-thus-far.html' title='Development thus far'/><author><name>Adrian</name><uri>http://www.blogger.com/profile/17496219706861321916</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='33' height='25' src='http://bp2.blogger.com/_37wobiQ3zUs/SAKidE1C46I/AAAAAAAAAAM/F065c-5eNR8/S220/2004-12-17-adrian.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-351311855402780691.post-6424672583771974544</id><published>2008-04-13T17:18:00.001-07:00</published><updated>2008-04-13T17:23:26.182-07:00</updated><title type='text'>Experiences with Google Code</title><content type='html'>I've imported a Squid-2.HEAD CVS repository (with complete history!) into Google Code. This -mostly- worked, although!&lt;br /&gt;&lt;ul&gt;&lt;li&gt;There were some Subversion gatewaying issues inside Google somewhere which made SVN transactions occasionally fail - they've rolled back these changes and things work again!&lt;/li&gt;&lt;li&gt;I'm not getting any commit messages for some reason!&lt;/li&gt;&lt;li&gt;SVNSYNC takes -far too long- : building the SVN repo from CVS took about 5 minutes. Syncing my local SVN repo to the Google Code repo? 2 days.&lt;/li&gt;&lt;li&gt;The size of my repository hangs browsers that try to run the Google Code "source browse" feature. Heh!&lt;/li&gt;&lt;li&gt;$Id$ tag version numbers have been munged into Subversion revision numbers. Argh! I wish it were obvious this was going to happen! (And because of the above times, I really can't be bothered re-building the repository just yet.)&lt;/li&gt;&lt;/ul&gt;All in all though I've been happy with the service - Google employees pipe up on the code hosting group and are generally helpful. The "wiki contents in SVN" trick is cute. The source browser was nice when it worked! And I like the simple UI for things like revision browsing.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/351311855402780691-6424672583771974544?l=cacheboy.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cacheboy.blogspot.com/feeds/6424672583771974544/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=351311855402780691&amp;postID=6424672583771974544' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/351311855402780691/posts/default/6424672583771974544'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/351311855402780691/posts/default/6424672583771974544'/><link rel='alternate' type='text/html' href='http://cacheboy.blogspot.com/2008/04/experiences-with-google-code.html' title='Experiences with Google Code'/><author><name>Adrian</name><uri>http://www.blogger.com/profile/17496219706861321916</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='33' height='25' src='http://bp2.blogger.com/_37wobiQ3zUs/SAKidE1C46I/AAAAAAAAAAM/F065c-5eNR8/S220/2004-12-17-adrian.jpg'/></author><thr:total>0</thr:total></entry></feed>
