Ecosystem Madness!

At long last: another run of the Ecosystem is complete. And they said it couldn’t be done!
This last run was a major pain in the neck, as I think y’all figured out from my whining last weekend. Over 100 new blogs have been added to the lists, bringing the total to 432. In addition, I tweaked the process in a couple of significant ways to improve its accuracy:
1) I spent a good deal of time reviewing suspicious-looking results by hand. In a number of cases, this resulted in modifying slightly the URL I was pulling data from for a blog, because the previous URL — while valid for a browser — was confusing my data pulls. So some blogs that just weren’t getting their links scanned properly before now are.
2) The process now supports Radio weblogs properly. For technical reasons, they would have confused the process before, but I’ve added some extra steps to the data cleansing phase to ensure they get handled correctly now. And to honor the occasion, I added Dave Winer himself to the list, even though he didn’t ask. (Hope you don’t mind, Dave!).
3) I discovered last weekend that the existing process was utterly failing to handle blogrolls for weblogs that used Blogrolling.com. The reason was obvious once I took a look at it: when you use Blogrolling, you don’t actually have the links themselves in the ‘source’ of your page — you’re just invoking a script to pull them from the blogrolling server. This is the problem that threw me into a fit of despair last weekend. However, with a clue from Jason DeFillippo himself, I figured out a way to add yet more steps to the process to pull Blogrolling links for those blogs that use it. There is a catch, however: these links are added to other blogs inbound totals, but they don’t show up on the blog’s outbound totals. So if you use Blogrolling, you’ll see your # of outbound links on the Hall of Link Sluttage is lower than you’d expect. Why am I counting them for the Inbound list but not the Outbound? Because it was an even greater pain in the neck to figure out how to count them for outbound, but at least this way, the Ecosystem list — which everyone cares most about — is as accurate as can be. Sorry; that’s life.
I also added a number of new subcategories to the Ecosystem list, since it’s getting so huge. Several of them came at the suggestion of Dr. Weevil. So there, Doc — and you thought I’d never get around to using your suggestions!
At this point, it is likely that this will be the last run for a few weeks. The process is now essentially completely falling apart, I’m afraid, due to the large volume of blogs I’m now trying to process. The end results are now more accurate, I think, than they’ve ever been — but it’s requiring a massive amount of manual work to get there. So, be warned that until I get enough time to do the major re-engineering effort I keep planning on doing (rewriting the whole deal with scripts), this may be the last run for a while. (If there are any hardcore UNIX scripting folks out there who want a challenging project to take on, let me know. Particularly if you’re a PHP/MySQL type, although I’ll take what I can get at this point).
Anyway, ’nuff said for now. Enjoy the lists, don’t take it all too seriously, and remember: no wagering!
And by the way: I have found a way to quash the defiant microbes who dared to stage an uprising a few weeks back: Their ringleader, Lynn, is now a Flappy Bird, and her minion Floyd has already ascended to the level of a Crawly Amphibian.
Looks like you’ve been co-opted by The Man, kids. There goes your street cred.
[Mr. Burns Voice]
Exxxcellent.
[/ Mr. Burns Voice]
Update: Lynn’s minion Floyd is neither her minion, apparently, nor Floyd. He’s actually Fred, and he claims to have been de-minionized on account of his failure to pay union dues. Hmph.