Did some significant rearchitecting on the Ecosystem scripts today, both to improve performance and squash some nasty bugs that were preventing the scans from actually hitting all blogs every day.
The major change is that now the process is multi-threaded: ecoscan.php executes a loop which grabs one weblog at a time and spawns an instance of scanblog.php to actually grab the data. It also monitors itself to ensure it never has too many processes running (currently set to ten) so it doesn’t crush the server.
This improves performance by parallelizing the main work, of course, but also minimizes the impact of an issue where the scan routine just hangs trying to open some blogs on occasion (it still hangs, but now you only lose one thread of many).
Anyway, data quality should be improved today, and hopefully will continue to trend upwards. release of the relevant files is available at SourceForge.