Almost fixed? – rob.neppell.org

OK, I’m pretty sure I’ve identified and fixed the problem.
How’s this for obscure: The way the Ecosystem works, at a high level, is that a job is submitted for each weblog which goes out and downloads the text of the blog’s front page, and then parses it to extract links. In debugging that routine, I traced the root problem to the fread command in PHP: instead of downloading the whole page, it appeared to be getting only the first few hundred bytes. Hence: only a handful of links would be found.
Now, I am fairly certain that when Hosting Matters moved servers, the version of PHP being used was updated to 4.3.2. (I can’t prove it, as I don’t have access to the old server now — but it’s definitely 4.3.2 on the new one).
Interestingly, if you read the PHP manual, there’s note that the behavior of fread has been changed in version 4.3.2. I can’t quite match what I’m seeing with what the note describes, but it seems awfully suspicious.
Especially since I replaced the fopen/fread combination with the new function get_file_contents, and shazam: all appeared back to normal.
Anyway; I’m running a one-off catchup run of the scan routines now, and if all goes well, things should be back to normal in a few hours. Will update again later…