The Ecosystem and Showcase rankings are not updated today, because a periodic problem I have with my MySQL database is back.
Basically, the Ecosystem has one huge table with almost a million rows in it which stores all the links that it harvests. Every week or so recently, it simply stops working. When I try to access it through phpMyAdmin, I get:
Can’t open file: ‘links.MYD’. (errno: 145)
Error
SQL-query :
SHOW KEYS FROM `links`
MySQL said:
Can’t open file: ‘links.MYD’. (errno: 145)
Back
Thus far, I’ve found no way to recover from this other than to drop the table, re-add it, and do a full rescan of all webolgs for links. (A full scan happens every day anyway, so thankfully this is never really a case of data loss).
Any suggestions from MySQL gurus out there?
Anyway: unfortunately I’m not going to be able to resolve this today, so stats will be off until tomorrow…
Update 11/15: Fixed for now. Thanks to all for the suggestions.
To describe the table in question a little better, here’s the creation script:
CREATE TABLE `links` (
`source_url` varchar(200) NOT NULL default ”,
`source_host` varchar(200) NOT NULL default ”,
`dest_url` varchar(200) NOT NULL default ”,
`dest_host` varchar(200) NOT NULL default ”,
`new` varchar(10) default NULL,
`blacklist` varchar(10) default NULL,
KEY `source_host` (`source_host`,`dest_host`),
KEY `source_url` (`source_url`,`dest_url`),
KEY `source_url_2` (`source_url`),
KEY `dest_url` (`dest_url`),
KEY `source_host_2` (`source_host`),
KEY `dest_host` (`dest_host`)
) TYPE=MyISAM;
I am fairly sure I am not using all those indicies at the moment and some are redundant/wasteful.
But basically, it’s a very simple table, and I’m not entirely sure there’s a way to rearchitect it to be more efficient. It’s pretty self-explanatory: it’s just a big collection of all links found during the Ecosystem’s scans, including data on the source and destination URLs. The ‘host’ is a somewhat clumsy concept which I use to try to identify a particular blog (so all links going to a particular blog may have different dest_urls, but they should all have the same dest_host).
Anyway. I’d welcome suggestions on how to a) avoid the routine corruption problem, or b) optimize this layout.
Thanks again all…
-NZB