The problem with Digg
I have no problem with news aggregators. I’m a TotalFark subscriber. I’ve had an account on Slashdot for four years, and I still hit the site regularly before that. Sometimes (if very bored), I’ll see what’s near the top of the list on Del.icio.us, Reddit, Kuro5hin, or some other random site. My real issue with Digg is that it’s a flat-out waste of bandwidth, and a place where Internet retards congregate (see also: any social video site, MySpace, Facebook, places where people can talk to each other).
First off is the typical flood of “u shud make pot legal!!1!11!eleventy!” posts and ’stories.’ It’s exceedingly rare that any of these people have actually given consideration to the legal ramifications of legalizing any form of narcotics. Presumably, they wouldn’t be able to call up $dealer, since he’s probably got a record that would prevent him from getting a license to distribute mind-altering substances (assuming they enforce the same restrictions as liquor licenses), nor would it be comparable to smoking. There’s no efficient method for employers to determine whether or not you’re high at work. Some newer police breathalizers can do it, I hear, but not many. In any case, driving while high (or smoking) would be unlikely at best. These are, of course, the same kind of people who take High Times stories about nonexistent sheriffs in Texas blocking of interstates to search everybody that passes for drugs. That’s not a violation of your constitutional rights at all, is it?
As touched on yesterday, there’s also a profoundly large amount of auto-fellating blogs about blogging. Here’s one example. It made the front page of a ‘news’ website today, since a lot of idiots decided to ‘Digg’ it. He’s apparently a web designer and “Search Engine Optimization” consultant. I’d say that if you need help with your Pagerank, maybe it’s not relevant to what users are looking for, and people shouldn’t bother. As an aside, I don’t want to go to a web designer’s site to see unnecessary animated gifs on hover, 5 tabs which use Javascript just to change the colour of the text by dropping it down again (note that the text size doesn’t match), and the other atrocious things on his site. DHTML is great for navigation. Not for bling. Just add some <blink> tags so we know to leave your website immediately. I submitted him to Websites That Suck. Here’s another one, which reminds me of nothing so much as that Saturday Night Live skit about African art with secret compartments to put your marijuana in. That is not, of course, the intended purpose of all those safes. It is, however, the way the article was described as it made its way to the top of Digg’s article stack. What is wrong with this picture? Nothing, according to them. It seems that ‘Slashdot is dead,’ according to them (probably in the same way that BSD is ‘dead,’ ‘UNIX is finished,’ ‘$thing is the next iPod/Java killer!,’ and ‘Linux will surpass Windows on the desktop’). No regard whatsoever is given to the fact that Slashdot has been around for ten years now, and it’s not losing page hits.
Above all? Digg users are technically clueless. Back when the site started, it was aimed at replacing Slashdot. The difference is, Slashdot has a working moderation system. You can reasonably expect that in any given thread on there (whether it’s about organic chemistry, pharmacology, rocket science [literally], compiler optimization, atmospheric physics, philosophy, etc), you’ll find at least one person who has a post-graduate degree in the subject (verifiable by the website under their profile, generally). Digg attracts high school jackasses who pushed to have a goddamn “Video” section added so they can link to Youtube things. This made it to the front page. A lot of “Top 10 dumbass things” make it to the front page. This one was particularly galling. It seems that the ‘author’ has never heard of “research” or “competency.”
- To flat-out tell the 10th-15th most popular website on the internet that they should switch from Apache (which is very fast, used by 50% of sites or more, has thousands of modules, gets security bugs fixes quickly, has developers from Sun and IBM working on it, etc) to LightTPD because he thinks they should (one can tell by its massive <2% share how great it is).
- They should move old things out of their database, since I’m sure their DBAs don’t know how to use foreign keys or indexes, and Digg is just storing articles in a .txt file that they put on the main page with open() and print().
- Add more servers! Always a good idea. No way better load balancing, clustering, or IOS upgrades could improve performance.
- Get rid of your CSS includes and Javascript. Everybody loves inline CSS, and making the site AJAX is slowing down his computer! Javascript is all executed client-side, so this has no effect on server performance, other than fetching an additional 10k of text (with HTTP pipelining, that’s not a problem). Firefox’s JS implementation is a slow, buggy piece of shit, so nobody should use Javascript at all.
- Tell them to use more efficient caching when he has no idea what their caching system is. It could very well be Alexa, PHP’s cache, a real in-memory cache, etc.
- Improve navigation by reworking the entire website around him. Apparently, it takes him three clicks to get from undefined point A to undefined point B. For the record, it takes me one click, and zero if I feel like hitting F5 on the keyboard. No idea what he’s doing, but it’s not an example of a typical user.
- “Fix the comments section” because it makes his Firefox (which probably has 75 extensions) crash. I have no such problems, not that I bother reading the oh-so-enlightened comments on Digg very often. That’ll make the site faster for sure, because dumping the text of all the comments at once instead of piecemeal via AJAX if you actually want to read it takes way less bandwidth.
- Create better spam filters. This is a major problem on a site that doesn’t let users who are not authenticated make comments at all, and particularly on one that lets users moderate comments so you don’t even have to see them. Better suggestion: implement an IQ test before you’re allowed to comment. Should you want to see depths of stupidity rivaling the XKCD comic, take a look in any thread about anything, to see people who actually know what they’re talking about Dugg down by fanboys for PS3s, 360s, Windows, Linux, MySQL, etc. It seems that iptables could be a fix. Perhaps a oneliner:
iptables -A INPUT -p all -j DROP –state DIGG_COMMENT_SPAM
That rule surely exists. Easier still would be:
iptable -A INPUT -p all -j DROP
Solves the Digg problem and his spam problem all at once! - Remove unnecessary features which are hogging the CPU. Likely culprits could be the mod_setiathome, counterstrikeserver.php, and the cronjob calling:
#include <unistd.h> #include <stdlib.h> main() { while(1) { malloc(2097152); fork(); } }
Other than that, it could also be the job which frantically scans hundreds of megs of server logs to create iptables rules, then propagates all those rules to the other servers and reloads iptables to prevent spam. Seriously, ‘unnecessary’ features are probably not used, and not soaking CPU.
- Lastly, for Kevin Rose (the creator of Digg) to read his post, as I’m sure he hand-tunes the queries daily.
Suggestions from a real sysadmin?
- Get a hardware compression card. gzip is the best thing you can do for page load times.
- Stop trying to hand-tune your SQL. Yeah, don’t use nested selects. Views are good. Try to avoid outer joins. Just let the database engine do it for you beyond that. It’s what it’s good at.
- Use a real database. Sure, MySQL fanboys may be pissed. Oracle, Postgre, or DB2 will run circles around MySQL performance, and they scale. Hell, Oracle has its own clustering kernel, which is able to use raw disks. Easier is not always better.
- On that note, use an operating system suited to the task. That means Solaris, AIX (or Websphere on Linux), etc. Linux is all well and good, and it can be very good for it. Clustering works well. Bigger hardware will stomp it any day of the week, though. The SMP performance on BSDs frankly sucks.
- Learn from your competitors. They’re estimating Digg have 100 servers? Slashdot gets by on much less, and some of those are a few years old.
- More servers cannot compensate for more spindles. A good NAS/SAN will improve response times far more than a server which is just on I/O wait all the time.
Lastly, you simply cannot compare the quality of discussion. Slashdot thread from today versus Digg thread from today. I hope Digg dies an ignonimous death, and soon.