Posts tagged: Plans

New Plan

I spend way too much time doing, well, nothing of consequence. Between Slashdot, Reddit, and Fark, I probably spend 30% of my waking time checking news (to say nothing of all the feeds Google Reader). I’ve come to the conclusion that this is really a waste. Ironically, a story on Slashdot pushed me in that direction (namely, that Stanford is starting to put courses online similar to MIT’s OpenCourseWare, except that they’re trying to only post complete courses). What the hell am I doing with my time?

To be sure, keeping up on news has its high points. It’s nice to be informed. Really, it is. It occupies my free time when I’m at work, and I think that’s part of the problem. There are better things I could be doing with my time. I don’t intend to stop reading the news at any point, but Google Reader will keep me up on that. I do intend to stop using my laptop for 99% of things at home, and actually migrate down to my desk (in front of my workstation and my server). Why? Ahh… so I can stop getting distracted by the television.

As irritating as it is to me (and the power bills are irksome), the television in my bedroom is on 95% of the month. Heather watches a shitload of television, and it’s too hard to lose my concentration with $whatever playing in the background constantly (even if it is Animal Cops or something). I watch very little TV. Good Eats. Mythbusters. The Daily Show. Battlestar Galactica (which will be over soon enough, at any rate). The Wire (which is already done). That 90 minutes between Good Eats and The Daily Show gets sucked into random channel flipping too often. Even C-SPAN is distracting at times.

At this point, I have a fairly good grasp of Hebrew (grammar, if not vocabulary). The intention, I suppose, is to start doing something else with my spare time. Going to school (which I am) is one thing, but my course load is light of necessity (financial necessity, more than anything, since I don’t want an assload of student loans while I still have credit card debt, and $work only covers $5,000 a year), and it’s not challenging. At this point, even though experience, aptitude, and intelligence are helpful in getting jobs, the piece of paper still counts for a lot. In particular, not having one excludes me from getting a state/federal job until such time as I have 6 years of experience or whatever. Also, it’s surprisingly difficult to move into a development role when I have years of systems administration experience, with scripting/development as a secondary duty.

I miss learning, I suppose. I do, however, have nearly as much free time in the day as I spent in high school (at the time). This free time is usually sucked up with errands, chores, and news reading when I can fit it in. That’s got to stop. What I’m planning to do now is to divide my time into blocks. I should be able to fit at least two hours a day into two distinct subjects, and possibly more on top of that. Certainly, I didn’t spend that much time actually learning in high school (largely because high school was a waste of time where the majority of my classes had nothing to teach me).

The plan now, I think, is to stop by Half-Price Books and grab some textbooks. I don’t have to take calculus in college, since I managed to test out of it. I kinda think that was by the skin of my teeth, though, since my grasp of it is hazy at this point (other than basic differentiation and integration). I’ll probably start with Calc I, then Calc II, Multi-Variable, Diff. Eq., Linear Algebra, and a couple levels of statistics thrown in (maybe after Diff. Eq.). I figure that at two hours a day, it’ll take me a month to a month and a half to get through a textbook, including the exercises, of course.

I’ll be doing Arabic contemporaneously. I mean, I have a fair amount of Dan’s coursework from DLI that’s been sitting around for years. It’s time to get to it. I don’t honestly have any idea how long I expect that to take me. I mean, having the entire MSA Basic course plus the MSA Intermediate course, and some dialect CDs besides? It could be quite an undertaking. Also at two hours a day, I guess. Given that the course at DLI was what, 64 weeks, 8-10 hours a day, it might take me a long fucking time. Then again, I don’t know how far that course went (past Intermediate?), nor do I have any idea how closely the pacing at DLI meshed with how quickly Dan (and probably I) acquire information. The attrition rate says something, but that something says little about how challenging I’ll find it. It will, of course, be paired up with al-Jazeera once I have a decent grasp of it. Hell, I still watch Deutschewelle, plus reading some Solaris blogs in German (c0t0d0s0, for instance, is German about 50% of the time), and after nearly eight years, I still don’t have any problems with it.

I haven’t stopped studying things, but, I suppose, it loses its luster after a while. Sure, I’ve learned three programming languages in the last year. I’m finally back on learning Rails (even if the job for the Rails consulting company doesn’t mandate that I know it beforehand, it’s not all that tough to learn now that the API has stabilized, and I’d like to be prepared for the interview), and I’ll probably play with Django or Pylons after that just to see how it compares (though I think I’d probably enjoy Ruby/Rails development more than Python/whatever, and deployment is a hell of a lot easier with JRuby and Glassfish/Tomcat/JBoss, by just dumping a .war on there). Working on expanding my .NET knowledge at work, too (ASP.NET, specifically), and I’ll no doubt continue that. However, it seems like such a limited focus (programming, that is), and I feel like if I re-learned calculus on my own, I probably wouldn’t forget it.

That, then, is the plan. No idea what to do after that, which will probably be a year and a half. After mathematics, chemistry/physics? After Arabic, Russian/Chinese (maybe not another language, anyway, maybe just linguistics, but Sanskrit might fill in for that nicely).

Tags:

categories General

/sigh

I’m fed up with Dreamhost’s performance. Well, to be fair, I’m not entirely sure it’s Dreamhost’s fault. Part of it could be the god-awful slow Javascript parsing of Firefox. It doesn’t help that Firefox takes 450MB of memory for 56 tabs (yes, it’s a tad ridiculous, I realize that), when Opera takes 170MB for the same. I haven’t really touched Opera in a while, since I’m too attached to Greasemonkey, Firefox’s Javascript console, and the DOM inspector. Opera seems to have have reasonable alternatives for those now (other than Greasemonkey). My one gripe at this point is Opera’s tab handling, which was a plus before. I’m finding myself preferring Firefox’s “endlessly scroll through your tabs” option (or the dropdown), just because I can see what they’re titled, and easily check whether or not I have new GMail.

I suppose now that GMail supports IMAP, I should just set up Opera to poll that, and the windows widget is pretty good, when it comes to it. At least it doesn’t slow to a crawl when Slashdot loads an animated ad (I refuse to use AdBlock for sites I actually like. Slashdot’s whitelisted, and I find myself occasionally clicking their ads). The RSS feeder wipes the floor with Firefox, it doesn’t peg the CPU when I open it up (along with however many tabs I left last time I closed it), it remembers page and window positioning between instances. I kind of wonder why I ever switched.

It doesn’t help that the clueless dipshit who wrote one of our monitoring applications has no idea how threading is supposed to work. A program with a 20MB footprint should not soak 50% of a 3Ghz Xeon every 4-5 seconds while it polls. I haven’t looked at the source, so I have no idea what’s happening there, but it can’t be right. One of my Perl scripts (which totals HTTP hits) chews through 4GB of logs every day in about 6 seconds, at 30% CPU. I find it hard to believe that a non-forked Perl script is somehow more optimized than the C# threading library. It also doesn’t help that Outlook takes an extraordinary amount of memory to do anything, nor that Windows aggressively swaps programs you haven’t used in a little while. That’s nice, except that I have, at any given time, 9-15 programs open. Putty’s fine to swap. Outlook, WINWORD (which Outlook still calls for composing messages, even plaintext), IE, Citrix, and the like are not. A 10 second delay when I click on Outlook again? Nuts to that. If I could convince our Exchange admin to turn on IMAP/POP, I’d just move to Linux. MAPI sucks, and I’ve never gotten Evolution’s Outlook Web Access plugin to work properly.

VMware is a possible solution, but it’s ridiculous to virtualize Windows just so I can run Outlook. Similarly, I’d like to get our AD admin to enable LDAP spanning so I can get our *nix systems on the domain and stop replicating the forest to an internal LDAP server just to keep accounts synced.

As it turns out, it’s not just a problem at work. Dreamhost’s response times are pitiful from home, too. Nine seconds to respond to a HTTP request? Pass. I’m seriously considering migrating to Joyent’s OpenSolaris hosting, even though it may cost more. However, they only let you run one Mongrel (the server Rails works best with) instance. That’s fine, and Rails should respond in virtually no time. However, I need to more closely research Apache reverse proxying. I could move to Typo, Mephisto, Radiant, or some other system for blogging, but one Mongrel instance isn’t going to cut it if I’m running a few Rails apps, and Mongrel doesn’t handle PHP. Maybe FastCGI performance is better at Joyent. I don’t know. Just that I can’t handle this pitiful performance anymore.

I’ll likely see what sustained performance is like on the Intellistation (which will be a web server) via a redirected subdomain monitoring a SNMP daemon (realtime CPU/network graphing). I know my home connection holds up really well via FreeNX, but it remains to be seen whether or not Comcast decides to block port 80 if they see a lot of traffic.

As a total aside, I feel like it should be “an HTTP request” and “an SNMP” daemon, though all proper rules of English say it should be “a HTTP request” or “a SMTP” daemon. IETF (SNMP) and w3 (HTTP) both have websites which agree with the usage of “an” (via a Google search for “an SNMP” vis-a-vis “a SNMP” and likewise for HTTP), but I’ve yet to find definitive rules for usage with regard to acronyms. Instinct tells me it should only be used when it’s referring to a singular adjective phrase versus a predicate or plural, but I can’t establish why. Any thoughts, grammar Nazi?

Also, I highly recommend A Fine Frenzy’s CD.

I’ve mastered the art of squandering my time

Hell, I can’t even find time to write a blog on a semi-regular basis. It’s a little sad. In the time since my last post, I’ve decided to change my home network architecture. Energy prices aren’t getting any lower, and speed just isn’t keeping up any more. The C3600, Octane2, and DS20E have got to go (or at least be powered down). I’m ganking a recently decommissioned Intellistation 6224-33U from a colleague. IBM refuses to upgrade the firmware for dual-core support (ditto for Sun on their W2100z, which is built on the same platform), probably because it would have killed the incentive to upgrade to a 6217 (same platform, does have dual-core support), though the 6217 has PCIe. So, it’s a dual Opteron 254 with 4GB of RAM. Currently one U320 hard drive and a Quadro FX 1100. I don’t really give a damn about the Quadro, but hey, it’s there.

I aim to put in a PERC 4/DC (dual channel PCI-X U320 controller) and 3 more U320 drives (36GB) I have laying around. It probably won’t even cap a channel, but there’s not really enough internal expansion to add more unless I rip out the CDROM drive and put a 3 bay hotswap cage. That’s a distinct possibility, but if so, I’d be doing it with a PCI-X SATA (or SAS, whichever is cheaper, since SAS supports SATA drives) controller. There’s only two SATA ports on the motherboard, and even with eBayed SCSI prices, it’s just not worth it. Sure, I could get 3 147GB 15k drives or 3 300GB 10k drives, but I don’t see the point. They’re still going to cost as much as 500GB SATA drives (or more). The SCSI will be used for a decommissioned external array. External SATA arrays are ludicrously expensive, given that I’d rather use eSATA if possible instead of a depreciated standard (internal connectors for external devices). SATA arrays with U320/Fibre Channel interfaces are ridiculously expensive, and they pretty much all use (crap) proprietary hardware RAID. Sun makes a few JBODs that’d work, but again, ludicrously expensive. A Powervault 220S (I already have one, but more never hurt) with dual U320 controllers is $99 on eBay with sleds (no drives), so I’ll attach one of those if I need more spindles or more space.

It is, of course, entirely possible that the price of SATA arrays with FC/SCSI connections will drop by the time I need to add more storage, but I’m not counting on that for now. In the meantime, I’ll be putting 2 400GB SATA drives in. I’m just not sure on what kind of filesystem layout I want to have. I’ve got an Intel Pro/1000 MT (quad GigE) PCI-X card that’ll be going in there. The Fibre Channel array is a nogo until I decide to actually spend some money and pick up PCI-X HBAs on eBay, since it’s a PCI64/66 card. I’m not sure how many PCI-X buses the Intellistation has, but PCI-X is a parallel bus, so adding a 66Mhz card would drop the throughput of whatever bus it’s on to 533MB/s. That’s no good. If it ends up being on the same bus as the SCSI controller and GigE card, I’d be capped.

It’s not that PCI-X (64/100) is much better (800MB/s), but I’d rather avoid bringing it all down. It -should- be a split bus. prtconf -pv lists five PCI-X bridges, but until I actually try swapping cards in, it’ll be hard to tell if it’s actually split electrically, or if they’re just hanging the onboard USB/GigE/SCSI/SATA off different bridges and all the PCI-X ports are on the same bus. If it’s actually split, then no worries.

The idea is this:

  • Intellistation A Pro 6224-33U. Dual Opteron 254 with 4GB of RAM running Solaris Express Developer Edition (SXDE).
  • 4×36GB U320 drives, HW RAID5 on the PERC4, stuck in a ZFS pool
  • 2×400GB SATA150 drives, pooled with the SCSI drives
  • 10×73GB FC drives over 2×1GB HBAs (if the bus is split), pooled with the rest. If it’s not split, grab a dual 2GB PCI-X/133 FC HBA off of eBay when I have $200 to blow and attach it that way.

Additional possibilities:

  • Powervault 220Sx2 with whatever drives I can scrape up. This would mean moving the current U320 drives to the onboard SCSI controller (again, dual channel U320, just that it’s only RAID0/1/0+1, not 5, and there’s no offload engine or battery backed cache). 80 pin (SCA-2) SCSI drives are much cheaper than 68 pin, since there’s a ton of servers getting decommissioned. SCSI (well, parallel SCSI) is disappearing as the trend to SAS and SATA drives continues in the datacenter, this should only get better for me.
  • Some other kind of FC array.
  • FC or SCSI array with SATA drives. MTTL is much lower, but /shrug. It’s cheap!
  • Bump the Intellistation to 8GB of RAM, assuming DDR1 ECC prices get better (unlikely). If it comes down to it, I’d rather have more spindles than more RAM anyway.

It can, at the very least, take over the role of OpenVPN server, DNSMasq server (I like having DNS on my home network), Postgre server, Oracle server, SSH gateway, and LDAP server if I feel like being a pain in the ass and making everyone authenticate to the server (plus RADIUS) to get on my network. I don’t have encryption set up on my wireless network, and I’m not about to change that, but I could (should) set up a trunked VLAN subnet on the wireless which can only get out to the internet (and not route to the wired network) until you authenticate, at which point you get into the main subnet. I mean, what if some random person comes to my house (or parks outside) and needs the internet, like, now! Sure, there’s a coffee shop a block away, but what if they need it at 3AM when the coffee shop is closed?

Now, concerns…

ZFS keeps an ‘intent log.’ Similar to most journaled filesystems, it’s got a record of what it does and doesn’t do. Unlike most journaled filesystems (jfs, reiserfs, ext3 -j/ext4, NTFS, HFS+, VxFS, XFS), it doesn’t check the filesystem and replay the journal if the system crashes. That’s not an issue in many cases. ZFS relies on filesystem metadata and self-heals. Due to this, ZFS requires that the writes be committed (fwrite()) every time. Without replaying the journal, writes could be lost on power loss, and there’s not a way I know of to automagically fsck the filesystem when it comes up (actually, to my knowledge, fsck.zfs doesn’t exist). That being the case, you’d end up having corruption. ZFS can fix that. If it’s in the kernel or essential processes, though? You just hosed the server.

The entire point of a cache-backed drive is that it waits for sequential writes so it’s not constantly flipping around the platters, which really helps performance. A cache backed controller immediately returns success to the OS, though the write is not committed yet.When the system comes back on after a power loss or crash, it flushes the cache to disks, and you’re good to go. With flaky SATA drives, JBODs on a plain Jane controller (no cache, which a lot of Fibre Channel HBAs are), forcing a sync is good. With the cache backed controller, it’s bad. Solaris has a syscontrol setting you can change to prevent this from happening (while still leaving the ZFS Intent Log up and running, though turning that off is another way around it, which is not at all recommended). That works great if everything in your ZFS pool is cache backed (real hardware RAID arrays, drives run by a cache-backed controller, etc). In a mixed environment (as mine will be)? I take either the risk of poor SCSI performance or data corruption. I could forgo the hardware RAID, but then why use the PERC at all? The only advantage I can see is that I’d still have a writeback cache, which would be flushed far too often. There’s a way to set this in per controller in sd.conf, but that’s for Fibre Channel LUNs, not SCSI. Turning off ZFS’s cache flushing would negatively affect performance on the SATA disks. Best solution for now? Make each SCSI disk its own logical drive on the PERC, then zpool those with the SATA disks.

The max throughput of GigE is 125MB/s. Given protocol overhead, 80-90MB/s is more realistic. The cost of a GigE switch which supports 802.3ad (link aggregation) is $50. That being the case, I’m going to put another wireless router in my house in repeater mode, upstairs (where my computer is, and where this’ll probably be), put a 802.3ad switch on it, connect the GigE on the Intellistation to the router, and the quad GigE to the switch. That’ll solve the problem with certain Broadcom wireless cards not coming up until I log into Gnome, since I’ll just wire them, plus I’ll have the advantage of being able to issue WOL packets (Wake On LAN). Link aggregation effectively makes multiple NICs appear to be one, along with the bandwidth. Intel’s got a proprietary way to do it via ‘teaming,’ but that’s only supported on their cards. Yeah, I have one, but Intel’s implementation is nonexistent on Solaris. Fortunately, Solaris doesn’t need it. I can aggregate whatever I want, regardless of the vendor (take that, Linux bonding, FreeBSD IPMP, and Windows lack of any comparable feature!).

I’ll have four aggregated GigE connections on the switch with a different subnet, so lookups for filesharing succeed with an IP in the hostfile (rather than routing through the wireless for no reason). This gives me an optimal throughput of 360MB/s or so on the network, and that can always be increased via another Pro/1000 MT (PCI-X versions are cheap!). I’d have to pick up a PCIe multiple port GigE card (another Intel, probably) for my desktop if I want more than 90MB/s, but that’s not necessary just yet. It’s faster than my hard drive is, anyway. Ideally, once throughput gets high enough (more spindles), and I have more throwaway money, I’ll pick one up. PCIe has a direct link to the CPU/RAM anyway, so it doesn’t need to touch my hard drive if I’m just streaming it over the network into RAM.

How to share it, though? NFS and CIFS (Samba) are both rather unintelligent, and they issue an assload of commands for everything. Not a big deal on copying a few large files, but ever tried to move a ton of small files (say, music) over the network? Suck. NFSv4 fixes this. I don’t know of a Windows NFSv4 client. SMB/CIFSv2 fixes this. That’s only supported on Vista and Server 2008. What do I do here, then? I could install 2008 in a virtual machine just to share files. Seems like a damn waste, and I’ll never touch 99% of what it does. The machine’s going to be headless, I don’t want to use RDP. I don’t want Active Directory on my network. SMB/CIFSv2 is the only thing Server 2008 offers me. Solaris does everything else I want to do better. iSCSI has none of this overhead, but I’d need to specify a create a volume, export it to a system, then format it on the client. I can’t get direct access to that from multiple clients, and it doesn’t grow nicely. Yeah, I could create another iSCSI device, export it, mount it, and use the support for Volumes windows has to span them, but that sucks, and I still can’t access it from multiple systems. So, I could create a VM, install Server 2008, have it share the volume, and add more virtual disks as necessary (again, spanning via Windows) to share.

Again, this is not an ideal solution. My storage isn’t unified, and it’s a big hassle for me to go add more. Creating a 22TB ZFS pool at work took 15 seconds. Any idea how long that takes on Windows? Plus I have to go through filesystem checks if it crashes. I don’t really know what I’m going to do about that. Latency on closing a 1K file via NFS is about 4 seconds. It’s similar for CIFS. Assuming the process reading/writing is multithreaded, it shouldn’t bottleneck. I have no idea how many of the applications I use are actually multithreaded, though, and I don’t really feel like digging around Process Explorer to find out.

Best case scenario?

Get a Thumper. Given that I don’t have $25,000 to blow? Get a PCI-X SATA card and an external case. The idea here is to save money on electricity, and attaching 3 arrays with redundant power supplies isn’t going to help that. A single case with a 300W with 8 SATA drives and cables funneled out of the Intellistation might work, since every company out there seems to be full of jackasses. It can’t honestly be that hard to support the SATA2 spec (no, it’s not 3GB/s throughput max) and give me a cheap port multiplier. Sequential throughput on a SATA drive is about 70MB/s. Random access is closer to 40. With 2 SATA drives plus 4 SCSI drives, I ought to be able to saturate a single GigE link pretty easily for now. With more drives, that’ll go up. Ten SATA drives plus the 4 SCSI should put me over the cap for the quad GigE. It’s not like I can’t add another card and aggregate those, too, but how fast do I really need it? PCI-X might disappear also (PCIe is gradually replacing it in servers), but the price of quad GigE PCI-X cards can only get better.

Just think of my zoning times at 360MB/s! That’s not going to happen now (or for a while), but I should at least be getting twice the speed of my hard drive.

The real problem for scalability is that I only get two cores. Hopefully, by the time it doesn’t scale to the demands of database/fileserver load/whatever, it’ll be a long time from now. 1TB drives should be less than $100 in a year. Who knows what I could get by the time this is obsoleted? I’d still like to dangle SCSI/Fibre Channel arrays off it, but I don’t think that’s going to go over well.

Looking at moving, condos, marriage, etc. Given the cost of weddings, it’s unlikely that I’ll have extra money any time soon. Given the average size of a condo, I don’t think I’d get a good reaction from whirring and clicking arrays, no matter how appealing the blinkenlights may be, plus 3U equipment is loud (ok, not as loud as 2U or 1U, but 60dB isn’t quiet). Regardless, perhaps I should appeal against condos with $250+ association fees on the basis that they’re costing me at least 1.5TB (7200RPM SATA) or 1TB (10k 300GB SCSI) a month, or more RAM, or something…