Posts tagged: Python

XMonad and xmobar on OpenSolaris with functional monitoring (mutt to boot)

I’ve been having carpal tunnel flareups lately, so I went looking around for things I could do to use the mouse less and the keyboard more (as counter-intuitive as that may sound, I find that holding a mouse for hours irritates it far more than any amount of typing, YMMV). Vimperator is an obvious first step, but, well, I love vi, so I already had that running.

Tiling window managers came to mind. I’ve already used wmii and dwm once upon a time, but they’re hardly state of the art (as state of the art as tiling window managers get, anyway), and hacking together a reasonable workspace status bar in dzen/Perl didn’t appeal to me. Awesome3 (the window manager) does appeal to me, but getting it running on Solaris looked like a little more work than I wanted to invest, and I’m sick of working with moving targets (Awesome3 looks like they’re almost as “break your config” happy as Rails).

Mind you, I still love Openbox, but given that 99% of my time is spent in terminals (irssi, mutt, ssh, vim, mcabber, slash’em), I see no reason why I should even bother with having window decorations and manually arranging them at all.

Had I known what I was getting into, I probably would have just used Awesome. I mean, it needed two libraries I didn’t have, and some dzen hacking. Not… this. Not that I’m unhappy with XMonad, but…

Firstly, there’s no build of GHC (Glaskow Haskell Compiler) in the OpenSolaris repositories. There’s a pre-compiled version of GHC 6.10, but only for SPARC. Pre-compiled version of GHC 6.8 for x86/amd64, but that ain’t helping me (a scary amount of stuff from Hackage, Haskell’s version of CPAN/rubygems/whathaveyou doesn’t want to run in GHC 6.8, and the recommended fix for some bugs is “upgrade to 6.10″).

No GCC 4 in the repos either, and GCC 4.1.2 is the recommended version for building GHC. So, onto the magic. Don’t even try with SunStudio. GNU-isms in the code stopped me dead.

I was bitching to Dan about how ridiculous this process was a few weeks ago. Maybe it’ll help somebody.

Firstly, either install readline from the OpenSolaris Pending repositories or compile it yourself.

Next, we need to bootstrap gcc. For that, we’ll need gmp from GNU and mpfr. Grab the precompiled version of GHC 6.8.2 while you’re at it. You should also get the newest versions of ghc and ghc-$version and ghc-$version-src-extralibs to get running later.

Complaint #1: GNU automake is braindead. It, I assume, just checks `uname` and not `isainfo`, so I can’t tell when we’re running 64 bit. Either use Solaris libtools or do the following for gmp and MPFR

./configure ABI=32 && gmake && pfexec gmake install
make distclean
configure --prefix=/usr/local/lib64 && gmake && pfexec gmake install

If you don’t specify another prefix, it’ll stomp all over the 32 bit libraries it just installed on your 64 bit box.

Complain #2: gcc is even more braindead. It’ll build a 64 bit binary but link it against 32 bit libraries, then eat itself during stage 2 bootstrap. You’d think the FSF would be smarter, but no. It just finds the wrong ELFCLASS down the line. To correct:

export LDFLAGS=-L/opt/local/lib -L/opt/local/lib/64 \
-R/usr/local/lib:/usr/local/lib/64
export LD_OPTIONS=-L/opt/local/lib -L/opt/local/lib/64 \
-R/usr/local/lib:/usr/local/lib/64
./configure && gmake -j4 && pfexec gmake install

Complaint #3: The precompiled ghc-6.8.2 we got? It sucks. The rts library is broken (check it with ldd). It would be nice to avoid this, but, well… Bootstrapping ghc from C sources and no Haskell compiler involves another goddamn system which DOES have Haskell installed, AND knowledge of what registers your CPU uses. Whoever thought up that notion of bootstrapping? Well… The “goal” is to have Haskell self-bootstrap (it currently does not, since, ironically, Haskell is too “pure” to actually be written in a “pure” language, since we need dirty things like actually doing something useful with “impure” data, like user input or stuff sucked in from a file).

cd ../ghc-6.8.2
./configure && pfexec gmake install
ghc-pkg describe rts > rts.pkg
vim rts.pkg
#add -R/usr/local/lib to the end of the ldoptions field, or ghc bombs bootstrapping the new version in stage 2
ghc-pkg update rts.pkg

Complaint #4: GHC is even stupider than GCC, if possible. Not only do we have to prepend /usr/local/bin to $PATH so GHC can find our shiny new gcc-4.1.2, we have to pass ridiculous amounts of config flags (including one which tells it where GCC is — if the $PATH OR –with-gcc is wrong, it won’t bootstrap. Don’t ask, because I don’t know why).

export PATH=/usr/local/bin:$PATH
./configure --with-gcc=/usr/local/bin/gcc --with-gmp-libraries=/usr/local/lib --with-gmp-include=/usr/local/include --with-readline-libraries=/usr/local/bin --with-readline-include=/usr/local/include
gmake -j4 && pfexec gmake install

Yay! Working GHC. Sadly, if you want to reclaim the 350MB or so the GHC 6.8 install is taking up, you have to go remove it yourself (apparently the GHC team does not believe in `make uninstall`). This means we can install cabal, which requires nothing special, other than you grabbing the tarball and installing it as normal.

Next, `cabal install xmonad xmonad-contrib` I said we were going to install xmobar, and we are, but it’s a little tricker. You see, even though Xmobar mostly reads a pipe to give us a title and workspace listing, the plugins are not optional. They also depend on libnotify, which is only present on Linux. Good job, xmobar developer! Fortunately, this is easily corrected, and xmobar (mostly) works. Caveats explained later.

You can’t just `cabal install` xmobar. No-go since hinotify will not install, and there’s not a clear explanation as to why from the output. As noted, it doesn’t really depend on it, just that the developer can’t be bothered to use Haskell’s typing system to throw messages at you when you try to use features that are not implemented. So… edit ~/.cabal/packages/hackage.haskell.org/xmobar/$version/xmobar-$version/xmonad.cabal

Take out the lines referring to hinotify. Then `cabal build && cabal install` from the directory xmonad.cabal was in. Ooh and aah, but don’t try to use, well… anything. BatteryReader, CpuReader, MemReader, Net, Swap, all broken. Thankfully, we have Dtrace and Python to replace it with, since xmobar’s PipeReader still works.

Memory usage?

#pragma D option quiet
#pragma D option bufsize=16k
 
dtrace:::BEGIN
{
}
 
profile:::tick-1sec
{
	/* RAM stats */
	this->ram_total = `physinstalled;
	this->unusable  = `physinstalled - `physmem;
	this->locked    = `pages_locked;
	this->ram_used  = `availrmem - `freemem;
	this->freemem   = `freemem;
	this->kernel    = `physmem - `pages_locked - `availrmem;
 
	this->ram_total	*= `_pagesize;  this->ram_total	/= 1048576;
	this->unusable	*= `_pagesize;  this->unusable	/= 1048576;
	this->kernel	*= `_pagesize;  this->kernel	/= 1048576;
	this->locked	*= `_pagesize;  this->locked	/= 1048576;
	this->ram_used	*= `_pagesize;  this->ram_used	/= 1048576;
	this->freemem	*= `_pagesize;  this->freemem	/= 1048576;
	printf("RAM: %2d%%\n", ((this->ram_total - this->freemem) * 100 / this->ram_total));
}

Network speeds?

#!/usr/sbin/dtrace -s
#pragma D option quiet
dtrace:::BEGIN
{
	TCP_out = 0; TCP_in = 0;
}
 
 
mib:::tcpOutDataBytes		{ TCP_out += arg0;   }
mib:::tcpInDataInorderBytes	{ TCP_in += arg0;    }
 
profile:::tick-1sec
{
	OUT_print = TCP_out/1024; IN_print = TCP_in/1024;
	printf("Out:%3d|In:%3d", OUT_print, IN_print);
	TCP_out = 0;
	TCP_in = 0;
 
}

.xmobarrc

Config { font = "-*-terminus-*-*-*-*-12-*-*-*-*-*-*-u"
       , bgColor = "#000000"
       , fgColor = "#AFAF87"
       , position = Top 
       , lowerOnStart = True
       , commands = [ Run Date "%a %b %_d %Y %H:%M:%S" "date" 10 
                    , Run Weather "KSTP" ["-t","<tempF>F","-L","64","-H","77","--normal","green","--high","red","--low","lightblue"] 36000
		    , Run PipeReader "/export/home/ryan/dtrace/net" "wireless"
		    , Run PipeReader "/export/home/ryan/dtrace/netspeed" "speed"
		    , Run PipeReader "/export/home/ryan/dtrace/psr" "cpui"
		    , Run PipeReader "/export/home/ryan/dtrace/ram" "mem"
                    , Run StdinReader
                    ]
       , sepChar = "%"
       , alignSep = "}{"
       , template = " %StdinReader% } { %cpui% | %mem% | %wireless% %speed% | %date% | %KSTP%"
       }

A script to feed those pipes. If you don’t have python2.6, pexpect on python2.4 (the Solaris/OpenSolaris default) works. Just install pexpect with easy_install, an .egg, or whatever your poison may be.

#!/usr/bin/python2.6
import math
import os
import platform
import re
import stat
import sys
import time
 
import pexpect
 
#Get the directory we're running from to create the fifos rather than the $pwd of whatever called us
path = os.path.dirname( os.path.realpath(__file__)) + "/"
 
wificonfig = ""
 
#Y'know, I haven't actually written the iwconfig thing.  It's here for posterity and possible later use.
osystem = platform.system()
if osystem == 'SunOS':
  wificonfig = 'wificonfig'
elif osystem == 'Linux':
  wificonfig = 'iwconfig'
 
 
#Kill off any old instances which may be running.  Poor man's pkill, but guaranteed to work pretty much anywhere.
pexpect.run('bash -c "ps -ef |grep mpstat |grep -v python| awk \'{print $2}\' | xargs kill -9"')
pexpect.run('bash -c "ps -ef | grep speed.d | awk \'{print $2}\' | xargs kill -9')
pexpect.run('bash -c "ps -ef | grep meminfo.d | awk \'{print $2}\' | xargs kill -9')
 
def checkfifo(path):
  #If it ain't there, make it
  if not os.path.exists(path):
    os.mkfifo(path)
    handle = open(path, "r+")
    return handle
  #If it is, just return it
  elif stat.S_ISFIFO(os.stat(path).st_mode):
    handle = open(path, "r+")
    return handle
  else:
    if os.path.isfile(path):
      #Not a FIFO, and it needs to be
      os.unlink(path)
    os.mkfifo(path)
    handle = open(path, "r+")
    return handle
 
#Set up our fifos
psrfifo = checkfifo(path + "psr")
netfifo = checkfifo(path + "net")
nspdfifo = checkfifo(path + "netspeed")
ramfifo = checkfifo(path + "ram")
 
#Fire off the processes we'll be reading from.  Using pexpect seems like overkill, but mpstat is apparently smart enough to tell when it's being read from a pipe, and it'll buffer no matter what you do.  pexpect/expect fake being interactive, so it happily runs without buffering.
mpstat = pexpect.spawn('bash -c "mpstat 1 | grep -v CPU"')
ramstats = pexpect.spawn(path + "meminfo.d")
nspeed = pexpect.spawn(path + "speed.d")
 
#Regular expressions to use later.  Since it's a long-runnign script, they may as well be compiled
mpre = re.compile(r'^\s+?(?P<cpu>\d+).*?(?P<idle>\d+)$')
solwifire = re.compile(r'.*?linkstatus: (?P<status>\w+).*essid: (?P<essid>\w+).*strength: \w+\((?P<strength>\d+)\).*', re.DOTALL)
 
def prstat():
  #Yay for awk/sed abuse, but it's concise and I'm already forking.  Basically getting a list of CPUs to check later, so this script should perform its duty no matter if you have 1 CPU or 128 (T2 users)
  psrinfo = pexpect.run('bash -c "psrinfo -v |grep MHz | awk \'{print $6,$7}\' | sed -e \'s/,//\'"').rstrip().split('\r\n')
  output = ""
  for cpu in psrinfo:
    line = prmatch(cpu)
    output = output + line
  psrfifo.write(output + "\n")
  psrfifo.flush()
 
def prmatch(cpu):
  line = mpstat.readline().rstrip()
  m = mpre.match(line)
  #Match it against our earlier regex and subtract the idle value from 100 to get the actual used percentage, which isn't wholly accurate (IOWAIT and whatnot), but it's good enough for me
  usage = 100 - int(m.group('idle'))
  #Padding the string seems stupid, and it is, but xmobar arbitrarily decides spots that it's not going to refresh even if text shows up there, leaving it (black in my case) when text slides.  Padding fixes that.  Also, if your CPU goes to 100%, you probably shouldn't have a script which reads dtrace probes running.  Just sayin'.
  return "Cpu%s: %2d%% (%s) " % (m.group('cpu'), usage, cpu)
 
def memory():
  #I haven't found any swap information from dtrace probes as easy to manipulate as thi sis
  swap = pexpect.run('bash -c "/usr/sbin/swap -l |tail -n 1 | awk \'{print $4, $5}\'"').rstrip().split(' ')
  #Ugly?  You bet.  Cast the subtract free swap from total swap blocks, divide it by free swap blocks * 100 cast to an int to give us an actual percentage, floor that, then cast THAT to an int
  usedswap = int(math.floor(((int(swap[1])-int(swap[0]))/int(swap[1])*100)))
  ram = ramstats.readline().rstrip()
  output =  "%s Swap: %2d%%" % (ram, usedswap)
  ramfifo.write(output + "\n")
  ramfifo.flush()
 
def network():
  #Filter out interface which aren't up, which are vnics, which only point to localhost to get the running interface.  I'm assuming you only have one at a time, but if you have more modify this to suit.
  iface = pexpect.run('bash -c "ifconfig -a | grep UP |grep RUNNING| grep -v IPv6 |grep -v lo | grep -v -E \':[0-9]: \' | awk \'{print $1}\' | sed -e \'s/://\'i"').rstrip()
  if osystem == 'SunOS':
    command = "wificonfig -i " + iface + " showstatus"
    status = pexpect.run(command).rstrip()
    output = ""
    if solwifire.match(status):
      #Beauty of regexes.  If it doesn't match, it's not wireless (or not connected).  It if is, give us values.
      m = solwifire.match(status)
      strength = math.floor((int(m.group('strength')) / 15.) * 100)
      output =  "%s: %s(%s) %3d%%" % (iface, m.group('status'), m.group('essid'), strength)
    else:
      #Probably not wireless
      output = iface + ":"
    netfifo.write(output + "\n") 
    netfifo.flush()
 
def netspeed():
  #Is this method really necessary?  Couldn't the dtrace probe just write to the fifo itself?  Probably, but if you (or I) want to colorize it at some point, it may as well get sucked in here.
  speed = nspeed.readline()
  nspdfifo.write(speed + "\n")
  nspdfifo.flush()
 
while 1:
 
  prstat()
  memory()
  network()
  netspeed()
  time.sleep(1)

It’s not the prettiest python. I should probably move those repetitive fifo flushes/etc to a method, but I didn’t honestly expect that I’d need to replace this much XMobar functionality. Notably, XMobar can colorize things with <span> attributes setting colors in case somebody wanted to pretty up the usages (really, to make it look more like XMobar’s [colorized] defaults for CPU/net usage). I don’t care, personally. I didn’t implement a battery monitor either, but hey, you can if you want to.

Complaint #5 (did I lose track?): SUNWmutt doesn’t have support for header caching, which is a real bitch when I have 12,000 emails. It also doesn’t support SMTP over SSL, making it pretty well worthless for Gmail. I have other accounts I use mutt for, but Gmail’s an important one.

This isn’t that tough, really. You need some kind of a database for the mutt config script, and gdbm is trivially easy to get running (normal ./configure && make && make install). On the other hand, we run into two hiccups. The configure file depends on ncurses, which is just a link to plain ol’ curses on lots of Solaris boxes. Secondly, (and I don’t really begrudge the Mutt guys for this, since there’s actually a commit to fix this, unlike GCC and GHC, whose response is “too fucking bad” [GHC actually posts the recommendation for fixing rts.pkg and the configure flags on their own site rather than FIXING THE BUILD]), configure.ac does some things wrong with libidn.

Find $with_idn, and replace the block which follows it with this (–with-idn doesn’t seem to build properly).:

if test "$with_idn" != "no" ; then
  if test "$with_idn" != "yes" ; then
    AC_CHECK_HEADERS([idn/idn-int.h],
      [AC_CHECK_HEADERS([idn/idna.h], [],
        [CPPFLAGS="$CPPFLAGS -I/usr/include/idn"])])
  fi
fi

If you don’t have or want ncurses (or it’s a symlink on your system), fix configure. `sed -i -e ’s/-lncurses/-lcurses/’ configure`.

./configure –with-regex –with-gnutls –enable-hcache –enable-smtp –enable-imap –enable-pop –enable-mailtool –with-sasl –with-idn=/usr/include/idn

Congratulations!

Next up, re-implementing htop for Solaris with dtrace probes, python, and ncurses.

Palm Desktop, I stab at thee!

Firstly, I’m starting P90X tomorrow. Should be interesting. Secondly, I miss you guys :/ I’m living with somebody who asked me what the Dead Sea Scrolls are this morning, since it was on the news that they’re coming to the Science Museum.

By the way, ever planning on touching your blogs again (Sewpbox and Rattributes not included)?

So I’m migrating Heather’s Palm Desktop crap to Google Calendar (I have no idea why no tool exists to do this). Google Calendar doesn’t really like the CSV I massaged out of it (only importing about half the records), and I’m starting to see why. Half the records are fucking duplicates in every way but one. I wrote a Python script to do it for me anyway.

The long and short of it amounts to this:
If you want the easy way, export the Palm data to a .mda, import it into Yahoo Calendar, then into Google Calendar from there. Otherwise, export it to a CSV, and hit it with this script:

#!/usr/bin/ruby
#
require 'csv'
 
input = "export.csv"
output = "gcal.csv"
 
csvfile = File.open(input) {|f| f.read}
 
puts "Parsing..."
 
csv = CSV::parse(csvfile)
 
fields = csv.shift
 
puts "Writing..."
File.open(output, "w") do |f|
   f.print "Subject, Start Date, Start Time, End Date, End Time\n"
   csv.each do |line|
     startdate, starttime = Time.at(line[6].to_i).strftime("%m/%d/%Y,%I:%M:%S %p").split(',')
     enddate, endtime = Time.at(line[7].to_i).strftime("%m/%d/%Y,%I:%M:%S %p").split(',')
     f.print "\"#{line[11]}\",#{startdate},#{starttime},#{enddate},#{endtime}\n"
   end
end
 
puts "Done."

If you don’t feel like exporting, and are running on Windows:

#!/usr/bin/ruby
#
#
require 'win32ole'
require 'dbi'
 
class Access
   attr_accessor :mdb, :conn, :data, :fields
 
   def initialize(mdb=nil)
       @mdb = mdb
       @conn = nil
       @data = nil
       @fields = nil
   end
 
   def open
       connstring = "Provider=Microsoft.Jet.OLEDB.4.0;Data Source=#{@mdb}"
       @conn = WIN32OLE.new('ADODB.Connection')
       @conn.Open(connstring)
   end
 
   def query(sql)
       set = WIN32OLE.new('ADODB.Recordset')
       set.Open(sql, @conn)
       @fields = []
       set.Fields.each do |field|
           @fields << field.Name
       end
       @data = set.GetRows.transpose
       set.Close
   end
 
   def close
       @conn.Close
   end
end
 
output = "gcal.csv"
 
rows = Array.new
 
db = Access.new('c:\path\to\mdb')
db.open
 
db.query("SELECT * FROM Main;")
names = db.fields
rows = db.data
 
#Alternatively
DBI.connect("DBI:ODBC:driver=Microsoft Access Driver (*.mdb);"+"dbq=c:/path/to/mdb") do |dbh|
   dbh.select_all('select * from Main') {|row| rows << row}
end
 
puts "Writing..."
File.open(output, "w") do |f|
   f.print "Subject, Start Date, Start Time, End Date, End Time\n"
   rows.each do |line|
     startdate, starttime = Time.at(line[6].to_i).strftime("%m/%d/%Y,%I:%M:%S %p").split(',')
     enddate, endtime = Time.at(line[7].to_i).strftime("%m/%d/%Y,%I:%M:%S %p").split(',')
     f.print "\"#{line[11]}\",#{startdate},#{starttime},#{enddate},#{endtime}\n"
   end
end
 
puts "Done."

If you want the details…

Essentially, Palm’s Datebook dumps everything into an Access database. No keys or relations (granted, only 3 tables, but still), and no idea what most of the columns do. Tools for working with Jet on Linux are minimal, and I didn’t feel like going through win32ole just to get to Jet, plus this sort of thing is nicer to do in downtime at work. So, I exported it via ODBC to a Postgres database on my Solaris box. Not pretty.

access=# \d main
                 TABLE "public.main"
     COLUMN     |          Type          | Modifiers 
----------------+------------------------+-----------
 record_id      | bigint                 | NOT NULL
 STATUS         | integer                | 
 placement      | bigint                 | 
 private        | smallint               | 
 category       | character varying(20)  | 
 start_time     | bigint                 | 
 end_time       | bigint                 | 
 untimed        | smallint               | 
 time_zone      | character varying(40)  | 
 location       | character varying(255) | 
 summary        | text                   | 
 alarm_advance  | character varying(10)  | 
 alarm_unit     | character varying(10)  | 
 repeated_event | character varying(255) | 
 alarm          | smallint               | 
 note           | character varying(100) | 
access=#

Ok, so record_id seems to be some sort of key, and Heather doesn’t bother with notes or alarms, so this doesn’t seem like it’d be so bad. To figure why Google is only taking some of the records, though:

access=$ SELECT count(*) FROM main;
 count 
-------
  5094
(1 row)
access=$ SELECT count(DISTINCT record_id) FROM main;
 count 
-------
  5074
(1 row)
access=$ SELECT count(DISTINCT start_time) FROM main;
 count 
-------
  2488
(1 row)
access=$ SELECT count(DISTINCT end_time) FROM main;
 count 
-------
  2490
(1 row)
access=$ SELECT count(DISTINCT summary) FROM main;
 count 
-------
  2264
(1 row)
access=$ SELECT record_id, start_time, end_time, summary 
FROM main 
WHERE record_id IN 
    (SELECT record_id 
     FROM main 
     GROUP BY record_id 
     HAVING count(*)>1);
 record_id | start_time |  end_time  |                                 summary                                 
-----------+------------+------------+-------------------------------------------------------------------------
         0 | 1231437600 | 1231441200 | tammy 
         0 | 1231869600 | 1231873200 | nb chanber lunch
         0 | 1229642100 | 1229645700 | tammy AND joe photos st claire broiler
         0 | 1231959600 | 1231963200 | dr hunt
         0 | 1230505200 | 1230508800 | tilsen photos
         0 | 1230568200 | 1230571800 | meet gary at studio
         0 | 1230571800 | 1230584400 | bri AND kids
         0 | 1230744600 | 1230748200 | tilsen, AND sandy ORDER y membership mail
         0 | 1230681600 | 1230681600 | Dan, missy AND the kids.
         0 | 1231610400 | 1231614000 |  james j hill houseOngoing Daily 11/15/08 - 2/22/09  m-sat 10-4 sun 1-4
         0 | 1230663600 | 1230667200 | tammys house glasses shopping
         0 | 1229727600 | 1229731200 | ryan help at studio
         0 | 1231889400 | 1231893000 | 
         0 | 1231889400 | 1231903800 | EMS 
         0 | 1237161600 | 1237161600 | spring break
         0 | 1229983200 | 1229986800 | msp WITH the girls
         0 | 1241049600 | 1241049600 | DISH
         0 | 1232233200 | 1232244000 | jordan senior photos excel AND studio 
         0 | 1230055200 | 1230058800 | paige studio
         0 | 1230314400 | 1230318000 | amanda tg
         0 | 1229968800 | 1229972400 | sara AND nolan 
(21 rows)
 
access=$ SELECT record_id, start_time, end_time, summary 
FROM main 
ORDER BY start_time 
ASC LIMIT 10;
 record_id | start_time | end_time | summary 
-----------+------------+----------+---------
   7128069 |   31449600 | 31449600 | c
   7128068 |   31449600 | 31449600 | a
   7123605 |   31449600 | 31449600 | a
   7128070 |   31449600 | 31449600 | 3
   7124866 |   31449600 | 31449600 | c
   7124107 |   31449600 | 31449600 | 3
   7124145 |   31449600 | 31449600 | o
   7124141 |   31449600 | 31449600 | ;
   7128072 |   31449600 | 31449600 | ;
   7128071 |   31449600 | 31449600 | o
(10 rows)
access=$ SELECT record_id, start_time, end_time, summary FROM main ORDER BY start_time DESC LIMIT 10;
 record_id | start_time |  end_time  |           summary           
-----------+------------+------------+-----------------------------
   7127485 | 1256774400 | 1256774400 | lawerance wedding
   7125815 | 1256774400 | 1256774400 | lawerance wedding
   7128114 | 1244167200 | 1244170800 | NB senior ALL night party
   7125941 | 1242489600 | 1242493200 | nyquist edding
   7125827 | 1242489600 | 1242493200 | nyquist edding
         0 | 1241049600 | 1241049600 | DISH
   7128073 | 1238079600 | 1238083200 | books IN the woods
   7125623 | 1238079600 | 1238083200 | books IN the woods
   7125697 | 1238025600 | 1238025600 | gunflint books IN the woods
   7126175 | 1238025600 | 1238025600 | gunflint books IN the woods
(10 rows)
 
access=$

Oh, yeah! What I’ve gathered:

  • There are duplicate record_ids (which I’d hoped would have been unique).
  • There are events set to start and end at duplicate times
  • Palm, at some point, duplicated a lot of the other records, except for the record_id.
  • Times are stored in epoch seconds (oddly, Unix epoch seconds, not Windows)
  • Some of the times correlate to 1970? WTF

A working solution:

access=$ SELECT DISTINCT a.start_time, a.end_time, a.summary 
INTO holdkey 
FROM main a
WHERE EXISTS 
    ( SELECT 'x' FROM main b WHERE a.start_time = b.start_time
      AND a.end_time = b.end_time
      AND a.summary = b.summary) 
ORDER BY a.start_time DESC;
SELECT
access=$ SELECT count(*) FROM holdkey;
 count 
-------
  2597
(1 row)
access=$ DELETE FROM main 
USING holdkey 
WHERE main.start_time = holdkey.start_time 
    AND main.end_time = holdkey.end_time 
    AND main.summary = holdkey.summary;
DELETE 5085
 
access=$ SELECT record_id, start_time, end_time, summary FROM main;
 record_id | start_time |  end_time  | summary 
-----------+------------+------------+---------
   5280360 |   31536000 |   31536000 | 
   5280298 |   31536000 |   31536000 | 
   5280429 |   31536000 |   31536000 | 
   7125497 | 1193437800 | 1193437800 | 
   7128378 |   31536000 |   31536000 | 
   7128376 |   31536000 |   31536000 | 
   7128374 |   31536000 |   31536000 | 
   7127620 | 1193437800 | 1193437800 | 
         0 | 1231889400 | 1231893000 | 
(9 rows)
access=$ DROP TABLE main;
DROP TABLE
access=$ SELECT * INTO main FROM holdkey;
SELECT

That works. Of course there’s the quick and dirty way which doesn’t involve munging about with temp tables:

access=$ DELETE FROM main t1
USING main 
WHERE EXISTS 
    (SELECT * FROM main t2 
         WHERE t1.start_time = t2.start_time 
         AND t1.end_time = t2.end_time 
         AND t1.summary = t2.summary 
         AND t1.record_id < t2.record_id);
DELETE 2488
 
access=$ SELECT count(*) FROM test;
 count 
-------
  2606
(1 row)

It gives a slightly different result, but operates under the assumption that Palm’s record_id means something (it may not, for all I know). On the upside, it preserves all the columns in case they’re useful for something (doubtful). I could order by start_time and select into another table, add an index, and do the same thing, but it’s easier the quick and dirty way. There’s probably a trivial way to do this with joins, but I couldn’t think of one, and it leaves 9 records with a record_id of 0..

Here’s the code which it turns out I didn’t need, but it might be useful to somebody:

#Rips data from Palm Desktop.  Uploads it to Google Calendar
#Written with Python 2.5 (though imports should work anyway)
#
#Currently, the Access MDB Palm Datebook uses has been exported to a 
#PostgreSQL server via ODBC, so I'll be connecting to that
#
#There's code in here for getting through Access also, but I haven't tested it.
#Use at your own risk (kinda like Access).
#
#This is mostly due to the Postgres ODBC driver, and the fact that I didn't
#want to bother with quoting all the queries for Postgres to allow spaces
 
try:
    from xml.etree import ElementTree #Python 2.5, probably 2.6/3.0 also
except ImportError:
    from elementtree import ElementTree #Python <2.4
import gdata.calendar.service
import gdata.service
import atom.service
import gdata.calendar
import atom
import getopt
import sys
import string
import time
import psycopg2 #Talk to Postgres
 
class Struct:
    def __init__(self, *args, **kwargs):
        for k,v in kwargs.items():
            setattr(self, k, v)
 
class GCalMigrate:
    def __init__(self):
        self.conn = None
        self.cur = None
        self.calendar = None
        self.records = []
 
    def connect(self):
       try:
           self.conn = psycopg2.connect("dbname='whatever' user='yournamehere' host='server'")
       except:
           print "Can't connect to the database!\n"
           sys.exit()
       self.cur = conn.cursor()
       query()
 
    def accessconnect(self,mdbpath):
        import odbc
        self.conn = odbc.odbc("driver=Microsoft Access Driver (*.mdb);DBQ=%s") % mdbpath
        self.cur = conn.cursor()
        queryaccess()
 
    def queryaccess(self):
        rows = []
        self.cur.execute("SELECT Main.[Start Time], Main.[End Time], Main.[Summary] FROM Main")
        rows = cur.fetchall()
        conn.close()
        parserows(rows)
 
    def query(self):
       rows = []
       try:
           self.cur.execute("SELECT start_time, end_time, summary FROM main")
           rows = cur.fetchall()
       except:
           print "Couldn't query the database.\n"
       conn.close()
       parserows(rows)
 
    def parserows(self, rows):
        for row in rows:
            starttime = time.strftime("%Y-%m-%dT%H:%M:%S.000Z", time.gmtime(row[0]))
            endtime = time.strftime("%Y-%m-%dT%H:%M:%S.000Z", time.gmtime(row[1]))
            title = row[2]
            record = Struct(start_time=starttime, end_time=enddtime, title=title)
            self.records.append(record)
        login()
 
    def login(self, username, password):
        self.calendar = gdata.calendar.service.CalendarService()
        self.calendar.email = username
        self.calendar.password = password
        self.calendar.source = "Palm_Desktop_Migrator"
        self.calendar.ProgrammaticLogin()
        batchsubmit()
 
 
    def batchsubmit(self):
        feed = gdata.calendar.CalendarEventFeed()
 
        for record in records:
            insertme = gdata.calendar.CalendarEventEntry()
            insertme.title = atom.Title(record.title)
            insertme.content = atom.Content("")
            insertme.when.append(gdata.calendar.When(start_time=record.start_time, end_time=record.end_time))
            insertme.batch_id = gdata.BatchId(text='Palm_Migration')
 
            feed.Add_Insert(entry=insertme)
        response = self.calendar.ExecuteBatch(feed, gdata.calendar.service.DEFAULT_BATCH_URL)
        return response
 
if __name__ == "__main__":
    runner = GCalMigrate()
    responses = runner.connect()
    for entry in responses.entry:
        print "Batch ID: %s" % entry.batch_id.text
        print "Status: %s" % entry.batch_status.code
        print "Reason: %s" % entry.batch_status.reason

Real World Regexes

Dan mentioned that he wasn’t that knowledgeable about regular expressions (a topic I am intimately familiar with), so I figured I’d put up some examples from code I’ve actually written, along with the text they’re actually supposed to match.

To begin with, here are the general rules for regexes. To begin with, “operator” refers to any of these (so \s+, [A-Z], (Word), etc). Greedy means it’ll continue matching as far as possible, and if the operator/character you want to match occurs more than once in the string, it’ll eat the first one and only stop matching at the last one.

. Match any character
\w Match “word” character (alphanumeric plus “_”)
\W Match non-word character
\s Match whitespace character
\S Match non-whitespace character
\d Match digit character
\D Match non-digit character
\t Match tab
\n Match newline
\r Match return
\f Match formfeed
\a Match alarm (bell, beep, etc)
\e Match escape
^ Beginning of the line
$ End of the line
+ matches the preceding operator one or more times (greedy)
* matches the preceding operator zero or more times (greedy)
? matches the preceding operator once if it exists, but it doesn’t have to be there. Mostly used to stop greedy operators (*? or +?, for instance) at the match you want.
() is used for grouping (either to use later as a backreference or to exclude)
(?<name>) (or (?P<name>) in Python and maybe others) is used for a named backreference. There’ll be some examples of that.
| is used as a logical or
{n} is used to match the preceding character n times
{n, m} matches n to m times
{n,} matches 1 or more times (may as well use +)
[A-Za-z] is used to match whatever is in the middle, but it only counts as one character (so [A-Za-z] would match any of those characters ONCE. Useful if you want [a-f] or [0-5]+ or something).
[^] is used to exclude things. [^word] excludes “w”, but the caret only matches ONCE (this can be chained as [^(word)], since groups count as a single operator.

Sound confusing? It is, which is why I’ll put up real examples. FYI, these are PCRE (Perl Compatible Regular Expressions) rather than SCRE (Sed Compatible Regular Expressions), but Dan’ll almost certainly never use sed compatible (which doesn’t have a ? operator, among other things).

Using a backreference later depends on the language. .NET uses ${n} where n is the reference number (note that they start from 1, as the entire string you matched is ${0}), Perl (and a lot of others) us $n, Ruby uses \1 (as does Python, but Python {like .NET} needs an operator in front to use a raw string {.NET is @, Python is r}, otherwise it’s \\1). Language reference is your best bet here.

First example.

(Oct6 0423z) Dec4100: C, was acknowledged by, ek
string regexPattern = @".*?\)\s
                      (?<system>\S+?)
                      :\s
                      (?<tape>\w)
                      .*,\s
                      (?<initials>.*)";
Regex re = new Regex(regexPattern, RegexOptions.ExplicitCapture);

It eats everything up until the right parenthesis (escaped so the regex parser doesn’t try to interpret it) followed by a space, then it gets all non-whitespace characters until the colon as the system name. Ignores the colon and a space, then grabs all word characters ([A-Z0-9_]) as the tape number. Ignores zero or more matches of any character (the “.”) until it finds a comma followed by a space, then yanks the rest of the line as the initials.

C is the tape name.

ek are the initials.

This means Dec4100 is available as ${system} (if doing Regex.Replace) or m.Groups["system"] if you matched the regex with m = Regex.Match(logfilestring, re);

Another example:

	<form action="http://www.climate.weatheroffice.ec.gc.ca/climateData/Interform.cfm" method="post" name="stnRequest1">
		<input type="Hidden" name="hlyRange" value="N/A">
		<input type="Hidden" name="dlyRange" value="1998-4-1|2007-11-30">
		<input type="Hidden" name="mlyRange" value="1998-4-1|2007-11-1">
		<input type="Hidden" name="StationID" value="10700">
		<input type="Hidden" name="prov" value="CA">
		<input type="Hidden" name="urlExtension" value="_e.html">
	<tr id="dataTableOddRow">
		<td id="dataTableRowHeader">(AE) BOW SUMMIT</td>
		<td id="dataTableRowHeader"><abbr title="ALBERTA">ALTA</abbr></td>
		<td>
			<select name="timeframe" size="1" class="formElement75w" onChange="elementChange(document.stnRequest1,1)">
	<option value="2">Daily</option><option value="3">Monthly</option><option value="4">Almanac</option>
			</select>
		</td>
	<td>
	<select name="day" size="1" class="formElement" disabled><option value="1" >1</option><option value="2" >2</option><option value="3" >3</option><option value="4" >4</option><option value="5" >5</option><option value="6" >6</option><option value="7" >7</option><option value="8" >8</option><option value="9" >9</option><option value="10" >10</option><option value="11" >11</option><option value="12" >12</option><option value="13" >13</option><option value="14" >14</option><option value="15" >15</option><option value="16" >16</option><option value="17" >17</option><option value="18" >18</option><option value="19" >19</option><option value="20" >20</option><option value="21" >21</option><option value="22" >22</option><option value="23" >23</option><option value="24" >24</option><option value="25" >25</option><option value="26" >26</option><option value="27" >27</option><option value="28" >28</option><option value="29" >29</option><option value="30" Selected>30</option><option value="31" >31</option>
		</select>
	</td>
	<td>
	<select name="month" size="1" class="formElement" onChange="elementChange(document.stnRequest1,1)" ><option value="1" >Jan</option><option value="2" >Feb</option><option value="3" >Mar</option><option value="4" >Apr</option><option value="5" >May</option><option value="6" >Jun</option><option value="7" >Jul</option><option value="8" >Aug</option><option value="9" >Sep</option><option value="10" >Oct</option><option value="11" Selected>Nov</option><option value="12" >Dec</option>
		</select>
	</td>
	<td>
	<select name="year" size="1" class="formElement" onChange="elementChange(document.stnRequest1,1)"><option value="1998" >1998</option><option value="1999" >1999</option><option value="2000" >2000</option><option value="2001" >2001</option><option value="2002" >2002</option><option value="2003" >2003</option><option value="2004" >2004</option><option value="2005" >2005</option><option value="2006" >2006</option><option value="2007" Selected>2007</option>
	</select>
	</td>
	<td>
	<input type="submit" name="stnSubmit" value="Go" class="formElement">
</td>
</form>

And the parser:

if ($chunk =~ /.*StationID.*?"(\d+)".*?prov.*?"(\w+).*?TableRowHeader">(.*?)<.*abbr title.*?>(\w+).*?/s) {
     my $stationid = $1;
     my $province = $2;
     my $name = $3;
     my $abbrprov = $4;
}

This is a multi-line regex (hence the //s, like //g is global, //i is case insensitive, //gi is both g and i, etc), and a good example of non-greedy matching. It snags everything up until StationId, then the next quotation mark followed by numbers, and captures those numbers. It comes out as “10700″.

Does the same thing following “prov” up until the next word characters in quotation marks, and captures those. As .* rather than .*?, it would have grabbed “data”, which precedes TableRowHeader (inside the same parenthesis). Comes out as “CA”.

Grabs everything from TableRowHeader”> until the next < Comes out as “(AE) Bow Summit”.

Drops everything up until the next < after “abbr title”, then captures all word characters. “ALBA”

These are all assigned to variables via backreferences. $1, $2, $3, $4 are the groups in order. It’s worth noting that (at least in .NET), named backreferences are assigned numbers BEFORE regular backreferences. So (?<a>a)(b)(?<c>c)(d) would be acbd as ${0}${1}${2}${3}.

Another example:

04:26:23 [2] Error creating WLAAAP06.FS8 = 1 : Unrecognized KGFXENG Error Code

And the parser:

re.match(line, r'^(?P<time>.*?)\s+\[(?P<engine>\d+)\]\s+(?P<error>.*?(KGFXENG|LeadTools).*)'

Grabs everything from the beginning of the line until the first space as “time”. Comes out as “04:26:23″.

Then skips whitespace and a bracket (escaped with \[) and grabs one or more numbers (\d+) as "engine". Comes out as "2", of course. Skips a space, then captures anything which contains "KGFXENG" or "LeadTools" as "error". Basically, the rest of the line.

This line, for instance, wouldn't match, and nothing in the regex would be captured:

00:15:18 [1] Error producing WPATAZ00.FSD = F088 : Error while saving the graphic

These are used later with this:

message = "ERROR: %s %s: %s" % (re.sub(r'.*?([A-Za-z]+Engine[A-Za-z]*?)(Errors)?.*', r'\1', 
                         logfilename), 
                         engine, 
                         match.group('error'))

"logfilename" is something like "2008_Oct_07__ProductEngineErrors.log". This grabs everything up until A through Z (uppercase or lowercase) one or more times followed by Engine, optionally followed by something else (*, though ? would have worked if I said r'Engine([A-Za-z]+)?'). It stops on Errors, if it exists (the question mark afterwards), and replaces the entire name with the first backreference ("ProductEngine" in this case).

Last example is a nested bitch of increasingly complicated rules:

#Match plain ol' timezones
if ($brpos =~ /^\[(\w+)\](.*)/)
{
	$DateZone = $1;
	$newname = $2;
}
#Match timezones with a day modification, and grab that along with the +/-
elsif ($brpos =~ /^\[(\w+)(\S\d+)\](.*)/)
{
	$DateZone = $1;
	$TempDay2 = ONE_DAY * $2;
	$newname = $3;
}
#Check for a delete flag
elsif ($brpos =~ /^(\d)\[.*/)
{
	$DeleteFilesStatus = $1;
	#If the status is one, we want to capture everything after the timezone as the DeleteName
	if ($DeleteFilesStatus == 1)
	{
		if ($brpos =~ /^(\d)\[(\w+)\](.*)/)
		{
			$DeleteFilesStatus = $1;
			$DateZone = $2;
			$DeleteFilesNames = $3;
			$newname = $3;
		}
		elsif ($brpos =~ /^(\d)\[(\w+)(\S\d+)\](.*)/)
		{
			$DeleteFilesStatus = $1;
			$DateZone = $2;
			$TempDay2 = ONE_DAY * $3;
			$DeleteFilesNames = $4;
			$newname = $4;
		}
	}
	#Otherwise, the DeleteName is in more brackets
	elsif ($DeleteFilesStatus == 2)
	{
                #Grab it all, but without a time modification
		if ($brpos =~ /^(\d)\[(\w+)\]\[(.*\.\w+)\](.*)/)
		{
			$DeleteFilesStatus = $1;
			$DateZone = $2;
			$DeleteFilesNames = $3;
			$newname = $4;
		}
                #Grab it with a time modification
		elsif ($brpos =~ /^(\d)\[(\w+)(\S\d+)\]\[(.*\.\w+)\](.*)/)
		{
			$DeleteFilesStatus = $1;
			$DateZone = $2;
			$TempDay2 = ONE_DAY * $3;
			$DeleteFilesNames = $4;
			$newname = $5;
		}
	}
}

Examples of what I'm catching (hopefully in order). The stuff in brackets later is filled in for date/time stamps:

[EDT]DOV-F-[MM][dd][yy][hh].csv
[CST-1][MM][dd].act
1[PDT]Actual[yy][MM][dd][hh][mm].csv
1[EST-3]KLGA[yy][MM][dd].mtx
2[EDT][WBD*.txt]WBD[yy][MM][dd]05.txt
2[MST+2][WSM*.txt]WBD[yyyy][MM].txt

Sadly, I'm out of work for the night, but these matches aren't that complicated. Lots of escaping brackets, and use of the \S character to match "-" or "+", then grabbing the rest of them. I may write more tomorrow...

Ruby vs. Perl vs. Python

Sigh. I didn’t check my fileserver before I left home. It booted into a non-xVM kernel, so the code in my CentOS virtual machine isn’t accessible. Not that it’s a big deal, it just means that I’ll only be posting the code I’ve got here at work for a routine to check file modification times via SFTP. The reason d’être seems to be making sure customers aren’t trying to grab files whilst they’re being updated, but I’m not sure on that. They wanted the code, I wrote the code, now it’s doing… something.

At this point, I’m attempting to write every script in Perl, Python, and Ruby simultaneously. It shouldn’t be a surprise that I’m most productive in Perl, but I’d still like to keep up skills in Python (and build them in Ruby). For one reason or another (the fact that HP-UX systems don’t come with Python, for one; Ruby performing about 10 times as slow as Perl/Python on the 1.8 interpreter, and the 1.9 interpreter not being production ready for another [and yes, JRuby is really fast {faster than Ruby 1.9}, but the JVM startup time is a killer for a script running out of cron, and not every system in my datacenter has a JVM installed]), the Perl version is almost always the one that goes onto a production box somewhere.

It’s also worth noting that writing Ruby/Python in a procedural manner, like it’s Perl with methods tacked onto the objects, doesn’t make any sense to me. Sure, it’s kinda fun to write it that way, and blocks in Ruby are really handy (even if they’re slightly more confusing to a non-Rubyist than pointers were the first time I used them), but it feels against the spirit of it somehow. I don’t know.

That being said, this code isn’t nearly what would go into production anyway. The Python really isn’t far off. Perl’s kinda slow when it comes to objects, though, and I’d never use Class::Struct in something which weren’t totally network or I/O bound (probably iterate through arrays or hashes instead). The Ruby? Well, ahh, it’d probably look a lot like this. I’m not exactly a Rubyist at this point, though, and I’m sure it could be cleaned up a lot more without playing Perl Golf with it (hopefully by integrating Rubyisms which don’t make it God-awful unreadable, like the code at the top of this post).

As a total aside, I loathe Coding Horror just a little bit more than I hate Joel on Software. We’re talking about two guys who probably spend as much time blogging as they do working, treat their readers like idiots, preach bad practices (Joel’s company wrote its own fucking programming language rather than a DSL, and he advocates against using Exceptions, among other things. Jeff Atwood [Coding Horror] has no idea what the phrase “use the right tool for the right job” means, and would rather advocate using .NET for everything).

FWIW, I realize now why I changed to this theme. WIDE textarea. Not all the Wordpress themes I’ve played with work nicely once I start fucking with the margins for it in CSS. I don’t give a damn about widget ready or what have you. All that stuff is easy enough to add by hand. I want a theme that’s not going to waste 1/3rd of the page on blankness.

WTB blogs from Dan.

So, here’s one script (the Craigslist parser will get posted tomorrow, I guess).

In Perl:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
#!/usr/bin/perl
 
###############
# SFTPCheck 
#
# 1.0
#
# Description:
# Checks update times for files on Omaha's server to troubleshoot scp problems
 
#Import what we need
use Class::Struct;
use Net::SFTP;
use Net::SFTP::Util qw{fx2txt};
use Net::SFTP::Attributes;
use strict;
use utf8;
 
#Set up  a struct we can dump data into
 
struct fileinfo => { filename => '$', 
 			  modtime => '$',
			  oldtime => '$',
			  waschanged => '$'};
 
#Set up logging
open(OUTFILE, ">> /data/testscript/ryan/sftpcheck.out");
print OUTFILE "-----------------------------------------------------------------------------------------\n";
my $now = localtime(time());
print OUTFILE "Starting at $now\n";
 
#Set up the connection parameters
my ($user, $pass, $host, $dir, $stamp, $diff) = @ARGV;
 
my @listers;
 
 
#Open it
my $sftp = undef;
$sftp = OpenSFTP($sftp, $user, $pass, $host);
 
#Set the directory path, since Net::SFTP->ls doesn't return a fully-qualified one
my $prefix = $dir . "/";
 
#Run through a loop 30 times, sleeping for one second inbetween
for(my $count = 0; $count <= 30; $count++) {
 
	#Do an ls
	@listers = &lookforfiles;
 
	foreach my $file (@listers) {
		#Set the fully qualified name so we can stat is
		my $remote = $prefix . $file;
		#Grab the modification time
		my $stat = $sftp->do_lstat($remote);
		#If it changed, set waschanged and put the new value in
		if ($file->oldtime != $stat->mtime) {
			$file->modtime($stat->mtime);
			$file->waschanged(1);
		}
		else {
			$file->modtime($stat->mtime);
		}
	}
	sleep(1);
}
 
foreach my $file (@listers) {
	if ($file->waschanged) {
		print OUTFILE $file->filename . " was changed during the check!\n";
	}
 
	#Compare it to now
	my $mtime = $file->modtime;
	my $difference = time() - $mtime;
	#If it's less than five minutes, pass it off in seconds
	if ($difference <= 300) {
		print OUTFILE $file->filename . " was updated $difference seconds ago\n";
	}
	elsif ($difference <=3600) {
		#Otherwise, minutes with two decimal places should be precise enough
		my $minutes = $difference / 60;
		printf(OUTFILE $file->filename . " was updated %.2f minutes ago\n", $minutes);
	}
	else {
		#If it's really that old, just print hours
		my $hours = $difference / 3600;
		printf(OUTFILE $file->filename . " was updated %.2f hours ago\n", $hours);
	}
}
 
sub OpenSFTP
{
	my ($sftp, $username, $pass, $host) = @_;
	my %args = ( 
		user => $username,
		password => $pass,
		debug => '1',
	);
	$username = utf8::encode($username);
	$pass = utf8::encode($pass);
	print "Trying to connect to $host as $username:$pass\n";
	$sftp = Net::SFTP->new($host, %args);
	my $status = fx2txt($sftp);
	if ($sftp) {
		return($sftp);
	}
}
 
sub lookforfiles
{
	#Figure out what the timestamp should be
	my @lookfor = &sftpstamp($stamp, $diff);
	#Check for yesterday's date too, why not.
	$lookfor[1] = $lookfor[0]--;
	my @found;
 
	#Do the LS, which doesn't support globbing, and pass it off to a regexp to find the files we need
	my @list = $sftp->ls("$dir", wanted => sub { $_[0]->{filename}});
	foreach my $day (@lookfor) {
		foreach my $file (@list) {
			my $name = $file->{filename};
			if ($name =~ /$day/) {
				my $there = 0;
				while (!$there) {
					foreach my $loop (@listers) {
						if($loop->filename($name)) {
							#If the filename is already in the array, break
							$there = 1;
						}
					}
					my $foundname = fileinfo->new();
					$foundname->filename($name);
					my $stat = $sftp->do_lstat($prefix . $name);
					$foundname->oldtime($stat->mtime);
					push(@listers, $foundname);
					$there = 1
				}
			}
		}
	}
	return(@found);
}
 
sub sftpstamp
{
    (my $stamp, my $diff) = @_;
	#Set the tzinfo so we know what the stamp should be
	my $timediff = 3600 * $diff; 
 
	#Check if we're in DST
	my $dststatus = 0;
	my $dstfile = "/data/.DST_Status";
	open(DST, $dstfile);
	$dststatus = (<DST>);
	chomp($dststatus);
	close(DST);
 
	#If so, modify the time accordingly
	if ($dststatus) {
		$timediff -= 3600;
	}
 
	#Get the time and format it
	my @StampNow = localtime(time() - $timediff);
	my $day = $StampNow[3];
	$day = "0$day" if $day < 10;
	my $month = $StampNow[4] + 1;
	$month = "0$month" if $month < 10;
 
	#Pass the appropriate value back
	$stamp = "$month$day";
	return $stamp;
}

In Python:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
import base64
import getpass
import os
import platform
import socket
import sys
import time
import traceback
 
import paramiko
 
class foundfile:
    def __init__(self, filename=None, modtime=None, oldtime=None, waschanged=None):
        self.filename = filename
        self.modtime = modtime
        self.oldtime = oldtime
        self.waschanged = waschanged
 
class sftpcheck:
 
    def __init__(self, host=None, username=None, password=None, dirname=None, stamp=None, diff=None):
 
 
 
        #Set up the initial variables for login
        self.host = host
        self.username = username 
        self.password = password
        self.dirname = dirname
        self.stamp = stamp
        self.diff = diff
        self.sftp = None
        self.found = []
 
        #Set up logging
        if platform.uname()[0] == "Windows":
            self.LogPath = "\\\\filer2\\data\\testscript\\ryan\\"
            self.DSTPath = "\\\\filer2\\data\\.DST_Status"
            self.DebugPath = "\\\\filer2\\data\\testscript\\ryan\\"
        else:
            self.LogPath = "/data/testscript/ryan/"
            self.DSTPath = "/data/.DST_Status"
            self.DebugPath = "/data/testscript/ryan/"
        self.DebugFilename = "sftpdebug.log"
        self.LogFilename = "sftpcheck.log"
        self.LogFileObj = None
 
    def login(self, host, username, password):
        #Connect to the server
        t = paramiko.Transport((host, 22))
        t.connect(username=username, password=password)
        self.sftp = paramiko.SFTPClient.from_transport(t)
 
        return (self.sftp != None)
 
    def listfiles(self):
        #Pick up how it's supposed to be formatted
        lookfor = self.formattime(self.diff)
 
        #Get a list of files in the directory
        ls = self.sftp.listdir(self.dirname)
 
        #Check if any match
        for fileinfo in ls:
            if re.search(lookfor, fileinfo):
                there = False
                while there != True:
                    for record in found:
                        if record.filename == fileinfo:
                            there = True
                    newone = foundfile(filename = fileinfo)
                    mtime = self.sftp.lstat(self.dirname + fileinfo)[8]
                    newone.oldtime = mtime
                    self.found.append(newone)
                    there = True
 
        return found
 
    def statfiles(self):
        #Run through the list of files we found and stat them
        files = self.listfiles()
 
        for fileinfo in files:
            #Stat the file
            fullpath = self.dirname + fileinfo.filename
            mtime = self.sftp.lstat(fullpath)[8]
            if fileinfo.oldtime != mtime:
                fileinfo.modtime = mtime
                fileinfo.waschanged = True
            else:
                fileinfo.modtime = mtime
 
 
    def finalize(self):
        for gotem in self.found:
            mtime = gotem.modtime
            nowtime = time.time()
            if gotem.waschanged:
                self.log_write("%s was changed during the check!" % gotem.filename)
            #Figure out when the last time it was updated was
            difference = nowtime - mtime
 
            #If it's less than five minutes, seconds for logging
            if difference <= 300:
                self.log_write("%s was updated %d seconds ago" % (gotem.filename, difference))
 
            #Less than an hour?  Minutes with two decimal places
            elif difference <= 3600:
                minutes = difference / 60
                self.log_write("%s was updated %.2f minutes ago" % (gotem.filename, minutes))
 
            #Otherwise, hours with two decimal places    
            else:
                hours = minutes / 3600
                self.log_write("%s was updated %.2f hours ago" % (gotem.filename, hours))
 
    def formattime(self, diff):
        #Set timezone for PST, since I don't need to import more 
        #python libs
        timediff = 3600 * diff
 
        #Check if we're in DST
        dsthandle = open(self.DSTPath, 'r')
        dststatus = dsthandle.read().rstrip("\n")
        dsthandle.close()
 
        if dststatus == '1':
            timediff = timediff - 3600
 
        #Get the time in epoch seconds, subtract the diff
        #pass it back in a struct so we can format it
        sftptime = time.gmtime(int(time.time() - timediff))
        sftpstamp = sftptime.strftime("%m%d")
 
 
        return sftpstamp
 
    def log_open(self):
        #Set up logging
        logfile = "%s%s" % (self.LogPath, self.LogFilename)
        try:
            file = open(filename, "a")
            self.LogFileObj = file
        except IOError:
            print "Error: Cannot open log file %s!" % logfile
        else:
            self.log_write("Log file opened: %s!" % logfile)
 
    def log_write(self, message):
        #Actually write to the log
        timestamp = time.strftime("%Y-%m-%d %H:%M:%S", time.gmtime())
        message_text = "%s %s\n" % (timestamp, message)
        self.LogFileObj.write(message_text)
 
if __name__ == "__main__":
    #Instantiate it
    check = sftpcheck(username=args[0], password=args[1], hosts=args[2], dirname=args[3], stamp=args[4], diff=args[5])
 
    #Set up logging to troubleshoot the connection
    debugfile = "%s%s" % (check.DebugPath, check.DebugFilename)
    paramiko.util.log_to_file(debugfile)
    args = sys.argv[1:]
 
 
    #Let us know when we're starting
    check.log_open()
    check.log_write("--------------------------------------------------------")
    check.log_write("Start sftpcheck.py")
    check.log_write("Try to login")
 
    #Try to get the info
    if check.login(check.host, check.username, check.password):
        check.log_write("Successfully logged in to %s as %s" % (check.host, check.username))
        #Loop 30 times, sleeping for a second inbetween
        for i in range(1, 30):
            check.statfiles()
            time.sleep(1)
        check.finalize()
 
    #If we can't, log it can exit
    else:
        check.log_write("Login to %s as %s failed!  Check %s for more information." % (check.host, check.username, debugfile))
 
    check.log_write("Stop sftpcheck.py")

In Ruby:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
#!/usr/bin/ruby
 
require 'net/sftp'
require 'dir'
 
class FoundFiles
  #Set up a class to hold info
  attr_accessor :filename, :modtime, :oldtime, :waschanged
end
 
class SFTPCheck
  #Initialize variables
  attr_accessor :username, :password, :host, :dir, :stamp, :diff
  @found = Array.new
  @logpath = "/data/testscript/ryan/"
  @dstpath = "/data/.DST_Status"
  @logfilename = "sftpcheck.log"
 
  def login(host, username, password)
    #Log in
    @sftp = Net::SFTP.start(host, username, :password => password)
    return @sftp
  end
 
  def formattime(diff)
    timediff = 3600 * diff
    #Read the DST status file.  Sure, Ruby has a .isdst? method in the time 
    #class, but that doesn't necessarily mesh with work
    File.open("#{dstpath}", "r") do |line|
      #Strip the newline
      dststatus = line.rstrip
    end
 
    if dststatus
      #If it's DST, change it
      timediff -= 3600
    end
 
    timenow = Time.now
    sftptime = time.at(timenow.to_i - timediff).strftime("%m%d")
    return sftptime
  end
 
  def listfiles
    lookfor = formatfile(diff)
 
    #Loop through
    ls = sftp.Dir.each_entry do |file|  
      #Ruby supports PCRE matching.  Yay!
      if file =~ /#{lookfor}/
        there = false
        #Same BS as Perl and Python.  I don't want to use a .contains method
        while there != true
          found.each do |record|
            if record.filename == file
              there = true
            end
          end
          newone = FoundFiles.new(file)
          newone.oldtime = sftp.stat(@dir + file).mtime
          #Could also be written as found.push(newone)
          @found << newone
        end
      end
    end
 
  end
 
  def statfiles
    listfiles()
 
    @found.each do |fileinfo| 
      fullpath = @dir + fileinfo.filename
      mtime = sftp.stat(fullpath).mtime
      #If it's changed, modify the object
      if fileinfo.oldtime != mtime
        fileinfo.modtime = mtime
        fileinfo.waschanged = true
      else
        fileinfo.modtime = mtime
      end
    end
  end
 
  def log_open
    #Set up logging
    logfile = @logpath + @logfilename
    @logfile = File.open(logfile, "a")
    @logfile.write("Log file opened: #{@logfile}!\n")
  end
 
  def log_write(message)
    #Actually write to the log
    @logfile.write(time.now.strftime("%Y-%m-%d %H:%M:%S") + message)
  end
 
  def finalize
    @found.each do |gotem|
      #Figure out when the last time it was updated was
      difference = gotem.mtime - time.now.to_i
      if gotem.waschanged?
        log_write("#{gotem.filename} was changed during the check!\n")
      end
 
 
      #If it's less than five minutes, seconds for logging
      if difference <= 300
        log_write("#{gotem.filename} was updated #{difference} seconds ago\n")
      #Less than an hour?
      elsif difference <= 3600
        log_write("#{gotem.filename} was updated #{(difference/60).round * 0.01} minutes ago\n")
      #Otherwise, hours.  
      else
        log_write("#{gotem.filename} was updated #{(difference/3600).round * 0.01} hours ago\n")
      end
    end
  end
 
end
 
username = ARGV[0]
password = ARGV[1]
host = ARGV[2]
dirname = ARGV[3]
stamp = ARGV[4]
diff = ARGV[5]
 
#Instantiate it
check = sftpcheck.new(username, password, host, dirname, stamp, diff)
 
#Let us know when we're starting
check.log_open
check.log_write("-------------------------------------------------------")
check.log_write("Start sftpcheck.rb")
check.log_write("Try to login")
 
#Try to get the info
if check.login(check.host, check.username, check.password)
  #Loop 30 times, sleeping for a second inbetween
  (1..30).each do |nothing|
    check.statfiles
  end
  check.finalize
else
  check.log_write("Login to #{check.host} as #{check.username} failed!\n")
end
check.log_write("Stopping sftpcheck.rb")

DISCLAIMER:
Use this code totally at your own risk. The Python and Ruby should work, but I haven’t tested them.

EDIT:
WTF. The Ruby came in significantly shorter than the Python and the Perl, plus it’s more fun to write? That makes me wish we had it installed on any of our servers here.