Posts tagged: Python

Real World Regexes

Dan mentioned that he wasn’t that knowledgeable about regular expressions (a topic I am intimately familiar with), so I figured I’d put up some examples from code I’ve actually written, along with the text they’re actually supposed to match.

To begin with, here are the general rules for regexes. To begin with, “operator” refers to any of these (so \s+, [A-Z], (Word), etc). Greedy means it’ll continue matching as far as possible, and if the operator/character you want to match occurs more than once in the string, it’ll eat the first one and only stop matching at the last one.

. Match any character
\w Match “word” character (alphanumeric plus “_”)
\W Match non-word character
\s Match whitespace character
\S Match non-whitespace character
\d Match digit character
\D Match non-digit character
\t Match tab
\n Match newline
\r Match return
\f Match formfeed
\a Match alarm (bell, beep, etc)
\e Match escape
^ Beginning of the line
$ End of the line
+ matches the preceding operator one or more times (greedy)
* matches the preceding operator zero or more times (greedy)
? matches the preceding operator once if it exists, but it doesn’t have to be there. Mostly used to stop greedy operators (*? or +?, for instance) at the match you want.
() is used for grouping (either to use later as a backreference or to exclude)
(?<name>) (or (?P<name>) in Python and maybe others) is used for a named backreference. There’ll be some examples of that.
| is used as a logical or
{n} is used to match the preceding character n times
{n, m} matches n to m times
{n,} matches 1 or more times (may as well use +)
[A-Za-z] is used to match whatever is in the middle, but it only counts as one character (so [A-Za-z] would match any of those characters ONCE. Useful if you want [a-f] or [0-5]+ or something).
[^] is used to exclude things. [^word] excludes “w”, but the caret only matches ONCE (this can be chained as [^(word)], since groups count as a single operator.

Sound confusing? It is, which is why I’ll put up real examples. FYI, these are PCRE (Perl Compatible Regular Expressions) rather than SCRE (Sed Compatible Regular Expressions), but Dan’ll almost certainly never use sed compatible (which doesn’t have a ? operator, among other things).

Using a backreference later depends on the language. .NET uses ${n} where n is the reference number (note that they start from 1, as the entire string you matched is ${0}), Perl (and a lot of others) us $n, Ruby uses \1 (as does Python, but Python {like .NET} needs an operator in front to use a raw string {.NET is @, Python is r}, otherwise it’s \\1). Language reference is your best bet here.

First example.

(Oct6 0423z) Dec4100: C, was acknowledged by, ek
string regexPattern = @".*?\)\s
                      (?<system>\S+?)
                      :\s
                      (?<tape>\w)
                      .*,\s
                      (?<initials>.*)";
Regex re = new Regex(regexPattern, RegexOptions.ExplicitCapture);

It eats everything up until the right parenthesis (escaped so the regex parser doesn’t try to interpret it) followed by a space, then it gets all non-whitespace characters until the colon as the system name. Ignores the colon and a space, then grabs all word characters ([A-Z0-9_]) as the tape number. Ignores zero or more matches of any character (the “.”) until it finds a comma followed by a space, then yanks the rest of the line as the initials.

C is the tape name.

ek are the initials.

This means Dec4100 is available as ${system} (if doing Regex.Replace) or m.Groups["system"] if you matched the regex with m = Regex.Match(logfilestring, re);

Another example:

	<form action="http://www.climate.weatheroffice.ec.gc.ca/climateData/Interform.cfm" method="post" name="stnRequest1">
		<input type="Hidden" name="hlyRange" value="N/A">
		<input type="Hidden" name="dlyRange" value="1998-4-1|2007-11-30">
		<input type="Hidden" name="mlyRange" value="1998-4-1|2007-11-1">
		<input type="Hidden" name="StationID" value="10700">
		<input type="Hidden" name="prov" value="CA">
		<input type="Hidden" name="urlExtension" value="_e.html">
	<tr id="dataTableOddRow">
		<td id="dataTableRowHeader">(AE) BOW SUMMIT</td>
		<td id="dataTableRowHeader"><abbr title="ALBERTA">ALTA</abbr></td>
		<td>
			<select name="timeframe" size="1" class="formElement75w" onChange="elementChange(document.stnRequest1,1)">
	<option value="2">Daily</option><option value="3">Monthly</option><option value="4">Almanac</option>
			</select>
		</td>
	<td>
	<select name="day" size="1" class="formElement" disabled><option value="1" >1</option><option value="2" >2</option><option value="3" >3</option><option value="4" >4</option><option value="5" >5</option><option value="6" >6</option><option value="7" >7</option><option value="8" >8</option><option value="9" >9</option><option value="10" >10</option><option value="11" >11</option><option value="12" >12</option><option value="13" >13</option><option value="14" >14</option><option value="15" >15</option><option value="16" >16</option><option value="17" >17</option><option value="18" >18</option><option value="19" >19</option><option value="20" >20</option><option value="21" >21</option><option value="22" >22</option><option value="23" >23</option><option value="24" >24</option><option value="25" >25</option><option value="26" >26</option><option value="27" >27</option><option value="28" >28</option><option value="29" >29</option><option value="30" Selected>30</option><option value="31" >31</option>
		</select>
	</td>
	<td>
	<select name="month" size="1" class="formElement" onChange="elementChange(document.stnRequest1,1)" ><option value="1" >Jan</option><option value="2" >Feb</option><option value="3" >Mar</option><option value="4" >Apr</option><option value="5" >May</option><option value="6" >Jun</option><option value="7" >Jul</option><option value="8" >Aug</option><option value="9" >Sep</option><option value="10" >Oct</option><option value="11" Selected>Nov</option><option value="12" >Dec</option>
		</select>
	</td>
	<td>
	<select name="year" size="1" class="formElement" onChange="elementChange(document.stnRequest1,1)"><option value="1998" >1998</option><option value="1999" >1999</option><option value="2000" >2000</option><option value="2001" >2001</option><option value="2002" >2002</option><option value="2003" >2003</option><option value="2004" >2004</option><option value="2005" >2005</option><option value="2006" >2006</option><option value="2007" Selected>2007</option>
	</select>
	</td>
	<td>
	<input type="submit" name="stnSubmit" value="Go" class="formElement">
</td>
</form>

And the parser:

if ($chunk =~ /.*StationID.*?"(\d+)".*?prov.*?"(\w+).*?TableRowHeader">(.*?)<.*abbr title.*?>(\w+).*?/s) {
     my $stationid = $1;
     my $province = $2;
     my $name = $3;
     my $abbrprov = $4;
}

This is a multi-line regex (hence the //s, like //g is global, //i is case insensitive, //gi is both g and i, etc), and a good example of non-greedy matching. It snags everything up until StationId, then the next quotation mark followed by numbers, and captures those numbers. It comes out as “10700″.

Does the same thing following “prov” up until the next word characters in quotation marks, and captures those. As .* rather than .*?, it would have grabbed “data”, which precedes TableRowHeader (inside the same parenthesis). Comes out as “CA”.

Grabs everything from TableRowHeader”> until the next < Comes out as “(AE) Bow Summit”.

Drops everything up until the next < after “abbr title”, then captures all word characters. “ALBA”

These are all assigned to variables via backreferences. $1, $2, $3, $4 are the groups in order. It’s worth noting that (at least in .NET), named backreferences are assigned numbers BEFORE regular backreferences. So (?<a>a)(b)(?<c>c)(d) would be acbd as ${0}${1}${2}${3}.

Another example:

04:26:23 [2] Error creating WLAAAP06.FS8 = 1 : Unrecognized KGFXENG Error Code

And the parser:

re.match(line, r'^(?P<time>.*?)\s+\[(?P<engine>\d+)\]\s+(?P<error>.*?(KGFXENG|LeadTools).*)'

Grabs everything from the beginning of the line until the first space as “time”. Comes out as “04:26:23″.

Then skips whitespace and a bracket (escaped with \[) and grabs one or more numbers (\d+) as "engine". Comes out as "2", of course. Skips a space, then captures anything which contains "KGFXENG" or "LeadTools" as "error". Basically, the rest of the line.

This line, for instance, wouldn't match, and nothing in the regex would be captured:

00:15:18 [1] Error producing WPATAZ00.FSD = F088 : Error while saving the graphic

These are used later with this:

message = "ERROR: %s %s: %s" % (re.sub(r'.*?([A-Za-z]+Engine[A-Za-z]*?)(Errors)?.*', r'\1', 
                         logfilename), 
                         engine, 
                         match.group('error'))

“logfilename” is something like “2008_Oct_07__ProductEngineErrors.log”. This grabs everything up until A through Z (uppercase or lowercase) one or more times followed by Engine, optionally followed by something else (*, though ? would have worked if I said r’Engine([A-Za-z]+)?’). It stops on Errors, if it exists (the question mark afterwards), and replaces the entire name with the first backreference (”ProductEngine” in this case).

Last example is a nested bitch of increasingly complicated rules:

#Match plain ol' timezones
if ($brpos =~ /^\[(\w+)\](.*)/)
{
	$DateZone = $1;
	$newname = $2;
}
#Match timezones with a day modification, and grab that along with the +/-
elsif ($brpos =~ /^\[(\w+)(\S\d+)\](.*)/)
{
	$DateZone = $1;
	$TempDay2 = ONE_DAY * $2;
	$newname = $3;
}
#Check for a delete flag
elsif ($brpos =~ /^(\d)\[.*/)
{
	$DeleteFilesStatus = $1;
	#If the status is one, we want to capture everything after the timezone as the DeleteName
	if ($DeleteFilesStatus == 1)
	{
		if ($brpos =~ /^(\d)\[(\w+)\](.*)/)
		{
			$DeleteFilesStatus = $1;
			$DateZone = $2;
			$DeleteFilesNames = $3;
			$newname = $3;
		}
		elsif ($brpos =~ /^(\d)\[(\w+)(\S\d+)\](.*)/)
		{
			$DeleteFilesStatus = $1;
			$DateZone = $2;
			$TempDay2 = ONE_DAY * $3;
			$DeleteFilesNames = $4;
			$newname = $4;
		}
	}
	#Otherwise, the DeleteName is in more brackets
	elsif ($DeleteFilesStatus == 2)
	{
                #Grab it all, but without a time modification
		if ($brpos =~ /^(\d)\[(\w+)\]\[(.*\.\w+)\](.*)/)
		{
			$DeleteFilesStatus = $1;
			$DateZone = $2;
			$DeleteFilesNames = $3;
			$newname = $4;
		}
                #Grab it with a time modification
		elsif ($brpos =~ /^(\d)\[(\w+)(\S\d+)\]\[(.*\.\w+)\](.*)/)
		{
			$DeleteFilesStatus = $1;
			$DateZone = $2;
			$TempDay2 = ONE_DAY * $3;
			$DeleteFilesNames = $4;
			$newname = $5;
		}
	}
}

Examples of what I’m catching (hopefully in order). The stuff in brackets later is filled in for date/time stamps:

[EDT]DOV-F-[MM][dd][yy][hh].csv
[CST-1][MM][dd].act
1[PDT]Actual[yy][MM][dd][hh][mm].csv
1[EST-3]KLGA[yy][MM][dd].mtx
2[EDT][WBD*.txt]WBD[yy][MM][dd]05.txt
2[MST+2][WSM*.txt]WBD[yyyy][MM].txt

Sadly, I’m out of work for the night, but these matches aren’t that complicated. Lots of escaping brackets, and use of the \S character to match “-” or “+”, then grabbing the rest of them. I may write more tomorrow…

Ruby vs. Perl vs. Python

Sigh. I didn’t check my fileserver before I left home. It booted into a non-xVM kernel, so the code in my CentOS virtual machine isn’t accessible. Not that it’s a big deal, it just means that I’ll only be posting the code I’ve got here at work for a routine to check file modification times via SFTP. The reason d’être seems to be making sure customers aren’t trying to grab files whilst they’re being updated, but I’m not sure on that. They wanted the code, I wrote the code, now it’s doing… something.

At this point, I’m attempting to write every script in Perl, Python, and Ruby simultaneously. It shouldn’t be a surprise that I’m most productive in Perl, but I’d still like to keep up skills in Python (and build them in Ruby). For one reason or another (the fact that HP-UX systems don’t come with Python, for one; Ruby performing about 10 times as slow as Perl/Python on the 1.8 interpreter, and the 1.9 interpreter not being production ready for another [and yes, JRuby is really fast {faster than Ruby 1.9}, but the JVM startup time is a killer for a script running out of cron, and not every system in my datacenter has a JVM installed]), the Perl version is almost always the one that goes onto a production box somewhere.

It’s also worth noting that writing Ruby/Python in a procedural manner, like it’s Perl with methods tacked onto the objects, doesn’t make any sense to me. Sure, it’s kinda fun to write it that way, and blocks in Ruby are really handy (even if they’re slightly more confusing to a non-Rubyist than pointers were the first time I used them), but it feels against the spirit of it somehow. I don’t know.

That being said, this code isn’t nearly what would go into production anyway. The Python really isn’t far off. Perl’s kinda slow when it comes to objects, though, and I’d never use Class::Struct in something which weren’t totally network or I/O bound (probably iterate through arrays or hashes instead). The Ruby? Well, ahh, it’d probably look a lot like this. I’m not exactly a Rubyist at this point, though, and I’m sure it could be cleaned up a lot more without playing Perl Golf with it (hopefully by integrating Rubyisms which don’t make it God-awful unreadable, like the code at the top of this post).

As a total aside, I loathe Coding Horror just a little bit more than I hate Joel on Software. We’re talking about two guys who probably spend as much time blogging as they do working, treat their readers like idiots, preach bad practices (Joel’s company wrote its own fucking programming language rather than a DSL, and he advocates against using Exceptions, among other things. Jeff Atwood [Coding Horror] has no idea what the phrase “use the right tool for the right job” means, and would rather advocate using .NET for everything).

FWIW, I realize now why I changed to this theme. WIDE textarea. Not all the Wordpress themes I’ve played with work nicely once I start fucking with the margins for it in CSS. I don’t give a damn about widget ready or what have you. All that stuff is easy enough to add by hand. I want a theme that’s not going to waste 1/3rd of the page on blankness.

WTB blogs from Dan.

So, here’s one script (the Craigslist parser will get posted tomorrow, I guess).

In Perl:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
#!/usr/bin/perl
 
###############
# SFTPCheck 
#
# 1.0
#
# Description:
# Checks update times for files on Omaha's server to troubleshoot scp problems
 
#Import what we need
use Class::Struct;
use Net::SFTP;
use Net::SFTP::Util qw{fx2txt};
use Net::SFTP::Attributes;
use strict;
use utf8;
 
#Set up  a struct we can dump data into
 
struct fileinfo => { filename => '$', 
 			  modtime => '$',
			  oldtime => '$',
			  waschanged => '$'};
 
#Set up logging
open(OUTFILE, ">> /data/testscript/ryan/sftpcheck.out");
print OUTFILE "-----------------------------------------------------------------------------------------\n";
my $now = localtime(time());
print OUTFILE "Starting at $now\n";
 
#Set up the connection parameters
my ($user, $pass, $host, $dir, $stamp, $diff) = @ARGV;
 
my @listers;
 
 
#Open it
my $sftp = undef;
$sftp = OpenSFTP($sftp, $user, $pass, $host);
 
#Set the directory path, since Net::SFTP->ls doesn't return a fully-qualified one
my $prefix = $dir . "/";
 
#Run through a loop 30 times, sleeping for one second inbetween
for(my $count = 0; $count <= 30; $count++) {
 
	#Do an ls
	@listers = &lookforfiles;
 
	foreach my $file (@listers) {
		#Set the fully qualified name so we can stat is
		my $remote = $prefix . $file;
		#Grab the modification time
		my $stat = $sftp->do_lstat($remote);
		#If it changed, set waschanged and put the new value in
		if ($file->oldtime != $stat->mtime) {
			$file->modtime($stat->mtime);
			$file->waschanged(1);
		}
		else {
			$file->modtime($stat->mtime);
		}
	}
	sleep(1);
}
 
foreach my $file (@listers) {
	if ($file->waschanged) {
		print OUTFILE $file->filename . " was changed during the check!\n";
	}
 
	#Compare it to now
	my $mtime = $file->modtime;
	my $difference = time() - $mtime;
	#If it's less than five minutes, pass it off in seconds
	if ($difference <= 300) {
		print OUTFILE $file->filename . " was updated $difference seconds ago\n";
	}
	elsif ($difference <=3600) {
		#Otherwise, minutes with two decimal places should be precise enough
		my $minutes = $difference / 60;
		printf(OUTFILE $file->filename . " was updated %.2f minutes ago\n", $minutes);
	}
	else {
		#If it's really that old, just print hours
		my $hours = $difference / 3600;
		printf(OUTFILE $file->filename . " was updated %.2f hours ago\n", $hours);
	}
}
 
sub OpenSFTP
{
	my ($sftp, $username, $pass, $host) = @_;
	my %args = ( 
		user => $username,
		password => $pass,
		debug => '1',
	);
	$username = utf8::encode($username);
	$pass = utf8::encode($pass);
	print "Trying to connect to $host as $username:$pass\n";
	$sftp = Net::SFTP->new($host, %args);
	my $status = fx2txt($sftp);
	if ($sftp) {
		return($sftp);
	}
}
 
sub lookforfiles
{
	#Figure out what the timestamp should be
	my @lookfor = &sftpstamp($stamp, $diff);
	#Check for yesterday's date too, why not.
	$lookfor[1] = $lookfor[0]--;
	my @found;
 
	#Do the LS, which doesn't support globbing, and pass it off to a regexp to find the files we need
	my @list = $sftp->ls("$dir", wanted => sub { $_[0]->{filename}});
	foreach my $day (@lookfor) {
		foreach my $file (@list) {
			my $name = $file->{filename};
			if ($name =~ /$day/) {
				my $there = 0;
				while (!$there) {
					foreach my $loop (@listers) {
						if($loop->filename($name)) {
							#If the filename is already in the array, break
							$there = 1;
						}
					}
					my $foundname = fileinfo->new();
					$foundname->filename($name);
					my $stat = $sftp->do_lstat($prefix . $name);
					$foundname->oldtime($stat->mtime);
					push(@listers, $foundname);
					$there = 1
				}
			}
		}
	}
	return(@found);
}
 
sub sftpstamp
{
    (my $stamp, my $diff) = @_;
	#Set the tzinfo so we know what the stamp should be
	my $timediff = 3600 * $diff; 
 
	#Check if we're in DST
	my $dststatus = 0;
	my $dstfile = "/data/.DST_Status";
	open(DST, $dstfile);
	$dststatus = (<DST>);
	chomp($dststatus);
	close(DST);
 
	#If so, modify the time accordingly
	if ($dststatus) {
		$timediff -= 3600;
	}
 
	#Get the time and format it
	my @StampNow = localtime(time() - $timediff);
	my $day = $StampNow[3];
	$day = "0$day" if $day < 10;
	my $month = $StampNow[4] + 1;
	$month = "0$month" if $month < 10;
 
	#Pass the appropriate value back
	$stamp = "$month$day";
	return $stamp;
}

In Python:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
import base64
import getpass
import os
import platform
import socket
import sys
import time
import traceback
 
import paramiko
 
class foundfile:
    def __init__(self, filename=None, modtime=None, oldtime=None, waschanged=None):
        self.filename = filename
        self.modtime = modtime
        self.oldtime = oldtime
        self.waschanged = waschanged
 
class sftpcheck:
 
    def __init__(self, host=None, username=None, password=None, dirname=None, stamp=None, diff=None):
 
 
 
        #Set up the initial variables for login
        self.host = host
        self.username = username 
        self.password = password
        self.dirname = dirname
        self.stamp = stamp
        self.diff = diff
        self.sftp = None
        self.found = []
 
        #Set up logging
        if platform.uname()[0] == "Windows":
            self.LogPath = "\\\\filer2\\data\\testscript\\ryan\\"
            self.DSTPath = "\\\\filer2\\data\\.DST_Status"
            self.DebugPath = "\\\\filer2\\data\\testscript\\ryan\\"
        else:
            self.LogPath = "/data/testscript/ryan/"
            self.DSTPath = "/data/.DST_Status"
            self.DebugPath = "/data/testscript/ryan/"
        self.DebugFilename = "sftpdebug.log"
        self.LogFilename = "sftpcheck.log"
        self.LogFileObj = None
 
    def login(self, host, username, password):
        #Connect to the server
        t = paramiko.Transport((host, 22))
        t.connect(username=username, password=password)
        self.sftp = paramiko.SFTPClient.from_transport(t)
 
        return (self.sftp != None)
 
    def listfiles(self):
        #Pick up how it's supposed to be formatted
        lookfor = self.formattime(self.diff)
 
        #Get a list of files in the directory
        ls = self.sftp.listdir(self.dirname)
 
        #Check if any match
        for fileinfo in ls:
            if re.search(lookfor, fileinfo):
                there = False
                while there != True:
                    for record in found:
                        if record.filename == fileinfo:
                            there = True
                    newone = foundfile(filename = fileinfo)
                    mtime = self.sftp.lstat(self.dirname + fileinfo)[8]
                    newone.oldtime = mtime
                    self.found.append(newone)
                    there = True
 
        return found
 
    def statfiles(self):
        #Run through the list of files we found and stat them
        files = self.listfiles()
 
        for fileinfo in files:
            #Stat the file
            fullpath = self.dirname + fileinfo.filename
            mtime = self.sftp.lstat(fullpath)[8]
            if fileinfo.oldtime != mtime:
                fileinfo.modtime = mtime
                fileinfo.waschanged = True
            else:
                fileinfo.modtime = mtime
 
 
    def finalize(self):
        for gotem in self.found:
            mtime = gotem.modtime
            nowtime = time.time()
            if gotem.waschanged:
                self.log_write("%s was changed during the check!" % gotem.filename)
            #Figure out when the last time it was updated was
            difference = nowtime - mtime
 
            #If it's less than five minutes, seconds for logging
            if difference <= 300:
                self.log_write("%s was updated %d seconds ago" % (gotem.filename, difference))
 
            #Less than an hour?  Minutes with two decimal places
            elif difference <= 3600:
                minutes = difference / 60
                self.log_write("%s was updated %.2f minutes ago" % (gotem.filename, minutes))
 
            #Otherwise, hours with two decimal places    
            else:
                hours = minutes / 3600
                self.log_write("%s was updated %.2f hours ago" % (gotem.filename, hours))
 
    def formattime(self, diff):
        #Set timezone for PST, since I don't need to import more 
        #python libs
        timediff = 3600 * diff
 
        #Check if we're in DST
        dsthandle = open(self.DSTPath, 'r')
        dststatus = dsthandle.read().rstrip("\n")
        dsthandle.close()
 
        if dststatus == '1':
            timediff = timediff - 3600
 
        #Get the time in epoch seconds, subtract the diff
        #pass it back in a struct so we can format it
        sftptime = time.gmtime(int(time.time() - timediff))
        sftpstamp = sftptime.strftime("%m%d")
 
 
        return sftpstamp
 
    def log_open(self):
        #Set up logging
        logfile = "%s%s" % (self.LogPath, self.LogFilename)
        try:
            file = open(filename, "a")
            self.LogFileObj = file
        except IOError:
            print "Error: Cannot open log file %s!" % logfile
        else:
            self.log_write("Log file opened: %s!" % logfile)
 
    def log_write(self, message):
        #Actually write to the log
        timestamp = time.strftime("%Y-%m-%d %H:%M:%S", time.gmtime())
        message_text = "%s %s\n" % (timestamp, message)
        self.LogFileObj.write(message_text)
 
if __name__ == "__main__":
    #Instantiate it
    check = sftpcheck(username=args[0], password=args[1], hosts=args[2], dirname=args[3], stamp=args[4], diff=args[5])
 
    #Set up logging to troubleshoot the connection
    debugfile = "%s%s" % (check.DebugPath, check.DebugFilename)
    paramiko.util.log_to_file(debugfile)
    args = sys.argv[1:]
 
 
    #Let us know when we're starting
    check.log_open()
    check.log_write("--------------------------------------------------------")
    check.log_write("Start sftpcheck.py")
    check.log_write("Try to login")
 
    #Try to get the info
    if check.login(check.host, check.username, check.password):
        check.log_write("Successfully logged in to %s as %s" % (check.host, check.username))
        #Loop 30 times, sleeping for a second inbetween
        for i in range(1, 30):
            check.statfiles()
            time.sleep(1)
        check.finalize()
 
    #If we can't, log it can exit
    else:
        check.log_write("Login to %s as %s failed!  Check %s for more information." % (check.host, check.username, debugfile))
 
    check.log_write("Stop sftpcheck.py")

In Ruby:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
#!/usr/bin/ruby
 
require 'net/sftp'
require 'dir'
 
class FoundFiles
  #Set up a class to hold info
  attr_accessor :filename, :modtime, :oldtime, :waschanged
end
 
class SFTPCheck
  #Initialize variables
  attr_accessor :username, :password, :host, :dir, :stamp, :diff
  @found = Array.new
  @logpath = "/data/testscript/ryan/"
  @dstpath = "/data/.DST_Status"
  @logfilename = "sftpcheck.log"
 
  def login(host, username, password)
    #Log in
    @sftp = Net::SFTP.start(host, username, :password => password)
    return @sftp
  end
 
  def formattime(diff)
    timediff = 3600 * diff
    #Read the DST status file.  Sure, Ruby has a .isdst? method in the time 
    #class, but that doesn't necessarily mesh with work
    File.open("#{dstpath}", "r") do |line|
      #Strip the newline
      dststatus = line.rstrip
    end
 
    if dststatus
      #If it's DST, change it
      timediff -= 3600
    end
 
    timenow = Time.now
    sftptime = time.at(timenow.to_i - timediff).strftime("%m%d")
    return sftptime
  end
 
  def listfiles
    lookfor = formatfile(diff)
 
    #Loop through
    ls = sftp.Dir.each_entry do |file|  
      #Ruby supports PCRE matching.  Yay!
      if file =~ /#{lookfor}/
        there = false
        #Same BS as Perl and Python.  I don't want to use a .contains method
        while there != true
          found.each do |record|
            if record.filename == file
              there = true
            end
          end
          newone = FoundFiles.new(file)
          newone.oldtime = sftp.stat(@dir + file).mtime
          #Could also be written as found.push(newone)
          @found << newone
        end
      end
    end
 
  end
 
  def statfiles
    listfiles()
 
    @found.each do |fileinfo| 
      fullpath = @dir + fileinfo.filename
      mtime = sftp.stat(fullpath).mtime
      #If it's changed, modify the object
      if fileinfo.oldtime != mtime
        fileinfo.modtime = mtime
        fileinfo.waschanged = true
      else
        fileinfo.modtime = mtime
      end
    end
  end
 
  def log_open
    #Set up logging
    logfile = @logpath + @logfilename
    @logfile = File.open(logfile, "a")
    @logfile.write("Log file opened: #{@logfile}!\n")
  end
 
  def log_write(message)
    #Actually write to the log
    @logfile.write(time.now.strftime("%Y-%m-%d %H:%M:%S") + message)
  end
 
  def finalize
    @found.each do |gotem|
      #Figure out when the last time it was updated was
      difference = gotem.mtime - time.now.to_i
      if gotem.waschanged?
        log_write("#{gotem.filename} was changed during the check!\n")
      end
 
 
      #If it's less than five minutes, seconds for logging
      if difference <= 300
        log_write("#{gotem.filename} was updated #{difference} seconds ago\n")
      #Less than an hour?
      elsif difference <= 3600
        log_write("#{gotem.filename} was updated #{(difference/60).round * 0.01} minutes ago\n")
      #Otherwise, hours.  
      else
        log_write("#{gotem.filename} was updated #{(difference/3600).round * 0.01} hours ago\n")
      end
    end
  end
 
end
 
username = ARGV[0]
password = ARGV[1]
host = ARGV[2]
dirname = ARGV[3]
stamp = ARGV[4]
diff = ARGV[5]
 
#Instantiate it
check = sftpcheck.new(username, password, host, dirname, stamp, diff)
 
#Let us know when we're starting
check.log_open
check.log_write("-------------------------------------------------------")
check.log_write("Start sftpcheck.rb")
check.log_write("Try to login")
 
#Try to get the info
if check.login(check.host, check.username, check.password)
  #Loop 30 times, sleeping for a second inbetween
  (1..30).each do |nothing|
    check.statfiles
  end
  check.finalize
else
  check.log_write("Login to #{check.host} as #{check.username} failed!\n")
end
check.log_write("Stopping sftpcheck.rb")

DISCLAIMER:
Use this code totally at your own risk. The Python and Ruby should work, but I haven’t tested them.

EDIT:
WTF. The Ruby came in significantly shorter than the Python and the Perl, plus it’s more fun to write? That makes me wish we had it installed on any of our servers here.