Palm Desktop, I stab at thee!
Firstly, I’m starting P90X tomorrow. Should be interesting. Secondly, I miss you guys :/ I’m living with somebody who asked me what the Dead Sea Scrolls are this morning, since it was on the news that they’re coming to the Science Museum.
By the way, ever planning on touching your blogs again (Sewpbox and Rattributes not included)?
So I’m migrating Heather’s Palm Desktop crap to Google Calendar (I have no idea why no tool exists to do this). Google Calendar doesn’t really like the CSV I massaged out of it (only importing about half the records), and I’m starting to see why. Half the records are fucking duplicates in every way but one. I wrote a Python script to do it for me anyway.
The long and short of it amounts to this:
If you want the easy way, export the Palm data to a .mda, import it into Yahoo Calendar, then into Google Calendar from there. Otherwise, export it to a CSV, and hit it with this script:
#!/usr/bin/ruby # require 'csv' input = "export.csv" output = "gcal.csv" csvfile = File.open(input) {|f| f.read} puts "Parsing..." csv = CSV::parse(csvfile) fields = csv.shift puts "Writing..." File.open(output, "w") do |f| f.print "Subject, Start Date, Start Time, End Date, End Time\n" csv.each do |line| startdate, starttime = Time.at(line[6].to_i).strftime("%m/%d/%Y,%I:%M:%S %p").split(',') enddate, endtime = Time.at(line[7].to_i).strftime("%m/%d/%Y,%I:%M:%S %p").split(',') f.print "\"#{line[11]}\",#{startdate},#{starttime},#{enddate},#{endtime}\n" end end puts "Done."
If you don’t feel like exporting, and are running on Windows:
#!/usr/bin/ruby # # require 'win32ole' require 'dbi' class Access attr_accessor :mdb, :conn, :data, :fields def initialize(mdb=nil) @mdb = mdb @conn = nil @data = nil @fields = nil end def open connstring = "Provider=Microsoft.Jet.OLEDB.4.0;Data Source=#{@mdb}" @conn = WIN32OLE.new('ADODB.Connection') @conn.Open(connstring) end def query(sql) set = WIN32OLE.new('ADODB.Recordset') set.Open(sql, @conn) @fields = [] set.Fields.each do |field| @fields << field.Name end @data = set.GetRows.transpose set.Close end def close @conn.Close end end output = "gcal.csv" rows = Array.new db = Access.new('c:\path\to\mdb') db.open db.query("SELECT * FROM Main;") names = db.fields rows = db.data #Alternatively DBI.connect("DBI:ODBC:driver=Microsoft Access Driver (*.mdb);"+"dbq=c:/path/to/mdb") do |dbh| dbh.select_all('select * from Main') {|row| rows << row} end puts "Writing..." File.open(output, "w") do |f| f.print "Subject, Start Date, Start Time, End Date, End Time\n" rows.each do |line| startdate, starttime = Time.at(line[6].to_i).strftime("%m/%d/%Y,%I:%M:%S %p").split(',') enddate, endtime = Time.at(line[7].to_i).strftime("%m/%d/%Y,%I:%M:%S %p").split(',') f.print "\"#{line[11]}\",#{startdate},#{starttime},#{enddate},#{endtime}\n" end end puts "Done."
If you want the details…
Essentially, Palm’s Datebook dumps everything into an Access database. No keys or relations (granted, only 3 tables, but still), and no idea what most of the columns do. Tools for working with Jet on Linux are minimal, and I didn’t feel like going through win32ole just to get to Jet, plus this sort of thing is nicer to do in downtime at work. So, I exported it via ODBC to a Postgres database on my Solaris box. Not pretty.
access=# \d main TABLE "public.main" COLUMN | Type | Modifiers ----------------+------------------------+----------- record_id | bigint | NOT NULL STATUS | integer | placement | bigint | private | smallint | category | character varying(20) | start_time | bigint | end_time | bigint | untimed | smallint | time_zone | character varying(40) | location | character varying(255) | summary | text | alarm_advance | character varying(10) | alarm_unit | character varying(10) | repeated_event | character varying(255) | alarm | smallint | note | character varying(100) | access=#
Ok, so record_id seems to be some sort of key, and Heather doesn’t bother with notes or alarms, so this doesn’t seem like it’d be so bad. To figure why Google is only taking some of the records, though:
access=$ SELECT count(*) FROM main; count ------- 5094 (1 row) access=$ SELECT count(DISTINCT record_id) FROM main; count ------- 5074 (1 row) access=$ SELECT count(DISTINCT start_time) FROM main; count ------- 2488 (1 row) access=$ SELECT count(DISTINCT end_time) FROM main; count ------- 2490 (1 row) access=$ SELECT count(DISTINCT summary) FROM main; count ------- 2264 (1 row) access=$ SELECT record_id, start_time, end_time, summary FROM main WHERE record_id IN (SELECT record_id FROM main GROUP BY record_id HAVING count(*)>1); record_id | start_time | end_time | summary -----------+------------+------------+------------------------------------------------------------------------- 0 | 1231437600 | 1231441200 | tammy 0 | 1231869600 | 1231873200 | nb chanber lunch 0 | 1229642100 | 1229645700 | tammy AND joe photos st claire broiler 0 | 1231959600 | 1231963200 | dr hunt 0 | 1230505200 | 1230508800 | tilsen photos 0 | 1230568200 | 1230571800 | meet gary at studio 0 | 1230571800 | 1230584400 | bri AND kids 0 | 1230744600 | 1230748200 | tilsen, AND sandy ORDER y membership mail 0 | 1230681600 | 1230681600 | Dan, missy AND the kids. 0 | 1231610400 | 1231614000 | james j hill houseOngoing Daily 11/15/08 - 2/22/09 m-sat 10-4 sun 1-4 0 | 1230663600 | 1230667200 | tammys house glasses shopping 0 | 1229727600 | 1229731200 | ryan help at studio 0 | 1231889400 | 1231893000 | 0 | 1231889400 | 1231903800 | EMS 0 | 1237161600 | 1237161600 | spring break 0 | 1229983200 | 1229986800 | msp WITH the girls 0 | 1241049600 | 1241049600 | DISH 0 | 1232233200 | 1232244000 | jordan senior photos excel AND studio 0 | 1230055200 | 1230058800 | paige studio 0 | 1230314400 | 1230318000 | amanda tg 0 | 1229968800 | 1229972400 | sara AND nolan (21 rows) access=$ SELECT record_id, start_time, end_time, summary FROM main ORDER BY start_time ASC LIMIT 10; record_id | start_time | end_time | summary -----------+------------+----------+--------- 7128069 | 31449600 | 31449600 | c 7128068 | 31449600 | 31449600 | a 7123605 | 31449600 | 31449600 | a 7128070 | 31449600 | 31449600 | 3 7124866 | 31449600 | 31449600 | c 7124107 | 31449600 | 31449600 | 3 7124145 | 31449600 | 31449600 | o 7124141 | 31449600 | 31449600 | ; 7128072 | 31449600 | 31449600 | ; 7128071 | 31449600 | 31449600 | o (10 rows) access=$ SELECT record_id, start_time, end_time, summary FROM main ORDER BY start_time DESC LIMIT 10; record_id | start_time | end_time | summary -----------+------------+------------+----------------------------- 7127485 | 1256774400 | 1256774400 | lawerance wedding 7125815 | 1256774400 | 1256774400 | lawerance wedding 7128114 | 1244167200 | 1244170800 | NB senior ALL night party 7125941 | 1242489600 | 1242493200 | nyquist edding 7125827 | 1242489600 | 1242493200 | nyquist edding 0 | 1241049600 | 1241049600 | DISH 7128073 | 1238079600 | 1238083200 | books IN the woods 7125623 | 1238079600 | 1238083200 | books IN the woods 7125697 | 1238025600 | 1238025600 | gunflint books IN the woods 7126175 | 1238025600 | 1238025600 | gunflint books IN the woods (10 rows) access=$
Oh, yeah! What I’ve gathered:
- There are duplicate record_ids (which I’d hoped would have been unique).
- There are events set to start and end at duplicate times
- Palm, at some point, duplicated a lot of the other records, except for the record_id.
- Times are stored in epoch seconds (oddly, Unix epoch seconds, not Windows)
- Some of the times correlate to 1970? WTF
A working solution:
access=$ SELECT DISTINCT a.start_time, a.end_time, a.summary INTO holdkey FROM main a WHERE EXISTS ( SELECT 'x' FROM main b WHERE a.start_time = b.start_time AND a.end_time = b.end_time AND a.summary = b.summary) ORDER BY a.start_time DESC; SELECT access=$ SELECT count(*) FROM holdkey; count ------- 2597 (1 row) access=$ DELETE FROM main USING holdkey WHERE main.start_time = holdkey.start_time AND main.end_time = holdkey.end_time AND main.summary = holdkey.summary; DELETE 5085 access=$ SELECT record_id, start_time, end_time, summary FROM main; record_id | start_time | end_time | summary -----------+------------+------------+--------- 5280360 | 31536000 | 31536000 | 5280298 | 31536000 | 31536000 | 5280429 | 31536000 | 31536000 | 7125497 | 1193437800 | 1193437800 | 7128378 | 31536000 | 31536000 | 7128376 | 31536000 | 31536000 | 7128374 | 31536000 | 31536000 | 7127620 | 1193437800 | 1193437800 | 0 | 1231889400 | 1231893000 | (9 rows) access=$ DROP TABLE main; DROP TABLE access=$ SELECT * INTO main FROM holdkey; SELECT
That works. Of course there’s the quick and dirty way which doesn’t involve munging about with temp tables:
access=$ DELETE FROM main t1 USING main WHERE EXISTS (SELECT * FROM main t2 WHERE t1.start_time = t2.start_time AND t1.end_time = t2.end_time AND t1.summary = t2.summary AND t1.record_id < t2.record_id); DELETE 2488 access=$ SELECT count(*) FROM test; count ------- 2606 (1 row)
It gives a slightly different result, but operates under the assumption that Palm’s record_id means something (it may not, for all I know). On the upside, it preserves all the columns in case they’re useful for something (doubtful). I could order by start_time and select into another table, add an index, and do the same thing, but it’s easier the quick and dirty way. There’s probably a trivial way to do this with joins, but I couldn’t think of one, and it leaves 9 records with a record_id of 0..
Here’s the code which it turns out I didn’t need, but it might be useful to somebody:
#Rips data from Palm Desktop. Uploads it to Google Calendar #Written with Python 2.5 (though imports should work anyway) # #Currently, the Access MDB Palm Datebook uses has been exported to a #PostgreSQL server via ODBC, so I'll be connecting to that # #There's code in here for getting through Access also, but I haven't tested it. #Use at your own risk (kinda like Access). # #This is mostly due to the Postgres ODBC driver, and the fact that I didn't #want to bother with quoting all the queries for Postgres to allow spaces try: from xml.etree import ElementTree #Python 2.5, probably 2.6/3.0 also except ImportError: from elementtree import ElementTree #Python <2.4 import gdata.calendar.service import gdata.service import atom.service import gdata.calendar import atom import getopt import sys import string import time import psycopg2 #Talk to Postgres class Struct: def __init__(self, *args, **kwargs): for k,v in kwargs.items(): setattr(self, k, v) class GCalMigrate: def __init__(self): self.conn = None self.cur = None self.calendar = None self.records = [] def connect(self): try: self.conn = psycopg2.connect("dbname='whatever' user='yournamehere' host='server'") except: print "Can't connect to the database!\n" sys.exit() self.cur = conn.cursor() query() def accessconnect(self,mdbpath): import odbc self.conn = odbc.odbc("driver=Microsoft Access Driver (*.mdb);DBQ=%s") % mdbpath self.cur = conn.cursor() queryaccess() def queryaccess(self): rows = [] self.cur.execute("SELECT Main.[Start Time], Main.[End Time], Main.[Summary] FROM Main") rows = cur.fetchall() conn.close() parserows(rows) def query(self): rows = [] try: self.cur.execute("SELECT start_time, end_time, summary FROM main") rows = cur.fetchall() except: print "Couldn't query the database.\n" conn.close() parserows(rows) def parserows(self, rows): for row in rows: starttime = time.strftime("%Y-%m-%dT%H:%M:%S.000Z", time.gmtime(row[0])) endtime = time.strftime("%Y-%m-%dT%H:%M:%S.000Z", time.gmtime(row[1])) title = row[2] record = Struct(start_time=starttime, end_time=enddtime, title=title) self.records.append(record) login() def login(self, username, password): self.calendar = gdata.calendar.service.CalendarService() self.calendar.email = username self.calendar.password = password self.calendar.source = "Palm_Desktop_Migrator" self.calendar.ProgrammaticLogin() batchsubmit() def batchsubmit(self): feed = gdata.calendar.CalendarEventFeed() for record in records: insertme = gdata.calendar.CalendarEventEntry() insertme.title = atom.Title(record.title) insertme.content = atom.Content("") insertme.when.append(gdata.calendar.When(start_time=record.start_time, end_time=record.end_time)) insertme.batch_id = gdata.BatchId(text='Palm_Migration') feed.Add_Insert(entry=insertme) response = self.calendar.ExecuteBatch(feed, gdata.calendar.service.DEFAULT_BATCH_URL) return response if __name__ == "__main__": runner = GCalMigrate() responses = runner.connect() for entry in responses.entry: print "Batch ID: %s" % entry.batch_id.text print "Status: %s" % entry.batch_status.code print "Reason: %s" % entry.batch_status.reason
4 Comments
Other Links to this Post
RSS feed for comments on this post. TrackBack URI
Leave a comment
You must be logged in to post a comment.
By Missy, January 15, 2009 @ 12:59 am
Oh! Just noticed that you blogged. I was actually thinking about overhauling Exquisite Turmoil this week.. I’ve got a bunch of themes picked out, I just need to bug Dan to update it for me again (since all of the new themes are for the new WP =/). When she asked what the Dead Sea Scrolls were, you totally should have made her watch Eva!
By Dan, January 15, 2009 @ 1:04 am
Suggestion (note — I don’t know Google Calendar’s format) –
Foreach loop through the records. Insert individually. When inserting, use SELECT SCOPE_IDENTITY() to get a unique key.
Also, SELECT DISTINCT is an abomination before God. You do know how that works behind the scenes, right?
By Ryan, January 15, 2009 @ 8:41 am
Isn’t Exquisite Turmoil on Dreamhost? Their panel allows for monkeys to upgrade WP (assuming it was installed through their one-click installer thingy). I don’t really think she’d appreciate Eva :/
The gdata APIs (including gcal) pretty much cram things into an ATOM feed, then authenticate to Google/publish via curl or HTTP sockets. Since the API (and Python libraries) are available from Google, it makes it fairly simple.
Yes, SELECT DISTINCT sucks. Well, SELECT DISTINCT sucks on joins and subqueries (I’m not convinced it sucks for single tables), since it tosses the query optimizer and just grabs the entirety of the table, checks it for uniqueness, then sorts it. GROUP BY works just as well in -most- instances.
The problem with the data from Access wasn’t a lack of unique keys, it was duplicate records that already had unique keys. Nothing else really worked as a constraint (since Heather sometimes had things starting or ending at the same time, or with exactly the same summary), so the DELETE FROM … WHERE EXISTS clause or a SELECT DISTINCT a INTO b seemed to be the only way to do it.
As an aside, SCOPE_IDENTITY() is T-SQL specific, and doesn’t that return the identity of the last row inserted? CURRVAL(‘id’) is pretty much the PGSQL equivalent (or NEXTVAL(‘id’). Still, though, there were only 20 duplicate keys out of 5100ish records, but nearly half those had duplicate data:
I don’t know that a unique key would have redeemed it =/ It would have been possible to generate a composite key over a couple of the fields, but composite key on a TEXT field is awful.
By Missy, January 15, 2009 @ 11:23 pm
It is on Dreamhost, but if you’ll recall — the last time Dan updated my blog it crapped out for a while and took extensive work to make it ah.. work.. again.