Palm Desktop, I stab at thee!

Firstly, I’m starting P90X tomorrow. Should be interesting. Secondly, I miss you guys :/ I’m living with somebody who asked me what the Dead Sea Scrolls are this morning, since it was on the news that they’re coming to the Science Museum.

By the way, ever planning on touching your blogs again (Sewpbox and Rattributes not included)?

So I’m migrating Heather’s Palm Desktop crap to Google Calendar (I have no idea why no tool exists to do this). Google Calendar doesn’t really like the CSV I massaged out of it (only importing about half the records), and I’m starting to see why. Half the records are fucking duplicates in every way but one. I wrote a Python script to do it for me anyway.

The long and short of it amounts to this:
If you want the easy way, export the Palm data to a .mda, import it into Yahoo Calendar, then into Google Calendar from there. Otherwise, export it to a CSV, and hit it with this script:

#!/usr/bin/ruby
#
require 'csv'
 
input = "export.csv"
output = "gcal.csv"
 
csvfile = File.open(input) {|f| f.read}
 
puts "Parsing..."
 
csv = CSV::parse(csvfile)
 
fields = csv.shift
 
puts "Writing..."
File.open(output, "w") do |f|
   f.print "Subject, Start Date, Start Time, End Date, End Time\n"
   csv.each do |line|
     startdate, starttime = Time.at(line[6].to_i).strftime("%m/%d/%Y,%I:%M:%S %p").split(',')
     enddate, endtime = Time.at(line[7].to_i).strftime("%m/%d/%Y,%I:%M:%S %p").split(',')
     f.print "\"#{line[11]}\",#{startdate},#{starttime},#{enddate},#{endtime}\n"
   end
end
 
puts "Done."

If you don’t feel like exporting, and are running on Windows:

#!/usr/bin/ruby
#
#
require 'win32ole'
require 'dbi'
 
class Access
   attr_accessor :mdb, :conn, :data, :fields
 
   def initialize(mdb=nil)
       @mdb = mdb
       @conn = nil
       @data = nil
       @fields = nil
   end
 
   def open
       connstring = "Provider=Microsoft.Jet.OLEDB.4.0;Data Source=#{@mdb}"
       @conn = WIN32OLE.new('ADODB.Connection')
       @conn.Open(connstring)
   end
 
   def query(sql)
       set = WIN32OLE.new('ADODB.Recordset')
       set.Open(sql, @conn)
       @fields = []
       set.Fields.each do |field|
           @fields << field.Name
       end
       @data = set.GetRows.transpose
       set.Close
   end
 
   def close
       @conn.Close
   end
end
 
output = "gcal.csv"
 
rows = Array.new
 
db = Access.new('c:\path\to\mdb')
db.open
 
db.query("SELECT * FROM Main;")
names = db.fields
rows = db.data
 
#Alternatively
DBI.connect("DBI:ODBC:driver=Microsoft Access Driver (*.mdb);"+"dbq=c:/path/to/mdb") do |dbh|
   dbh.select_all('select * from Main') {|row| rows << row}
end
 
puts "Writing..."
File.open(output, "w") do |f|
   f.print "Subject, Start Date, Start Time, End Date, End Time\n"
   rows.each do |line|
     startdate, starttime = Time.at(line[6].to_i).strftime("%m/%d/%Y,%I:%M:%S %p").split(',')
     enddate, endtime = Time.at(line[7].to_i).strftime("%m/%d/%Y,%I:%M:%S %p").split(',')
     f.print "\"#{line[11]}\",#{startdate},#{starttime},#{enddate},#{endtime}\n"
   end
end
 
puts "Done."

If you want the details…

Essentially, Palm’s Datebook dumps everything into an Access database. No keys or relations (granted, only 3 tables, but still), and no idea what most of the columns do. Tools for working with Jet on Linux are minimal, and I didn’t feel like going through win32ole just to get to Jet, plus this sort of thing is nicer to do in downtime at work. So, I exported it via ODBC to a Postgres database on my Solaris box. Not pretty.

access=# \d main
                 TABLE "public.main"
     COLUMN     |          Type          | Modifiers 
----------------+------------------------+-----------
 record_id      | bigint                 | NOT NULL
 STATUS         | integer                | 
 placement      | bigint                 | 
 private        | smallint               | 
 category       | character varying(20)  | 
 start_time     | bigint                 | 
 end_time       | bigint                 | 
 untimed        | smallint               | 
 time_zone      | character varying(40)  | 
 location       | character varying(255) | 
 summary        | text                   | 
 alarm_advance  | character varying(10)  | 
 alarm_unit     | character varying(10)  | 
 repeated_event | character varying(255) | 
 alarm          | smallint               | 
 note           | character varying(100) | 
access=#

Ok, so record_id seems to be some sort of key, and Heather doesn’t bother with notes or alarms, so this doesn’t seem like it’d be so bad. To figure why Google is only taking some of the records, though:

access=$ SELECT count(*) FROM main;
 count 
-------
  5094
(1 row)
access=$ SELECT count(DISTINCT record_id) FROM main;
 count 
-------
  5074
(1 row)
access=$ SELECT count(DISTINCT start_time) FROM main;
 count 
-------
  2488
(1 row)
access=$ SELECT count(DISTINCT end_time) FROM main;
 count 
-------
  2490
(1 row)
access=$ SELECT count(DISTINCT summary) FROM main;
 count 
-------
  2264
(1 row)
access=$ SELECT record_id, start_time, end_time, summary 
FROM main 
WHERE record_id IN 
    (SELECT record_id 
     FROM main 
     GROUP BY record_id 
     HAVING count(*)>1);
 record_id | start_time |  end_time  |                                 summary                                 
-----------+------------+------------+-------------------------------------------------------------------------
         0 | 1231437600 | 1231441200 | tammy 
         0 | 1231869600 | 1231873200 | nb chanber lunch
         0 | 1229642100 | 1229645700 | tammy AND joe photos st claire broiler
         0 | 1231959600 | 1231963200 | dr hunt
         0 | 1230505200 | 1230508800 | tilsen photos
         0 | 1230568200 | 1230571800 | meet gary at studio
         0 | 1230571800 | 1230584400 | bri AND kids
         0 | 1230744600 | 1230748200 | tilsen, AND sandy ORDER y membership mail
         0 | 1230681600 | 1230681600 | Dan, missy AND the kids.
         0 | 1231610400 | 1231614000 |  james j hill houseOngoing Daily 11/15/08 - 2/22/09  m-sat 10-4 sun 1-4
         0 | 1230663600 | 1230667200 | tammys house glasses shopping
         0 | 1229727600 | 1229731200 | ryan help at studio
         0 | 1231889400 | 1231893000 | 
         0 | 1231889400 | 1231903800 | EMS 
         0 | 1237161600 | 1237161600 | spring break
         0 | 1229983200 | 1229986800 | msp WITH the girls
         0 | 1241049600 | 1241049600 | DISH
         0 | 1232233200 | 1232244000 | jordan senior photos excel AND studio 
         0 | 1230055200 | 1230058800 | paige studio
         0 | 1230314400 | 1230318000 | amanda tg
         0 | 1229968800 | 1229972400 | sara AND nolan 
(21 rows)
 
access=$ SELECT record_id, start_time, end_time, summary 
FROM main 
ORDER BY start_time 
ASC LIMIT 10;
 record_id | start_time | end_time | summary 
-----------+------------+----------+---------
   7128069 |   31449600 | 31449600 | c
   7128068 |   31449600 | 31449600 | a
   7123605 |   31449600 | 31449600 | a
   7128070 |   31449600 | 31449600 | 3
   7124866 |   31449600 | 31449600 | c
   7124107 |   31449600 | 31449600 | 3
   7124145 |   31449600 | 31449600 | o
   7124141 |   31449600 | 31449600 | ;
   7128072 |   31449600 | 31449600 | ;
   7128071 |   31449600 | 31449600 | o
(10 rows)
access=$ SELECT record_id, start_time, end_time, summary FROM main ORDER BY start_time DESC LIMIT 10;
 record_id | start_time |  end_time  |           summary           
-----------+------------+------------+-----------------------------
   7127485 | 1256774400 | 1256774400 | lawerance wedding
   7125815 | 1256774400 | 1256774400 | lawerance wedding
   7128114 | 1244167200 | 1244170800 | NB senior ALL night party
   7125941 | 1242489600 | 1242493200 | nyquist edding
   7125827 | 1242489600 | 1242493200 | nyquist edding
         0 | 1241049600 | 1241049600 | DISH
   7128073 | 1238079600 | 1238083200 | books IN the woods
   7125623 | 1238079600 | 1238083200 | books IN the woods
   7125697 | 1238025600 | 1238025600 | gunflint books IN the woods
   7126175 | 1238025600 | 1238025600 | gunflint books IN the woods
(10 rows)
 
access=$

Oh, yeah! What I’ve gathered:

  • There are duplicate record_ids (which I’d hoped would have been unique).
  • There are events set to start and end at duplicate times
  • Palm, at some point, duplicated a lot of the other records, except for the record_id.
  • Times are stored in epoch seconds (oddly, Unix epoch seconds, not Windows)
  • Some of the times correlate to 1970? WTF

A working solution:

access=$ SELECT DISTINCT a.start_time, a.end_time, a.summary 
INTO holdkey 
FROM main a
WHERE EXISTS 
    ( SELECT 'x' FROM main b WHERE a.start_time = b.start_time
      AND a.end_time = b.end_time
      AND a.summary = b.summary) 
ORDER BY a.start_time DESC;
SELECT
access=$ SELECT count(*) FROM holdkey;
 count 
-------
  2597
(1 row)
access=$ DELETE FROM main 
USING holdkey 
WHERE main.start_time = holdkey.start_time 
    AND main.end_time = holdkey.end_time 
    AND main.summary = holdkey.summary;
DELETE 5085
 
access=$ SELECT record_id, start_time, end_time, summary FROM main;
 record_id | start_time |  end_time  | summary 
-----------+------------+------------+---------
   5280360 |   31536000 |   31536000 | 
   5280298 |   31536000 |   31536000 | 
   5280429 |   31536000 |   31536000 | 
   7125497 | 1193437800 | 1193437800 | 
   7128378 |   31536000 |   31536000 | 
   7128376 |   31536000 |   31536000 | 
   7128374 |   31536000 |   31536000 | 
   7127620 | 1193437800 | 1193437800 | 
         0 | 1231889400 | 1231893000 | 
(9 rows)
access=$ DROP TABLE main;
DROP TABLE
access=$ SELECT * INTO main FROM holdkey;
SELECT

That works. Of course there’s the quick and dirty way which doesn’t involve munging about with temp tables:

access=$ DELETE FROM main t1
USING main 
WHERE EXISTS 
    (SELECT * FROM main t2 
         WHERE t1.start_time = t2.start_time 
         AND t1.end_time = t2.end_time 
         AND t1.summary = t2.summary 
         AND t1.record_id < t2.record_id);
DELETE 2488
 
access=$ SELECT count(*) FROM test;
 count 
-------
  2606
(1 row)

It gives a slightly different result, but operates under the assumption that Palm’s record_id means something (it may not, for all I know). On the upside, it preserves all the columns in case they’re useful for something (doubtful). I could order by start_time and select into another table, add an index, and do the same thing, but it’s easier the quick and dirty way. There’s probably a trivial way to do this with joins, but I couldn’t think of one, and it leaves 9 records with a record_id of 0..

Here’s the code which it turns out I didn’t need, but it might be useful to somebody:

#Rips data from Palm Desktop.  Uploads it to Google Calendar
#Written with Python 2.5 (though imports should work anyway)
#
#Currently, the Access MDB Palm Datebook uses has been exported to a 
#PostgreSQL server via ODBC, so I'll be connecting to that
#
#There's code in here for getting through Access also, but I haven't tested it.
#Use at your own risk (kinda like Access).
#
#This is mostly due to the Postgres ODBC driver, and the fact that I didn't
#want to bother with quoting all the queries for Postgres to allow spaces
 
try:
    from xml.etree import ElementTree #Python 2.5, probably 2.6/3.0 also
except ImportError:
    from elementtree import ElementTree #Python <2.4
import gdata.calendar.service
import gdata.service
import atom.service
import gdata.calendar
import atom
import getopt
import sys
import string
import time
import psycopg2 #Talk to Postgres
 
class Struct:
    def __init__(self, *args, **kwargs):
        for k,v in kwargs.items():
            setattr(self, k, v)
 
class GCalMigrate:
    def __init__(self):
        self.conn = None
        self.cur = None
        self.calendar = None
        self.records = []
 
    def connect(self):
       try:
           self.conn = psycopg2.connect("dbname='whatever' user='yournamehere' host='server'")
       except:
           print "Can't connect to the database!\n"
           sys.exit()
       self.cur = conn.cursor()
       query()
 
    def accessconnect(self,mdbpath):
        import odbc
        self.conn = odbc.odbc("driver=Microsoft Access Driver (*.mdb);DBQ=%s") % mdbpath
        self.cur = conn.cursor()
        queryaccess()
 
    def queryaccess(self):
        rows = []
        self.cur.execute("SELECT Main.[Start Time], Main.[End Time], Main.[Summary] FROM Main")
        rows = cur.fetchall()
        conn.close()
        parserows(rows)
 
    def query(self):
       rows = []
       try:
           self.cur.execute("SELECT start_time, end_time, summary FROM main")
           rows = cur.fetchall()
       except:
           print "Couldn't query the database.\n"
       conn.close()
       parserows(rows)
 
    def parserows(self, rows):
        for row in rows:
            starttime = time.strftime("%Y-%m-%dT%H:%M:%S.000Z", time.gmtime(row[0]))
            endtime = time.strftime("%Y-%m-%dT%H:%M:%S.000Z", time.gmtime(row[1]))
            title = row[2]
            record = Struct(start_time=starttime, end_time=enddtime, title=title)
            self.records.append(record)
        login()
 
    def login(self, username, password):
        self.calendar = gdata.calendar.service.CalendarService()
        self.calendar.email = username
        self.calendar.password = password
        self.calendar.source = "Palm_Desktop_Migrator"
        self.calendar.ProgrammaticLogin()
        batchsubmit()
 
 
    def batchsubmit(self):
        feed = gdata.calendar.CalendarEventFeed()
 
        for record in records:
            insertme = gdata.calendar.CalendarEventEntry()
            insertme.title = atom.Title(record.title)
            insertme.content = atom.Content("")
            insertme.when.append(gdata.calendar.When(start_time=record.start_time, end_time=record.end_time))
            insertme.batch_id = gdata.BatchId(text='Palm_Migration')
 
            feed.Add_Insert(entry=insertme)
        response = self.calendar.ExecuteBatch(feed, gdata.calendar.service.DEFAULT_BATCH_URL)
        return response
 
if __name__ == "__main__":
    runner = GCalMigrate()
    responses = runner.connect()
    for entry in responses.entry:
        print "Batch ID: %s" % entry.batch_id.text
        print "Status: %s" % entry.batch_status.code
        print "Reason: %s" % entry.batch_status.reason

4 Comments

  • By Missy, January 15, 2009 @ 12:59 am

    Oh! Just noticed that you blogged. I was actually thinking about overhauling Exquisite Turmoil this week.. I’ve got a bunch of themes picked out, I just need to bug Dan to update it for me again (since all of the new themes are for the new WP =/). When she asked what the Dead Sea Scrolls were, you totally should have made her watch Eva!

  • By Dan, January 15, 2009 @ 1:04 am

    Suggestion (note — I don’t know Google Calendar’s format) –

    Foreach loop through the records. Insert individually. When inserting, use SELECT SCOPE_IDENTITY() to get a unique key.

    Also, SELECT DISTINCT is an abomination before God. You do know how that works behind the scenes, right?

  • By Ryan, January 15, 2009 @ 8:41 am

    Isn’t Exquisite Turmoil on Dreamhost? Their panel allows for monkeys to upgrade WP (assuming it was installed through their one-click installer thingy). I don’t really think she’d appreciate Eva :/

    The gdata APIs (including gcal) pretty much cram things into an ATOM feed, then authenticate to Google/publish via curl or HTTP sockets. Since the API (and Python libraries) are available from Google, it makes it fairly simple.

    Yes, SELECT DISTINCT sucks. Well, SELECT DISTINCT sucks on joins and subqueries (I’m not convinced it sucks for single tables), since it tosses the query optimizer and just grabs the entirety of the table, checks it for uniqueness, then sorts it. GROUP BY works just as well in -most- instances.

    The problem with the data from Access wasn’t a lack of unique keys, it was duplicate records that already had unique keys. Nothing else really worked as a constraint (since Heather sometimes had things starting or ending at the same time, or with exactly the same summary), so the DELETE FROM … WHERE EXISTS clause or a SELECT DISTINCT a INTO b seemed to be the only way to do it.

    As an aside, SCOPE_IDENTITY() is T-SQL specific, and doesn’t that return the identity of the last row inserted? CURRVAL(‘id’) is pretty much the PGSQL equivalent (or NEXTVAL(‘id’). Still, though, there were only 20 duplicate keys out of 5100ish records, but nearly half those had duplicate data:

      record_id | start_time |  end_time  |           summary
    -----------+------------+------------+-----------------------------
       7127485 | 1256774400 | 1256774400 | lawerance wedding
       7125815 | 1256774400 | 1256774400 | lawerance wedding

    I don’t know that a unique key would have redeemed it =/ It would have been possible to generate a composite key over a couple of the fields, but composite key on a TEXT field is awful.

  • By Missy, January 15, 2009 @ 11:23 pm

    It is on Dreamhost, but if you’ll recall — the last time Dan updated my blog it crapped out for a while and took extensive work to make it ah.. work.. again.

Other Links to this Post

RSS feed for comments on this post. TrackBack URI

Leave a comment

You must be logged in to post a comment.