Mathieu Fenniak's Weblog

2004/03/12

Distributing a Python Embedding Program

Filed under: programming,python — admin @ 8:39 pm

I got a bit bored at my job yesterday. This happens often, and usually a new random piece of software emerges as a result. This time I ended up hacking on a piece of software that had previously emerged from boredom, which I called the "Difference Machine". I added support for plot datasets generated on the fly through Python code, and in the end learned how to distribute Python within my application.

The purpose of the Difference Machine is very simple: I dislike MS Excel, and wish to never have to use it again. As someone who works in a job involving a lot of engineering, Excel is pretty much standard fare for comparing datasets and stuff of that nature, though. The Difference Machine allows plots generated by the software we develop to be imported, and compared against each other. Additionally, text import is possible from a CSV file or from the clipboard.

differencemachine-plot.png

The Difference Machine comparing two gas supercompressibility (z-factor) correlations.

Adding in the Python embedding was easy. I’ve done similar things before, and the Python C-API is pretty straight-forward (although necessarily verbose). When it came time to put this application up on the network for co-workers to access, the real fun began.

differencemachine-edit.png

The Difference Machine editing Python code with a crude edit control.

My first attempt was simple. Copy the application onto the network, and include python23.dll in the application’s directory. Behold! It actually worked. But, there were a few minor glitches…

  • No access to the standard library. import sys and import math worked, but not import csv or anything else.
  • No standard library meant no traceback library, which I had depended upon to format tracebacks when things went poorly. The software did not react to the missing library well.

I needed the standard library. I tried copying it up into a Lib directory in my application’s directory, and this worked okay… except that it was a huge number of files, and it was still missing things like _sre.pyd, the dynamically linked libraries.

I consulted PEP-237, Import Modules from Zip Archives, to try to create a zip file of the entire standard library that would be much easier to manage than my Lib directory. By naming the zip file python23.zip and putting it in the same directory as my python23.dll, it should have been able to access the library. But it still failed because it couldn’t access any dynamically linked libraries, and zlib is necessary to open the zip file.

By putting all the .pyd files into my application’s directory, I finally got it to work. It loads zlib to read python23.zip, where it retrieves the rest of the standard library. Yippee! In the end, the directory contains the following files:

DifferenceMachine.exe
... bunch of my dlls ...
python23.dll
python23.zip
zlib.pyd
_csv.pyd
_sre.pyd
... bunch more pyd files ...

And all is good and happy. I’m one small step closer to being Excel-less.

2004/02/05

Open Scripting Architecture and iCal

Filed under: programming,python — admin @ 2:22 pm

I was browsing the Daily Python-URL this morning, when I came across a link to a Python module for reading OSX’s iCal files. The module’s web page states:

Apple doesn't document the API for interfacing with iCal, but it does save
it's files as industry standard iCalendar files.  I went looking recently
for a python module to interface with iCal, and couldn't find one.

It’s great to have a good Python module to read standard iCalendar files, but the statement that Apple doesn’t have a documented API for interfacing with iCal is bogus. iCal supports the Open Scripting Architecture (AppleScript’s APIs), the same as every other Apple application does. The Open Scripting Architecture can be used from Python much easier than one can read an iCalendar file, and the information is exposed in a very easy to access format. I wrote an earlier article on how to do similar actions with iTunes, so in my continuing effort to educate the world on how to use OSA from Python, here’s some more!

The first step in using iCal from a Python script through OSA is to generate an iCal module which wraps up the low-level OSA details. The gensuitemodule.py script is designed to do just that. It is located in your plat-mac directory inside your Python installation’s lib directory. On a stock installation of Panther, that makes it in /System/Library/Frameworks/Python.framework/ pant,pant Versions/Current/lib/python2.3/plat-mac/gensuitemodule.py. In order to interface with the system window manager and access OSA, it must be run with the pythonw executable, rather than python. The --output command line parameter can be used to specify a name for the output module. In the end, generating the module is extremely simple:

pythonw {...}/gensuitemodule.py --output iCal /Applications/iCal.app

Now that you have a beautiful iCal module, what do you do with it? The iCal.iCal class is the basic application, which is used for all OSA communications with the iCal application itself. So, to start things off, an instance of that class should be created so we can do some communicating:

import iCal
app = iCal.iCal()

How do we determine what we can do to this application through OSA? The easiest way is to use OSX’s Script Editor, which allows us to open up an application’s OSA dictionary and view the contents. The Script Editor is located in the /Applications/AppleScript folder, and has an option in the File menu to Open Dictionary....

OS X's script editor

Once you have iCal’s dictionary open, inside the iCal suite you can see all the classes and commands that iCal exposes. For example, the application class has elements like calendar, window, and document. Since we’re interested in getting data out, we know that calendar is where we want to start. An element is a collection of objects, implying that each application has a collection of calendars. Based upon what we know of how iCal works, this sounds very reasonable. Here’s how we can iterate over all the calendars that our iCal has:

numCalendars = app.count(app, each=iCal.calendar)
for i in range(1, numCalendars + 1):
    cal = iCal.calendar(i)

The first line sends an OSA command to iCal asking it to count the number of iCal.calendar objects it currently has. Since Python normally uses zero-based counting, and OSA uses one-based counting, we need to iterate from 1 to numCalendars + 1. We create an iCal.calendar object from each of those indicies. Creating the object doesn’t do any communication with iCal, it just creates an object which can dispatch calls to the appropriate instance of the calendar class.

What if we wanted to get some data out of the object, now? The Script Editor shows that the calendar objects support a number of properties like tint, title, and description. Let’s print out the title of every calender:

print "Calendar %s - %s" % (i, app.get(cal.title))

The magic here is in the app.get(cal.title). This tells the application to retrieve the title property of the cal instance, which is already bound to the proper calendar object in iCal. We simply print it out, which outputs something like:

Calendar 1 - Home
Calendar 2 - Work

Let’s put these chunks of code together, and include the ability to print out every event for each calendar:

import iCal

def printAllEvents(app, calendar):
    numEvents = app.count(calendar, each=iCal.event)
    for i in range(1, numEvents + 1):
        event = calendar.event(i)
        print "\tEvent %s - %s" % (i, app.get(event.summary))

def printAllCalendars(app, eventsToo = True):
    numCalendars = app.count(app, each=iCal.calendar)
    for i in range(1, numCalendars + 1):
        calendar = iCal.calendar(i)
        print "Calendar %s - %s" % (i, app.get(calendar.title))
        if eventsToo:
            printAllEvents(app, calendar)

app = iCal.iCal()
printAllCalendars(app)

Yay! We have access to every piece of information iCal stores, and we didn’t need to parse an iCalendar file. Some parts of iCal even begin to integrate with the OSX Address Book, and we can access that information as well.

One last bit of code could be useful, though. What if we wanted to change some of the data in iCal? It’s fairly simple, but not too obvious or well documented anywhere. Here’s a short snippet that will give all your calendars dumb, pointless names:

import iCal
app = iCal.iCal()

numCalendars = app.count(app, each=iCal.calendar)
for i in range(1, numCalendars + 1):
    calendar = iCal.calendar(i)
    app.set(calendar.title, to = "Calendar %s" % i)

2004/01/26

Python Trackback Library

Filed under: programming,python — admin @ 4:24 pm

In addition to the pingback library that I built earlier today, I decided to add Trackback support to my weblog software as well. To that end, I’ve created a new trackback.py library which handles the grunt work of being a Trackback client. It’s not exactly very difficult, anyways. The method of greatest concern for a client:

def ping(trackbackURI, sourceURI, title = None, excerpt = None, siteName = None):
    """Implements a trackback ping.  This method throws exceptions based upon
    the error returned from the trackback server, unless it is the successful
    error code '0'.

    The excerpt parameter should contain plaintext for maximum readability on
    trackback servers.  Providing HTML is not recommended.  This library
    provides a function ``detagHTML`` which will shred HTML text for you."""
    ...

This library does not offer support for being a Trackback server, nor autodiscovery in any way. I do not believe that the autodiscovery method suggested by the Trackback specification is appropriate. Embedding RDF data in an HTML comment? This is the kind of thing a perl author would suggest, since they believe they could could regex their way out of the hole they dug for themselves. I don’t like it.

An alternative implementation of trackback in Python is available called tblib. I chose to write my own library because:

  1. I was bored at work,
  2. I didn’t want to use a GPL-licensed library in my modified-BSD licensed weblog,
  3. I didn’t like the way tblib used regular expressions to grab data from ping responses rather than using an XML parser,
  4. I didn’t want to implement autodiscovery in a flappy regex perl way. (No disrespect meant towards the author of tblib, of course. I realize that there is no simpler or stronger way to do Trackback autodiscovery.)

Python Pingback Library

Filed under: programming,python — admin @ 1:49 pm

I’ve created a simple and easy-to-use Python library to handle the client implementation of the Pingback 1.0 protocol. The pingback.py script handles all aspects of a client, including automatically parsing HTML and reStructredText for hyperlinks which should be checked.

Pingback can now be implemented by other Python projects by using one of the following functions:

def autoPingback(sourceURI, reST = None, HTML = None):
    """Scans the input text, which can be in either reStructuredText or HTML
    format, pings every linked website for auto-discovery-capable pingback
    servers, and does an appropriate pingback."""
    ...

def pingback(sourceURI, targetURI):
    """Attempts to notify the server of targetURI that sourceURI refers to
    it."""
    ...

The documentation at the top of the module describes how one might implement a Pingback server by using the SimpleXMLRPCServer module, as well.

2003/09/22

Scripting AppleScriptable Applications with Python

Filed under: programming,python — admin @ 1:51 pm

Python 2.3 comes with a number of utilities which make it possible to use the same interface for accessing software as AppleScript uses. It isn’t much more difficult than AppleScript, but using it effectively is very hard due to the lack of documentation. I present to you a mini-tutorial on how to script iTunes with Python through the Open Scripting Architecture (OSA), using gensuitemodule.py.

First of all, you must run the program gensuitemodule.py on your target application (in this case, iTunes) to generate a set of class wrappers for that program. gensuitemodule.py is installed in your the plat-mac library directory of your Python installation, and can be used as a module or as a stand-alone application. In order to generate an iTunes package containing the wrapper classes, we run the program like this:

pythonw {...}/lib/python2.3/plat-mac/gensuitemodule.py \
    --output iTunes --resource --creator hook \
    /Applications/iTunes.app/Contents/Resources/iTunes.rsrc

That should automatically create an iTunes directory wherever you run the command which contains the files Internet_suite.py, Standard_Suite.py, __init__.py, and iTunes_Suite.py. If you’re familiar with AppleScript, the three suite files should sound familiar to how classes and functions are seperated in an application’s script dictionary.

From then on, scripting iTunes is fairly easy. However, accessing properties and iterating through containers takes a bit of extra work. I’ll demonstrate a few different functions. Please note that these scripts must be run with the pythonw executable because they need access to the window manager. Running them with python will cause an error.

In order to do most anything, you should create an instance of the iTunes class inside the iTunes module. This instance is responsible for sending messages to iTunes, and receiving the responses. Essentially, all communication with the application will go through this instance. Here are a few simple operations:

import iTunes
app = iTunes.iTunes()
app.start()          # fires up iTunes
app.play()           # whack the play button
app.stop()           # stop playing

# Retrieve the current track property.
trk = app.get(app.current_track)
# repr(trk) == "file_track(5721,...)"

# Retrieve the track's 'artist' property.
print app.get(trk.artist)

Iterating over a collection requires a bit more work. Here’s an example of iterating through every song in iTunes’ library:

import iTunes
app = iTunes.iTunes()
library = iTunes.library_playlist(1)
# (Note, this iteration sucks.  See 'fixed_indexing' later.)
for i in range(1, app.count(library, each = iTunes.track) + 1):
    trk = playlist.track(i)
    print app.get(trk.artist)

Notice that indicies start at 1, not 0. When we create a track object to represent the track from the playlist, we could create an iTunes.track, iTunes.file_track, iTunes.url_track, or iTunes.shared_track. Each of the more specialized track objects has the same properties as iTunes.track, but also has more specialized properties. They’re inherited classes. The easiest way to see the relationship between different classes, as well as the properties and methods they have, is to open up the Script Editor application (/Applications/AppleScript/Script Editor.app), choose the ‘Open Dictionary…’ menu option, and select the iTunes application.

I noticed that the iteration shown above would sometimes return the same tracks for different indicies. A bit of research showed that iTunes had a ‘fixed indexing’ property which is usually set to false. I suggest that before iterating through a playlist, this property be set to true:

old_fixed_indexing = app.get(app.fixed_indexing)
app.set(app.fixed_indexing, to = 1)
try:
    code()
finally:
    app.set(app.fixed_indexing, to = old_fixed_indexing)

I did all this research while trying to write an application to set the track numbers of every ogg file in my iTunes library properly. The application is short, but contains everything I know about scripting iTunes with Python. It requires that pyogg and pyvorbis be installed. One of the more difficult parts was taking the iTunes.file_track‘s location property, which is an instance of Carbon.File.Alias, and getting the path that it points to:

#!/usr/bin/env pythonw2.3

import iTunes
import ogg.vorbis

app = iTunes.iTunes()
playlist = iTunes.library_playlist(1)

# Ensure that the track order will not reset during this script.
old_fixed_indexing = app.get(app.fixed_indexing)
app.set(app.fixed_indexing, to = 1)

try:
    for i in range(1, app.count(playlist, each=iTunes.track) + 1):
        trk = playlist.file_track(i)
        location = app.get(trk.location)
        fsref, wasChanged = location.FSResolveAlias(None)
        path = fsref.FSRefMakePath()
        if path.endswith(".ogg"):
            vc = ogg.vorbis.VorbisFile(path).comment()
            for key, value in vc.items():
                if key.upper() == 'TRACKNUMBER':
                    app.set(trk.track_number, to=int(value))
                    break
finally:
    app.set(app.fixed_indexing, to=old_fixed_indexing)

And so ends my first experience with using Python and OSA. It was not as easy as it should have been, but hey… it was easier than writing an Ogg Vorbis comment decoder in AppleScript.

2003/07/16

Formatting a Simple Function in Python

Filed under: programming,python — admin @ 6:11 pm

I use Python often for writing web based applications, such as this weblog you’re reading right now. I love using Python, but I occasionally have problems with making the code asthetically pleasing. This article presents a short case study into formatting a simple function from my website’s picture gallery.

Original code (artifically wrapped for web viewing):

def getThumbnailLink(self):
    return '<a href="%s"><img class="thumbnail" src="%s" width="%s" height="%s"
/></a><br /><span class="thumbnail-title">%s</span>' % (self.getHandPageURL(),
self.getWebPath(self.getThumbnailPath()), self.thumbnailDimensions[0],
self.thumbnailDimensions[1], self.title)

This code presents a simple problem. It has large line of code. I like to keep lines of code to shorter than 79 columns when possible. Additionally, this code is hard to modify. I approached this because I wanted to add an alt attribute into the img tag, but found that where I put the attribute would actually affect the order of elements in the tuple. That’s really unfortunate.

The first attempt to clean up this code was simply by putting a linebreak in between the string literal and the tuple (also artificially wrapped for web viewing):

def getThumbnailLink(self):
    return '<a href="%s"><img class="thumbnail" src="%s" width="%s" height="%s"
/></a><br /><span class="thumbnail-title">%s</span>' % \
        (self.getHandPageURL(), self.getWebPath(self.getThumbnailPath()),
self.thumbnailDimensions[0], self.thumbnailDimensions[1], self.title)

This helped a bit, but both lines were still greater than 79 characters, and it failed entirely to address the problem of the tuple ordering. I whacked at it with the enter and tab keys for a while, but didn’t get anywhere. Fearing I was low on visionary power, I consulted with Lynx. She threw out a chunk of code with shorter lines:

def getThumbnailLink(self):
    ret1 = '<a href="%s"><img class="thumbnail" src="%s" width="%s"'
    ret2 = ' height="%s" /></a><br /><span class="thumbnail-title">%s</span>'
    t = self.getWebPath(self.getThumbnailPath())
    ret1 = ret1 % (self.getHandPageURL(), t, self.thumbnailDimensions[0])
    ret2 = ret2 % (self.thumbnailDimensions[1], self.title)
    return "%s%s" % (ret1, ret2)

This code was a bit of an improvement. It’s a bit more flexible because there aren’t any tuples of arguments that are quite as long as the original, and the code lines are shorter. The function has grown a bunch of extra operations (but who cares?), and a number of extra lines (oh, my poor hard drive…), but it’s definately an improvement.

I wondered whether there wasn’t a better way to write such a small function. I thought about common templating systems and how they work, such as [A-Z]SP, but I didn’t want to do anything that complex. I like the idea of seperating code and HTML, and this current function and server-page technologies do the opposite of that. I pounded at it for a bit, and came up with this function:

def getThumbnailLink(self):
    htmlText = """
        <a href="%(handURL)s">
            <img class="thumbnail" src="%(thumbnailImageURL)s" alt="[%(title)s]"
            width="%(thumbnailImageWidth)s" height="%(thumbnailImageHeight)s" />
        </a>
        <br />
        <span class="thumbnail-title">%(title)s</span>"""
    return htmlText % \
        {'handURL': self.getHandPageURL(),
        'thumbnailImageURL': self.getWebPath(self.getThumbnailPath()),
        'thumbnailImageWidth': self.thumbnailDimensions[0],
        'thumbnailImageHeight': self.thumbnailDimensions[1],
        'title': self.title}

This function shortens the lines of code, addresses the issue of the tuple ordering, and it also seperates the HTML from the Python code. It would be easy to get the htmlText variable from a module full of HTML strings which define the entire web page. It wouldn’t be as flexible as a full-out template system, but I like it more.

2003/05/06

reStructuredText and AT&T 386 UNIX

Filed under: computers — admin @ 8:28 pm

Since I’ve been trying to get active in Python development, I’ve noticed the proliferation of reStructuredText into the world of Python documentation. I’ve taken an interest in reStructuredText. It seems to be an easy way to write documentation and other text that can be converted into HTML painlessly. It is much easier to write than HTML, and yet it provides just enough flexibility for most activities.

I’ve added reStructuredText as an option within Growlmurrdurr as a way to write weblog entries. This entry is the first to be written in reStructuredText, other than my test entries. Thanks to the wonder of XML-based file storage, my reStructuredText entries can co-exist with HTML entries without any problems. Yay!

I’ve converted MOOzilla’s documentation out of the much more dreadful DocBook format and into reStructuredText. Section 3 of the MOOzilla documentation, Building with MOOzilla, is nearly complete due to my renewed documentation efforts. It looks like another MOOzilla release with documentation might happen sometime this century! Qa’pla!

Alright, off the ReST discussion for a minute. Cecil and I had a trying lunch hour. Our local network administrator referred an associate of his to us for help with a ‘Linux, version unspecified’ box which was supposedly failing to boot at a local law firm. Cecil and I went out as Linux consultants to take a look at the problem. Well, it turns out Linux wasn’t involved at all.

The law firm had an ancient AT&T 386 UNIX box with a terrible green monitor. This box occasionally would poll a device attached to it and get accounting information, and store it locally. A nearby DOS machine would connect to the UNIX machine through a serial connection, using kermit, and download files that had accumulated daily, where they could be transferred to a disk.

It was an ancient, awful setup. And it didn’t work. We set about diagnosing the problem, trying to figure out how the ugly AT&T UNIX worked relative to our modern Linux experience, and so on. Within a half hour, we were pretty sure of one thing: We weren’t getting anything done. It was around when I was about to loose all hope that Cecil discovered the problem. "Shouldn’t that machine be connected to something? I see power, and keyboard, and monitor…" Sure enough, the serial connector into the DOS machine was disconnected, laying on the floor nearby underneath a garbage can. Qa’pla!

The moral of the story is: check the hardware too, not just the software.

2003/04/29

Playing with Weblogs

Filed under: programming — admin @ 6:10 pm

I’ve completed a massive overhaul of my growlmurrdurr weblog system. It is now exactly a thousand times more nicely written, expandable, and just generally clean and happy. No longer do I have to cringe every time I want to make a change to it.

Now that it is more generic and clean and happy, after a bit of testing I’ll be releasing it upon the world on my software page. I’ll even write some documentation and stuff, explaining how to set it up. It’s nice and robust now, such that modifications to the source files shouldn’t be required just to get it up and running. Give or take a bit.

I’ve been pondering the best way to create some kind of configuration system for it. Some kind of more robust administration than it currently has, so that it could be expanded to do more in the future. Multiple author support, for example, will require some way for the administrator to create multiple author ‘accounts’ and so on. Anyways, I’m happy with the new setup. It’s up-to-date in the Subversion repository, if you’re interested in taking a look-see.

2003/04/16

Google Search Term Highlighting

Filed under: programming,python — admin @ 12:33 am

Earlier today I was searching around Google for some technical information, and was given Expert Exchange as a link. Experts Exchange has this nifty little feature, implemented as an Apache module, whereby the HTTP referer is examined for search engine footprints. When it is found that you were linked by a search engine, this Apache module named mod_suru will highlight those search terms inside the resultant web page.

“Cool,” I thought. So I went to look at mod_suru and discovered that it is only available at a relatively high cost. Certainly not a cost that I’m willing to pay for my own personal website.

So I took another path. I re-implemented the same idea as mod_suru through primarily JavaScript. My stomphighlight.js file defines a function, highlightWord, which does the magic of actually highlighting the text. The magic of determining which words to highlight is highly dependant upon the setup that the web site has. In the case of stompstompstomp.com, every footer on every page is served by a Python function. I added the following code to my footer:

import os, re, cgi

# Highlight google-ed for words!
if os.environ.has_key("HTTP_REFERER"):
    words = None

    m = re.search("google.[a-z.]+/search\\?(.*)", os.environ['HTTP_REFERER'])
    if m:
        googleQuery = cgi.parse_qs(m.group(1))
        words = ' '.join(googleQuery['q']).split(' ')
        words = filter(lambda x: x.find(":") == -1, words) # remove words like 'site:blahblah.com'

    if words != None:
        print "&lt;!-- Begin magical search term highlighting. --&gt;"
        print """&lt;script language="JavaScript" type="text/javascript" src="/stomphighlight.js"&gt;&lt;/script&gt;"""
        colors = ("#00eeee", "#eeee00", "#ee00ee", "#ee0000", "#00ee00", "#0000ee")
        print """&lt;script language="JavaScript" type="text/javascript"&gt;&lt;!--"""
        for i in range(len(words)):
            try:
                print "highlightWord(%r, %r, document.documentElement);" % (words[i], colors[i])
            except IndexError:
                pass
        print """//--&gt;&lt;/script&gt;"""
        print "&lt;!-- End magical search term highlighting. --&gt;"

Hopefully, it should be pretty clear how you could take this and use the same idea on your own web site, or modify my code to support more than just Google. Yay!

2003/02/24

Backup Script in Bash/Python

Filed under: programming,python — admin @ 8:34 pm

It’s been a while since I’ve written anything on ye olde technical weblog, but I thought that a code snippet I wrote a couple days ago might be generally useful. Have you ever had to do simple, incremental backups? I had to write a simple utility to backup a MySQL database server to a network drive. I wanted to do a backup every night, but also keep the daily backups for the past 7 days. Additionally, I thought it’d be nice to keep a copy for every month so I can go way back in time, if necessary.

My mind was in a Python frame today, and so I wrote the shell script partially in Python… in fact, shell variables are substituted into Python code, which is quite interesting:

#!/bin/sh

datestr=`date +%Y-%m-%d`
filename='mysql_database'
netwerk='/netwerk'
directory='${netwerk}/BlahBlah Backup'

mount "${netwerk}"

/usr/bin/mysqldump -u backup_user --databases mysql db1 db2 \
        &gt; "${directory}/${filename}-${datestr}.sql"

python &lt;&lt;END_PYTHON
import os
import re
logfiles = [x for x in os.listdir("${directory}"} \
        if re.search("${filename}-[0-9]{4}-[0-9]{2}-[0-9]{2}", x)]
# don't delete logfiles on the first of the month
deletablelogfiles = [x for x in logfiles \
        if not re.search("${filename}-[0-9]{4}-[0-9]{2}-01", x)]
deletablelogfiles.sort()
deletablelogfiles.reverse()
while len(deletablelogfiles) &gt; 7:
    file = deletablelogfiles.pop()
    print "Deleting old log file %s...\n" % (file,)
    os.remove("${directory}/%s" % (file,))
END_PYTHON

umount "${netwerk}"

There’s always something intrinsically cool about code writing code. This script seamlessly integrates shell scripting and Python. Shell scripting is useful for controlling and accessing systems. Python allows a more clear implementation of relatively complex operations.

« Newer PostsOlder Posts »

Powered by WordPress