Mathieu Fenniak's Weblog

2004/05/11

KMDI's Open Source Conference

Filed under: computers, programming — admin @ 2:43 pm

I’ve been attending an Open Source conference at the University of Toronto, organized by the Knowledge Media Design Institute. The conference has been extremely interesting for a number of reasons, which I will iterate here in chronological order.

I’m here mostly because of Catspaw’s involvement in the conference. I’m also here staying with her while I attend the conference. This is a major important part of understanding why many of the odd events here have occured.

  • The day before the conference, I chatted with a volunteer by the name of Leigh. Although originally I was just listening to a conversation between her and Jason Nolan, eventually just the two of us were talking. At some point the topic of the Internet retaining information came up, and we used Catspaw’s laptop to lookup her name on google. One of the relevant google groups search result was from a newsgroup alt.dragon-net, which she claimed was a MUD she used to use.

    Well, the discussion continued on MUDs, when Leigh made the interesting claim that she had once dated two wizards from a single MOO (although she hadn’t known they were on the same MOO until much later). I dug into this and discovered she was referring to two people from MOO Canada, where she had previously had a character and used. She had even previously attended a MOO picnic. She was even a close friend of a good friend of mine, Joanna.

    This relatively random encounter was with a person who I had probably met before many years prior, and knew of myself and Catspaw (only by the name Catspaw), and was a very close friend-of-a-friend relation. Small world.

  • Sunday morning, we were treated to an amazing speech by Eben Moglen. He made a great, moving, active speach supporting Free software. It was one of the highlights of this conference. "We will win."

  • During the break between the Sunday sessions, I was standing next to the refreshment table when a voice from behind asked, "What’s with the Python shirt?" (referring to my wonderful Python t-shirt, of course). I turned around to be faced with David Ascher, one of the conference speakers and the current manager of product development at ActiveState. I had hoped to talk with him during the conference, and here was my chance. Surprised as I was though, I had nothing insightful or useful to say.

  • Sundary afternoon, it got even better. A panel on the law and politics of open source became a debate on legal issues surrounding free software and the GNU general public license. David McGowan spoke amazingly well to poke a hole in the wonderful balloon of emotion that Eben had earlier generated amoung the audience. Barry Sookman followed with a continuing and more detailed evaluation of the GPL, with specific references to Canadian copyright law.

    Eben was allowed to take the stage to comment. He replied with some very breakneck comments that blew away the earlier points of discussion.

    One of the conference attendees, while posing a question, described the situation very well. "You’re all excellent speakers, with an excellent discussion. I believed everything each of you said as you said it, which is amazing since it all conflicted with each other." (Ian D. Allen).

  • I helped Nancy Frishberg transfer her presentation slides off her laptop (which was running the Java Desktop System) and into the KMDI ePresence network server. I was happy to help, but I was very confused by a couple of points of this interaction:

    1. Why does a Sun employee need help doing such a simple seeming task using their own desktop environment? I mean, clearly she didn’t write the entire system, but I would have thought she might have the channel of input necessary to tell the software developers to make such a thing easy. It was … unintuitive, but understandable.
    2. Nancy kept a copy of the terminal session I used and put it into a StarOffice document after the process was done. I’m very curious what she planned to do with that… send it to the developers and suggest she doesn’t want to do such things ever again? Place it into documentation for the desktop system?
  • After David Ascher’s presentation during the Monday morning panel, a question was asked by a dude in a red plaid shirt. His question related to Komodo, in which he refered to it as a tool aiding the development of things like MUDzilla. This was an extremely odd comment. As one of the developers of MUDzilla, I was shocked to hear anyone else refer to our small, unmaintained, and very limited-audience piece of software.

  • After the morning panel on Monday, I got a chance to intercept David Ascher while he was leaving the conference room. I asked him whether ActiveState had any plans to develop an OSX version of their wonderful Komodo IDE. His response was, "if I had a nickel for every time I was asked that question…" to which I replied "You could have more than a nickel for every time." Anyways, his answer was that it was not as simple as it seemed to port, and market research had indicated it was not likely that many full commercial copies of the software (at $300 each) would be sold. It was not going to be profitable due to a large amount of development and a small potential return.

  • Nancy Frishberg’s presentation included a quotation from Catspaw’s recent paper on Five Fundamental Issues with Open Source Software. Cecil noticed this just shortly before her presentation began as she was zipping through her presentation on screen. We informed Catspaw and someone introduced the two of them. I find it pretty ironic that one of the presenters during this conference was quoting the volunteer organizer of the conference.

2004/04/30

Latin Hypercube Sampling

Filed under: programming — admin @ 9:44 pm

Introduction

Latin hypercube sampling (LHS) is a form of stratified sampling that can be applied to multiple variables. The method commonly used to reduce the number or runs necessary for a Monte Carlo simulation to achieve a reasonably accurate random distribution. LHS can be incorporated into an existing Monte Carlo model fairly easily, and work with variables following any analytical probability distribution.

Monte-Carlo simulations provide statistical answers to problems by performing many calculations with randomized variables, and analyzing the trends in the output data. There are many resources available describing Monte-Carlo (history, examples, software).

The concept behind LHS is not overly complex. Variables are sampled using a even sampling method, and then randomly combined sets of those variables are used for one calculation of the target function.

The sampling algorithm ensures that the distribution function is sampled evenly, but still with the same probability trend. Figure 1 and figure 2 demonstrate the difference between a pure random sampling and a stratified sampling of a log-normal distribution. (These figures were generated using different versions of the same software. Differences within the plot, such as the left axis label and the black lines, are due to ongoing development of the software application and are not related to the issue being demonstrated.)

stratified-cfp.png

Figure 1. A cumulative frequency plot of "recovery factor", which was log-normally distributed with a mean of 60% and a standard deviation of 5%. 500 samples were taken using the stratified sampling method described here, which generated a very smooth curve.

random-cfp.png

Figure 2. A cumulative frequency plot of "recovery factor", which was log-normally distributed with a mean of 60% and a standard deviation of 5%. 500 random samples were taken.

Process

Sampling

To perform the stratified sampling, the cumulative probability (100%) is divided into segments, one for each iteration of the Monte Carlo simulation. A probability is randomly picked within each segment using a uniform distribution, and then mapped to the correct representative value in of the variable’s actual distribution.

A simulation with 500 iterations would split the probability into 500 segments, each representing 0.2% of the total distribution. For the first segment, a number would be chosen between 0.0% and 0.2%. For the second segment, a number would be chosen between 0.2% and 0.4%. This number would be used to calculate the actual variable value based upon its distribution.

P(X <= x) = n is solved for x, where n is the random point in the segment. How this is done is different for every distribution, but it’s generally just a matter of reversing the process of the probability function.

Grouping

Once each variable has been sampled using this method, a random grouping of variables is selected for each Monte Carlo calculation. Independant uniform selection is done on each of the variable’s generated values. Each value must only be used once.

Example

Average Joe has been offered free wood scraps from the local lumber yard. They tend to cut pieces up for customers and always have leftovers that nobody is interested in because the ends are damaged, or other contrived reasons. Joe wants to build U shaped picture frames out of these pieces, and he’s interested in knowing how probable it is that he can build a picture frame big enough for some strange purpose.

Joe’s odd U shaped picture frames require three pieces of wood. A base, and two sides. He’s going to make the base out of a chunk of 2-by-4, and the sides out of baseboards. The 2-by-4 (our first variable) comes in pieces 4cm to 40cm long, and the baseboards come in pieces 2cm to 60cm long. All possibilities are equally likely (uniform distribution).

We’re going to run this simulation with 10 iterations. (A common practice for Monte Carlo simulations is to start with about a thousand iterations, and work upwards depending upon the complexity of the problem.) We’re also going to use Python to do our simulation.

Sampling:

from __future__ import division
import random
iterations = 10
segmentSize = 1 / iterations
for i in range(iterations):
    segmentMin = i * segmentSize
    #segmentMax = (i+1) * segmentSize
    point = segmentMin + (random.random() * segmentSize)
    pointValue = (point * (variableMax - variableMin)) + variableMin

For our 2-by-4s, this might provide values like this:

Sample Segment Min Segment Max Segment Point Value
0 0% 10% 2.132 % 4.767
1 10% 20% 13.837 % 8.981
2 20% 30% 24.227 % 12.722
3 30% 40% 33.716 % 16.137
4 40% 50% 47.659 % 21.157
5 50% 60% 53.636 % 23.308
6 60% 70% 69.006 % 28.842
7 70% 80% 73.106 % 30.318
8 80% 90% 88.888 % 35.999
9 90% 100% 90.812 % 36.692

Once Joe has all three runs of the sampling for each variable, he’ll have a set of output like this:

Sample Variable 1 Variable 2 Variable 3
0 4.767 7.483 4.719
1 8.981 7.731 9.417
2 12.722 13.150 11.852
3 16.137 18.347 15.136
4 21.157 21.233 21.786
5 23.308 23.382 22.630
6 28.842 26.008 26.180
7 30.318 32.452 31.977
8 35.999 36.129 34.198
9 36.692 37.482 39.968

As you can see, even though both variable 2 and variable 3 have the same distribution (they’re the baseboards), the stratified sampling does not cause them to have the same numbers. It just causes a more evenly distributed set of numbers, eliminating the possibility of all 10 samples being the exact same number. In Monte Carlo simulation, we want the entire distribution to be used evenly. We usually use a large number of samples to reduce actual randomness, but the latin hypercube sampling permits us to get the ideal randomness without so much of a calculation.

The only step left is to shuffle these variables around to create some calculations. Joe’s software might randomly pick iteration 9 for variable 1, iteration 3 for variable 2, and iteration 5 for variable 3. This results in a set of (35.999, 13.150, 21.786) for the sizes of the 2-by-4s and the left and right baseboard. (Joe’s got a lopsized U! Not entirely unexpected. He should have been the guy buying the lumber first, the cheap bastard.)

Each number should be discarded after it is used, and others selected for the next calculation.

An alternative at this point is to run one case with each of the possible variable sets, creating 1000 calculations for these simple 10 variables. For complex cases involving many variables and many samplings, this can balloon to unreasonable numbers of calculations quickly. This eliminates all of the benefits that Monte Carlo simulation provides, while still providing approximately the same answer.

Conclusion

Latin hypercube sampling is capable of reducing the number of runs necessary to stablize a Monte Carlo simulation by a huge factor. Some simulations may take up to 30% fewer calculations to create a smooth distribution of outputs. The process is quick, simple, and easy to implement. It helps ensure that the Monte Carlo simulation is run over the entire length of the variable distributions, taking even unlikely extremities into account as one would desire.

2004/04/08

The MutableNumber Class Revisited

Filed under: programming, python — admin @ 5:04 pm

Ian Bicking’s comment suggesting I wasn’t far away from getting the MutableNumber class to work was quite correct. I had never considered using a descriptor, which made it difficult to make this work with a metaclass. The following code, though, does work:

class Operator(object):
    def __init__(self, instance, op):
        self.instance = instance
        self.op = op
    def __call__(self, rhs):
        try:
            return self.op(self.instance.number, rhs.number)
        except AttributeError:
            return self.op(self.instance.number, rhs)

class OperatorDescriptor(object):
    def __init__(self, op):
        self.op = op
    def __get__(self, obj, type=None):
        return Operator(obj, self.op)
    def __set__(self, obj, value):
        raise TypeError
    def __delete__(self, obj):
        raise TypeError

class NumberDelegateMetaclass(type):
    def __init__(cls, name, bases, dct):
        for attr in cls.__operatormap__:
            setattr(cls, attr, OperatorDescriptor(getattr(operator, attr)))

class MutableNumber(object):
    number = None

    __metaclass__ = NumberDelegateMetaclass
    __operatormap__ = "__add__", "__div__", "__eq__", "__floordiv__", "__ge__", \
        "__gt__", "__inv__", "__invert__", "__le__", "__lshift__", "__lt__", \
        "__mod__", "__mul__", "__ne__", "__neg__", "__pos__", "__pow__", \
        "__rshift__", "__sub__", "__truediv__"
    def __init__(self, value=None):
        self.number = value

It’s a miracle.

Overriding Operators in New-Style Classes

Filed under: programming, python — admin @ 3:38 pm

Occasionally, even Python violates the principal of ‘least surprise’ – that is, software should do what you expect it to do. Overriding mathematics operators in new-style classes led me to find one such violation, and investigate the reason behind it.

I setup myself a make-work project a few days ago when I probably should have done a dozen other things. After reading a posting to python-dev entitled Python is faster than C, I was motivated to create a pure-python implementation of a matrix class. I was interested in building it primarily as a refresher of matrix mathematics, and also to experiment with psyco’s x86 optimization capabilities.

I ran into a quirk of new-style classes. I envisioned the Matrix class having two hidden data fields, one to hold a tuple of Vector objects representing each row, and a second one to hold a tuple of Vector objects representing each column. This would allow for easy access to the data based upon rows or columns, whichever was more convenient, with no speed penalty. Each tuple of Vectors ought to point to the same numbers, though. How does one do this with Python?

I’d like self.__cdata[0][0] and self.__rdata[0][0] to always be the same number, even if I only change one. I could manage this with accessor methods that make sure they’re the same, but I decided to wrap each number in an object called a MutableNumber. The class was setup like so:

class MutableNumber(object):
    def __init__(self, value = None):
        self.number = value

Every mathematical operator should have then been overridden like this:

...
    def __mul__(self, rhs):
        try:
            return MutableNumber(self.number * rhs.number)
        except AttributeError:
            return MutableNumber(self.number * rhs)
...

What a pain to type and maintain, though. What if I wanted to add another possibility in the future, or change the operator to not return a new MutableNumber object wrapping the number? There are twenty-two operators I wanted to override (eg. __mul__, __eq__, __div__, __floordiv__…), and I didn’t want to have to type them, even once.

I originally setup a meta-class to delegate every access to a property in the class’s __delegateattributes__ member to self.number. This had some issues:

>>> MutableNumber(2) * 3
6
>>> MutableNumber(2) * MutableNumber(3)
Traceback (most recent call last):
  ...
TypeError: unsupported operand types for *: 'int' and 'MutableNumber'

I had to go a more complex route, rather than just pure delegation. I eventually replaced all the operators with custom instances of an Operator class, which held an operator and a reference to the instance of the class:

class MutableNumber(object):
    __operatormap__ = "__add__", "__div__", "__eq__", "__floordiv__", "__ge__", \
        "__gt__", "__inv__", "__invert__", "__le__", "__lshift__", "__lt__", \
        "__mod__", "__mul__", "__ne__", "__neg__", "__pos__", "__pow__", \
        "__rshift__", "__sub__", "__truediv__"
    class Operator(object):
        def __init__(self, instance, op):
            self.instance = instance
            self.op = op
        def __call__(self, rhs):
            op = getattr(operator, self.op)
            try:
                return op(self.instance.number, rhs.number)
            except AttributeError:
                return op(self.instance.number, rhs)

    def __init__(self, value=None):
        for op in self.__operatormap__:
            setattr(self, op, MutableNumber.Operator(self, op))
        self.number = value

So, we finally reach the part where it gets interesting. What happens? Does this work properly? The code seems good, but it fails to function:

>>> MutableNumber(3) * 2
Traceback (most recent call last):
  ...
TypeError: unsupported operand type(s) for *: 'MutableNumber' and 'int'

Unsupported operand for types? It looks like it’s totally ignoring the fact that the MutableNumber instance has a __mul__ property set. Hm… stranger still:

>>> MutableNumber(3).__mul__(2)
6
>>> MutableNumber(3).__class__.__mul__(2)
Traceback (most recent call last):
  ...
AttributeError: type object 'MutableNumber' has no attribute '__mul__'

Well, it seems that A * B does not call A.__mul__(B). It searches for A.__class__.__mul__, but not A.__mul__. It works fine with an old-style class, but not a new-style class.

It turns out this behaviour has been discovered before. In fact, Jan Decaluwe was trying to do the same thing I am, create an instance object that acts like a mutable number. Alex Martelli wrote:

The idea is that operators should rely on special
methods defined by the TYPE (class) of the object they’re working
on, NOT on special methods defined just by the OBJECT itself and
not by its class. Doing otherwise (as old-style classes did, and
still do for compatibility) is untenable in the general case, for
example whenever you consider a class and its metaclass (given that
the metaclass IS the class object’s type, no more and no less).

Well, too bad. I guess maybe I’ll just backtrack all the way to using accessor methods to make sure the column and row data is both kept constant… or just store one of them.

2004/03/30

Embedding Python Tips

Filed under: moo, programming, python — admin @ 2:46 am

Python is a beautiful programming language. One of it’s most wonderful features is a very clean and simple C API that allows Python to be extended with dynamically loadable C modules. That same C API also allows Python to be embedded in other pieces of software. This means that any program can allow the user to enter Python code interactively (or otherwise) to affect the program in whatever way they wish. This is a powerful capability, but using occasionally requires a few tricks to accomplish the embedder’s goals.

Today’s embedding exercise was allowing a MOO server to execute arbitrary Python code:

;py_runfunction({"import math", "return math.sqrt(2*2*2)"})
> 2.8284271247461903

Of course, a MOO server can already do square roots… that wasn’t the point. There was no point. Anyways, here are a few ideas that might help other people embed Python in a useful way.

Evaluating Statements

One of the first things most people try to do is evaluate an arbitrary statement and get its return value. This is not quite as easy as it sounds. Although Python’s eval builtin does this, it may be more limited than the embedding programmer desires. eval will only permit an expression to be evaluated, not a statement:

>>> eval("x = 2")
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
  File "<string>", line 1
    x = 2
      ^
SyntaxError: invalid syntax

I suggest that if you want the user to be able to evaluate an arbitrary block of code, wrap an artificial function around it and call the function itself:

def f():
    import math
    class Cylinder:
        def _calcVolume(self):
            return math.pi * \
                self.radius**2 * \
                self.height
        volume = property(_calcVolume)
    c = Cylinder()
    c.radius = 12.2
    c.height = 16.12
    return c.volume

This allows the user to input much more complex functions, like the above example which uses a class and an import statement. All that needed to be artificially added was the ‘def f():’ and an arbitrary but constant amount of whitespace in front of each line of code.

Compiling Code without a Module

So you’ve gotten some code from a user, and you want to compile it. Maybe you’re creating a function to wrap around the user’s code. Where does that function belong? Where do you evaluate your code?

The first instinct I had was to use PyImport_AddModule to get the __main__ module and start importing functions into its module dictionary. I had a block of code similar to this (error checking omitted):

Py_Initialize();
PyObject* module = PyImport_AddModule("__main__");
PyObject* moduleDict = PyModule_GetDict(module);
PyObject* compileRetval = PyRun_String(code, Py_file_input,
    moduleDict, moduleDict);
...
Py_Finalize();

This then allowed me to call functions on the module object and get some code back. The only real downside was the initialize and finalize around my code. I didn’t want code from one compile to mess with another, and since I was using the __main__ module, this caused problems. Eventually I decided to use random strings as the names for my modules so that I could use them all independently, but that sure was ugly.

The solution I stumbled upon was caused by my accidently deleting some lines of code. I eventually realized that I didn’t need the module object at all. I could create a new, empty dictionary, and compile the code ‘into’ that:

PyObject* dict = PyDict_New();
PyObject* compileRetval = PyRun_String(code, Py_file_input,
    dict, dict);

Everything continued to work as before, except now I had to PyDict_GetItem out of dict and use PyObject_CallObject rather than the PyObject_CallMethod that I could have used before. But nothing crashed, the world continued to run, and I no longer needed to initialize and finalize around my evaluation. Yay!

Settings __builtins__

There was one minor problem. Some functionality like builtin functions and classes (like Exception) was missing. Oops:

// Check for __builtins__...
if (PyDict_GetItemString(dict, "__builtins__") == NULL)
{
    // Hm... no __builtins__ eh?
    PyObject* builtinMod = PyImport_ImportModule("__builtin__");
    if (builtinMod == NULL ||
        PyDict_SetItemString(dict, "__builtins__", builtinMod) != 0)
    {
        Py_DECREF(dict);
        Py_XDECREF(dict);
        // error handling
        return;
    }
    Py_DECREF(builtinMod);
}

Hey, that fixed that right up.

I had this problem when I was using random names for modules, as well. It seems PyImport_AddModule does not set __builtins__ on a new module — but it is set up on __main__ always.

Getting Tracebacks using the traceback Module

What happened when things went wrong? Well, at first, a lot of crashing. And things were going wrong a lot, especially when I was trying to use modules that didn’t exist in the system. Heh heh.

Thankfully, Python will setup tracebacks that are useful even when you’re using the C API and screwing things up from the inside. How on earth do you get at those tracebacks, though? You can get a lot of information from the PyErr_* class of functions, but not a properly formatted Python traceback to display to the user. Eventually, I ended up using the traceback module itself to display an error:

char* getPythonTraceback()
{
    // Python equivilant:
    // import traceback, sys
    // return "".join(traceback.format_exception(sys.exc_type,
    //    sys.exc_value, sys.exc_traceback))

    PyObject *type, *value, *traceback;
    PyObject *tracebackModule;
    char *chrRetval;

    PyErr_Fetch(&type, &value, &traceback);

    tracebackModule = PyImport_ImportModule("traceback");
    if (tracebackModule != NULL)
    {
        PyObject *tbList, *emptyString, *strRetval;

        tbList = PyObject_CallMethod(
            tracebackModule,
            "format_exception",
            "OOO",
            type,
            value == NULL ? Py_None : value,
            traceback == NULL ? Py_None : traceback);

        emptyString = PyString_FromString("");
        strRetval = PyObject_CallMethod(emptyString, "join",
            "O", tbList);

        chrRetval = strdup(PyString_AsString(strRetval));

        Py_DECREF(tbList);
        Py_DECREF(emptyString);
        Py_DECREF(strRetval);
        Py_DECREF(tracebackModule);
    }
    else
    {
        chrRetval = strdup("Unable to import traceback module.");
    }

    Py_DECREF(type);
    Py_XDECREF(value);
    Py_XDECREF(traceback);

    return chrRetval;
}

Of course, when one can’t import the traceback module, one can’t generate a traceback explaining why not. :-)

2004/03/16

The Worst C++ Code Ever

Filed under: c++, programming — admin @ 3:51 pm

Yesterday morning, I overheard a couple co-workers and Cecil talking about problems they were having compiling their software this morning. I popped over to Cecil’s desk to take a look, since I like seeing messed up problems. It took about three seconds flat to diagnose the problem:

/wsimages/m_Symbol.png

It looks like someone has #define’d m_Symbol.

Someone had defined m_Symbol as a preprocessor macro. Why on earth would that be? A bit more digging determined that a co-worker had decided to move a bunch of variables in a library class into a small class, to consolidate them. Unfortunately, this class was written with all of these member variables as public, so he decided that writing a bunch of preprocessor macros would be the only way to allow everyone to keep working despite his changes:

#define m_Symbol    m_Sym.m_Type
#define m_SymSize   m_Sym.m_Size
...

He tested it by building one of our larger applications, and then assumed if it worked there it would work everywhere. This was bad code.

The correct possible solutions involved more work:

  • Re-write everyone’s code to use the new classes.
  • Use accessor methods and keep data private so that it can be rearranged in the future.

Of course, both of these would involve a lot of code re-writing. An automated semi-intelligent find and replace could take care of at least 80% of the cases, but some hand writing would need to be done. Since it involved more work, he didn’t do it. He plans to someday fix this when he has a spare programmer lying around.

BAH.

Coding everything in Python would be the best alternative.

2004/03/12

Distributing a Python Embedding Program

Filed under: programming, python — admin @ 8:39 pm

I got a bit bored at my job yesterday. This happens often, and usually a new random piece of software emerges as a result. This time I ended up hacking on a piece of software that had previously emerged from boredom, which I called the "Difference Machine". I added support for plot datasets generated on the fly through Python code, and in the end learned how to distribute Python within my application.

The purpose of the Difference Machine is very simple: I dislike MS Excel, and wish to never have to use it again. As someone who works in a job involving a lot of engineering, Excel is pretty much standard fare for comparing datasets and stuff of that nature, though. The Difference Machine allows plots generated by the software we develop to be imported, and compared against each other. Additionally, text import is possible from a CSV file or from the clipboard.

differencemachine-plot.png

The Difference Machine comparing two gas supercompressibility (z-factor) correlations.

Adding in the Python embedding was easy. I’ve done similar things before, and the Python C-API is pretty straight-forward (although necessarily verbose). When it came time to put this application up on the network for co-workers to access, the real fun began.

differencemachine-edit.png

The Difference Machine editing Python code with a crude edit control.

My first attempt was simple. Copy the application onto the network, and include python23.dll in the application’s directory. Behold! It actually worked. But, there were a few minor glitches…

  • No access to the standard library. import sys and import math worked, but not import csv or anything else.
  • No standard library meant no traceback library, which I had depended upon to format tracebacks when things went poorly. The software did not react to the missing library well.

I needed the standard library. I tried copying it up into a Lib directory in my application’s directory, and this worked okay… except that it was a huge number of files, and it was still missing things like _sre.pyd, the dynamically linked libraries.

I consulted PEP-237, Import Modules from Zip Archives, to try to create a zip file of the entire standard library that would be much easier to manage than my Lib directory. By naming the zip file python23.zip and putting it in the same directory as my python23.dll, it should have been able to access the library. But it still failed because it couldn’t access any dynamically linked libraries, and zlib is necessary to open the zip file.

By putting all the .pyd files into my application’s directory, I finally got it to work. It loads zlib to read python23.zip, where it retrieves the rest of the standard library. Yippee! In the end, the directory contains the following files:

DifferenceMachine.exe
... bunch of my dlls ...
python23.dll
python23.zip
zlib.pyd
_csv.pyd
_sre.pyd
... bunch more pyd files ...

And all is good and happy. I’m one small step closer to being Excel-less.

2004/02/05

Open Scripting Architecture and iCal

Filed under: programming, python — admin @ 2:22 pm

I was browsing the Daily Python-URL this morning, when I came across a link to a Python module for reading OSX’s iCal files. The module’s web page states:

Apple doesn't document the API for interfacing with iCal, but it does save
it's files as industry standard iCalendar files.  I went looking recently
for a python module to interface with iCal, and couldn't find one.

It’s great to have a good Python module to read standard iCalendar files, but the statement that Apple doesn’t have a documented API for interfacing with iCal is bogus. iCal supports the Open Scripting Architecture (AppleScript’s APIs), the same as every other Apple application does. The Open Scripting Architecture can be used from Python much easier than one can read an iCalendar file, and the information is exposed in a very easy to access format. I wrote an earlier article on how to do similar actions with iTunes, so in my continuing effort to educate the world on how to use OSA from Python, here’s some more!

The first step in using iCal from a Python script through OSA is to generate an iCal module which wraps up the low-level OSA details. The gensuitemodule.py script is designed to do just that. It is located in your plat-mac directory inside your Python installation’s lib directory. On a stock installation of Panther, that makes it in /System/Library/Frameworks/Python.framework/ pant,pant Versions/Current/lib/python2.3/plat-mac/gensuitemodule.py. In order to interface with the system window manager and access OSA, it must be run with the pythonw executable, rather than python. The --output command line parameter can be used to specify a name for the output module. In the end, generating the module is extremely simple:

pythonw {...}/gensuitemodule.py --output iCal /Applications/iCal.app

Now that you have a beautiful iCal module, what do you do with it? The iCal.iCal class is the basic application, which is used for all OSA communications with the iCal application itself. So, to start things off, an instance of that class should be created so we can do some communicating:

import iCal
app = iCal.iCal()

How do we determine what we can do to this application through OSA? The easiest way is to use OSX’s Script Editor, which allows us to open up an application’s OSA dictionary and view the contents. The Script Editor is located in the /Applications/AppleScript folder, and has an option in the File menu to Open Dictionary....

OS X's script editor

Once you have iCal’s dictionary open, inside the iCal suite you can see all the classes and commands that iCal exposes. For example, the application class has elements like calendar, window, and document. Since we’re interested in getting data out, we know that calendar is where we want to start. An element is a collection of objects, implying that each application has a collection of calendars. Based upon what we know of how iCal works, this sounds very reasonable. Here’s how we can iterate over all the calendars that our iCal has:

numCalendars = app.count(app, each=iCal.calendar)
for i in range(1, numCalendars + 1):
    cal = iCal.calendar(i)

The first line sends an OSA command to iCal asking it to count the number of iCal.calendar objects it currently has. Since Python normally uses zero-based counting, and OSA uses one-based counting, we need to iterate from 1 to numCalendars + 1. We create an iCal.calendar object from each of those indicies. Creating the object doesn’t do any communication with iCal, it just creates an object which can dispatch calls to the appropriate instance of the calendar class.

What if we wanted to get some data out of the object, now? The Script Editor shows that the calendar objects support a number of properties like tint, title, and description. Let’s print out the title of every calender:

print &quot;Calendar %s - %s&quot; % (i, app.get(cal.title))

The magic here is in the app.get(cal.title). This tells the application to retrieve the title property of the cal instance, which is already bound to the proper calendar object in iCal. We simply print it out, which outputs something like:

Calendar 1 - Home
Calendar 2 - Work

Let’s put these chunks of code together, and include the ability to print out every event for each calendar:

import iCal

def printAllEvents(app, calendar):
    numEvents = app.count(calendar, each=iCal.event)
    for i in range(1, numEvents + 1):
        event = calendar.event(i)
        print "\tEvent %s - %s" % (i, app.get(event.summary))

def printAllCalendars(app, eventsToo = True):
    numCalendars = app.count(app, each=iCal.calendar)
    for i in range(1, numCalendars + 1):
        calendar = iCal.calendar(i)
        print "Calendar %s - %s" % (i, app.get(calendar.title))
        if eventsToo:
            printAllEvents(app, calendar)

app = iCal.iCal()
printAllCalendars(app)

Yay! We have access to every piece of information iCal stores, and we didn’t need to parse an iCalendar file. Some parts of iCal even begin to integrate with the OSX Address Book, and we can access that information as well.

One last bit of code could be useful, though. What if we wanted to change some of the data in iCal? It’s fairly simple, but not too obvious or well documented anywhere. Here’s a short snippet that will give all your calendars dumb, pointless names:

import iCal
app = iCal.iCal()

numCalendars = app.count(app, each=iCal.calendar)
for i in range(1, numCalendars + 1):
    calendar = iCal.calendar(i)
    app.set(calendar.title, to = "Calendar %s" % i)

2004/01/26

Python Trackback Library

Filed under: programming, python — admin @ 4:24 pm

In addition to the pingback library that I built earlier today, I decided to add Trackback support to my weblog software as well. To that end, I’ve created a new trackback.py library which handles the grunt work of being a Trackback client. It’s not exactly very difficult, anyways. The method of greatest concern for a client:

def ping(trackbackURI, sourceURI, title = None, excerpt = None, siteName = None):
    """Implements a trackback ping.  This method throws exceptions based upon
    the error returned from the trackback server, unless it is the successful
    error code '0'.

    The excerpt parameter should contain plaintext for maximum readability on
    trackback servers.  Providing HTML is not recommended.  This library
    provides a function ``detagHTML`` which will shred HTML text for you."""
    ...

This library does not offer support for being a Trackback server, nor autodiscovery in any way. I do not believe that the autodiscovery method suggested by the Trackback specification is appropriate. Embedding RDF data in an HTML comment? This is the kind of thing a perl author would suggest, since they believe they could could regex their way out of the hole they dug for themselves. I don’t like it.

An alternative implementation of trackback in Python is available called tblib. I chose to write my own library because:

  1. I was bored at work,
  2. I didn’t want to use a GPL-licensed library in my modified-BSD licensed weblog,
  3. I didn’t like the way tblib used regular expressions to grab data from ping responses rather than using an XML parser,
  4. I didn’t want to implement autodiscovery in a flappy regex perl way. (No disrespect meant towards the author of tblib, of course. I realize that there is no simpler or stronger way to do Trackback autodiscovery.)

Python Pingback Library

Filed under: programming, python — admin @ 1:49 pm

I’ve created a simple and easy-to-use Python library to handle the client implementation of the Pingback 1.0 protocol. The pingback.py script handles all aspects of a client, including automatically parsing HTML and reStructredText for hyperlinks which should be checked.

Pingback can now be implemented by other Python projects by using one of the following functions:

def autoPingback(sourceURI, reST = None, HTML = None):
    """Scans the input text, which can be in either reStructuredText or HTML
    format, pings every linked website for auto-discovery-capable pingback
    servers, and does an appropriate pingback."""
    ...

def pingback(sourceURI, targetURI):
    """Attempts to notify the server of targetURI that sourceURI refers to
    it."""
    ...

The documentation at the top of the module describes how one might implement a Pingback server by using the SimpleXMLRPCServer module, as well.

« Newer PostsOlder Posts »

Powered by WordPress