Mathieu Fenniak's Weblog

2004/08/04

Vancouver Python Workshop 2004

Filed under: programming,python — admin @ 2:51 pm

I just returned to Calgary from the Vancouver Python Workshop. Cecil and I drove out there last week (Thursday evening/Friday morning). Catsy flew out from Toronto. We all stayed at the Hyatt Regency in downtown Vancouver, where much fun was had watching the Back to The Future trilogy, and making up entertaining stories about elves.

The workshop was reasonably well organized, had a nice venue, and was well attended. The talks varied in quality from okay to excellent – all the speakers were well informed folk with interesting topics, but very few programmers are excellent public speakers. That being said, I have a few specific suggestions for the workshop itself:

  • Some talks could have benefited from the presence of a strict moderator. On a scale from polite to rude, the following things happened:

    • A speaker deferring a question to Guido. This is appropriate, as it is the prerogative of the speaker to defer during his own presentation, and Guido may be an excellent resource for an answer.
    • Guido interjecting a comment like "It’s not happening." while an attendee asks a question (referencing a PEP) is more questionable, but ultimately "polite enough" in the company of geeks.
    • Getting into an audience debate about wxWidgets during a PyObjC talk is not very polite.
    • Nitpicking the usefulness of a contrived optimization example is rude, pointless, and time consuming.
  • An entire track on the second day ended up being dedicated to Plone. I think Plone is cool, but these talks unfortunately ended up being about Plone from the point of view of a user rather than a Python developer. I believe that the workshop did not have quite as many speaker submissions as they wanted, so these Plone talks weighed in based upon how many people were willing to do them. They were good, but badly targeted and abundant.

    I would suggest that the conference organizers be willing to say "no" to speakers if they have an abundance of talks on one such subject. I realize getting submissions for talks must be a difficult process, and the fear of having not enough content at a conference must be pretty big as an organizer.

I enjoyed the workshop greatly. I should have flown out rather than drive 12 hours, but it seemed like a good idea at the time. I picked up a lot of good ideas, started in writing some interesting PyObjC code (a game of interracial life, oooh), enjoyed using SubEthaEdit in public for the first time, and ate a lot of good food. Catsy seems to have had her own kind of fun, too.

2004/07/12

An Introduction to Coalbed Methane

Filed under: petroleum — admin @ 5:46 pm

a.k.a. An Introduction to my Day Job

The process by which plant material is converted to coal generates large quantites of methane gas, which is often trapped within the coal. The presence of this gas is very well known due to underground coal mining, where it presents a serious safety risk. This is coalbed methane, often refered to as CBM.

Coalbed methane differs from a typical sandstone natural gas reservoir in the following ways:

  • Methane is stored within the matrix (the coal) by a process called adsorption. The methane is in a nearly liquid state, lining the inside of the coal pores.
  • The porosity of the matrix typically refers to the size of the cleats (fractures running throughout the coal), not the porosity of the coal rock itself. Porosity is typically very low compared to a traditional reservoir (less than 3%).
  • The gas is often, but not always, sealed in the coal by 100% water saturation of the cleats. The reservoir must be dewatered before the gas can begin to desorp from the coal.

Estimated CBM reserves vary, but are huge. A 1997 estimate from the U.S. Geological Survey predicts more than 700 trillion cubic feet of methane within the US, and at least 100 trillion cubic feet of it being economically viable to produce. At today’s natural gas prices ($6.05 USD / MMBtu), 700 TCF represents $4.37 trillion USD. (1031 Btu / scf)

High natural gas prices are making CBM economically viable where it previously may not have been. Coalbed methane wells produce at low gas rates (typically maxing out around 300 Mcf/d), and can have large inital costs.

The production profiles of CBM wells are typically characterized by a "negative decline" in the gas rate as water is pumped away and gas begins to desorp and flow. A dry CBM well does not look very different from a standard well, except that the gas rates are lower and decline at a much slower rate.

/wsimages/cbm-forecast.png

A typical ten-year CBM gas rate forecast, showing a negative decline for the first couple years of production.

The methane desorption process follows a curve (of gas content vs. reservoir pressure) called a Langmuir isotherm. The isotherm can be analytically described by a maximum gas content (at infinite pressure), and the pressure at which half that gas exists within the coal. These parameters (called the Langmuir volume and Langmuir pressure, respectively) are properties of the coal, and vary widely. A coal in Alabama and a coal in Colorado may have radically different Langmuir parameters, despite similar other coal properties.

/wsimages/cbm-isotherm.png

A typical CBM isotherm, characteristic of a San Juan basin coal.

The increasing gas rates seen in a negative decline are caused by increasing relative permeability as the water saturation around the wellbore decreases. As there is less water in the coal cleats, the gas is able to flow more and more into the wellbore to be produced.

/wsimages/cbm-relperm.png

A set of relative permeability curves. As water saturation decreases, more gas and less water is produced from the coal.

As production occurs from a coal reservoir, the changes in pressure are believed to cause changes in the porosity and permeability of the coal. This is commonly known as matrix shrinkage/swelling. As the gas is desorbed, the pressure exerted by the gas inside the pores decreases, causing them to shrink in size and restricting gas flow through the coal. As the pores shrink, the overall matrix shrinks as well, which may eventually increase the space the gas can travel through (the cleats), increasing gas flow.

/wsimages/cbm-mtxshrink.png

An example of CBM matrix shrinkage. Three matrix shrinkage correlations are shown: Palmer and Mansoori, Shi and Durucan, and Seidle.

Thank you for your interest in this article!

I was bored one day, and wrote this up as a Wikipedia entry for coalbed methane… excluding the self-indulgent advertising. In the future, I may write a bit about building a mathematical model of a CBM reservoir. It’s an interesting and surprisingly simple process.

2004/06/21

iSight Security Camera

Filed under: computers — admin @ 3:14 pm

iSight Security Camera

I live next door to my father and Cheryl, who have been caring for my cat Ophelia for years while I’ve lived in Ottawa and in various no-pets-allowed apartments. This weekend, I’ve been feeding Ophelia since they’ve been out of town. They have some lovely potted flowers on their front steps, and even a small fern tree in a pot.

Apparently some dorks decided to remove the plants from their pots this weekend. I came over Saturday morning, while Cheryl was still there, to be find dirt thrown all over the place. Cheryl had already replaced the plants in their pots, but we found it unthinkable that someone would stop by just to pull plants out of pots.

Sunday morning, on my trip over to visit the cat, I found that all the plants had been de-potted again. After replacing them as best I could, I setup a security camera to watch the front steps:

I discovered a couple interesting things:

  • EvoCam has a very nifty motion sensor feature, which works very well.
  • The normal stands that come with the iSight are not well suited for being placed on a flat surface. They are geared purely towards being placed on a computer monitor.
  • When using a thick firewire cable, rather than Apple’s recommended thin firewire cable, the cut off top of a pop bottle is an effective stand for an iSight camera.
  • An empty casing for the Simpsons’ DVDs works well to boost the camera up a bit, as well as providing a nice escape route for the firewire cable.

Fortunately, the plants were undisturbed overnight. This is good, since I’m not entirely sure how well the camera would have worked in the darkness. I left the porch lights on to give some light, and the iSight is generally pretty good in the dark, but I never tested it in this security camera configuration.

2004/05/20

Plotting in Excel through Python/COM

Filed under: programming,python — admin @ 1:41 pm

For the past couple weeks, I’ve been thinking about mathematical model development. There are lots of great tools out there to help with such tasks, like Mathcad and Mathematica. But if you’re doing software development, once you’ve built and tested a model, what you really want is code. Your Mathcad files are great for documentation, testing, and development of your model, but they can’t be embedded in your Java or C++ application.

Additionally, it’s very easy to use functions from those pieces of software which can’t be easily replicated in your software. They have very optimized methods for root solving, matrix math, symbolic derivative calculations, and other such tasks that you can’t reproduce without years worth of effort. So, when you’re developing a model that’s going to be used in software, what’s the easiest way to do it?

Python, of course. Python code tends to be very legible and terse, while still being well suited for mathematical programming. In fact, the syntax and control structures even look pretty similar between Mathcad and Python:

/wsimages/python-riddler.png

Part of a root solver implemented in Python.

/wsimages/mathcad-riddler.png

Part of a root solver implemented in Mathcad.

So, it seems natural to me to use Python to develop models. My code can easily be translated into other programming languages after the fact, and it tends to have a more proper programming structure if it’s not translated from Mathcad. Plus, translating a big chunk code from Mathcad can be a real pain in the ass to do manually – there’s just too much to miss, and it can be very time consuming to confirm it’s been done correctly.

Python is not quite a silver bullet in this case, though. Mathcad has a lot of tools for visualization which are very useful when developing a new model. Python’s wonderful abillity to interoperate makes it fairly easy to leverage plotting capabilities of an application like Microsoft Excel though. So, without any further leadup, here’s a quick process which will get you up and running with an Excel plotting Python program:

  1. Get and install Python, and the PythonWin extensions (or install ActivePython, which contains all the necessary tools and more).

  2. Run the PythonWin IDE program, and generate static COM wrappers for Microsoft Excel. This is as easy as selecting the COM Makepy Utility option from the Tools menu of PythonWin, then selecting the most recent version of the Microsoft Excel n.m Object Library available:

    /wsimages/pythonwin-comwrapper.png

    Selecting COM Makepy utility from PythonWin’s menu.

    The static COM wrappers must be used in order to access the Excel constants (of which there are hundreds).

  3. Import the COM Dispatch function and the constants namespace into your application:

    
    from win32com.client import Dispatch, constants
    
  4. At this point, all that’s left is to go to town on automating Excel. There is a lot of documentation that comes along with Excel, as well as a big chunk of MSDN content that will also help. Let’s dive in, and I’ll just throw an XY scatter plot at you:

    def plot(x, y, xAxisLog=False, yAxisLog=False):
        # acquire application object, which may start application
        application = Dispatch("Excel.Application")
    
        # create new file ('Workbook' in Excel-vocabulary)
        workbook = application.Workbooks.Add()
    
        # store default worksheet object so we can delete it later
        defaultWorksheet = workbook.Worksheets(1)
    
        # build new chart (on seperate page in workbook)
        chart = workbook.Charts.Add()
        chart.ChartType = constants.xlXYScatter
        chart.Name = "Plot"
    
        # create data worksheet
        worksheet = workbook.Worksheets.Add()
        worksheet.Name = "Plot data"
    
        # install data
        xColumn = addDataColumn(worksheet, 0, x)
        yColumn = addDataColumn(worksheet, 1, y)
    
        # create series for chart
        series = chart.SeriesCollection().NewSeries()
        series.XValues = xColumn
        series.Values = yColumn
        series.Name = "Data"
        series.MarkerSize = 3
    
        # setup axises
        xAxis = chart.Axes()[0]
        yAxis = chart.Axes()[1]
        xAxis.HasMajorGridlines = True
        yAxis.HasMajorGridlines = True
        if xAxisLog:
            xAxis.ScaleType = constants.xlLogarithmic
        if yAxisLog:
            yAxis.ScaleType = constants.xlLogarithmic
    
        # remove default worksheet
        defaultWorksheet.Delete()
    
        # make stuff visible now.
        chart.Activate()
        application.Visible = True
    
    def genExcelName(row, col):
        """Translate (0,0) into "A1"."""
        if col < 26:
            colName = chr(col + ord('A'))
        else:
            colName = chr((col / 26)-1 + ord('A')) +\
                chr((col % 26) + ord('A'))
        return "%s%s" % (colName, row + 1)
    
    def addDataColumn(worksheet, columnIdx, data):
        range = worksheet.Range("%s:%s" % (
            genExcelName(0, columnIdx),
            genExcelName(len(data) - 1, columnIdx),
            ))
        for idx, cell in enumerate(range):
            cell.Value = data[idx]
        return range
    

    I suppose that at least an explanation of what this is doing would be a good thing. This program will cause Excel to be run, and a new file to be created. The new file will contain one worksheet with the contents of the x/y arrays in columns A and B. It will also contain one sheet-location chart, an XY scatter, with the x/y points plotted. The x and y arguments to plot are sequences of numbers, and the two flags indicate whether the X and Y axis should be logarithmic or linear.

  5. Some examples of using plot:

    # A simple example:
    plot( (1,2,3,4,5), (6,7,8,9,10) )
    
    # Some more data:
    x, y = [], []
    for i in range(100):
        x.append(i)
        y.append(i ** 2)
    plot(x, y)
    
    # Using log axises:
    plot(x, y, True, True)
    
  6. Remember, there are good tools available if you’re not in Windows too. gnuplot and gnuplot-py provide a pretty nice graphing environment which is probably quite a bit more capable than Excel.

2004/05/11

KMDI's Open Source Conference

Filed under: computers,programming — admin @ 2:43 pm

I’ve been attending an Open Source conference at the University of Toronto, organized by the Knowledge Media Design Institute. The conference has been extremely interesting for a number of reasons, which I will iterate here in chronological order.

I’m here mostly because of Catspaw‘s involvement in the conference. I’m also here staying with her while I attend the conference. This is a major important part of understanding why many of the odd events here have occured.

  • The day before the conference, I chatted with a volunteer by the name of Leigh. Although originally I was just listening to a conversation between her and Jason Nolan, eventually just the two of us were talking. At some point the topic of the Internet retaining information came up, and we used Catspaw’s laptop to lookup her name on google. One of the relevant google groups search result was from a newsgroup alt.dragon-net, which she claimed was a MUD she used to use.

    Well, the discussion continued on MUDs, when Leigh made the interesting claim that she had once dated two wizards from a single MOO (although she hadn’t known they were on the same MOO until much later). I dug into this and discovered she was referring to two people from MOO Canada, where she had previously had a character and used. She had even previously attended a MOO picnic. She was even a close friend of a good friend of mine, Joanna.

    This relatively random encounter was with a person who I had probably met before many years prior, and knew of myself and Catspaw (only by the name Catspaw), and was a very close friend-of-a-friend relation. Small world.

  • Sunday morning, we were treated to an amazing speech by Eben Moglen. He made a great, moving, active speach supporting Free software. It was one of the highlights of this conference. "We will win."

  • During the break between the Sunday sessions, I was standing next to the refreshment table when a voice from behind asked, "What’s with the Python shirt?" (referring to my wonderful Python t-shirt, of course). I turned around to be faced with David Ascher, one of the conference speakers and the current manager of product development at ActiveState. I had hoped to talk with him during the conference, and here was my chance. Surprised as I was though, I had nothing insightful or useful to say.

  • Sundary afternoon, it got even better. A panel on the law and politics of open source became a debate on legal issues surrounding free software and the GNU general public license. David McGowan spoke amazingly well to poke a hole in the wonderful balloon of emotion that Eben had earlier generated amoung the audience. Barry Sookman followed with a continuing and more detailed evaluation of the GPL, with specific references to Canadian copyright law.

    Eben was allowed to take the stage to comment. He replied with some very breakneck comments that blew away the earlier points of discussion.

    One of the conference attendees, while posing a question, described the situation very well. "You’re all excellent speakers, with an excellent discussion. I believed everything each of you said as you said it, which is amazing since it all conflicted with each other." (Ian D. Allen).

  • I helped Nancy Frishberg transfer her presentation slides off her laptop (which was running the Java Desktop System) and into the KMDI ePresence network server. I was happy to help, but I was very confused by a couple of points of this interaction:

    1. Why does a Sun employee need help doing such a simple seeming task using their own desktop environment? I mean, clearly she didn’t write the entire system, but I would have thought she might have the channel of input necessary to tell the software developers to make such a thing easy. It was … unintuitive, but understandable.
    2. Nancy kept a copy of the terminal session I used and put it into a StarOffice document after the process was done. I’m very curious what she planned to do with that… send it to the developers and suggest she doesn’t want to do such things ever again? Place it into documentation for the desktop system?
  • After David Ascher’s presentation during the Monday morning panel, a question was asked by a dude in a red plaid shirt. His question related to Komodo, in which he refered to it as a tool aiding the development of things like MUDzilla. This was an extremely odd comment. As one of the developers of MUDzilla, I was shocked to hear anyone else refer to our small, unmaintained, and very limited-audience piece of software.

  • After the morning panel on Monday, I got a chance to intercept David Ascher while he was leaving the conference room. I asked him whether ActiveState had any plans to develop an OSX version of their wonderful Komodo IDE. His response was, "if I had a nickel for every time I was asked that question…" to which I replied "You could have more than a nickel for every time." Anyways, his answer was that it was not as simple as it seemed to port, and market research had indicated it was not likely that many full commercial copies of the software (at $300 each) would be sold. It was not going to be profitable due to a large amount of development and a small potential return.

  • Nancy Frishberg’s presentation included a quotation from Catspaw’s recent paper on Five Fundamental Issues with Open Source Software. Cecil noticed this just shortly before her presentation began as she was zipping through her presentation on screen. We informed Catspaw and someone introduced the two of them. I find it pretty ironic that one of the presenters during this conference was quoting the volunteer organizer of the conference.

2004/04/30

Latin Hypercube Sampling

Filed under: programming — admin @ 9:44 pm

Introduction

Latin hypercube sampling (LHS) is a form of stratified sampling that can be applied to multiple variables. The method commonly used to reduce the number or runs necessary for a Monte Carlo simulation to achieve a reasonably accurate random distribution. LHS can be incorporated into an existing Monte Carlo model fairly easily, and work with variables following any analytical probability distribution.

Monte-Carlo simulations provide statistical answers to problems by performing many calculations with randomized variables, and analyzing the trends in the output data. There are many resources available describing Monte-Carlo (history, examples, software).

The concept behind LHS is not overly complex. Variables are sampled using a even sampling method, and then randomly combined sets of those variables are used for one calculation of the target function.

The sampling algorithm ensures that the distribution function is sampled evenly, but still with the same probability trend. Figure 1 and figure 2 demonstrate the difference between a pure random sampling and a stratified sampling of a log-normal distribution. (These figures were generated using different versions of the same software. Differences within the plot, such as the left axis label and the black lines, are due to ongoing development of the software application and are not related to the issue being demonstrated.)

stratified-cfp.png

Figure 1. A cumulative frequency plot of "recovery factor", which was log-normally distributed with a mean of 60% and a standard deviation of 5%. 500 samples were taken using the stratified sampling method described here, which generated a very smooth curve.

random-cfp.png

Figure 2. A cumulative frequency plot of "recovery factor", which was log-normally distributed with a mean of 60% and a standard deviation of 5%. 500 random samples were taken.

Process

Sampling

To perform the stratified sampling, the cumulative probability (100%) is divided into segments, one for each iteration of the Monte Carlo simulation. A probability is randomly picked within each segment using a uniform distribution, and then mapped to the correct representative value in of the variable’s actual distribution.

A simulation with 500 iterations would split the probability into 500 segments, each representing 0.2% of the total distribution. For the first segment, a number would be chosen between 0.0% and 0.2%. For the second segment, a number would be chosen between 0.2% and 0.4%. This number would be used to calculate the actual variable value based upon its distribution.

P(X <= x) = n is solved for x, where n is the random point in the segment. How this is done is different for every distribution, but it’s generally just a matter of reversing the process of the probability function.

Grouping

Once each variable has been sampled using this method, a random grouping of variables is selected for each Monte Carlo calculation. Independant uniform selection is done on each of the variable’s generated values. Each value must only be used once.

Example

Average Joe has been offered free wood scraps from the local lumber yard. They tend to cut pieces up for customers and always have leftovers that nobody is interested in because the ends are damaged, or other contrived reasons. Joe wants to build U shaped picture frames out of these pieces, and he’s interested in knowing how probable it is that he can build a picture frame big enough for some strange purpose.

Joe’s odd U shaped picture frames require three pieces of wood. A base, and two sides. He’s going to make the base out of a chunk of 2-by-4, and the sides out of baseboards. The 2-by-4 (our first variable) comes in pieces 4cm to 40cm long, and the baseboards come in pieces 2cm to 60cm long. All possibilities are equally likely (uniform distribution).

We’re going to run this simulation with 10 iterations. (A common practice for Monte Carlo simulations is to start with about a thousand iterations, and work upwards depending upon the complexity of the problem.) We’re also going to use Python to do our simulation.

Sampling:

from __future__ import division
import random
iterations = 10
segmentSize = 1 / iterations
for i in range(iterations):
    segmentMin = i * segmentSize
    #segmentMax = (i+1) * segmentSize
    point = segmentMin + (random.random() * segmentSize)
    pointValue = (point * (variableMax - variableMin)) + variableMin

For our 2-by-4s, this might provide values like this:

Sample Segment Min Segment Max Segment Point Value
0 0% 10% 2.132 % 4.767
1 10% 20% 13.837 % 8.981
2 20% 30% 24.227 % 12.722
3 30% 40% 33.716 % 16.137
4 40% 50% 47.659 % 21.157
5 50% 60% 53.636 % 23.308
6 60% 70% 69.006 % 28.842
7 70% 80% 73.106 % 30.318
8 80% 90% 88.888 % 35.999
9 90% 100% 90.812 % 36.692

Once Joe has all three runs of the sampling for each variable, he’ll have a set of output like this:

Sample Variable 1 Variable 2 Variable 3
0 4.767 7.483 4.719
1 8.981 7.731 9.417
2 12.722 13.150 11.852
3 16.137 18.347 15.136
4 21.157 21.233 21.786
5 23.308 23.382 22.630
6 28.842 26.008 26.180
7 30.318 32.452 31.977
8 35.999 36.129 34.198
9 36.692 37.482 39.968

As you can see, even though both variable 2 and variable 3 have the same distribution (they’re the baseboards), the stratified sampling does not cause them to have the same numbers. It just causes a more evenly distributed set of numbers, eliminating the possibility of all 10 samples being the exact same number. In Monte Carlo simulation, we want the entire distribution to be used evenly. We usually use a large number of samples to reduce actual randomness, but the latin hypercube sampling permits us to get the ideal randomness without so much of a calculation.

The only step left is to shuffle these variables around to create some calculations. Joe’s software might randomly pick iteration 9 for variable 1, iteration 3 for variable 2, and iteration 5 for variable 3. This results in a set of (35.999, 13.150, 21.786) for the sizes of the 2-by-4s and the left and right baseboard. (Joe’s got a lopsized U! Not entirely unexpected. He should have been the guy buying the lumber first, the cheap bastard.)

Each number should be discarded after it is used, and others selected for the next calculation.

An alternative at this point is to run one case with each of the possible variable sets, creating 1000 calculations for these simple 10 variables. For complex cases involving many variables and many samplings, this can balloon to unreasonable numbers of calculations quickly. This eliminates all of the benefits that Monte Carlo simulation provides, while still providing approximately the same answer.

Conclusion

Latin hypercube sampling is capable of reducing the number of runs necessary to stablize a Monte Carlo simulation by a huge factor. Some simulations may take up to 30% fewer calculations to create a smooth distribution of outputs. The process is quick, simple, and easy to implement. It helps ensure that the Monte Carlo simulation is run over the entire length of the variable distributions, taking even unlikely extremities into account as one would desire.

2004/04/08

The MutableNumber Class Revisited

Filed under: programming,python — admin @ 5:04 pm

Ian Bicking’s comment suggesting I wasn’t far away from getting the MutableNumber class to work was quite correct. I had never considered using a descriptor, which made it difficult to make this work with a metaclass. The following code, though, does work:

class Operator(object):
    def __init__(self, instance, op):
        self.instance = instance
        self.op = op
    def __call__(self, rhs):
        try:
            return self.op(self.instance.number, rhs.number)
        except AttributeError:
            return self.op(self.instance.number, rhs)

class OperatorDescriptor(object):
    def __init__(self, op):
        self.op = op
    def __get__(self, obj, type=None):
        return Operator(obj, self.op)
    def __set__(self, obj, value):
        raise TypeError
    def __delete__(self, obj):
        raise TypeError

class NumberDelegateMetaclass(type):
    def __init__(cls, name, bases, dct):
        for attr in cls.__operatormap__:
            setattr(cls, attr, OperatorDescriptor(getattr(operator, attr)))

class MutableNumber(object):
    number = None

    __metaclass__ = NumberDelegateMetaclass
    __operatormap__ = "__add__", "__div__", "__eq__", "__floordiv__", "__ge__", \
        "__gt__", "__inv__", "__invert__", "__le__", "__lshift__", "__lt__", \
        "__mod__", "__mul__", "__ne__", "__neg__", "__pos__", "__pow__", \
        "__rshift__", "__sub__", "__truediv__"
    def __init__(self, value=None):
        self.number = value

It’s a miracle.

Overriding Operators in New-Style Classes

Filed under: programming,python — admin @ 3:38 pm

Occasionally, even Python violates the principal of ‘least surprise’ – that is, software should do what you expect it to do. Overriding mathematics operators in new-style classes led me to find one such violation, and investigate the reason behind it.

I setup myself a make-work project a few days ago when I probably should have done a dozen other things. After reading a posting to python-dev entitled Python is faster than C, I was motivated to create a pure-python implementation of a matrix class. I was interested in building it primarily as a refresher of matrix mathematics, and also to experiment with psyco‘s x86 optimization capabilities.

I ran into a quirk of new-style classes. I envisioned the Matrix class having two hidden data fields, one to hold a tuple of Vector objects representing each row, and a second one to hold a tuple of Vector objects representing each column. This would allow for easy access to the data based upon rows or columns, whichever was more convenient, with no speed penalty. Each tuple of Vectors ought to point to the same numbers, though. How does one do this with Python?

I’d like self.__cdata[0][0] and self.__rdata[0][0] to always be the same number, even if I only change one. I could manage this with accessor methods that make sure they’re the same, but I decided to wrap each number in an object called a MutableNumber. The class was setup like so:

class MutableNumber(object):
    def __init__(self, value = None):
        self.number = value

Every mathematical operator should have then been overridden like this:

...
    def __mul__(self, rhs):
        try:
            return MutableNumber(self.number * rhs.number)
        except AttributeError:
            return MutableNumber(self.number * rhs)
...

What a pain to type and maintain, though. What if I wanted to add another possibility in the future, or change the operator to not return a new MutableNumber object wrapping the number? There are twenty-two operators I wanted to override (eg. __mul__, __eq__, __div__, __floordiv__…), and I didn’t want to have to type them, even once.

I originally setup a meta-class to delegate every access to a property in the class’s __delegateattributes__ member to self.number. This had some issues:

>>> MutableNumber(2) * 3
6
>>> MutableNumber(2) * MutableNumber(3)
Traceback (most recent call last):
  ...
TypeError: unsupported operand types for *: 'int' and 'MutableNumber'

I had to go a more complex route, rather than just pure delegation. I eventually replaced all the operators with custom instances of an Operator class, which held an operator and a reference to the instance of the class:

class MutableNumber(object):
    __operatormap__ = "__add__", "__div__", "__eq__", "__floordiv__", "__ge__", \
        "__gt__", "__inv__", "__invert__", "__le__", "__lshift__", "__lt__", \
        "__mod__", "__mul__", "__ne__", "__neg__", "__pos__", "__pow__", \
        "__rshift__", "__sub__", "__truediv__"
    class Operator(object):
        def __init__(self, instance, op):
            self.instance = instance
            self.op = op
        def __call__(self, rhs):
            op = getattr(operator, self.op)
            try:
                return op(self.instance.number, rhs.number)
            except AttributeError:
                return op(self.instance.number, rhs)

    def __init__(self, value=None):
        for op in self.__operatormap__:
            setattr(self, op, MutableNumber.Operator(self, op))
        self.number = value

So, we finally reach the part where it gets interesting. What happens? Does this work properly? The code seems good, but it fails to function:

>>> MutableNumber(3) * 2
Traceback (most recent call last):
  ...
TypeError: unsupported operand type(s) for *: 'MutableNumber' and 'int'

Unsupported operand for types? It looks like it’s totally ignoring the fact that the MutableNumber instance has a __mul__ property set. Hm… stranger still:

>>> MutableNumber(3).__mul__(2)
6
>>> MutableNumber(3).__class__.__mul__(2)
Traceback (most recent call last):
  ...
AttributeError: type object 'MutableNumber' has no attribute '__mul__'

Well, it seems that A * B does not call A.__mul__(B). It searches for A.__class__.__mul__, but not A.__mul__. It works fine with an old-style class, but not a new-style class.

It turns out this behaviour has been discovered before. In fact, Jan Decaluwe was trying to do the same thing I am, create an instance object that acts like a mutable number. Alex Martelli wrote:

The idea is that operators should rely on special
methods defined by the TYPE (class) of the object they’re working
on, NOT on special methods defined just by the OBJECT itself and
not by its class. Doing otherwise (as old-style classes did, and
still do for compatibility) is untenable in the general case, for
example whenever you consider a class and its metaclass (given that
the metaclass IS the class object’s type, no more and no less).

Well, too bad. I guess maybe I’ll just backtrack all the way to using accessor methods to make sure the column and row data is both kept constant… or just store one of them.

2004/03/30

Embedding Python Tips

Filed under: moo,programming,python — admin @ 2:46 am

Python is a beautiful programming language. One of it’s most wonderful features is a very clean and simple C API that allows Python to be extended with dynamically loadable C modules. That same C API also allows Python to be embedded in other pieces of software. This means that any program can allow the user to enter Python code interactively (or otherwise) to affect the program in whatever way they wish. This is a powerful capability, but using occasionally requires a few tricks to accomplish the embedder’s goals.

Today’s embedding exercise was allowing a MOO server to execute arbitrary Python code:

;py_runfunction({"import math", "return math.sqrt(2*2*2)"})
> 2.8284271247461903

Of course, a MOO server can already do square roots… that wasn’t the point. There was no point. Anyways, here are a few ideas that might help other people embed Python in a useful way.

Evaluating Statements

One of the first things most people try to do is evaluate an arbitrary statement and get its return value. This is not quite as easy as it sounds. Although Python’s eval builtin does this, it may be more limited than the embedding programmer desires. eval will only permit an expression to be evaluated, not a statement:

>>> eval("x = 2")
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
  File "<string>", line 1
    x = 2
      ^
SyntaxError: invalid syntax

I suggest that if you want the user to be able to evaluate an arbitrary block of code, wrap an artificial function around it and call the function itself:

def f():
    import math
    class Cylinder:
        def _calcVolume(self):
            return math.pi * \
                self.radius**2 * \
                self.height
        volume = property(_calcVolume)
    c = Cylinder()
    c.radius = 12.2
    c.height = 16.12
    return c.volume

This allows the user to input much more complex functions, like the above example which uses a class and an import statement. All that needed to be artificially added was the ‘def f():’ and an arbitrary but constant amount of whitespace in front of each line of code.

Compiling Code without a Module

So you’ve gotten some code from a user, and you want to compile it. Maybe you’re creating a function to wrap around the user’s code. Where does that function belong? Where do you evaluate your code?

The first instinct I had was to use PyImport_AddModule to get the __main__ module and start importing functions into its module dictionary. I had a block of code similar to this (error checking omitted):

Py_Initialize();
PyObject* module = PyImport_AddModule("__main__");
PyObject* moduleDict = PyModule_GetDict(module);
PyObject* compileRetval = PyRun_String(code, Py_file_input,
    moduleDict, moduleDict);
...
Py_Finalize();

This then allowed me to call functions on the module object and get some code back. The only real downside was the initialize and finalize around my code. I didn’t want code from one compile to mess with another, and since I was using the __main__ module, this caused problems. Eventually I decided to use random strings as the names for my modules so that I could use them all independently, but that sure was ugly.

The solution I stumbled upon was caused by my accidently deleting some lines of code. I eventually realized that I didn’t need the module object at all. I could create a new, empty dictionary, and compile the code ‘into’ that:

PyObject* dict = PyDict_New();
PyObject* compileRetval = PyRun_String(code, Py_file_input,
    dict, dict);

Everything continued to work as before, except now I had to PyDict_GetItem out of dict and use PyObject_CallObject rather than the PyObject_CallMethod that I could have used before. But nothing crashed, the world continued to run, and I no longer needed to initialize and finalize around my evaluation. Yay!

Settings __builtins__

There was one minor problem. Some functionality like builtin functions and classes (like Exception) was missing. Oops:

// Check for __builtins__...
if (PyDict_GetItemString(dict, "__builtins__") == NULL)
{
    // Hm... no __builtins__ eh?
    PyObject* builtinMod = PyImport_ImportModule("__builtin__");
    if (builtinMod == NULL ||
        PyDict_SetItemString(dict, "__builtins__", builtinMod) != 0)
    {
        Py_DECREF(dict);
        Py_XDECREF(dict);
        // error handling
        return;
    }
    Py_DECREF(builtinMod);
}

Hey, that fixed that right up.

I had this problem when I was using random names for modules, as well. It seems PyImport_AddModule does not set __builtins__ on a new module — but it is set up on __main__ always.

Getting Tracebacks using the traceback Module

What happened when things went wrong? Well, at first, a lot of crashing. And things were going wrong a lot, especially when I was trying to use modules that didn’t exist in the system. Heh heh.

Thankfully, Python will setup tracebacks that are useful even when you’re using the C API and screwing things up from the inside. How on earth do you get at those tracebacks, though? You can get a lot of information from the PyErr_* class of functions, but not a properly formatted Python traceback to display to the user. Eventually, I ended up using the traceback module itself to display an error:

char* getPythonTraceback()
{
    // Python equivilant:
    // import traceback, sys
    // return "".join(traceback.format_exception(sys.exc_type,
    //    sys.exc_value, sys.exc_traceback))

    PyObject *type, *value, *traceback;
    PyObject *tracebackModule;
    char *chrRetval;

    PyErr_Fetch(&type, &value, &traceback);

    tracebackModule = PyImport_ImportModule("traceback");
    if (tracebackModule != NULL)
    {
        PyObject *tbList, *emptyString, *strRetval;

        tbList = PyObject_CallMethod(
            tracebackModule,
            "format_exception",
            "OOO",
            type,
            value == NULL ? Py_None : value,
            traceback == NULL ? Py_None : traceback);

        emptyString = PyString_FromString("");
        strRetval = PyObject_CallMethod(emptyString, "join",
            "O", tbList);

        chrRetval = strdup(PyString_AsString(strRetval));

        Py_DECREF(tbList);
        Py_DECREF(emptyString);
        Py_DECREF(strRetval);
        Py_DECREF(tracebackModule);
    }
    else
    {
        chrRetval = strdup("Unable to import traceback module.");
    }

    Py_DECREF(type);
    Py_XDECREF(value);
    Py_XDECREF(traceback);

    return chrRetval;
}

Of course, when one can’t import the traceback module, one can’t generate a traceback explaining why not. :-)

2004/03/16

The Worst C++ Code Ever

Filed under: c++,programming — admin @ 3:51 pm

Yesterday morning, I overheard a couple co-workers and Cecil talking about problems they were having compiling their software this morning. I popped over to Cecil’s desk to take a look, since I like seeing messed up problems. It took about three seconds flat to diagnose the problem:

/wsimages/m_Symbol.png

It looks like someone has #define’d m_Symbol.

Someone had defined m_Symbol as a preprocessor macro. Why on earth would that be? A bit more digging determined that a co-worker had decided to move a bunch of variables in a library class into a small class, to consolidate them. Unfortunately, this class was written with all of these member variables as public, so he decided that writing a bunch of preprocessor macros would be the only way to allow everyone to keep working despite his changes:

#define m_Symbol    m_Sym.m_Type
#define m_SymSize   m_Sym.m_Size
...

He tested it by building one of our larger applications, and then assumed if it worked there it would work everywhere. This was bad code.

The correct possible solutions involved more work:

  • Re-write everyone’s code to use the new classes.
  • Use accessor methods and keep data private so that it can be rearranged in the future.

Of course, both of these would involve a lot of code re-writing. An automated semi-intelligent find and replace could take care of at least 80% of the cases, but some hand writing would need to be done. Since it involved more work, he didn’t do it. He plans to someday fix this when he has a spare programmer lying around.

BAH.

Coding everything in Python would be the best alternative.

« Newer PostsOlder Posts »

Powered by WordPress