March 2004 archive

Embedding Python Tips

Python is a beautiful programming language. One of it’s most wonderful features is a very clean and simple C API that allows Python to be extended with dynamically loadable C modules. That same C API also allows Python to be embedded in other pieces of software. This means that any program can allow the user to enter Python code interactively (or otherwise) to affect the program in whatever way they wish. This is a powerful capability, but using occasionally requires a few tricks to accomplish the embedder’s goals.

Today’s embedding exercise was allowing a MOO server to execute arbitrary Python code:

;py_runfunction({"import math", "return math.sqrt(2*2*2)"})
> 2.8284271247461903

Of course, a MOO server can already do square roots… that wasn’t the point. There was no point. Anyways, here are a few ideas that might help other people embed Python in a useful way.

Evaluating Statements

One of the first things most people try to do is evaluate an arbitrary statement and get its return value. This is not quite as easy as it sounds. Although Python’s eval builtin does this, it may be more limited than the embedding programmer desires. eval will only permit an expression to be evaluated, not a statement:

>>> eval("x = 2")
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
  File "<string>", line 1
    x = 2
      ^
SyntaxError: invalid syntax

I suggest that if you want the user to be able to evaluate an arbitrary block of code, wrap an artificial function around it and call the function itself:


def f():
    import math
    class Cylinder:
        def _calcVolume(self):
            return math.pi * \
                self.radius**2 * \
                self.height
        volume = property(_calcVolume)
    c = Cylinder()
    c.radius = 12.2
    c.height = 16.12
    return c.volume

This allows the user to input much more complex functions, like the above example which uses a class and an import statement. All that needed to be artificially added was the ‘def f():’ and an arbitrary but constant amount of whitespace in front of each line of code.

Compiling Code without a Module

So you’ve gotten some code from a user, and you want to compile it. Maybe you’re creating a function to wrap around the user’s code. Where does that function belong? Where do you evaluate your code?

The first instinct I had was to use PyImport_AddModule to get the __main__ module and start importing functions into its module dictionary. I had a block of code similar to this (error checking omitted):


Py_Initialize();
PyObject* module = PyImport_AddModule("__main__");
PyObject* moduleDict = PyModule_GetDict(module);
PyObject* compileRetval = PyRun_String(code, Py_file_input,
    moduleDict, moduleDict);
...
Py_Finalize();

This then allowed me to call functions on the module object and get some code back. The only real downside was the initialize and finalize around my code. I didn’t want code from one compile to mess with another, and since I was using the __main__ module, this caused problems. Eventually I decided to use random strings as the names for my modules so that I could use them all independently, but that sure was ugly.

The solution I stumbled upon was caused by my accidently deleting some lines of code. I eventually realized that I didn’t need the module object at all. I could create a new, empty dictionary, and compile the code ‘into’ that:


PyObject* dict = PyDict_New();
PyObject* compileRetval = PyRun_String(code, Py_file_input,
    dict, dict);

Everything continued to work as before, except now I had to PyDict_GetItem out of dict and use PyObject_CallObject rather than the PyObject_CallMethod that I could have used before. But nothing crashed, the world continued to run, and I no longer needed to initialize and finalize around my evaluation. Yay!

Settings __builtins__

There was one minor problem. Some functionality like builtin functions and classes (like Exception) was missing. Oops:


// Check for __builtins__...
if (PyDict_GetItemString(dict, "__builtins__") == NULL)
{
    // Hm... no __builtins__ eh?
    PyObject* builtinMod = PyImport_ImportModule("__builtin__");
    if (builtinMod == NULL ||
        PyDict_SetItemString(dict, "__builtins__", builtinMod) != 0)
    {
        Py_DECREF(dict);
        Py_XDECREF(dict);
        // error handling
        return;
    }
    Py_DECREF(builtinMod);
}

Hey, that fixed that right up.

I had this problem when I was using random names for modules, as well. It seems PyImport_AddModule does not set __builtins__ on a new module — but it is set up on __main__ always.

Getting Tracebacks using the traceback Module

What happened when things went wrong? Well, at first, a lot of crashing. And things were going wrong a lot, especially when I was trying to use modules that didn’t exist in the system. Heh heh.

Thankfully, Python will setup tracebacks that are useful even when you’re using the C API and screwing things up from the inside. How on earth do you get at those tracebacks, though? You can get a lot of information from the PyErr_* class of functions, but not a properly formatted Python traceback to display to the user. Eventually, I ended up using the traceback module itself to display an error:


char* getPythonTraceback()
{
    // Python equivilant:
    // import traceback, sys
    // return "".join(traceback.format_exception(sys.exc_type,
    //    sys.exc_value, sys.exc_traceback))

    PyObject *type, *value, *traceback;
    PyObject *tracebackModule;
    char *chrRetval;

    PyErr_Fetch(&type, &value, &traceback);

    tracebackModule = PyImport_ImportModule("traceback");
    if (tracebackModule != NULL)
    {
        PyObject *tbList, *emptyString, *strRetval;

        tbList = PyObject_CallMethod(
            tracebackModule,
            "format_exception",
            "OOO",
            type,
            value == NULL ? Py_None : value,
            traceback == NULL ? Py_None : traceback);

        emptyString = PyString_FromString("");
        strRetval = PyObject_CallMethod(emptyString, "join",
            "O", tbList);

        chrRetval = strdup(PyString_AsString(strRetval));

        Py_DECREF(tbList);
        Py_DECREF(emptyString);
        Py_DECREF(strRetval);
        Py_DECREF(tracebackModule);
    }
    else
    {
        chrRetval = strdup("Unable to import traceback module.");
    }

    Py_DECREF(type);
    Py_XDECREF(value);
    Py_XDECREF(traceback);

    return chrRetval;
}

Of course, when one can’t import the traceback module, one can’t generate a traceback explaining why not. :-)

The Worst C++ Code Ever

Yesterday morning, I overheard a couple co-workers and Cecil talking about problems they were having compiling their software this morning. I popped over to Cecil’s desk to take a look, since I like seeing messed up problems. It took about three seconds flat to diagnose the problem:

/wsimages/m_Symbol.png

It looks like someone has #define’d m_Symbol.

Someone had defined m_Symbol as a preprocessor macro. Why on earth would that be? A bit more digging determined that a co-worker had decided to move a bunch of variables in a library class into a small class, to consolidate them. Unfortunately, this class was written with all of these member variables as public, so he decided that writing a bunch of preprocessor macros would be the only way to allow everyone to keep working despite his changes:

#define m_Symbol    m_Sym.m_Type
#define m_SymSize   m_Sym.m_Size
...

He tested it by building one of our larger applications, and then assumed if it worked there it would work everywhere. This was bad code.

The correct possible solutions involved more work:

  • Re-write everyone’s code to use the new classes.
  • Use accessor methods and keep data private so that it can be rearranged in the future.

Of course, both of these would involve a lot of code re-writing. An automated semi-intelligent find and replace could take care of at least 80% of the cases, but some hand writing would need to be done. Since it involved more work, he didn’t do it. He plans to someday fix this when he has a spare programmer lying around.

BAH.

Coding everything in Python would be the best alternative.

Distributing a Python Embedding Program

I got a bit bored at my job yesterday. This happens often, and usually a new random piece of software emerges as a result. This time I ended up hacking on a piece of software that had previously emerged from boredom, which I called the "Difference Machine". I added support for plot datasets generated on the fly through Python code, and in the end learned how to distribute Python within my application.

The purpose of the Difference Machine is very simple: I dislike MS Excel, and wish to never have to use it again. As someone who works in a job involving a lot of engineering, Excel is pretty much standard fare for comparing datasets and stuff of that nature, though. The Difference Machine allows plots generated by the software we develop to be imported, and compared against each other. Additionally, text import is possible from a CSV file or from the clipboard.

differencemachine-plot.png

The Difference Machine comparing two gas supercompressibility (z-factor) correlations.

Adding in the Python embedding was easy. I’ve done similar things before, and the Python C-API is pretty straight-forward (although necessarily verbose). When it came time to put this application up on the network for co-workers to access, the real fun began.

differencemachine-edit.png

The Difference Machine editing Python code with a crude edit control.

My first attempt was simple. Copy the application onto the network, and include python23.dll in the application’s directory. Behold! It actually worked. But, there were a few minor glitches…

  • No access to the standard library. import sys and import math worked, but not import csv or anything else.
  • No standard library meant no traceback library, which I had depended upon to format tracebacks when things went poorly. The software did not react to the missing library well.

I needed the standard library. I tried copying it up into a Lib directory in my application’s directory, and this worked okay… except that it was a huge number of files, and it was still missing things like _sre.pyd, the dynamically linked libraries.

I consulted PEP-237, Import Modules from Zip Archives, to try to create a zip file of the entire standard library that would be much easier to manage than my Lib directory. By naming the zip file python23.zip and putting it in the same directory as my python23.dll, it should have been able to access the library. But it still failed because it couldn’t access any dynamically linked libraries, and zlib is necessary to open the zip file.

By putting all the .pyd files into my application’s directory, I finally got it to work. It loads zlib to read python23.zip, where it retrieves the rest of the standard library. Yippee! In the end, the directory contains the following files:

DifferenceMachine.exe
... bunch of my dlls ...
python23.dll
python23.zip
zlib.pyd
_csv.pyd
_sre.pyd
... bunch more pyd files ...

And all is good and happy. I’m one small step closer to being Excel-less.