Mathieu Fenniak's Weblog

2006/06/04

“import zlib” vs. .NET Framework

Filed under: pdf,programming,python — admin @ 9:44 pm

During my current period of unemployedness, I’ve been preparing for some contract development work that I expect to be doing in the near future. Inspired by the article series on IronPython and .NET GUI development over at The Voidspace Techie Blog, I’ve been looking into what kinds of development struggles I might face using IronPython and .NET as a platform. To that end, I began to look at making pyPdf work under IronPython.

The first struggle I encountered was that the “zlib” module was not available in IronPython. “No problem,” I think to myself. “There’s got to be access to a DEFLATE library through .NET, somehow.”

“Yes, younger-self,” my older-self now says. “There is a .NET way to do this, but apparently it requires an annoyingly large amount of code.”

Here’s the original Python code that was used to implement the FlateEncode streams in pyPdf:

import zlib
def decompress(data):
    return zlib.decompress(data)
def compress(data):
    return zlib.compress(data)

Okay, that was simple and straightforward. Here’s the IronPython solution (note, if you have suggestions to make this shorter, please do let me know):

import System
from System import IO, Collections, Array
def _string_to_bytearr(buf):
    retval = Array.CreateInstance(System.Byte, len(buf))
    for i in range(len(buf)):
        retval[i] = ord(buf[i])
    return retval
def _bytearr_to_string(bytes):
    retval = ""
    for i in range(bytes.Length):
        retval += chr(bytes[i])
    return retval
def _read_bytes(stream):
    ms = IO.MemoryStream()
    buf = Array.CreateInstance(System.Byte, 2048)
    while True:
        bytes = stream.Read(buf, 0, buf.Length)
        if bytes == 0:
            break
        else:
            ms.Write(buf, 0, bytes)
    retval = ms.ToArray()
    ms.Close()
    return retval
def decompress(data):
    bytes = _string_to_bytearr(data)
    ms = IO.MemoryStream()
    ms.Write(bytes, 0, bytes.Length)
    ms.Position = 0  # fseek 0
    gz = IO.Compression.DeflateStream(ms, IO.Compression.CompressionMode.Decompress)
    bytes = _read_bytes(gz)
    retval = _bytearr_to_string(bytes)
    gz.Close()
    return retval
def compress(data):
    bytes = _string_to_bytearr(data)
    ms = IO.MemoryStream()
    gz = IO.Compression.DeflateStream(ms, IO.Compression.CompressionMode.Compress, True)
    gz.Write(bytes, 0, bytes.Length)
    gz.Close()
    ms.Position = 0 # fseek 0
    bytes = ms.ToArray()
    retval = _bytearr_to_string(bytes)
    ms.Close()
    return retval

Basically, the code grew in length for a few reasons. First of all, the original compress and decompress functions took string arguments, but they were basically being used as arrays of bytes. In .NET, there is a clear difference between an array of bytes and a string, so conversion methods were necessary to create a byte array from a string. I actually like this, because it forces you to encode and decode strings whenever you use them, making you aware of their unicode nature (which is actually optional in CPython, basically).

The other added complexity was the use of streams, rather than just basic functions that can be called. A nice object-oriented stream library is actually quite flexible and powerful, but as you can see it can make things a little more verbose. But, you know… both have their advantages.

Finally, I had to write a function just to read an entire stream into a byte array. MemoryStream has a simple “ToArray()” function on it — I wish this was standard on all Stream objects. But regardless, this function only really needs to be written once and can be used for many different purposes. So it isn’t really adding to the length of the deflate encoding, it should be adding to the length of my toolbox somewhere else. Note that my implementation is fairly wasteful of memory, but it is a simple approach that will not fail if Read returns partial buffers, or anything like that.

IronPython is interesting. One hurdle is down for PyPdf, but a few still exist. We’ll see what happens next.

2006/01/10

Python PDF Split/Merge Library

Filed under: pdf,programming,python — admin @ 12:48 pm

When you have good tools, working with PDF files can be fun. When you have no tools – it’s time to build a pure-Python library for working with PDF files.

Enter the challenge: create a website that can split and merge PDF files on demand. Given a PDF file of a few hundred pages, split the PDF file and store individual pages as seperate PDF files. On demand, merge any set of individual pages to create and serve a new PDF file.

Rejected solution #1: activePDF Toolkit, a COM based library that receives excellent reviews from a co-worker. Sounds super! However, my deployment platform is Linux, making a Windows COM library virtually unusable.

Rejected solution #2: pdftk, a command-line utility that allows splitting and merging PDF files. pdftk is based on a modified version of the Java iText library, which I am familiar with. However, spawning processes on every page view to merge PDF files is probably relatively slow. When you add in the fact that my pdftk process kept dying with SIGABRT when running it through os.system, os.spawnl, and popen (in other words, I couldn’t get it to work), this solution was rejected.

Rejected solution #3: Use the iText Java library, which is capable of splitting and merging files. However, my web server is somewhat memory limited at the moment. Adding a JRE would not help. Plus, who wants to code in Java when it can be done in Python? Nobody, that’s who.

Enter the solution: a pure-Python library for working with PDF files. It may not be perfect (okay, okay, it definitely is not), but it does work with the PDF files I was most interested in splitting and merging. I’ve also tested it lightly with other random PDF files I’ve found on my system and it seems to work pretty happily with them.

I’ve created a pyPdf project page and uploaded it to PyPI.

2004/12/08

Java Web Start 1.5

Filed under: java,programming — admin @ 3:37 pm

The Java Network Launching Protocol (JNLP), commonly known as Java Web Start, is some very cool technology. In my opinion, it hits a sweet spot between web applications and local applications:

  • Software is run locally on the client machine. JNLP is supported on every platform that Sun’s Java Runtime environment is available, including Linux, Windows, and Mac OS X.
  • The software can be updated on the remote server, and it is updated locally as soon as it is next run.
  • Desktop integration is possible, meaning that the user of the software has a nice double-clickable application shortcut on their desktop, start menu, applications folder, or wherever you would store such a thing in Linux.
  • New in Java 1.5^H^H^H5.0 – file associations can be specified in JNLP files, meaning that I can open a “.whatever” file with the Whatever Java Web Start application by clicking it. Cool!

So, basically JNLP has some of the advantages of Web applications (immediate upgrades available, doesn’t need to be rolled out by network administrators [whether this is really an advantage is questionable]), and some of the advantages of desktop applications (run locally, quick to run, shortcuts and file associations). Cool.

However, the changes to JNLP in Java 1.5 are somewhat poorly documented. I’ve discovered three tidbits of information that I believe are pretty useful, so I’m going to share them with you.

First of all, I wanted my users to know that by using Java 1.5 they’re going to get some additional functionality (file associations), but my application still runs in Java 1.4. I updated my web start launch page to check for 1.5.0 (thanks to Sun’s documentation), and then made it link to my download page with some useful text if they don’t have 1.5. However, if the user went to the download page when running Java 1.4, it did not upgrade to Java 1.5 – it saw that Java Web Start was available, and launched the application.

Adding a #Version tag to the download page fixed this, forcing an upgrade to 1.5.0:

<!--
Automatically installs Java 1.5.0 and runs the PDA application with
Java Web Start.  The addition of the #Version section on the object's
codebase will cause an upgrade to JRE 1.5.0 even if the user already
has a Java Runtime installed.
-->
<object
  codebase="http://java.sun.com/update/1.5.0/jinstall-1_5_0-windows-i586.cab#Version=1,5,0,0"
  classid="clsid:5852F5ED-8BF4-11D4-A245-0080C6F74284"
  height="0"
  width="0">
    <param name="app" value="http://yoursite.com/app.jnlp">
    <param name="back" value="true">
    <!-- Alternate HTML for browsers which cannot instantiate the object -->
    <a
        xhref="http://java.sun.com/j2se/1.5.0/download.html"
        mce_href="http://java.sun.com/j2se/1.5.0/download.html">
          Download Java Web Start
    </a>
</object>

So, now my users had the option to upgrade to Java 1.5. Some of them might even do it. Now, how does one get the file associations to work? At the time I was looking at it (and still as I write this), Sun’s getting started guide is grossly incorrect when discussing the <association> tag in the JNLP file. The getting started guide also does not explain what changes to make to your application to get it to actually open the file when you double click on it.

The first part was easy – Keith Lea already documented the problem, and Google lead me straight to it. Add an <association> tag to your JNLP file, like this:

One step closer! Now I can click on my files and the application launches, but it doesn’t do anything with the file I opened. I took a guess that it was probably passing the filename in through the command line parameters, and put some message boxes into my main function. Sure enough, that’s how it’s being done. I’m being passed two options: -open, and then the filename to open. The following code in my main function dealt with this:

// Handle JNLP association command line argument file opening.
// Format: '-open' 'path-to-file'
boolean openFlag = false;
for (int i = 0; i &lt; args.length; i++)
{
    if (openFlag)
        openPath(args[i]);

    if (args[i].equals("-open"))
        openFlag = true;
    else
        openFlag = false;
}

Now I’m happily taking advantage of the new features in JNLP 1.5, without forcing my users to upgrade if they don’t want to. It’s a happy day.

2004/11/23

Adapting Classes

Filed under: java,programming — admin @ 12:55 am

A few days ago, the wisdom of the java.io.Reader interface dawned on me suddenly, and at the same moment the world of interfaces came into a new light. I’ve always understood what an interface (or pure virtual class) is, and the purpose of them – they allow you to change the implementation of your class without changing the calling code. Some people have even told me that the use of interfaces can replace multiple inheritance – but I never really got how.

For those of you who are unaware, Reader is a basic interface that reads arrays of character data from “some source”. This seems like a good idea, of course. You get the data, and you don’t care about the source. Yay, nice and simple, and everyone is happy. Typically one creates a java.io.FileInputStream, creates a java.io.InputStreamReader (which implements Reader), and you’re off to the races.

One day, another class caught my eye: java.io.BufferedReader. This class implements the Reader interface, but doesn’t specify in the name any kind of data source. How does this class work? A BufferedReader takes another Reader instance as part of its constructor, and adapts it.

Why is this such a special idea? Because BufferedReader is not derived from InputStreamReader. As a result, any Reader can be buffered by this simple class. In the same way, other classes can adapt a BufferedReader to add additional functionality. A LineNumberReader can take a BufferedReader, an InputStreamReader, a KeyboardJunkReader, a RandomDataReader, or a ManagementBullshitReader – whatever Reader one wants to count the lines of. (The fact that LineNumberReader is derived from BufferedReader is irrelevant [and frankly, pointless...])

So, you’re thinking “fine, but so what?”. Let me give you an example of another simple interface that this kind of adapting would be cool for:

public interface XYDataset
{
    public int getCount();
    public Number getX(int index);
    public Number getY(int index);
}

This interface is pretty simple, and would be good as part of a plotting package. Any set of X and Y values could be plotted easily by creating an instance of this XYDataset wherever your data is. This is simple, effective, and cool. A basic implementation of this could have two List objects, or one List object, or … whatever, who cares – storing data is boring.

What if you find that some users are plotting thousands of points, and it’s very slow? Let’s create a filtering adapter class:

public class FilteringXYDataset implements XYDataset
{
    private XYDataset delegate;
    private int maxPoints = 1000; // only plot this many points.

    public FilteringXYDataset(XYDataset initDelegate)
    {
        delegate = initDelegate;
    }

    public int getCount()
    {
        int realCount = delegate.getCount();
        if (realCount &lt; maxPoints)
            return realCount;
        else
            return maxPoints;
    }

    public Number getX(int index)
    {
        int correctItemCount = delegate.getCount();
        if (correctItemCount < numPts)
            return delegate.getX(index);

        int newIdx = (int)(correctItemCount * ((double)item / (double) maxPoints));
        return delegate.getX(newIdx);
    }
    // (repeat for getY)
}

Cool, the dataset is filtered now. It’s a crude filtering, but when you’re plotting a thousand points it’ll do nicely – it’s hard to tell a thousand points from ten thousand points on a normal screen sized plot.

What other dataset adapters could you use?

  • Add an extra 50% to the number of points, and generate them from a bezier smoothing curve.
  • Add a (0, 0) point to every dataset. Pretend the count is one greater, and then add the point in at the appropriate index.
  • Plotting arbitrary X-Y points on a log-log plot is impossible if the points are negative – so chop them out in another adapter class.

Adapting classes like this are simple and nifty – you take one interface, and provide the same interface back to the library user.

The really great part is that, without multiple inheritence, you can later create a dataset that is smoothed, filtered, and has negative points chopped out – all with one line of code. The smoothing and filtering algorithms are only written once, but can be applied in various orders and with various other tools. … and you still don’t know how the XYDataset is being stored. Good!

2004/10/22

Exponential Secant Root Finder

Filed under: programming — admin @ 7:25 pm

Given a function ƒ, find x such that ƒ(x) = 0. Sounds simple enough…

Root finding is a hobby of mine. It’s kind of a lame hobby. It can be lots of fun though, and of course there are many real world usages for it. Once in a while, I’ll read through papers and methods on the general subject and implement new algorithms. I once even had an “algorithm war” program which would pit two against each other: given an arbitrary function with one root, which method would find it first? Most consistently first? Highest accuracy with fewest function calls?

I was faced with an interesting more specific problem yesterday. A simple secant solver had been used on a function with great results, except at a few extreme software inputs. In general, the solver only knew that 0 < x <= ?, and, in fact, ƒ could not be evaluated for values <= 0. “Hm…. the secant method worked so well for this nice, smooth function… if only there was a way to prevent it from going negative,” I thought.

What if the secant method were drawn on a semi-log plot, rather than a cartesian plot? That would cause the root finder to never go negative. It would have greater resolution at smaller values with fewer iterations, but could suffer from requiring more iterations at higher values (for the same tolerance).

/wsimages/plot-linear.png

Figure A: Function plotted on a cartesian scale.

The function being solved looked similar to figure A. In figure A, if the secant method were to hit points between 5 and 10 billion, it would quickly be thrown off by the straight slope into values less than zero, where the function cannot be evaluated.

/wsimages/plot-log.png

Figure B: Function plotted on a semi-log scale.

If it is instead plotted on a semi-log scale, as shown in figure B, the flattened area would throw the values off to very small numbers (around 1E-10). This function would have a very high slope at those values, and the secant method would throw it back into a reasonable range very quickly.

Implementing this modified secant method idea was pretty simple. The standard secant root solver is used, but with a variation of how the next point is chosen:

  • Rather than fitting y = m * x + b to two points on the curve, an exponential function y = m * exp(x) + b was fitted to the two points.
    1. The m value was calculated with m = (y1 - y2) / (log(x1) - log(x2)),
    2. the b value was calculated as b = y1 - log(x1) * m.
  • A new x coordinate by solving the equation for y == 0, which simplifies down to x = exp((y1 * log(x2) - log(x1) * y2) / (y1 - y2)).

This method proved to be as quick as a normal secant solution, and very effective for functions which cannot be evaluated at values <= 0.

2004/10/01

Some Nifty Things

Filed under: java,programming,python — admin @ 1:08 am

Lately I seem to have pushed a large number of important projects to the side to make room for some smaller personal projects. Here’s what I’ve been thinking about lately that’s kinda nifty:

  • I gave notice of resignation to my employer earlier this week. October 8th is my final work day. I’ll be free for a few months to persue some contract work that I’ve got waiting in the wings, and then I’m into a new job in the new year. I’ll be working as the head of software development at a small start-up company. Exciting!

  • I built a new photo gallery system. This one builds on a number of features in flickr. It has tag based image catagorizing, EXIF photo information, and a small number of different image sizes that can be viewed. You can see it in action on my pictures page.

  • A new version of Java was released today. Java 1.5, Tiger, contains a number of boring features: autoboxing (saves some typing), a new for/in loop (saves some typing), and generics (saves some typing – may find some ClassCastExceptions waiting to happen at compile time). These seem to be the features everyone thinks are cool, but I think they’re pretty lame.

    That said, there are a couple of features that are cool. Java 1.5 adds the ability to change a function’s return type when it is being overridden. You can only change it to a subclass of the original type, but this makes a lot of sense. When you really think about it… it turns out that all it does is save you some typing. In any situation where you’d actually use this, you’d be doing some casting that you wouldn’t have to do anymore.

    Hmmm… so… what does it have that’s cool? Variable length argument lists… I’ve never missed these in Java before. Annotations look pretty cool, but if you think about the pretty static applications they have unless you recode your own compiler, they seem to be pretty much just a couple nice builtin features and a new way to add documentation. So… static imports? Yeah, that’s nice. Yay!

    I really wanted to be impressed with Java 1.5′s new features, since I’m doing a lot of Java development these days. But I can just type faster, and I’ll still retain backwards compatibility with people using Java 1.4.

  • I added HTTP Digest authentication into my Twisted based weblog aggregator. This allows me to view LiveJournal RSS feeds with a logged in user, and hence getting links to protected LiveJournal entries that m yuser can see. I submitted a small patch to urllib2 to make it work with those same LiveJournal feeds, and I may add real authentication support to twisted.web.client rather than the hacked support I’m currently using. Maybe this weekend, if I have the inclination.

2004/08/29

Roundup

Filed under: programming,python — admin @ 2:47 am

Roundup is some damn beautiful software. It’s a very nice and simple package for software bug tracking (oh, pardon me… issue tracking). It can be customized very easily, and in fact from a minimal ‘tracker’ just about any web-based database application could be built with a minimum of fuss. The mail gateway is a beautiful design too. Oh, and I love the fact that e-mailing the system creates a user “account” for that e-mail address (unless it’s associated with an existing account, of course). No fuss bug tracking.

I’d love if it supported some e-mail security, though. Digitally signed messages, for example. The current complete lack of e-mail security makes me irrationally scared – a bad person couldn’t do much, but they could do some.

Here’s a neat trick – for nice clean URLs, place the roundup.cgi script wherever you want it to be, renamed it to just roundup, and add a couple lines to your Apache configuration:

<Location /blah/roundup>
    SetHandler cgi-script
</Location>

And you’ll magically get the CGI interface of roundup working without the minor annoyance of having ’roundup.cgi’ in your URLs. Go Apache!

2004/08/26

DevEnv vs. the Programmer

Filed under: programming,python — admin @ 3:57 pm

How can you capture the console output of a program, when it buffers that output if you’re not using a console to view it? This was a problem run into when building an automation tool for MS Visual Studio .NET. In the end, the programmer subjugated his tool (as it should be) by beating it over the head with a pipe.

A few of us programmers with in the unfortunately unfriendly environment of MS Windows. It might look pretty and have lots of applications written for it, but it’s basically an unfriendly environment for a software developer. Even MS Visual Studio .NET can be unfriendly to a developer, which is unfortunate since it’s the one program you’d expect would be really friendly.

Visual Studio allows you to provide command-line options which start a software build. Running inside a command prompt, all you need to do is pass a solution file and a build configuration to the program, and you’re off. In fact, Visual Studio even gives you more command-line flexibility by providing two executables, devenv.com and devenv.exe – the former will tend towards printing console output all the time, while the latter will avoid it if a build log file is provided instead.

In the creation of a complete build tool, I wanted to run devenv.com and capture the output so I could display the progress to a user. That’s when it became tough. Running the executable through os.popen (or any other popen function) didn’t accomplish what I wanted – the output being printed to the console (and now being read through a pipe) was buffered inside the devenv process and only printed after the build was completed. Clearly this didn’t accomplish the goal of providing a progress display for the user.

devenv.com provides an option which I thought might have some promise: /out. This writes the build output to a specified file. Great! All I need to do is start it writing to a file, and read through the file at the same time. I wasn’t sure of the implementation details, but it seemed feasible. Unfortunately, the devenv process locks the output file exclusively. Python’s open() was unable to read it, and even trying to find obscure parameters to win32file‘s functions failed to give me the necessary access to the file.

In the UNIX world, the solution would be obvious. Create a pipe, and write the build output into the pipe while reading the pipe. In the Windows world though, a pipe is not a filesystem object. It can’t be created in a specific location, and so devenv wouldn’t be able to open it like a normal file and write to it. I considered for a while that there are a bunch of standard reserved file names, like CON and PRN. Might one of them help me? Could one of them be used to connect to a pipe? Well, no. Not really. They’re ancient history, a relic from years gone past, and they don’t have any concept of a pipe.

I started digging around for more information about named pipes, which seemed to be the prefered mechanism for IPC in Windows software. Could a named pipe be referenced through a file location? Yes, it can! \\%(host)s\pipe\%(name)s refers to the named pipe name on the host host. And as a bonus, the host . refers to the local machine at all times. Now I finally have a plan of action: Create a named pipe, make devenv write to \\.\pipe\buildOutput, and read the output on the fly.

In the end, I wrapped the named pipe code into module, NamedPipe, and the code to read devenv output on the fly was easy:

from NamedPipe import AnonymousNamedPipeReader

pipe = AnonymousNamedPipeReader()

# Application command line...
# (build application cmd line, devenv.com x.sln /build Release, etc..
# {code omitted}
cmd = cmd + r' /out \\.\pipe\%s' % pipe.name

# Okay, one of us needs to loop and accept a pipe connection, read
# data, display it to the user, and so on.
# The other of us needs to run the build command.
class ExecThread(threading.Thread):
    def __init__(self, cmdLine):
        threading.Thread.__init__(self)
        self.cmdLine = cmdLine
    def run(self):
        self.retval = os.system(self.cmdLine)
thread = ExecThread(cmd)
thread.start()

buildLog = ""
line = ""

for data in pipe:
    buildLog += data
    # {code omitted - display output on the fly}

Now, obviously this code snippet has left out all the magic. It’s a bit long and boring, so I thought maybe you’d just like a link to NamedPipe.py instead. Through the magic of functions like CreateNamedPipe and ConnectNamedPipe, you can read data being written to a file on the fly. It even works when the writer is a jerk, locking the file.

2004/08/04

Vancouver Python Workshop 2004

Filed under: programming,python — admin @ 2:51 pm

I just returned to Calgary from the Vancouver Python Workshop. Cecil and I drove out there last week (Thursday evening/Friday morning). Catsy flew out from Toronto. We all stayed at the Hyatt Regency in downtown Vancouver, where much fun was had watching the Back to The Future trilogy, and making up entertaining stories about elves.

The workshop was reasonably well organized, had a nice venue, and was well attended. The talks varied in quality from okay to excellent – all the speakers were well informed folk with interesting topics, but very few programmers are excellent public speakers. That being said, I have a few specific suggestions for the workshop itself:

  • Some talks could have benefited from the presence of a strict moderator. On a scale from polite to rude, the following things happened:

    • A speaker deferring a question to Guido. This is appropriate, as it is the prerogative of the speaker to defer during his own presentation, and Guido may be an excellent resource for an answer.
    • Guido interjecting a comment like "It’s not happening." while an attendee asks a question (referencing a PEP) is more questionable, but ultimately "polite enough" in the company of geeks.
    • Getting into an audience debate about wxWidgets during a PyObjC talk is not very polite.
    • Nitpicking the usefulness of a contrived optimization example is rude, pointless, and time consuming.
  • An entire track on the second day ended up being dedicated to Plone. I think Plone is cool, but these talks unfortunately ended up being about Plone from the point of view of a user rather than a Python developer. I believe that the workshop did not have quite as many speaker submissions as they wanted, so these Plone talks weighed in based upon how many people were willing to do them. They were good, but badly targeted and abundant.

    I would suggest that the conference organizers be willing to say "no" to speakers if they have an abundance of talks on one such subject. I realize getting submissions for talks must be a difficult process, and the fear of having not enough content at a conference must be pretty big as an organizer.

I enjoyed the workshop greatly. I should have flown out rather than drive 12 hours, but it seemed like a good idea at the time. I picked up a lot of good ideas, started in writing some interesting PyObjC code (a game of interracial life, oooh), enjoyed using SubEthaEdit in public for the first time, and ate a lot of good food. Catsy seems to have had her own kind of fun, too.

2004/05/20

Plotting in Excel through Python/COM

Filed under: programming,python — admin @ 1:41 pm

For the past couple weeks, I’ve been thinking about mathematical model development. There are lots of great tools out there to help with such tasks, like Mathcad and Mathematica. But if you’re doing software development, once you’ve built and tested a model, what you really want is code. Your Mathcad files are great for documentation, testing, and development of your model, but they can’t be embedded in your Java or C++ application.

Additionally, it’s very easy to use functions from those pieces of software which can’t be easily replicated in your software. They have very optimized methods for root solving, matrix math, symbolic derivative calculations, and other such tasks that you can’t reproduce without years worth of effort. So, when you’re developing a model that’s going to be used in software, what’s the easiest way to do it?

Python, of course. Python code tends to be very legible and terse, while still being well suited for mathematical programming. In fact, the syntax and control structures even look pretty similar between Mathcad and Python:

/wsimages/python-riddler.png

Part of a root solver implemented in Python.

/wsimages/mathcad-riddler.png

Part of a root solver implemented in Mathcad.

So, it seems natural to me to use Python to develop models. My code can easily be translated into other programming languages after the fact, and it tends to have a more proper programming structure if it’s not translated from Mathcad. Plus, translating a big chunk code from Mathcad can be a real pain in the ass to do manually – there’s just too much to miss, and it can be very time consuming to confirm it’s been done correctly.

Python is not quite a silver bullet in this case, though. Mathcad has a lot of tools for visualization which are very useful when developing a new model. Python’s wonderful abillity to interoperate makes it fairly easy to leverage plotting capabilities of an application like Microsoft Excel though. So, without any further leadup, here’s a quick process which will get you up and running with an Excel plotting Python program:

  1. Get and install Python, and the PythonWin extensions (or install ActivePython, which contains all the necessary tools and more).

  2. Run the PythonWin IDE program, and generate static COM wrappers for Microsoft Excel. This is as easy as selecting the COM Makepy Utility option from the Tools menu of PythonWin, then selecting the most recent version of the Microsoft Excel n.m Object Library available:

    /wsimages/pythonwin-comwrapper.png

    Selecting COM Makepy utility from PythonWin’s menu.

    The static COM wrappers must be used in order to access the Excel constants (of which there are hundreds).

  3. Import the COM Dispatch function and the constants namespace into your application:

    
    from win32com.client import Dispatch, constants
    
  4. At this point, all that’s left is to go to town on automating Excel. There is a lot of documentation that comes along with Excel, as well as a big chunk of MSDN content that will also help. Let’s dive in, and I’ll just throw an XY scatter plot at you:

    def plot(x, y, xAxisLog=False, yAxisLog=False):
        # acquire application object, which may start application
        application = Dispatch("Excel.Application")
    
        # create new file ('Workbook' in Excel-vocabulary)
        workbook = application.Workbooks.Add()
    
        # store default worksheet object so we can delete it later
        defaultWorksheet = workbook.Worksheets(1)
    
        # build new chart (on seperate page in workbook)
        chart = workbook.Charts.Add()
        chart.ChartType = constants.xlXYScatter
        chart.Name = "Plot"
    
        # create data worksheet
        worksheet = workbook.Worksheets.Add()
        worksheet.Name = "Plot data"
    
        # install data
        xColumn = addDataColumn(worksheet, 0, x)
        yColumn = addDataColumn(worksheet, 1, y)
    
        # create series for chart
        series = chart.SeriesCollection().NewSeries()
        series.XValues = xColumn
        series.Values = yColumn
        series.Name = "Data"
        series.MarkerSize = 3
    
        # setup axises
        xAxis = chart.Axes()[0]
        yAxis = chart.Axes()[1]
        xAxis.HasMajorGridlines = True
        yAxis.HasMajorGridlines = True
        if xAxisLog:
            xAxis.ScaleType = constants.xlLogarithmic
        if yAxisLog:
            yAxis.ScaleType = constants.xlLogarithmic
    
        # remove default worksheet
        defaultWorksheet.Delete()
    
        # make stuff visible now.
        chart.Activate()
        application.Visible = True
    
    def genExcelName(row, col):
        """Translate (0,0) into "A1"."""
        if col < 26:
            colName = chr(col + ord('A'))
        else:
            colName = chr((col / 26)-1 + ord('A')) +\
                chr((col % 26) + ord('A'))
        return "%s%s" % (colName, row + 1)
    
    def addDataColumn(worksheet, columnIdx, data):
        range = worksheet.Range("%s:%s" % (
            genExcelName(0, columnIdx),
            genExcelName(len(data) - 1, columnIdx),
            ))
        for idx, cell in enumerate(range):
            cell.Value = data[idx]
        return range
    

    I suppose that at least an explanation of what this is doing would be a good thing. This program will cause Excel to be run, and a new file to be created. The new file will contain one worksheet with the contents of the x/y arrays in columns A and B. It will also contain one sheet-location chart, an XY scatter, with the x/y points plotted. The x and y arguments to plot are sequences of numbers, and the two flags indicate whether the X and Y axis should be logarithmic or linear.

  5. Some examples of using plot:

    # A simple example:
    plot( (1,2,3,4,5), (6,7,8,9,10) )
    
    # Some more data:
    x, y = [], []
    for i in range(100):
        x.append(i)
        y.append(i ** 2)
    plot(x, y)
    
    # Using log axises:
    plot(x, y, True, True)
    
  6. Remember, there are good tools available if you’re not in Windows too. gnuplot and gnuplot-py provide a pretty nice graphing environment which is probably quite a bit more capable than Excel.

« Newer PostsOlder Posts »

Powered by WordPress