Mathieu Fenniak’s Weblog

“import zlib” vs. .NET Framework

Filed under: programming, python, pdf — Mathieu Fenniak @ June 4, 2006 9:44 pm

During my current period of unemployedness, I’ve been preparing for some contract development work that I expect to be doing in the near future. Inspired by the article series on IronPython and .NET GUI development over at The Voidspace Techie Blog, I’ve been looking into what kinds of development struggles I might face using IronPython and .NET as a platform. To that end, I began to look at making pyPdf work under IronPython.

The first struggle I encountered was that the “zlib” module was not available in IronPython. “No problem,” I think to myself. “There’s got to be access to a DEFLATE library through .NET, somehow.”

“Yes, younger-self,” my older-self now says. “There is a .NET way to do this, but apparently it requires an annoyingly large amount of code.”

Here’s the original Python code that was used to implement the FlateEncode streams in pyPdf:

import zlib
def decompress(data):
    return zlib.decompress(data)
def compress(data):
    return zlib.compress(data)

Okay, that was simple and straightforward. Here’s the IronPython solution (note, if you have suggestions to make this shorter, please do let me know):

import System
from System import IO, Collections, Array
def _string_to_bytearr(buf):
    retval = Array.CreateInstance(System.Byte, len(buf))
    for i in range(len(buf)):
        retval[i] = ord(buf[i])
    return retval
def _bytearr_to_string(bytes):
    retval = ""
    for i in range(bytes.Length):
        retval += chr(bytes[i])
    return retval
def _read_bytes(stream):
    ms = IO.MemoryStream()
    buf = Array.CreateInstance(System.Byte, 2048)
    while True:
        bytes = stream.Read(buf, 0, buf.Length)
        if bytes == 0:
            break
        else:
            ms.Write(buf, 0, bytes)
    retval = ms.ToArray()
    ms.Close()
    return retval
def decompress(data):
    bytes = _string_to_bytearr(data)
    ms = IO.MemoryStream()
    ms.Write(bytes, 0, bytes.Length)
    ms.Position = 0  # fseek 0
    gz = IO.Compression.DeflateStream(ms, IO.Compression.CompressionMode.Decompress)
    bytes = _read_bytes(gz)
    retval = _bytearr_to_string(bytes)
    gz.Close()
    return retval
def compress(data):
    bytes = _string_to_bytearr(data)
    ms = IO.MemoryStream()
    gz = IO.Compression.DeflateStream(ms, IO.Compression.CompressionMode.Compress, True)
    gz.Write(bytes, 0, bytes.Length)
    gz.Close()
    ms.Position = 0 # fseek 0
    bytes = ms.ToArray()
    retval = _bytearr_to_string(bytes)
    ms.Close()
    return retval

Basically, the code grew in length for a few reasons. First of all, the original compress and decompress functions took string arguments, but they were basically being used as arrays of bytes. In .NET, there is a clear difference between an array of bytes and a string, so conversion methods were necessary to create a byte array from a string. I actually like this, because it forces you to encode and decode strings whenever you use them, making you aware of their unicode nature (which is actually optional in CPython, basically).

The other added complexity was the use of streams, rather than just basic functions that can be called. A nice object-oriented stream library is actually quite flexible and powerful, but as you can see it can make things a little more verbose. But, you know… both have their advantages.

Finally, I had to write a function just to read an entire stream into a byte array. MemoryStream has a simple “ToArray()” function on it — I wish this was standard on all Stream objects. But regardless, this function only really needs to be written once and can be used for many different purposes. So it isn’t really adding to the length of the deflate encoding, it should be adding to the length of my toolbox somewhere else. Note that my implementation is fairly wasteful of memory, but it is a simple approach that will not fail if Read returns partial buffers, or anything like that.

IronPython is interesting. One hurdle is down for PyPdf, but a few still exist. We’ll see what happens next.

8 Comments

  1. On the other hand, the source code for the zlib module is 1000 lines long, so 45 lines for a partial implementation isn’t all that bad…

    Comment by Fredrik — June 5, 2006 @ 1:02 am

  2. AFAIK with http://msdn2.microsoft.com/en-us/system.io.compression.aspx it should be shorter.

    Comment by Lawrence Oluyede — June 5, 2006 @ 7:35 am

  3. Lawrence - The .NET version of this code does use the System.IO.Compression.DeflateStream class.

    Comment by Mathieu Fenniak — June 5, 2006 @ 7:42 am

  4. Ouch. That means it can’t be shortened :-D

    Comment by Lawrence Oluyede — June 5, 2006 @ 8:21 am

  5. You shouldn’t look at this as something that is so much harder to do in IronPython than CPython. Instead, as Fredrik pointed out, its a much simpler implementation of zlib than that for CPython, so wrap this up in a zlib for .Net and everyone one else can reap the benefits of your work!

    Comment by Calvin Spealman — June 5, 2006 @ 7:11 pm

  6. IronPython really needs to provide simpler class and namespace names, maybe some aliases or an emulation of the sys, os, and other essentials. OTW it is still almost as much typing as C#!

    Comment by casey — June 12, 2006 @ 12:59 pm

  7. This is untested, but it should do exactly what you want in a much more concise way. Note that you can actually leave out the explicit Close statements and rely on the GC to reclaim the memory and save yourself 3 further lines of code instantly :). In fact, if you do that you can write decompress in one line ;).

    from System.IO import StreamReader, MemoryStream
    from System.IO.Compression import CompressionMode, DeflateStream
    from System.Text import Encoding

    def decompress(data):
    gz = DeflateStream(MemoryStream(Encoding.ASCII.GetBytes(data)), CompressionMode.Decompress)
    str = StreamReader(gz).ReadToEnd()
    gz.Close()
    return str

    def compress(data):
    ms = MemoryStream()
    gz = DeflateStream(ms, CompressionMode.Compress, True)
    bytes = Encoding.ASCII.GetBytes(data)
    gz.Write(bytes, 0, bytes.Length)
    gz.Close()
    string = StreamReader(ms).ReadToEnd()
    ms.Close()
    return string

    Comment by Max — August 23, 2006 @ 3:42 am

  8. Unfortunately the example posted by Max doesn’t work - because the StreamReader loses information when you read the compressed data in as a string ‘StreamReader(ms).ReadToEnd()’.

    The best solution (IMHO) is to leave the compressed data as bytes (which is what it is).

    http://www.ironpython.info/index.php/Compression_with_DeflateStream

    Comment by Fuzzyman — June 2, 2007 @ 7:21 am

RSS feed for comments on this post. TrackBack URI

Sorry, the comment form is closed at this time.

Powered by WordPress