pg8000 v1.00 — a new PostgreSQL/Python interface

pg8000 is a Pure-Python interface to the PostgreSQL database engine. Yesterday, it was released to the public for the first time.

pg8000′s name comes from the belief that it is probably about the 8000th PostgreSQL interface for Python. However, pg8000 is somewhat distinctive in that it is written entirely in Python and does not rely on any external libraries (such as a compiled python module, or PostgreSQL’s libpq library). As such, it is quite small and easy to deploy. It is suitable for distribution where one might not have a compiled libpq available, and it is a great alternative to supplying one with your package.

Why use pg8000?

  • No external dependencies other than Python’s standard library.
  • Pretty cool to hack on, since it is 100% Python with no C involved.
  • Being entirely written in Python means it should work with Jython, PyPy, or IronPython without too much difficulty.
  • libpq reads the entire result set into memory immediately following a query. pg8000 uses cursors to read chunks of rows into memory, attempting to find a balance between speed and memory usage for large datasets. You could accomplish this yourself using libpq by declaring cursors and then executing them to read rows, but this has two disadvantages:
    • You have to do it yourself.
    • You have to know when your query returns rows, because you can’t DECLARE CURSOR on an INSERT, UPDATE, DELETE, CREATE, ALTER, ect.
  • pg8000 offers objects to represent prepared statements. This makes them easy to use, which should increase their usage and improve your application’s performance.
  • It has some pretty nice documentation, I think.

Now, that being said, reality kicks in. Here’s why not to use pg8000:

  • It’s pretty new. This means there are likely bugs that haven’t been found yet. It will mature over the next couple weeks with some community feedback and some internal testing.
  • It doesn’t support the DB-API interface. I didn’t want to limit myself to DB-API, so I created just a slightly different interface that made more sense to me. I intend to include a DB-API wrapper in the next release, v1.01.
  • It isn’t thread-safe. When a sequence of messages needs to be sent to the PG backend, it often needs to occur in a given order. The next release, v1.01, will address this by protecting critical areas of the code.
  • It doesn’t support every PostgreSQL type, or even the majority of them. Notably lacking are: parameter send for float, datetime, decimal, interval; data receive for interval. This will just be a matter of time as well, and hopefully some user patches to add more functions. For the case of interval, I expect to optionally link in mxDateTime, but have a reasonable fallback if it is not available.
  • It doesn’t support UNIX sockets for connection to the PostgreSQL backend. I just don’t quite know how to reliably find the socket location. It seems that information is compiled into libpq. Support could be added very easily if it was just assumed that the socket location was provided by the user.
  • It only supports authentication to the PG backend via trust, ident, or md5 hashed password.

pg8000′s website is http://pybrary.net/pg8000/. The source code is directly accessible through SVN at http://svn.pybrary.net/pg8000/.

Be Sociable, Share!

20 Comments on pg8000 v1.00 — a new PostgreSQL/Python interface

  1. Jonathan Ellis
    2007/03/09 at 10:39 am (7 years ago)

    How much of a speed hit is pg8000 over, say, psycopg2? Did you consider using ctypes + libpq? (Installing the many psycopg2 dependencies can be a pita but installing just libpq has never been an obstacle in my experience.)

  2. Mathieu Fenniak
    2007/03/09 at 10:51 am (7 years ago)

    That’s an excellent question, Jonathan. I haven’t done any benchmarking yet, but I am very interested in doing some in the next day or so. I wouldn’t be surprised if the performance was slightly slower, but I don’t think it will be significant.

    ctypes/libpq would work pretty well, but I thought there would be value in building the library using nothing but the documented network protocol for PG. I feel there are some advantages to this approach.

  3. Mathieu Fenniak
    2007/03/09 at 12:40 pm (7 years ago)

    Initial benchmark results for some basic queries returning around 15,000 rows of data show that pg8000 is between 6% and 9% slower than psycopg2. I’m pretty happy with this. I’m sure that with some optimizations this result could be improved slightly, but without any work on “speed” it is fast enough to be competitive with other PG interfaces.

  4. Cliff Wells
    2007/03/09 at 4:19 pm (7 years ago)

    For Unix domain sockets, requiring the user to specify the socket location is both acceptable and IMO, desirable. Also, Unix domain socket support is absolutely critical for many people (myself being one).

    Sounds like an interesting project.

  5. Dictionary Boya
    2007/03/09 at 8:30 pm (7 years ago)

    Wow, this would be nice to have bundled with Zope.

  6. Jonathan Ellis
    2007/03/09 at 9:25 pm (7 years ago)

    It would be cool if your DB-API layer were drop-in compatible with psycopg2 (as long as you don’t use whacky psycopg2 extensions) — e.g. using %s for bind variables and detecting the types automatically.

    (I only care because SQLAlchemy uses psycopg2, and it would be cool to have an easier-to-install driver. But not cool enough to give up SA. :)

  7. michael schurter
    2007/03/09 at 10:14 pm (7 years ago)

    I’m an Mono/.Net dev just learning Python, and this sounds exciting! In the .Net world I used Npgsql which was a pure .Net PostgreSQL client implementation, and it worked beautifully!

    The only problem I see is the lack of SSL support. I connect via SSL to remote PostrgeSQL servers throughout my development process.

    Once you add SSL and a DB-API wrapper, I would hope this would replace the other 7999 implementations! Its nice to have options, but I would hope projects like SQLAlchemy would choose a pure Python solution by default.

    Thanks!

  8. Luis Bruno
    2007/03/10 at 9:05 am (7 years ago)

    You might want to run pg_config to get the socket path; this provides a user-replaceable default, because I don’t need to change pg’s default socket path. Those who do need might just change the compile-time default, which is picked up by pg_config and then by your code.

    What d’ya think?

  9. Mathieu Fenniak
    2007/03/10 at 10:10 am (7 years ago)

    Cliff — UNIX domain socket support was added in a new pg8000 release, version 1.01, by providing the ‘unix_sock’ parameter to the Connection object.

    Jonathan — I’m not sure how compatible with psycopg2 it will be. It does automatically detect types (coming and going, excluding for prepared statements which are optional anyways). The %s bind variable format is not my favourite, but if I’m going to write code to rewrite a statement into PG’s variable format, I might as well include all the DB-API standard formats. It may be simpler to write a new engine for SQLAlchemy, but I’m not sure what that involves.

    Michael — Npgsql is an excellent project, and one I am happy to say I’ve used in the past as well. The Python standard library supplies enough functionality for basic SSL support, but it will not include client certificate support nor check the validity of the server certificate. So basic SSL support is very easy, but I will aim for the best SSL support eventually.

    Luis — It sounds like a good idea, but I couldn’t find the socket path in the output of my pg_config. Also, I am concerned about the DB wrapper exec’ing “pg_config” blindly and hoping it is found on the PATH. This might be a security concern if the wrong executable is run, especially if the program using pg8000 wasn’t aware it was taking this action.

    Thanks to everyone for your interest in this project! A new pg8000 release was cut last night that fixed some bugs, added some new types (woo, floats — let the good times begin), added UNIX socket support, and made the library thread-safe.

  10. Roberto Mello
    2007/03/10 at 10:54 am (7 years ago)

    I would also like pg8000 to be a drop-in replacement for psycopg2 so it can be used with SQLalchemy. This looks like a cool project. Thanks for opening it up.

  11. Kye
    2007/03/10 at 12:06 pm (7 years ago)

    May I suggest a more standard layout (trunk, tags, branches)?

    Anyway, congratulations for your pure python approach. To my
    understanding, in this field the java people is far far away from
    us, because they *believe* in their language. Just in the DB case,
    they have all those type 4 drivers for all mayor db vendors.
    That’s pure java!. No clients, no deployment nightmare. And this
    is even worst in the python world because creating C extensions
    for windows can be a headache (because compiler dependences).

    So I hope the best for this project (if you open the development
    or try to join the psycopg people, there will be more possibilities
    of success, but it’s your call).

  12. Mathieu Fenniak
    2007/03/11 at 12:24 am (7 years ago)

    Seo — I wasn’t aware of bpgsql when I began this project. A quick review shows quite a few similarities between the modules, but I prefer the approach I have taken.

    For example, bpgsql escapes strings on the client side and builds a complete command to upload to the server, whereas pg8000 uses PG bind variables to avoid having to escape the string. If both are implemented right, I don’t think there’s a practical difference between the two. I believe my approach is more conservative in letting the database server handle what has traditionally been a source of security problems in databases (SQL injection possibilities). On the other hand, creating a DB-API compliant interface that supports another parameter quoting style means that I need to rewrite the query into a new parameter style, which could be a source of obscure bugs too.

    pg8000 uses the PostgreSQL “extended query” protocol for queries, which means that prepared statements and cursors come for free. I’ve built the API to represent that, and it has allowed me to reduce memory consumption by caching a limited number of rows from the server response. I think that’s a valuable feature in the library. But it could be done with some clever middleware around any PostgreSQL library.

    There are lots of differences between the two libraries. My opinion about them is bound to be biased, as I am proud of my work. Maybe once pg8000 supports DB-API 2.0 we’ll be on even ground to compare the two.

  13. Paul Boddie
    2007/03/12 at 5:24 am (7 years ago)

    How about using various techniques employed by pyPgSQL to get cursors working properly? I’ve been using psycopg2 and have had to patch it to have “server-side” cursors, mostly because my client balloons like an overfed python with the amounts of data I’m working with if it all gets transferred back to the client.

  14. Mathieu Fenniak
    2007/03/12 at 8:21 am (7 years ago)

    Paul — pyPgSQL has always been my favourite PostgreSQL wrapper library. So, to anyone reading this, I’m not criticizing. I like the approach taken in pg8000 because it is simpler, and requires less code.

    For example, pyPgSQL compares queries against regular expressions to determine whether “DECLARE CURSOR” can use the query. This obviously works, but I believe it is inflexible. An SQL command “EXECUTE”, to run a prepared statement, would not use the cursor functionality because it doesn’t match the “SELECT” regular expression. Assuming that the regular expressions in the library are perfect, they wouldn’t be future-proof — what if new versions of PostgreSQL add some new statement that returns a result set? pg8000 would work the same as usual with the new statement.

    One of my favourite facts about pg8000 is that it uses the binary data transfer format selectively for some types. For example, floats are transmitted in binary format, which actually has impact on their value as compared to text formats. The bytea type, which isn’t implemented yet in pg8000, will not require escaping the text being sent across the wire. The timestamp type does not work well with the binary transfer format since it can sometimes be a float8, and sometimes not, so the text format is used again. Very flexible type implementation that I am happy with.

  15. Paul Boddie
    2007/03/12 at 8:37 am (7 years ago)

    I think the PostgreSQL people really need to provide some kind of API which can say whether a statement can use cursors. It took a while for bind parameters/variables to be supported properly via the client APIs, as far as I recall, but anything of this nature is really important for anyone doing stuff outside the database. One suggestion with psycopg2 was to use “named cursors” explicitly, but that’s really inconvenient if you’re keeping an eye on portability with other database modules – I’d have to employ another layer just to hide such incompatibilities.

    I’ll keep an eye on your project, however, and hope that I can use it in preference to the other modules in the future. It’s true that pyPgSQL does some tests on statements to see if they’re likely to work with cursors, and my patch to psycopg2 does similar things. The big problem with PostgreSQL, in my opinion, is that it’s often very good technically in so many areas, only to be let down by one or two minor details which can “sour the deal”.

  16. Joel Lawhead
    2007/03/12 at 8:57 am (7 years ago)

    Mathieu,

    This is great. Pure Python libraries are way underrated. Python’s ability to interface with C libraries is great but it’s nice to have pure Python alternatives with no dependencies.

  17. Luis Bruno
    2007/03/12 at 11:32 am (7 years ago)

    I meant: use pg_config during “setup.py install”, not in “runtime”; then “hardcode” the value you find.

    I was *so* sure that pg_config provides the socket path. My experience with pg_config comes from psycopg2, which is a different use case.

    I guess I’ll have a look around.

  18. Luis Bruno
    2007/03/12 at 11:35 am (7 years ago)

    Speaking for myself, I don’t think binary client libraries are a “bad” dependency. OTOH, I strongly dislike having to *compile* something to access a database.

    I actually like the Windows model: Next -> Next -> Next -> Install. .oO(OT: except that now I use .msi files and a silent installation)

1Pingbacks & Trackbacks on pg8000 v1.00 — a new PostgreSQL/Python interface

  1. Strictly Business » Blog Archive » Antithesis
    2007/08/03 at 8:54 pm (7 years ago)

    [...] I haven’t tried it yet, but assuming that I can make easy use ofpg8000, a PostgreSQL that throws out c-based bindings and speaking postgresql-ish in native python, then it’s an example of the anti-thesis of “dragon tea”. It doesn’t appear to require hunting down obscure dependencies, it doesn’t require that you build ten other equally obscure things before you can use it, and if it delivers what it promises it will make developer’s lives much easier. [...]