Python Data Access patterns, part 1

Recently I’ve been reading Clifton Nock’s book Data Access Patterns: Database Interactions in Object-Oriented Applications, which as you’d expect from the title, covers a lot of patterns to do with data access. The first part of the book is dedicated to decoupling patterns, which seems to me to be pretty fundamental to good design of applications which connect with databases. If you want an application to interact with a database, then in general it’s a good idea to place the code handling data access into a separate component. The idea is to encapsulate physical database calls in logical operations, which the application can then use, as opposed to mixing them into the rest of the application code.

The benefits of keeping application code (code which needs to know about objects in the application domain) separate from data access code (code which needs to know about how to get at the database) are many. Maintenance is a lot easier – by exposing only logical operations to your application you can change anything related to the detail of how these are implemented without worrying about breaking code elsewhere in the application. The application code itself is cleaner, as it can focus on the objects without worrying about how these are stored physically. This also affords a nice division of labour, so you could have someone working on the business logic in the application who knows nothing of the database behind it.

Here’s an example of how such a data access component could be implemented in Python, where the application is interacting with a SQLite database. The DataAccessor class (apologies to Clifton Nock here, I both stole the class name and modelled the code pretty closely on his Java version) handles connecting to the database and exposes the operations the application might require: read, insert, update, delete. For the sake of brevity I’ve only included the read and insert operations here – update and delete follow a pretty similar pattern to insert.

import apsw
import os

def sql_trace(stmt, bindings):
    'Echoes all SQL executed'
    print "SQL:", stmt
    if bindings:
        print "Bindings:", bindings
    return True

class DataAccessor(object):
    Class to handle data access using apsw sqlite wrapper
    def __init__(self, dbpath, echo=False):
            if os.path.exists(dbpath):
                self.conn = apsw.Connection(dbpath)
                self.cur = self.conn.cursor()
                if echo:
                raise IOError('Database not found: ' + dbpath)
        except apsw.CantOpenError as detail:
            print "Unable to open db file: ", dbpath, detail

    def read(self, table, columns=None, where_row=None, sort_cols=None):
        '''Executes a SELECT statement against table.

        table                 -- name of the table to be read
        columns (optional)    -- list of columns to be read
                              from table
        where_row (optional)  -- dict used to build WHERE
        sort_cols (optional)  -- list of (column, order) pairs
                              used to specify order of the
                              rows returned. Needs to be of
                              the form ('<column>', 'ASC'|'DESC')

        Returns: rows returned from the SELECT statement.
            stmt = 'SELECT '
            if columns:
                stmt += ', '.join(columns)
                stmt += '*'

            # from clause
            stmt += "\nFROM " + table

            # where clause
            if where_row:
                stmt += "\nWHERE "
                stmt += "\n  AND ".join([col + "=:" + col \
                                    for col in where_row])

            # order clause
            if sort_cols:
                stmt += "\nORDER BY "
                stmt += ', '.join([col[0] + ' ' + col[1] \
                                    for col in sort_cols])

            stmt += ';'

            # submit and return results
            args = where_row and (stmt, where_row) or (stmt,)

            results = columns and [dict(zip(columns, row)) \
                for row in self.cur.execute(*args)] \
                or [row for row in self.cur.execute(*args)]

            return results

        except apsw.SQLError as sql:
            print 'Error in SQL submitted:', sql
            print 'SQL:', stmt
            if where_row:
                print 'Bindings:', where_row

        except apsw.Error as error:
            print 'APSW Error: ', error

        except Exception as error:
            print 'Error reading from database:', error


    def insert(self, table, values):
        '''Executes an INSERT statement against table.

        table           -- name of the table to be written to
        values          -- list of rows (dicts) to be inserted

        Returns: None
            # build list of column names
            cols = values[0].keys()

            # generate insert statement
            stmt = 'INSERT INTO ' + table + ' ('
            stmt += ', '.join(cols)
            stmt += ') VALUES ('
            stmt += ', '.join([":%s" % col for col in cols])
            stmt += ')'

            # submit

            self.cur.execute('BEGIN IMMEDIATE')
            self.cur.executemany(stmt, values)

            return self.conn.changes()

        except apsw.SQLError as sql:
            print 'Error in SQL submitted:', sql
            print 'SQL:', stmt

        except apsw.Error as error:
            print 'APSW Error: ', error

        except Exception as error:
            print 'Error submitting insert:', error


I’ve used the apsw SQLite wrapper here, but if at some point I decided to switch to pysqlite, or to use a MySQL database, I could do so without greatly affecting the calling code. The potential drawback here is that what you gain is offset by the loss of control. It may well be that all your application needs to do is to read from or write to a single table at a time, but what about if it needs to execute a join? Drop or create a table or view? What if you need to do specify a more complex where condition than “column=value”?

It may well be possible to rewrite this class to expose these operations, but I could see this getting to be a lot of work. In the next part I’ll look at some ORM (Object Relational Map) solutions which get around this nicely by mapping an OO structure to a relational model.


4 thoughts on “Python Data Access patterns, part 1

  1. Roger says:

    I’m the author of APSW and your code could be a lot shorter. When opening the database you can specify flags including that it must exist already, so there is no need for testing for file existence.

    There is no need for tracing code – see the apswtrace utility which will do all the work for you, including generating reports and showing executing code plus result rows.

    For the insert function you can use context managers (with statement). It will automatically rollback on error.

    There is also no need to close the cursor. This was a requirement several years ago because SQLite was not threadsafe so you had to ensure closing happened in the same thread as other operations. If you let the Python garbage collector run then it could close/free the objects in a different thread. SQLite has been threadsafe for quite a while now as has APSW. (Note that pysqlite still requires all operations in the same thread.)

    There is also no need to have a cursor instance as part of the object. Cursors are very very cheap, make plenty of them. The danger in trying to only have one is that another function may reuse it which will discard any remaining rows some other function is expecting. (This is discussed in the first note on the documentation page for cursors.)

    As a general style issue it is very bad practise to catch exceptions, print something and then just carry on. This means the callers will not get exceptions and carry on as though nothing had happened. At least adding some ‘raise’ to reraise the exceptions would be a good idea.

    There is a python-sqlite mailing list which the pysqlite author and myself manage as a helpful source of discussion and hints.

    With respect to your last sentence, also have a look at document oriented databases. They let you store Python objects directly without having intermediate layers trying to map them relationally. I like MongoDB for performance and scale, and CouchDB for replication and web friendliness.

    • geoffness says:

      Thank you Roger, this is very helpful feedback. I was actually a little hesitant to post this, not having used APSW before, but I now feel a lot better informed! I will put a rewrite based on your comment in a new post.

      • Roger says:

        I’m happy to review your rewrite – feel free to send me personal email or to the python-sqlite mailing list.

        BTW everything I said above is actually covered in the APSW documentation. Unfortunately there is a fair amount of it so that makes it less likely it would all be read or remembered. I’d appreciate any suggestions on improving it (structure and content).

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s