Python Data Access patterns, part 1

Recently I’ve been reading Clifton Nock’s book Data Access Patterns: Database Interactions in Object-Oriented Applications, which as you’d expect from the title, covers a lot of patterns to do with data access. The first part of the book is dedicated to decoupling patterns, which seems to me to be pretty fundamental to good design of applications which connect with databases. If you want an application to interact with a database, then in general it’s a good idea to place the code handling data access into a separate component. The idea is to encapsulate physical database calls in logical operations, which the application can then use, as opposed to mixing them into the rest of the application code.

The benefits of keeping application code (code which needs to know about objects in the application domain) separate from data access code (code which needs to know about how to get at the database) are many. Maintenance is a lot easier – by exposing only logical operations to your application you can change anything related to the detail of how these are implemented without worrying about breaking code elsewhere in the application. The application code itself is cleaner, as it can focus on the objects without worrying about how these are stored physically. This also affords a nice division of labour, so you could have someone working on the business logic in the application who knows nothing of the database behind it.

Here’s an example of how such a data access component could be implemented in Python, where the application is interacting with a SQLite database. The DataAccessor class (apologies to Clifton Nock here, I both stole the class name and modelled the code pretty closely on his Java version) handles connecting to the database and exposes the operations the application might require: read, insert, update, delete. For the sake of brevity I’ve only included the read and insert operations here – update and delete follow a pretty similar pattern to insert.

import apsw
import os

def sql_trace(stmt, bindings):
    'Echoes all SQL executed'
    print "SQL:", stmt
    if bindings:
        print "Bindings:", bindings
    return True

class DataAccessor(object):
    Class to handle data access using apsw sqlite wrapper
    def __init__(self, dbpath, echo=False):
            if os.path.exists(dbpath):
                self.conn = apsw.Connection(dbpath)
                self.cur = self.conn.cursor()
                if echo:
                raise IOError('Database not found: ' + dbpath)
        except apsw.CantOpenError as detail:
            print "Unable to open db file: ", dbpath, detail

    def read(self, table, columns=None, where_row=None, sort_cols=None):
        '''Executes a SELECT statement against table.

        table                 -- name of the table to be read
        columns (optional)    -- list of columns to be read
                              from table
        where_row (optional)  -- dict used to build WHERE
        sort_cols (optional)  -- list of (column, order) pairs
                              used to specify order of the
                              rows returned. Needs to be of
                              the form ('<column>', 'ASC'|'DESC')

        Returns: rows returned from the SELECT statement.
            stmt = 'SELECT '
            if columns:
                stmt += ', '.join(columns)
                stmt += '*'

            # from clause
            stmt += "\nFROM " + table

            # where clause
            if where_row:
                stmt += "\nWHERE "
                stmt += "\n  AND ".join([col + "=:" + col \
                                    for col in where_row])

            # order clause
            if sort_cols:
                stmt += "\nORDER BY "
                stmt += ', '.join([col[0] + ' ' + col[1] \
                                    for col in sort_cols])

            stmt += ';'

            # submit and return results
            args = where_row and (stmt, where_row) or (stmt,)

            results = columns and [dict(zip(columns, row)) \
                for row in self.cur.execute(*args)] \
                or [row for row in self.cur.execute(*args)]

            return results

        except apsw.SQLError as sql:
            print 'Error in SQL submitted:', sql
            print 'SQL:', stmt
            if where_row:
                print 'Bindings:', where_row

        except apsw.Error as error:
            print 'APSW Error: ', error

        except Exception as error:
            print 'Error reading from database:', error


    def insert(self, table, values):
        '''Executes an INSERT statement against table.

        table           -- name of the table to be written to
        values          -- list of rows (dicts) to be inserted

        Returns: None
            # build list of column names
            cols = values[0].keys()

            # generate insert statement
            stmt = 'INSERT INTO ' + table + ' ('
            stmt += ', '.join(cols)
            stmt += ') VALUES ('
            stmt += ', '.join([":%s" % col for col in cols])
            stmt += ')'

            # submit

            self.cur.execute('BEGIN IMMEDIATE')
            self.cur.executemany(stmt, values)

            return self.conn.changes()

        except apsw.SQLError as sql:
            print 'Error in SQL submitted:', sql
            print 'SQL:', stmt

        except apsw.Error as error:
            print 'APSW Error: ', error

        except Exception as error:
            print 'Error submitting insert:', error


I’ve used the apsw SQLite wrapper here, but if at some point I decided to switch to pysqlite, or to use a MySQL database, I could do so without greatly affecting the calling code. The potential drawback here is that what you gain is offset by the loss of control. It may well be that all your application needs to do is to read from or write to a single table at a time, but what about if it needs to execute a join? Drop or create a table or view? What if you need to do specify a more complex where condition than “column=value”?

It may well be possible to rewrite this class to expose these operations, but I could see this getting to be a lot of work. In the next part I’ll look at some ORM (Object Relational Map) solutions which get around this nicely by mapping an OO structure to a relational model.

SQLite in Ubuntu 9.04

There are a lot of good choices for a free/open-source DBMS in Ubuntu (or anywhere for that matter): MySQL and PostgreSQL would be the two that stand out most prominently. But when you’re developing a small-scale application which requires a small-scale database (whether or not you intend to scale up later), a client-server DBMS like that is the proverbial sledgehammer, with your data structure as the walnut. SQLite offers a file-based transactional SQL database engine, without the installation/configuration/administration time associated with a server-based DB. Think MS Access, cross-platform, minus all the forms.

I’m currently building a proof-of-concept application (for migrating a spreadmart system), using the Django framework to put a web form presentation layer across a db backend. SQLite is ideal for this sort of task, in that the development time is shortened – I can pretty much get down to writing code as soon as the design is complete. An interface to SQLite (sqlite3) is built in to Python since 2.5, so if you wish to create an SQLite database using Python, all you have to do is write the appropriate code to do so. Without using Python you’ll need to enable command-line access by installing the sqlite3 library:

sudo apt-get install sqlite3

or just open Synaptic and search for sqlite3.

So for instance to create a database file ‘expenses.db’, create a category table and populate one row in it, I can do the following from the command line:

$ sqlite3 expenses.db
SQLite version 3.6.10
Enter ".help" for instructions
Enter SQL statements terminated with a ";"
sqlite> create table category(
   ...> id integer primary key,
   ...> cat_code varchar(3),
   ...> cat_name varchar(20)
   ...> );
sqlite> insert into category values(
   ...> null, 'GRO', 'groceries'
   ...> );
sqlite> select * from category;

To do the same thing in Python:

>>> import sqlite3
>>> conn = sqlite3.connect('expenses.db')
>>> conn.execute('''create table category(
... id integer primary key,
... cat_code varchar(3),
... cat_name varchar(20)
... )''')
<sqlite3.Cursor object at 0xb77ce2c0>
>>> conn.commit()
>>> conn.execute('''insert into category values(
... null, 'GRO', 'groceries')''')
<sqlite3.Cursor object at 0xb77ce290>
>>> conn.commit()
>>> for row in conn.execute('select * from category'):
...   print row
(1, u'GRO', u'groceries')

If you prefer to have a GUI, you can install the SQLite Database Browser:

sudo apt-get install sqlitebrowser

This is a very handy (albeit rather basic) visual tool which gives you access to pretty much all the same functionality as you have from the command line:

So that’s SQLite – simple, easy to use and very handy. Incidentally there’s a great blog post here extolling the virtues of SQLite in itself and also in combination with Excel. Intriguing reading and makes a great case for the benefits of using a “fractional horsepower” database. I’ll post more shortly on what this approach has allowed me to do with both Django and Excel.