I’ve spent quite a bit of time over the past couple of days transforming the data access layer of one of my Flask applications from a purely psycopg2-based mess to an SQLAlchemy-based not-quite-such-a-mess, and I thought I’d share some thoughts on the process.
Introduction
As a former data analyst, my face has been permanently sootened from long days spent at the SQL coalface, rattling off an endless succession of raw queries in the faint hope of unearthing something pertinent from the dark mine that is the database.
Then, when I became a web developer, I learned about database abstractions. These initially filled me with rage – we use a bastardised version of Laravel’s Eloquent ORM at my place of work, and I remember thinking it was the devil when I first heard about it. At that point, SQL was about all I had in my skills locker, and so the idea that it might be rendered practically redundant was a frightening one, luddite that I am.
I’ve grown more used to ORM over time, and no longer need convincing of its benefits. However, when I coded the data access layer for the Flask app under discussion in this article six months ago, I simply used the PostgreSQL database driver library psycopg2, which requires textual, relational database management system (RDBMS)-specific SQL to be passed to an execute method. SQLAlchemy felt unnecessary at the time, as I had no desire to impose an object-oriented architecture on the core algorithm, which lives or dies by the speed at which it is able to process input.
Recently, however, it became plain that layering SQLAlchemy over psycopg2 was the way forward for the backend of the app. The three major motivators for this were:
- Not wanting to be tied down to a particular RDBMS
- Cleaner, more readable data access
- Database migrations – the Flask-Migrate library seems to be the standard way to handle migrations for a Flask app, and this library is built on top of SQLAlchemy. Therefore, to avoid reinventing the wheel by building my own migration logic, it seemed SQLAlchemy was required
I should clarify that even after the transition to SQLAlchemy, the core algorithm is still for the most part free of complex, custom objects, as it should be – all that’s really changed is the way data is persisted to and loaded from the database.
The transition to SQLAlchemy – a short story
Transitioning to SQLAlchemy turned out to be a bit of a lesson in how little I know about databases – but having your lack of knowledge exposed is the best motivation to learn! I’ve divided up my notes on the experience into three separate parts. I hope that together they provide for a relatively interesting read.
Part 1 – Why do they want everything in one file?!
Being fairly familiar with the Laravel Eloquent ORM, setting up classes that map to database tables and defining fields and their datatypes in those classes was a fairly straightforward process. Defining relationships was a little trickier – I had gone out of my way to normalise the database when I revamped it a few months ago, and this meant I now had nine interconnecting tables, with one-to-many relationships galore. Initially I defined the one-to-many (and many-to-one) relationships on both parent and child classes using the back_populates parameter, but eventually abandoned that in favour of the more succinct backref parameter, which when used in the parent class removes the requirement to define the relationship in the child class. I found the way one-to-one relationships work in SQLAlchemy interesting – what you effectively have is a parent-child relationship under the hood, but by passing an argument uselist = False to the relationship method in the parent class, you effectively convert a method that would have returned a list of children into a method that returns only a single child; and of course, in object-oriented programming children tend to have only one parent (multiple inheritance aside!), so the relationship method on the child returns a scalar value also – thus, you end up with a kind of pseudo-equality between the classes.
One thing I find odd about SQLAlchemy is that it appears to be standard to define multiple data model classes in a single file. Call me old-fashioned, but that seems a bit wrong to me – I was brought up with the notion that each class should have its own file. The yuppies of today, honestly! And so, being belligerent, I stuck to my guns and put each of my data model classes in their own file. The programming gods immediately punished me for my disobedience, offering up a new problem for me to wrap my head around – because I was importing declarative_base and creating a separate Base class variable in each file (to be extended from by the data model class in that file), database metadata was not being shared amongst classes, and attempting to create relationships between tables led to the NoReferencedTableError.
What I needed to do was have a single Base variable that could be accessed by all of my files. And intriguingly, the simplest solution seemed to be to put that variable inside the __init__.py file in the directory containing all of the data model class files. Being an OOP acolyte, I had of course used __init__.py files – for the uninitiated, they turn directories into Python packages – but I had never put any code into them before – heavens no! My __init__.py files had always been virginal things, unsullied by my horrible code. Well, it turns out that the contents of the __init__.py file are actually executed when its containing module is imported, and variables defined in the file can be conveniently grabbed from the module namespace – two facts which make the file a perfect candidate to host a shared Base variable!
Part 2 – As Matthew Crawley implored of Mary Crawley in season one of Downton Abbey: “Just commit already!”
Another confusing aspect of the transition to SQLAlchemy came courtesy of confrontation with the realities of database transactions, and attempting to understand the difference between the Session methods add, flush and commit. Despite using SQL day in, day out for almost two years whilst working in data, I’m ashamed to admit that I never really gained much of an appreciation for database transactions. After all, nobody at my company ever seemed to use them, and by the time I had found out all about their benefits, it all seemed rather academic – I felt I could already do everything I needed to do in the database, and the additional statements required to support transactional behaviour seemed rather unnecessary. In short, I was A Clod In Denial.
Even when working with database abstractions, I’d got used to writing statements like this…
$user = User::create([
'username' => $username,
'password' => md5($password)
]);
…whereby the id property on the user variable is immediately available for usage in subsequent statements, indicating that the User::create function is causing the query to be run against the database there and then – with all of the adding and the flushing and the committing apparently done behind the scenes.
Because of all this, the requirement to explicitly commit transactions in SQLAlchemy threw me initially. And I did indeed require almost all of my transactions to be committed immediately, as I needed last insert IDs for building up foreign key relationships across tables.
As I’d been learning about decorators recently, I took this opportunity to get busy with them, creating a @commit_transaction decorator that I used to wrap SQLAlchemy statements and append to them a session.commit() statement, to save having to explicitly write out that statement multiple times. I’m still not sure I’m fully converted to the ways of the decorator – they do seem awfully unintuitive, and are probably misused in situations (like this one) where a simpler and cleaner solution might have been more appropriate – but clearly somebody thinks they are a good idea, so I’d better get used to them!
Part 3 – Fixing a leak
Getting to grips with SQLAlchemy sessions was a bit of a nightmare. I wanted to avoid using Flask-SQLAlchemy (“an extension for Flask that adds support for SQLAlchemy to your application”), primarily because I already felt I was operating at a sufficient level of abstraction, and didn’t want to learn a new API if it was not strictly necessary. But there is a price to avoiding Flask-SQLAlchemy, and that price is having to handle sessions yourself. And, as I didn’t have much of an idea of what sessions actually were, this proved somewhat painful.
Because I wasn’t handling sessions properly in my initial SQLAlchemy code, I ended up leaking database connections. This meant that after several web requests had been processed by the app, it would just stop working completely, and the error “psql: FATAL: too many connections for role ‘xyz’” would present itself in the logs.
My eventual solution involved the use of built-in decorators provided by Flask – specifically, @app.before_request and @app.teardown_request. When you decorate a function with one of these decorators, you are basically instructing the Flask app to execute the code inside the function at a specific point in the web request lifecycle – either before the request, or after the request. By creating a global database variable in the @app.before_request function, passing this variable around the application, and then disposing of the database’s Engine property in @app.teardown_request with database.engine.dispose(), I was able to prevent connection leakage. The nifty thing about @app.teardown_request is that it executes the decorated code regardless of whether or not an exception was raised during the handling of the request. And with that, the transition to SQLAlchemy had been successful!