Django is Dangerous
Django is a web framework that consists of everything a backend engineer would think a website might need. It's an ORM tool, a templating engine, a content management system, an admin console, and a collection of utilities. It was invented to power my small-town newspaper's website[0], for which it may have been well-suited. But it has two massive flaws that cripple many of the websites that use it today.
No Data Integrity
Data integrity problems are particularly insidious, because they tend to lie dormant until a site starts to get a lot of traffic and accumulates lots of data, and because bad data, unlike bad code, often cannot be fixed. Django provides plenty of data integrity footcanons for the careless.
ATOMIC_REQUESTS = False
by Default
Among many other things, Django provides an ORM: instead of using SQL directly, you create "models", which are Python classes with "fields" that represent various pieces of data, and the framework handles their serialization and deserialization. In doing so, it also tries to hide the finer points of database interaction from you.
Among these points is transaction management, where you decide which series of database queries need to take place as atomic transactions[1][2].
Since the framework has no insight into the conceptual relationships between queries, it has essentially two choices:
the most fine-grained strategy (only individual queries are atomic) or the most course-grained strategy (every HTTP request is processed atomically).
These choices offer the classic trade-off between performance and safety, which everyone agrees should default towards safety.
Django allows the user to control this behavior via the ATOMIC_REQUESTS
setting, but it defaults to False
—the unsafe but more performant choice.
Common Django usage patterns exacerbate this terrible design decision. Most Django views[3] look like the following:
def my_view(request):
instance = GrabBagOfData.objects.get(...) # retrieve and deserialize a model instance from the database
instance.some_godawful_expensive_method() # modify the instance object
instance.save() # serialize and store the modified instance
return HttpResponse(...)
Unless explicitly told otherwise, the save
method writes every field on the model back to the database,
including those which have not been modified—silently overwriting any changes that have been made since the model instance was retrieved.
Since GrabBagOfData
probably serves numerous distinct purposes, all sorts of AJAX requests are firing off concurrently
to modify different fields and trampling each other's updates. Hell, some_godawful_expensive_method
is often so slow that
the user will manually trigger new requests before it has finished.
get_or_create
and update_or_create
are non-atomic
These methods allow you to get or update a model instance specified by certain parameters, if it exists, and creates such an instance if it does not.
What is the point of using these methods instead of, say, trying a get
and falling back to a create
?
One would probably expect them to prevent data races, i.e. if an instance is created after the get
fails but before the create
is issued,
these methods should prevent a duplicate instance from being created.
But the Django documentation gently notes otherwise, in an unhighlighted paragraph of text more than a page into the
notes for get_or_create
:
This method is atomic assuming correct usage, correct database configuration, and correct behavior of the underlying database. However, if uniqueness is not enforced at the database level for the kwargs used in aget_or_create
call (seeunique
orunique_together
), this method is prone to a race-condition which can result in multiple rows with the same parameters being inserted simultaneously.
In fact, the implementation makes no attempt to prevent data races, relying entirely on the database[4]. To add insult to injury, it recommends lowering the MySQL isolation level[5], thereby making your entire system less safe.
Validation defined on the model is only enforced on the form
Django models can be associated with one or more "forms", which are Python representations of HTML forms used to create and update model instances.
Forms have a lot of default behavior, particularly validation, which is controlled by properties of the model.
In fact, many properties of the model exist only to control the forms' default behavior.
Because nobody ever modifies a model instance expect through its form, right?
And good luck keeping track of which constraints are where. Non-null? That's on the model.
Define choices
on a field, explicitly enumerating what values it can have? That's on the form. Uniqueness? On the model.
Decimal field? The model will take any string!
This inconsistent validation also results in a classic terrible user experience: forms, pre-populated with the existing data for an object, that cannot be submitted because the existing data is invalid.
Invalid values are silently coerced to None
Not consistently of course. I can't tell you how many times I've run something like the following:
> Model.objects.filter(field__isnull=True) # retrieve every model instance where field is None
[] # looks like there are no such instances
> Model.objects.get(id=12345).field is None # check whether a particular model instance has field set to None
True # it does
What has happened here is that the serialized model instance has an invalid value, but the value is not NULL
, so the first query finds nothing.
The second query deserializes the instance, and upon encountering the invalid value, silently treats it as if it were NULL
, instead of raising an error like any sane function would.
This tragedy is particularly common with datetimes, since poorly-behaved applications tend to ignore all the nastiness involved in handling datetimes
correctly[6][7].
Duplicate fields abound
Since queries can't use computed columns and adding a real column is only a makemigrations
away,
models steadily accrue duplicate fields that represent the same data in different ways.
Since Django doesn't support complex constraints, these fields inevitably drift out of sync.
Pretty soon half of your Vehicles
have is_motorcycle == True
and wheel_count == 4
,
and you're not sure which field to trust (hint: neither).
One of the great things about Python is that you can refactor inconsistent properties like this with the @property
decorator.
But while the ORM allows you to access columns as properties, the reverse is not true, so you have to manually refactor every query.
Dynamic Templating
Dynamic templating gives your views an enormous surface and makes them effectively untestable[8].
But suppose that you come up with a hack to kinda-sorta test a page. Django's templating engine offers two different kinds of template inheritance,
the extend
and include
tags. This alone means that a single template may have content spread over many files, but it gets worse.
While a template can only extend
one other template, but it can include
arbitrarily many, and can even include
templates
dynamically—making it nearly impossible to tell what the template looks like after resolving inheritance, or even how many different templates you have.
I've worked on systems where nobody knew the order of magnitude—and nobody would be surprised to learn it was in the millions.
Forms, our friends from the previous section, are also the kings of dynamic templating—large, critical chunks of entirely dynamically created HTML—and
as such bring their own pains.
Change the name of a field? Since <input>
names and ids are dynamically generated, now your JavaScript and your CSS are broken!
A designer wants to make an adjustment? Chances are they'll have to touch the Python code too. What could go wrong?
Performance naturally suffers too. In addition to the obvious inefficiency of building every page per-request, you have to compress them per-request as well! Or you would, if you didn't have to abandon compression completely to prevent BREACH attacks, and as a result send several times as much data with each response.
Conclusion
These are far from all the problems with Django[9][10], but what makes them the most insidious is that they grow with scale. Most of these are minor issues when you are small and your project is simple. And Django is useful, which is why so many projects pick it up early on. Then as your project grows, these problems get worse and affect more users.
Can Django be fixed? Most of the data integrity issues are implementation mistakes that could be fixed with relatively little work
and without introducing too much backwards incompatibility.
Performance would certainly suffer, particularly for get_or_create
and update_or_create
, but performance has never been a priority for Django.
Moving validation to the model would be the primary source of backwards incompatibility, but it seems likely that any system which relies on storing invalid values is already broken.
It would be difficult to offer a good way around creating duplicate fields, but this at least is a problem with most frameworks.
Dynamic templating, on the other hand, was a fundamental design mistake, and migrating away from it would be almost as much work as switching frameworks altogether,
so it will probably never leave Django.
In the best possible outcome, where Django fixes most of its data integrity issues and various other warts, I still would recommend against using it.
- ^
The Lawrence Journal-World, served HTTP-only and from the "www2" subdomain.
- ^
Atomicity is a central concept in concurrent computing, and a full introduction would not fit in a footnote. In brief, an atomic transaction is a collection of database queries which cannot be interrupted by another query, and which from the perspective of another query all appear to execute at the same time.
- ^
Making this choice for the developer requires turning autocommit on, in direct contradiction of PEP 249.
- ^
Views are the functions that process incoming requests and return responses.
- ^
The implementation assumes that creating a duplicate will raise an
IntegrityError
, which is only the case if duplicates violate a database constraint. - ^
From the documentation:
If you are using MySQL, be sure to use the READ COMMITTED isolation level rather than REPEATABLE READ (the default), otherwise you may see cases where
get_or_create
will raise anIntegrityError
but the object won't appear in a subsequentget()
call. - ^
Leap Day, Daylight Savings Time, and months indexed from 0 (thanks JavaScript) are all common causes.
- ^
A surprisingly common response is "don't have poorly behaved applications modifying your database". I wonder what kind of utopia these people live in, where they control, or even know about, all the applications modifying their database.
- ^
I would like to thank Tim Best for pointing this out.
- ^
Like the queryset methods
first()
andearliest(field)
, which is likeorder_by(field).first()
except when the queryset is empty, in which casefirst()
returnsNone
whileearliest(field)
raises anObjectDoesNotExist
exception. - ^
Or the
FileField
andFieldFile
API, where the documentation claims thatFieldFile
behaves like Python'sfile
object, for instance documentingFieldFile.open()
as:Behaves like the standard Python
open()
method and opens the file associated with this instance in the mode specified by mode.However, instead of returning an open file (which is helpfully a context manager) like the builtin
open()
,FieldFile.open()
returnsNone
, so none of the same patterns apply.