
N.B. This is a cross-post from the YPlan tech blog .
During PyCon UK I had the opportunity to work on open-sourcing our in-house Django performance testing tool, which has now been released as django-perf-rec . We created it over two years ago, and have been using and improving it since. It has been helping us to pre-emptively fix performance problems in our code, and now it can help you!
In the old days we used to often see performance regressions when we introduced a new feature to existing code, and we’d have to retroactively understand and fix them. For example, we might add a feature that accesses a new foreign key on a model, and because select_related / prefetch_related hadn’t been added to the appropriate QuerySet , we’d see an N+1 query problem appear. Often the problems would only manifest in real slowness in production, as our test and development environments don’t contain much data.
We tried to lock these down with Django’s assertNumQueries ( docs ) in tests like:
def test_books(self): with self.assertNumQueries(4): self.client.get('/books/')
This worked on a basic level, but if the test failed on you, you’d be left with little information on what caused the failure, and be forced to manually trace the code path, thinking along the way about where the change came from.
Failures happened often enough for us that we started adding comments by assertNumQueries to track roughly what the expected queries were, to make retracing easier:
def test_books(self): with self.assertNumQueries(4): # - Get User # - Get Books # - Prefetch Authors # - Update Stats counters self.client.get('/books/')
Suffice to say this wasn’t fun. If a query was added or removed, you’d have to manually edit all the comments. Also they could still ‘rot’ and become inaccurate - if one query was removed whilst another was added elsewhere, the test would continue to pass but the comments would be out of sync, making debugging at the next failure harder.
We had the insight was that the comments actually contained data, and this data could be grabbed and written down by a tool automatically…
From this idea, django-perf-rec was born. When active it intercepts all database queries (and also cache operations!), and writes them out to a YAML file that lives next to the test. Then when the test runs again, it compares the newly captured data to the record in the file, and fails if there are any differences. Thus the above test can now be written as:
import django_perf_rec # ... def test_books(self): with django_perf_rec.record(): self.client.get('/books/')
It also deals with variable data changing in your SQL and cache keys by fingerprinting it. For example the YAML for the above test might look like the following - note the SQL parameters have been replaced with # , and the column lists with ... :
MyTests.test_books: - cache|get: session.# - db: 'SELECT ... FROM myapp_users WHERE (myapp_users.id = #)' - db: 'SELECT ... FROM myapp_books ORDER BY # LIMIT #' - db: 'SELECT ... FROM myapp_authors WHERE (myapp_authors.id IN #)' - db: 'UPDATE myapp_stats SET # WHERE #'
When a failure happens, you get the exact comparison between the old and new lists, making it easy to understand why a change has happened. (We’re using pytest which gives us nice output - PR’s accepted for improving the output on other test runners!). If the changes are acceptable you can just delete the YAML file and rerun the test to check them in as part of the diff. If not, you have a lot more information to use in finding the problematic code.
It works with parallel test running (we use pytest-xdist ), so the YAML files don’t get corrupted whilst multiple processes write to them. Plus it also comes with a TestCase mixin so you don’t have to import it in every file you want to use it in!
Check it out today at https://github.com/YPlan/django-perf-rec and if you need an improvement please open an issue, or better yet, a pull request!
Tags:django