Announcement

Monday, July 04, 2011

Optimizing Django database access : Part 2

Originally published on BootstrapToday Blog 

Few months back we moved to Django 1.2.  Immediately we realized that our method 'optimizing' database queries on foreign key fields doesn't work any more.  The main reason for this 'break' was Django 1.2 changes to support multiple databases. Specially the way 'get' function is called to access value of a ForeignKey field.

In Django 1.2 ForeignKey fields are accessed using rel_mgr. See the call below.

rel_obj = rel_mgr.using(db).get(**params).

However, manager.using() call returns a queryset. Hence if the manager has implemented a custom 'get'  function, this 'custom get' function is not called. Since our database access optimzation is based on 'custom get' function. This change broke our optimization.

Check Django Ticket 16173 for the details.

Also our original code didn't consider multidb scenarios. Hence query 'signature' computation has to consider the db name also.
from django.db import connection
from django.core import signals

def install_cache(*args, **kwargs):
    setattr(connection, 'model_cache', dict())

def clear_cache(*args, **kwargs):
    delattr(connection, 'model_cache')

signals.request_started.connect(install_cache)
signals.request_finished.connect(clear_cache)

class YourModelManager(models.Manager):
    def get(self, *args, **kwargs):
        '''
        Added single row level caching for per connection. Since each request
        opens a new connection, this essentially translates to per request
        caching
        '''
        model_key = (self.db, self.model, repr(kwargs))

        model = connection.model_cache.get(model_key)
        if model is not None:
            return model

        # One of the cache queries missed, so we have to get the object
        # from the database:
        model = super(YourModelManager, self).get(*args, **kwargs)
        if model is not None and model.pk is not None:
            connection.model_cache[model_key]=model
        return model

There are minor changes from the first version. Now this manager class stores the 'db' name also in 'key'.

Along with this change you have to make one change the core Django code. In file django/db/fields/related.py in in __get__ function of 'ReverseSingleRelatedObjectDescriptor' class, Replace the line
         rel_obj = rel_mgr.using(db).get(**params)

by line
         rel_obj = rel_mgr.db_manager(db).get(**params)

This will ensure that YourModelManager's get function is called while querying the foreignkeys.

No comments: