March 13, 2014

Debugging a ZODB bloat

We just earned a larger Plone 4.0 project where the majority of the code appeared in a very bad shape  - basically a summary of all coding mistakes you could do. While trying to migrated the codebase to Plone 4.2.6 we figured out that rendering a simple read-only view of an Archetypes based content-type caused a ZODB write operation of 3-4 KB each time. Inspecting the code of the content class did not directly reveal any pointer to the problem. The methods called during the rendering of the 'view' view did not assign to 'self' - so there was no obvious reason for any badly written code. But how to track this down?

A good start is the troubleshooting section of the Plone documentation - in particular the usage of the debug.py script at the end of the page. Looking at the latest transaction of the ZODB showed us that the transaction is really caused by the 'Artikel' content-type that made us suspicious. But the output of debug.py did not reveal which attributes of the persistent 'Artikel' instance really changed.

The solution is to look into the __setattr__() calls of the related object. So I added a custom __setattr_() implementation to the content-type 'Artikel':

class Artikel(ATDocument):
    def __setattr__(self, k, v):
        print k,v
        return super(Artikel, self).__setattr__(k, v)

This __setattr_() implementation would be called for every change of an attribute on the 'Artikel' instance. Re-running the instance and re-rendering the object produced the following output:

_p_serial ���:S��
_p_estimated_size 3634
__provides__ 
__provides__ 
_p_serial ��v(w
_p_estimated_size 3301

The _p_ attributes are internal attributes used by the ZODB - they appeared normal however there were also changes to the __provides__ attribute of the instance. This happens usually when you attach a marker interface to a persistent object using alsoProvides(). So we grepped through the code and found this:

def get_average_rating(self):
     alsoProvides(self, IUserRatable)
     adapted = IUserRating(self)
     return adapted.averageRating

def get_rating_count(self):
    alsoProvides(self, IUserRatable)
    adapted = IUserRating(self)
    return adapted.numberOfRatings

Both methods are called during the rendering of the 'view' view and with every call it re-attaches/modifies the IUserRatable interface to the instance of the Artikel.....BAD BAD BAD.....moving the code to a place where is is called only once per object resolved the issue and the ZODB bloat.