Fixing POSKeyError: ‘No blob file’ content in Plone

Plone CMS 4.x onwards stores uploaded files and images in ZODB as BLOBs. They exist in var/blobstorage folder structure on the file system, files being named after persistent object ids (non-human-readable). The objects themselves, without file payload, are stored in an append-only database file called filestorage and usually the name of this file is Data.fs.

If you copy Plone site database object data (Data.fs) and forget to copy blobstorage folder, or data gets out of the sync during the copy, various problems appear on the Plone site

  • You cannot access a content item which has a corresponding blob file missing on the file system
  • You cannot rebuild the portal_catalog indexes
  • Database packing may fail

Instead, you’ll see something like this – an evil POSKeyError exception (Persistent Object Storage):

Traceback (most recent call last):
  File "/fast/xxx/eggs/ZODB3-3.10.3-py2.6-macosx-10.6-i386.egg/ZODB/Connection.py", line 860, in setstate
    self._setstate(obj)
  File "/fast/xxx/eggs/ZODB3-3.10.3-py2.6-macosx-10.6-i386.egg/ZODB/Connection.py", line 922, in _setstate
    obj._p_blob_committed = self._storage.loadBlob(obj._p_oid, serial)
  File "/fast/xxx/eggs/ZODB3-3.10.3-py2.6-macosx-10.6-i386.egg/ZODB/blob.py", line 644, in loadBlob
    raise POSKeyError("No blob file", oid, serial)
POSKeyError: 'No blob file'

The proper solution is to fix this problem is to

  • Re-copy blobstorage folder
  • Restart Plone twice in foreground mode (sometimes freshly copied blobstorage folder does not get picked up – some kind of timestamp issue?)
  • Plone site copy instructions

However you may have failed. You may have damaged or lost your blobstorage forever. To get the Plone site to a working state all content having bad BLOB data must be deleted (usually meaning losing some of site images and uploaded files).

Below is Python code for Grok view which you can drop in to your own Plone add-on product. It creates an admin view which you can call directly thru URL. This code will walk thru all the content on your Plone site and tries to delete bad content items with BLOBs missing.

The code handles both Archetypes and Dexterity subsystems’ content types.

Note: Fixing Dexterity blobs with this code have never been tested – please feel free to update the code in collective.developermanual on GitHub if you find it not working properly.

The code, fixblobs.py:

"""

    A Zope command line script to delete content with missing BLOB in Plone, causing
    POSKeyErrors when content is being accessed or during portal_catalog rebuild.

    Tested on Plone 4.1 + Dexterity 1.1.

    http://stackoverflow.com/questions/8655675/cleaning-up-poskeyerror-no-blob-file-content-from-plone-site

    Also see:

    http://pypi.python.org/pypi/experimental.gracefulblobmissing/

"""

# Zope imports
from ZODB.POSException import POSKeyError
from zope.component import getMultiAdapter
from zope.component import queryUtility
from Products.CMFCore.interfaces import IPropertiesTool
from Products.CMFCore.interfaces import IFolderish, ISiteRoot

# Plone imports
from five import grok
from Products.Archetypes.Field import FileField
from Products.Archetypes.interfaces import IBaseContent
from plone.namedfile.interfaces import INamedFile
from plone.dexterity.content import DexterityContent

def check_at_blobs(context):
    """ Archetypes content checker.

    Return True if purge needed
    """

    if IBaseContent.providedBy(context):

        schema = context.Schema()
        for field in schema.fields():
            id = field.getName()
            if isinstance(field, FileField):
                try:
                    field.get_size(context)
                except POSKeyError:
                    print "Found damaged AT FileField %s on %s" % (id, context.absolute_url())
                    return True

    return False

def check_dexterity_blobs(context):
    """ Check Dexterity content for damaged blob fields

    XXX: NOT TESTED - THEORETICAL, GUIDELINING, IMPLEMENTATION

    Return True if purge needed
    """

    # Assume dexterity contennt inherits from Item
    if isinstance(context, DexterityContent):

        # Iterate through all Python object attributes
        # XXX: Might be smarter to use zope.schema introspection here?
        for key, value in context.__dict__.items():
            # Ignore non-contentish attributes to speed up us a bit
            if not key.startswith("_"):
                if INamedFile.providedBy(value):
                    try:
                        value.getSize()
                    except POSKeyError:
                        print "Found damaged Dexterity plone.app.NamedFile %s on %s" % (key, context.absolute_url())
                        return True
    return False

def fix_blobs(context):
    """
    Iterate through the object variables and see if they are blob fields
    and if the field loading fails then poof
    """

    if check_at_blobs(context) or check_dexterity_blobs(context):
        print "Bad blobs found on %s" % context.absolute_url() + " -> deleting"
        parent = context.aq_parent
        parent.manage_delObjects([context.getId()])

def recurse(tree):
    """ Walk through all the content on a Plone site """
    for id, child in tree.contentItems():

        fix_blobs(child)

        if IFolderish.providedBy(child):
            recurse(child)

class FixBlobs(grok.CodeView):
    """
    A management view to clean up content with damaged BLOB files

    You can call this view by

    1) Starting Plone in debug mode (console output available)

    2) Visit site.com/@@fix-blobs URL

    """
    grok.name("fix-blobs")
    grok.context(ISiteRoot)
    grok.require("cmf.ManagePortal")

    def disable_integrity_check(self):
        """  Content HTML may have references to this broken image - we cannot fix that HTML
        but link integriry check will yell if we try to delete the bad image.

        http://collective-docs.readthedocs.org/en/latest/content/deleting.html#bypassing-link-integrity-check "
        """
        ptool = queryUtility(IPropertiesTool)
        props = getattr(ptool, 'site_properties', None)
        self.old_check = props.getProperty('enable_link_integrity_checks', False)
        props.enable_link_integrity_checks = False

    def enable_integrity_check(self):
        """ """
        ptool = queryUtility(IPropertiesTool)
        props = getattr(ptool, 'site_properties', None)
        props.enable_link_integrity_checks = self.old_check

    def render(self):
        #plone = getMultiAdapter((self.context, self.request), name="plone_portal_state")
        print "Checking blobs"
        portal = self.context
        self.disable_integrity_check()
        recurse(portal)
        self.enable_integrity_check()
        print "All done"
        return "OK - check console for status messages"

More info

\"\" Subscribe to RSS feed Follow me on Twitter Follow me on Facebook Follow me Google+

6 thoughts on “Fixing POSKeyError: ‘No blob file’ content in Plone

  1. If (for some crazy reason) you want to know the full path to the blob in the filesystem, i managed to do it like this:

    import Globals
    oid = blob._p_oid
    serial = blob._p_serial
    conn = Globals.DB.open()
    filename = conn._storage.fshelper.getBlobFilename(oid, serial)

  2. The line:
    class FixBlobs(grok.CodeView):

    didn’t work, chokes with an ‘ AttributeError: ‘module’ object has no attribute ‘CodeView”.

    Looks like on has to change it to:

    class FixBlobs(grok.View):

    At least that worked for me.

  3. I have a zeoserver at host1 and have 2 instances at separated machines
    for example, client1 at host2 and client2 at host3.
    How can I buildout the clients to share the same blobstorage located in host1?
    Is there any example of buildout.cfg that has such a configuration?
    Thanks in advance.

  4. I discovered the same POSKeyError: ‘No blob file’ error after copying data.fs and the whole blobstorage (I thought!) when trying to pack the database failed too with error (Plone 4.1.6 and Zope 2.13.8):

    Site Error

    An error was encountered while publishing this resource.

    Error Type: OSError
    Error Value: (21, ‘Is a directory’)

    Solution:

    When you write over the blobstorage, you may use::

    rm -rf [desthost/instancepath]/var/blobstorage/*
    (s)cp -r [sourcehost/instancepath]/var/blobstorage/* [desthost/pinstancepathth]/var/blobstorage/

    This will fail because you may leave the [desthost/pinstancepathth]/var/blobstorage/.layout file unchanged!

    Instead use::

    rm -rf [desthost/instancepath]/var/blobstorage
    (s)cp -r [sourcehost/instancepath]/var/blobstorage [desthost/instancepath]/var/

    This worked for me and is far more “preserving” your content than killing maybe necessary stuff because you shoot yourself into the foot.

    Hope this helps…

    Armin

  5. Hi all,
    I found this great post and I’ve modified the code to be used throught debug console in order investigate zodb blobstorage consistency (no grok view, only a simple `bin/instance run fixblob.py`). I’ve also commented out the manage_delObjects line and added a counter to print the state.

    The problem is that around 100k objects returned by `recurse` function, the systems goes in out-of-memory and kills the process.

    I supose that commenting the `delete` line, there should not be any pending transaction to be committed, right? So memory is full of.. what?

    Portal_catalog reports more than 200k objects in zodb (I know that there are catalogs with milions of objects, my should be considered a mid-low sized db, after all).

    Now the question is, how can I free up the memory during the run and get the script can actually end?

    Any help very appreciated.
    alessandro.

  6. I reply to myself.

    It has been enough to do a `saveopoint(optimistic=True)` every 1000 items.

Leave a Reply

Your email address will not be published. Required fields are marked *