Automatically removing old items from a Plone site

Below is an advanced version for old item date based deletion code which issuitable for huge sites. This snippet is from Products.feedfeeder package. It will look for Feedfeeder items (automatically generated from RSS) which are older than X days and delete them.

It’s based on Zope 3 page registration (sidenote: I noticed that views do not need to be based on BrowserView page class).

  • Transaction thresholds make sure the code runs faster
  • Logging to Plone event log files
  • Number of days to look into past is not hardcoded
  • Manage rights needed to execute the code

You can call this view like:

http://localhost:9999/plonecommunity/@@feed-mega-cleanup?days=90

… hook it to Zope clock server or run as crond job.

Here is the view Python source code:

import logging

import transaction
from zope import interface
from zope import component
import DateTime
import zExceptions

logger = logging.getLogger("feedfeeder")

class MegaClean(object):
    """ Clean-up old feed items by deleting them on the site.

    This is intended to be called from cron weekly.
    """

    def __init__(self, context, request):
        self.context = context
        self.request = request

    def clean(self, days, transaction_threshold=100):
        """ Perform the clean-up by looking old objects and deleting them.

        Commit ZODB transaction for every N objects to that commit buffer does not grow
        too long (timewise, memory wise).

        @param days: if item has been created before than this many days ago it is deleted

        @param transaction_threshold: How often we commit - for every nth item
        """

        logger.info("Beginning feed clean up process")

        context = self.context.aq_inner
        count = 0

        # DateTime deltas are days as floating points
        end = DateTime.DateTime() - days
        start = DateTime.DateTime(2000, 1,1)

        date_range_query = { 'query':(start,end), 'range': 'min:max'}

        items = context.portal_catalog.queryCatalog({"portal_type":"FeedFeederItem",
                                             "created" : date_range_query,
                                             "sort_on" : "created"
                                            })

        items = list(items)

        logger.info("Found %d items to be purged" % len(items))

        for b in items:
            count += 1
            obj = b.getObject()
            logger.info("Deleting:" + obj.absolute_url() + " " + str(obj.created()))
            obj.aq_parent.manage_delObjects([obj.getId()])

            if count % transaction_threshold == 0:
                # Prevent transaction becoming too large (memory buffer)
                # by committing now and then
                logger.info("Committing transaction")
                transaction.commit()

        msg = "Total %d items removed" % count
        logger.info(msg)

        return msg

    def __call__(self):

        days = self.request.form.get("days", None)
        if not days:
            raise zExceptions.InternalError("Bad input. Please give days=60 as HTTP GET query parameter")

        days = int(days)

        return self.clean(days)

Then we have the view ZCML registration:

<page
    name="feed-mega-cleanup"
    for="Products.CMFCore.interfaces.ISiteRoot"
    permission="cmf.ManagePortal"
    class=".feed.MegaClean"
    />

\"\" Subscribe to RSS feed Follow me on Twitter Follow me on Facebook Follow me Google+

Leave a Reply

Your email address will not be published. Required fields are marked *