Below is a sample script to automatically generate descriptions based on page body text. It is for Plone CMS, but should be applicable to any Python based CMS with some modifications.
The idea is that we take three first sentences and use them as a description.
Use case: People are lazy to write descriptions (descriptions as in Dublin Core metadata). You can generate some kind of description by taking the few first sentences of the text. This is not perfect, but this is way better than empty description. Also, the script comes with good comments which should be helpful for beginner Plone programmers.
Please comment if you have other simple ideas to generate descriptions.
Usage
- Add Script (Python) item through Zope Management interface to any Plone folder
- Put in the code payload below
- Hit Test tab or type in Script URL manually – note that the operation is one shot only
- The script iterates through all content items in that folder
- The script will provide logging output to standard Plone log (var/log and stdout if Plone is run in the debug mode).
Since Zope uses RestrictedPython for through-the-web created scripts, the user of this script cannot breach the server security (they cannot make Python calls they have no permission for). This sets some limitations for automating tasks like this, but we don’t hit those limitations in our use case.
def create_automatic_description(content, text_field_name="text"): """ Creates an automatic description from HTML body by taking three first sentences. Takes the body text @param content: Any Plone contentish item (they all have description) @param text_field_name: Which schema field is used to supply the body text (may very depending on the content type) """ # Body is Archetype "text" field in schema by default. # Accessor can take the desired format as a mimetype parameter. # The line below should trigger conversion from text/html -> text/plain automatically using portal_transforms field = content.Schema()[text_field_name] # Returns a Python method which you can call to get field's # for a certain content type. This is also security aware # and does not breach field-level security provded by Archetypes accessor = field.getAccessor(content) # body is UTF-8 body = accessor(mimetype="text/plain") # Now let's take three first sentences or the whole content of body sentences = body.split(".") if len(sentences) > 3: intro = ".".join(sentences[0:3]) intro += "." # Don't forget closing the last sentence else: # Body text is shorter than 3 sentences intro = body content.setDescription(intro) # context is the reference of the folder where this script is run for id, item in context.contentItems(): # Iterate through all content items (this ignores Zope objects like this script itself) # Use RestrictedPython safe logging. # plone_log() method is permission aware and available on any contentish object # so we can safely use it from through-the-web scripts context.plone_log("Fixing:" + id) # Check that the description has never been saved (None) # or it is empty, so we do not override a description someone has # set before automatically or manually desc = context.Description() # All Archetypes accessor method, returns UTF-8 encoded string if desc is None or desc.strip() == "": # We use the HTML of field called "text" to generate the description create_automatic_description(item, "text") # This will be printed in the browser when the script completes succesfully return "OK"
Subscribe to RSS feed Follow me on Twitter Follow me on Facebook Follow me Google+
Hey,
nice idea — perhaps splitting on “. ” to get whole sentences would be a better idea, at least that’s what I use. Simple dots are far too common in normal text, IMHO.
Pingback: Der Splog