Putting breakpoints to HTML templates in Python

Python offers many different template engines for web application development to turn your view logic to HTML code on the server and then send the resulting HTML code to a web browser. When you are dealing with complex page templates, especially when using third party libraries, it often gets confusing what template context variables are available and what they have eaten.  In this case, the best method of debugging is to insert a breakpoint into your page template and inspect the variables in runtime context. This very useful “advanced” debugging method is not known to many people, especially if they come from the background where command-line debugging tools have not been the norm.

1. Python debugging

Python offers pdb command-line debugger out of the box. I recommend to use more advanced version ipdb, which comes with proper command history, colored tracebacks and tab completion / arrow key support (though there seems to be an issue using ipdb with multithreaded / autoreloading programs). There also exist more GUIful, but still terminal based, pudb. Eclipse + PyDev offer remote debugging support with full GUI.

Read some command-line debugging tutorials if you are not familiar with the concept. Especially how to inspect local variables with commands of locals(), dir(object), for i in dict.items(): print i come to hand.

2. Setting a breakpoint in Django templates

Here is an example how to create a breakpoint tag for Django templates and then go around and debug your templates. In the example, I debug the field rendering look of django-crispy-forms.

First we need to create a custom Django template tag invoking pdb, because you cannot run arbitrary Python snippets directly from Django templates. The code here is based on the example on djangosnippets.org.

templatetags/debug.py

# -*- coding: utf-8 -*-
#
# http://djangosnippets.org/snippets/1550/
#

import pdb as pdb_module

from django.template import Library, Node

register = Library()

class PdbNode(Node):

    def render(self, context):
        pdb_module.set_trace()
        return ''

@register.tag
def pdb(parser, token):
    return PdbNode()

Then we use this tag our template:

{% load crispy_forms_field %}
{% load debug %}

{% if field.is_hidden %}
    {{ field }}
{% else %}

    <div>
        <div>
            {{ field.long_help }}
            {% pdb %}
        </div>
    </div>
{% endif %}

We run Django normally with manage.py runserver and encounter this breakpoint when rendering a template.

Screen Shot 2013-05-16 at 9.36.06 AM

Now we can go around and see what variables are avaialble in the template and what they have eaten.

(Pdb) locals().keys()
['self', 'context']
(Pdb) context.__class__
<class 'django.template.context.RequestContext'>

Ok, so we have available template variables in a local variable called context (makes sense, we are now in our templatetag.py code).

(Pdb) for i in dir(context): print i
...
_reset_dicts
autoescape
current_app
dicts
get
has_key
new
pop
push
render_context
update
use_l10n
use_tz

There seems to be a way to access all template variables:

(Pdb) for d in context.dicts: print d.keys()
[]
['csrf_token']
['request']
['perms', 'user']
['debug', 'sql_queries']
['LANGUAGES', 'LANGUAGE_BIDI', 'LANGUAGE_CODE']
['MEDIA_URL']
['STATIC_URL']
['TIME_ZONE']
['messages']
['downtime']
['block']
['flat_attrs', 'inputs', 'csrf_token', 'form_id', 'form_error_title', 
'form_action', 'error_text_inline', 'html5_required', 'help_text_inline', 
'formset_error_title', 'form_style', 'form_method', 'form_show_errors', 
'is_formset', 'form_tag', 'attrs', 'form_attrs', 'form_class']

Now we can see what come through to our template as the field and widget in the field rendering loop.

(Pdb) context["field"]
<django.forms.forms.BoundField object at 0x1038ae590>

(Pdb) for i in context["field"].__dict__.items(): print i
('html_initial_name', u'initial-ad-xxx_type')
('form', <xxx.Form object at 0x103064c10>)
('html_name', 'ad-xxx_type')
('html_initial_id', u'initial-ad-id_ad-xxx_type')
('label', u'Foobar')
('field', <django.forms.fields.TypedChoiceField object at 0x103062e50>)
('help_text', '')
('name', 'xxx_type')

And after you see this, you can figure out if your data is coming through the templating stack and is it coming through correctly and what you need to do in order to bend it to your will.

3. Other templating engines

See also an example for Chameleon templates. Unfortunately Plone, out of the box, cannot support this debugging method due to the security.

 

 

 

\"\" Subscribe to RSS feed Follow me on Twitter Follow me on Facebook Follow me Google+

Exporting and sharing Sublime Text configuration

Sublime Text is a very powerful and popular text editor. But it’s more than a text editor… it’s an ecosystem of programmer’s tools where you can go to armory and choose the winning set for every code you’ll face. However, entering this ecosystem is long walk; learning all the plugins, tricks and such will take time. In this blog post I’ll show how more experienced Sublime Text user can share his/her setup with fellow peers still learning Sublime Text. Also, I have created a shell script collection which helps copying configurations into a fresh Sublime Text installation.

Screen Shot 2013-05-09 at 1.21.35 AM

If you are new to Sublime Text 2 read my crash course article. Instructions in this blog post have been tested with Sublime Text 2. Sublime Text 3 is in beta, but it is not recommended yet for every day usage yet as some of the workflow critical plugins have not yet been ported for this version.

Sublime Text configuration mainly consists of following

  • Packages/ folder where all your installed plugins and setting files lie (shortcut into this available through preferences menu)
  • Packages/User/ folder where are the files specific to your configuration (other folders in Packages/ folder are pristine plugin files and should not be directly edited)
  • Packages/User/Preferences.sublime-settings stores the preferences you have changed
  • Packages/User/Package Control.sublime-settings contains the list of installed addons

1. Configuration files needing copying

To transfer your Sublime Text configuration to fellow user you only need to copy

It is not safe to copy plugins as is between environments, as plugins may contain operating system or computer specific dependencies. It’s better to copy Package Control.sublime-settings file only. On the next Sublime Text start up Package Control will automatically read the list of packages from this file and install any missing packages from Github.

Manually installed packages, those not through Package Control, must be installed by hand on the target computer.

For personal backup purposes you can sync Sublime Text configuration using Dropbox.

2. Automated setups

I have created a sublime-helper, a collection of shell scripts, which will help copying Sublime Text configurations (hosted on Github). It comes with my own configuration and package list; please free to clone and do whatever you want it with to make your and your peers’ workflow smooth.

The script will also setup subl shell alias to open Sublime Text from command line. Also, your UNIX EDITOR environment variable will point to special subl-wrapper-wait-exit command which integrates Sublime Text for git and svn commit message editing.

I tested the script by transferring my Sublime Text 2 configuration from OSX to Ubuntu Linux. A person new to Sublime Text was able to configure the editor and grasp some basics in 5 minutes. After seeing SublimeLinter jshint warning highlighting and JsFormat plugins the test person reaction was: “Oh my god, oh my god, oh my god.”

I think he is not switching back to Aptana + Eclipse.

Screen Shot 2013-05-09 at 1.47.51 AM

 

\"\" Subscribe to RSS feed Follow me on Twitter Follow me on Facebook Follow me Google+

Converting presentation slides to HTML blog post with images

Here is a Python script to convert a PDF to series of HTML <img> tags with alt texts. It makes the presentation suitable embedded for a blog post and reading on a mobile device and such.

The motivation for creating this script is that services creating embedded slideshow viewers (SlideShare, Speaker Deck) create <iframe> based output that is not RSS reader friendly. For example, blog aggregation services and soon defunct Google Reader may strip out the presentations <iframe> from RSS feeds. Also, generated <iframe> viewers are not mobile friendly. If you want to share your slides in a blog post using plain old HTML and images is the most bullet-proof way of doing it.

The script also shows the power of Python scripting, when combined with UNIX command line tools like Ghostscript PDF renderer and high availability of ready-made Python libraries from PyPi repository.

Example Workflow:

  • Export presentation from Apple Keynote to PDF file. On Export dialog untick include date and add borders around slides.
  • Run the script against generated PDF file to convert it to a series of JPEG files and a HTML snippet with <img> tags
  • Optionally, the scripts adds a full URL prefix to <img src>, so you don’t need to manually link images to your hosting service absolute URL
  • Copy-paste generated HTML to your blog post

Tested with Apple Keynote exported PDFs, but the approach should work for any PDF content.

See example blog post and presentation.

slide7

Source code below. The full source code with README and usage instructions is available on Github.

"""
    PDF to HTML converter.

"""

import os
import sys

import pyPdf
from pyPdf.pdf import ContentStream
from pyPdf.pdf import TextStringObject

SLIDE_TEMPLATE = u'<p><img src="{prefix}{src}" alt="{alt}" /></p>'

# You can pass Ghostscript binary to the script as an environment variable.
GHOSTSCRIPT = os.environ.get("GHOSTSCRIPT", "gs")

def create_images(src, target, width=620, height=480):
    """ Create series of images from slides.

    http://right-sock.net/linux/better-convert-pdf-to-jpg-using-ghost-script/

    :param src: Source PDF file

    :param target: Target folder
    """

    if target.endswith("/"):
        target = target[0:-1]

    # Generated filenames
    ftemplate = "%(target)s/slide%%d.jpg" % locals()

    # gs binary
    ghostscript = GHOSTSCRIPT

    # Export magic of doing
    # Note: Ghostscript 9.06 crashed for me
    # had to upgrade 9.07
    # This command does magic of anti-aliasing text and settings output JPEG dimensions correctly
    cmd = "%(ghostscript)s -dNOPAUSE -dPDFFitPage -dTextAlphaBits=4 -sDEVICE=jpeg -sOutputFile=%(ftemplate)s -dJPEGQ=80 -dDEVICEWIDTH=%(width)d -dDEVICEHEIGHT=%(height)d  %(src)s -c quit"
    cmd = cmd % locals()  # Apply templating

    if os.system(cmd):
        raise RuntimeError("Command failed: %s" % cmd)

def extract_text(self):
    """ Patched extractText() from pyPdf to put spaces between different text snippets.
    """
    text = u""
    content = self["/Contents"].getObject()
    if not isinstance(content, ContentStream):
        content = ContentStream(content, self.pdf)
    # Note: we check all strings are TextStringObjects.  ByteStringObjects
    # are strings where the byte->string encoding was unknown, so adding
    # them to the text here would be gibberish.
    for operands, operator in content.operations:
        if operator == "Tj":
            _text = operands[0]
            if isinstance(_text, TextStringObject):
                text += _text
        elif operator == "T*":
            text += "\n"
        elif operator == "'":
            text += "\n"
            _text = operands[0]
            if isinstance(_text, TextStringObject):
                text += operands[0]
        elif operator == '"':
            _text = operands[2]
            if isinstance(_text, TextStringObject):
                text += "\n"
                text += _text
        elif operator == "TJ":
            for i in operands[0]:
                if isinstance(i, TextStringObject):
                    text += i

        if text and not text.endswith(" "):
            text += " "  # Don't let words concatenate

    return text

def scrape_text(src):
    """ Read a PDF file and return plain text of each page.

    http://stackoverflow.com/questions/25665/python-module-for-converting-pdf-to-text

    :return: List of plain text unicode strings
    """

    pages = []

    pdf = pyPdf.PdfFileReader(open(src, "rb"))
    for page in pdf.pages:
        text = extract_text(page)
        pages.append(text)

    return pages

def create_index_html(target, slides, prefix):
    """ Generate HTML code for `<img>` tags.
    """

    out = open(target, "wt")

    print >> out, "<!doctype html>"
    for i in xrange(0, len(slides)):
        alt = slides[i]  # ALT text for this slide
        params = dict(src=u"slide%d.jpg" % (i+1), prefix=prefix, alt=alt)
        line = SLIDE_TEMPLATE.format(**params)
        print >> out, line.encode("utf-8")

    out.close()

def main():
    """ Entry point. """

    if len(sys.argv) < 3:
        sys.exit("Usage: pdf2html.py mypresentation.pdf targetfolder [image path prefix]")

    src = sys.argv[1]
    folder = sys.argv[2]

    if len(sys.argv) > 3:
        prefix = sys.argv[3]
    else:
        prefix = ""

    if not os.path.exists(folder):
        os.makedirs(folder)

    alt_texts = scrape_text(src)

    target_html = os.path.join(folder, "index.html")

    create_index_html(target_html, alt_texts, prefix)

    create_images(src, folder)

if __name__ == "__main__":
    main()

\"\" Subscribe to RSS feed Follow me on Twitter Follow me on Facebook Follow me Google+