How to split() strings of indefined item count to Python variables elegantly?

A common problem for  (space) separated string parsing is that there are a number of fixed items followed by some amount of optional items. You want to extract parsed values to Python variables and initialize optional values with None if they are missing.

E.g. we have option format description string with 2 or 3 items value1 value2 [optional value]

s = "foo bar" # This is a valid example option string
s2 = "foo bar optional" # This is also, with one optional item

Usually we’d like to parse this so that non-existing options are set to None. Traditonally one does Python code like below:

# Try get extration and deletion
parts = s.split(" ")
value1 = parts[0]
value2 = parts[1]
if len(parts) >= 3:
    optional_value = parts[2]
else:
    optionanl_value = None

However, this looks little bit ugly, considering  if there would always be just two options one could write:

value1, value2 = s.split(" ")

When the number of optional options grow, the boiler-plate code starts annoy greatly.

I have been thinking how to write a split() code to parse options with non-existing items to be set None. Below is my approach for a string with total of 4 options of which any number can be optional.

parts = s.split(" ", 4) # Will raise exception if too many options 
parts += [None] * (4 - len(parts)) # Assume we can have max. 4 items. Fill in missing entries with None.
value1, value2, optional_value, optional_value2 = parts

Any ideas for something more elegant?

 Subscribe to RSS feed Follow me on Twitter Follow me on Facebook Follow me Google+

25 thoughts on “How to split() strings of indefined item count to Python variables elegantly?

  1. In python 2.x you can do

    txt = “A B”
    a, b, c = (txt.split() + [None]*99)[:3]

    3.x has a new feature that doesn’t help in this particular situation, but is nice in general:

    a, b, *optional = txt.split()

  2. I might be missing some of your goals, and I think Python 3 can actually do it simpler, but something like this works pretty well for me:

    parts = s.split() #Space isn’t necessary by default
    value1, value2, optionals = parts[0], parts[1], parts[2:]

    Hope that helps.

  3. It’s basically ok, just wrap it in a function if you use it often and give it optional parameters, something like:

    def split(s, numsplits, default=None, sep=None, ignore_extra=False):
    parts = s.split(sep, numsplits)
    if len(parts) > numsplits:
    if ignore_extra:
    del parts[numsplits:]
    else:
    raise ValueError(‘too many values to split’)
    else:
    parts.extend(default for i in xrange(numsplits – len(parts)))
    return parts

  4. Oops, indentation was messed up but you can probably infer it.

  5. The “will raise” comment is incorrect.

    split() is different from split(” “): try both on “a b” (two spaces between words).

    What you want is probably

    def split(s, n):
    return (s.split() + [None] * n)[:n]

    a, b, c, d = split(s, 4)

    Add error handling appropriatelly (too few items, too many items).

  6. The split() method itself doesn’t raise an exception as inferred by the comment in your example.

    >>> ’1 2 3 4 5′.split(‘ ‘,3)
    ['1', '2', '3', '4 5']

    which will raise an exception when you attempt to break apart the list later in the example (‘too many values to unpack’ error).

    Following the “explicit is better than implicit” principle, the following, I think, would be more Pythonic and still make the point of your blog:

    parts = s.split()
    if len(parts) > 4:
    raise ValueError(“Too many options”)
    parts += [None] * (4 – len(parts))
    value1,value2,opt_value1,opt_value2 = parts

    If just want to ignore the extra options, just replace the first line:

    parts = s.split()[:4]

    BTW, thanks for posting this. This is a cleaner than the way I had been handling optional values in the past.

  7. I can write a helper function for my application regarding the problem any day and I have already done so *many times*. However, what happens inside my application does not interest Python community as a whole, as the world cannot use the function :)

    So from this perspective if there is a helper function

    * This helper function should end up to Python standard library

    * This helper function should end up as PyPi egg (quite heavy option for one function, eh?)

    * …other idea…?

  8. >>> parts = ‘a b c’.split(‘ ‘)
    >>> if len(parts) > 4: raise Exception

    >>> value1, value2, optional_value, optional_value2 = [parts[i] if i >> value1, value2, optional_value, optional_value2
    (‘a’, ‘b’, ‘c’, None)

  9. def run(v1, v2, opt1=None, opt2=None):
    pass #…

    run(*s.split())

  10. Simplest? With one small extension, I agree with Juho.

    >>> maxparts = 4
    >>> parts = s.split()[:maxparts]
    >>> p1, p2, opts = parts[0], parts[1], parts[2:] or None

    Note: this works [silently] when the number of parts is greater than maxparts.

  11. Ah, but opts is a list is it not? Are they then not split? You can then select individually them from there without any more manipulation.

  12. Larry: But I want to see them in Python variables and if opt is not given the variable must be None.

    Otherwise you need do if len(opts) > 1:… checks.

  13. I use this line in one of my programs to normalize input row length:

    if length!=tcols: row=row[:tcols]+['']*(tcols-length)

  14. What James Thiele probably meant was:

    parts = ‘a b c’.split(‘ ‘)
    var1, var2, var3, var4 = [parts[i] if i < len(parts) else None for i in range(4)]

  15. An example using izip_longest:

    parts = ‘a b c’.split(‘ ‘)
    var1, var2, var3, var4 = [x[0] for x in itertools.izip_longest(parts, range(4))]

    (Mikko, how do you mark up code properly in this blog to preserve indentation and quotes?)

  16. Here’s a wacky one:

    v1, v2, v3, v4 = (parts.pop(0) if parts else None for i in range(4))

  17. itertools madness:

    from itertools import islice, chain, repeat
    parts = ‘a b c’.split(‘ ‘)
    v1, v2, v3, v4 = islice(chain(parts, repeat(None)), 4)

  18. Jack Diederich’s solution with an additional check for missing required parts:

    For 3 required and 2 optional parts, use:

    v1, v2, v3, v4, v5 = (s.split(‘ ‘, 4) + [None] * 2

  19. Here’s Jack Diederich’s solution enhanced with a check for missing required items.

    For 3 required and 2 optional items, use:

    try:
    v1, v2, v3, o1, o2 = (‘a b c d’.split(‘ ‘, 4) + [None] * 2)[:5]
    except ValueError:
    print ‘too few items’

    Extra items will all end up in o2.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>