How to split() strings of indefined item count to Python variables elegantly?

Posted on 2010-04-17 by Mikko Ohtamaa

A common problem for (space) separated string parsing is that there are a number of fixed items followed by some amount of optional items. You want to extract parsed values to Python variables and initialize optional values with None if they are missing.

E.g. we have option format description string with 2 or 3 items value1 value2 [optional value]

s = "foo bar" # This is a valid example option string

s2 = "foo bar optional" # This is also, with one optional item

Usually we’d like to parse this so that non-existing options are set to None. Traditonally one does Python code like below:

# Try get extration and deletion
parts = s.split(" ")
value1 = parts[0]
value2 = parts[1]
if len(parts) >= 3:
    optional_value = parts[2]
else:
    optionanl_value = None

However, this looks little bit ugly, considering if there would always be just two options one could write:

value1, value2 = s.split(" ")

When the number of optional options grow, the boiler-plate code starts annoy greatly.

I have been thinking how to write a split() code to parse options with non-existing items to be set None. Below is my approach for a string with total of 4 options of which any number can be optional.

parts = s.split(" ", 4) # Will raise exception if too many options 
parts += [None] * (4 - len(parts)) # Assume we can have max. 4 items. Fill in missing entries with None.
value1, value2, optional_value, optional_value2 = parts

Any ideas for something more elegant?

$\"\"$ Subscribe to RSS feed Follow me on Twitter Follow me on Facebook Follow me Google+

25 thoughts on “How to split() strings of indefined item count to Python variables elegantly?”

Jack Diederich on 2010-04-17 at 15:19 said:

In python 2.x you can do

txt = “A B”
a, b, c = (txt.split() + [None]*99)[:3]

3.x has a new feature that doesn’t help in this particular situation, but is nice in general:

a, b, *optional = txt.split()
Alec Munro on 2010-04-17 at 15:37 said:

I might be missing some of your goals, and I think Python 3 can actually do it simpler, but something like this works pretty well for me:

parts = s.split() #Space isn’t necessary by default
value1, value2, optionals = parts[0], parts[1], parts[2:]

Hope that helps.
Mikko Ohtamaa on 2010-04-17 at 15:50 said:

Hi Alec. You don’t parse optionals to None which was the point of the blog post.
George Sakkis on 2010-04-17 at 16:17 said:

It’s basically ok, just wrap it in a function if you use it often and give it optional parameters, something like:

def split(s, numsplits, default=None, sep=None, ignore_extra=False):
parts = s.split(sep, numsplits)
if len(parts) > numsplits:
if ignore_extra:
del parts[numsplits:]
else:
raise ValueError(‘too many values to split’)
else:
parts.extend(default for i in xrange(numsplits – len(parts)))
return parts
George Sakkis on 2010-04-17 at 16:19 said:

Oops, indentation was messed up but you can probably infer it.
Juho Vepsäläinen on 2010-04-17 at 17:07 said:

You can try “or”:
…
value1, value2, optionals = parts[0], parts[1], parts[2:] or None
Marius Gedminas on 2010-04-17 at 17:59 said:

The “will raise” comment is incorrect.

split() is different from split(” “): try both on “a b” (two spaces between words).

What you want is probably

def split(s, n):
return (s.split() + [None] * n)[:n]

a, b, c, d = split(s, 4)

Add error handling appropriatelly (too few items, too many items).
Jason Peacock on 2010-04-17 at 18:32 said:

The split() method itself doesn’t raise an exception as inferred by the comment in your example.

>>> ‘1 2 3 4 5’.split(‘ ‘,3)
[‘1’, ‘2’, ‘3’, ‘4 5’]

which will raise an exception when you attempt to break apart the list later in the example (‘too many values to unpack’ error).

Following the “explicit is better than implicit” principle, the following, I think, would be more Pythonic and still make the point of your blog:

parts = s.split()
if len(parts) > 4:
raise ValueError(“Too many options”)
parts += [None] * (4 – len(parts))
value1,value2,opt_value1,opt_value2 = parts

If just want to ignore the extra options, just replace the first line:

parts = s.split()[:4]

BTW, thanks for posting this. This is a cleaner than the way I had been handling optional values in the past.
Benjamin Riggs on 2010-04-17 at 18:34 said:

Look at itertools.izip_longest: http://docs.python.org/library/itertools.html#itertools.izip_longest

dict(itertools.izip_longest(s.split(), (value1, value2, option1, option2), None))
Mikko Ohtamaa on 2010-04-17 at 18:36 said:

I can write a helper function for my application regarding the problem any day and I have already done so *many times*. However, what happens inside my application does not interest Python community as a whole, as the world cannot use the function 🙂

So from this perspective if there is a helper function

* This helper function should end up to Python standard library

* This helper function should end up as PyPi egg (quite heavy option for one function, eh?)

* …other idea…?
James Thiele on 2010-04-17 at 20:08 said:

>>> parts = ‘a b c’.split(‘ ‘)
>>> if len(parts) > 4: raise Exception
…
>>> value1, value2, optional_value, optional_value2 = [parts[i] if i >> value1, value2, optional_value, optional_value2
(‘a’, ‘b’, ‘c’, None)
James Thiele on 2010-04-17 at 20:09 said:

oops! >> should be >
Tom Lynn on 2010-04-17 at 23:41 said:

def run(v1, v2, opt1=None, opt2=None):
pass #…

run(*s.split())
Larry on 2010-04-18 at 01:42 said:

Simplest? With one small extension, I agree with Juho.

>>> maxparts = 4
>>> parts = s.split()[:maxparts]
>>> p1, p2, opts = parts[0], parts[1], parts[2:] or None

Note: this works [silently] when the number of parts is greater than maxparts.
Mikko Ohtamaa on 2010-04-18 at 01:55 said:

Larry: What if there are opts1 and opts2? You don’t split them and that was the point of the blog post.
Larry on 2010-04-18 at 02:47 said:

Ah, but opts is a list is it not? Are they then not split? You can then select individually them from there without any more manipulation.
Mikko Ohtamaa on 2010-04-18 at 05:16 said:

Larry: But I want to see them in Python variables and if opt is not given the variable must be None.

Otherwise you need do if len(opts) > 1:… checks.
Mikko Ohtamaa on 2010-04-18 at 05:17 said:

See previous comments for more proper solutions.
nes on 2010-04-19 at 19:16 said:

I use this line in one of my programs to normalize input row length:

if length!=tcols: row=row[:tcols]+[”]*(tcols-length)
Antti Kaihola on 2010-05-05 at 23:20 said:

What James Thiele probably meant was:

parts = ‘a b c’.split(‘ ‘)
var1, var2, var3, var4 = [parts[i] if i < len(parts) else None for i in range(4)]
Antti Kaihola on 2010-05-05 at 23:25 said:

An example using izip_longest:

parts = ‘a b c’.split(‘ ‘)
var1, var2, var3, var4 = [x[0] for x in itertools.izip_longest(parts, range(4))]

(Mikko, how do you mark up code properly in this blog to preserve indentation and quotes?)
Antti Kaihola on 2010-05-05 at 23:31 said:

Here’s a wacky one:

v1, v2, v3, v4 = (parts.pop(0) if parts else None for i in range(4))
Antti Kaihola on 2010-05-05 at 23:37 said:

itertools madness:

from itertools import islice, chain, repeat
parts = ‘a b c’.split(‘ ‘)
v1, v2, v3, v4 = islice(chain(parts, repeat(None)), 4)
Antti Kaihola on 2010-05-05 at 23:42 said:

Jack Diederich’s solution with an additional check for missing required parts:

For 3 required and 2 optional parts, use:

v1, v2, v3, v4, v5 = (s.split(‘ ‘, 4) + [None] * 2
Antti Kaihola on 2010-05-05 at 23:46 said:

Here’s Jack Diederich’s solution enhanced with a check for missing required items.

For 3 required and 2 optional items, use:

try:
v1, v2, v3, o1, o2 = (‘a b c d’.split(‘ ‘, 4) + [None] * 2)[:5]
except ValueError:
print ‘too few items’

Extra items will all end up in o2.

Open Source Hacker

Pushing the boundaries of free technology

How to split() strings of indefined item count to Python variables elegantly?

25 thoughts on “How to split() strings of indefined item count to Python variables elegantly?”

Leave a Reply Cancel reply