A common problem for (space) separated string parsing is that there are a number of fixed items followed by some amount of optional items. You want to extract parsed values to Python variables and initialize optional values with None if they are missing.
E.g. we have option format description string with 2 or 3 items value1 value2 [optional value]
s = "foo bar" # This is a valid example option string
s2 = "foo bar optional" # This is also, with one optional item
Usually we’d like to parse this so that non-existing options are set to None. Traditonally one does Python code like below:
# Try get extration and deletion parts = s.split(" ") value1 = parts[0] value2 = parts[1] if len(parts) >= 3: optional_value = parts[2] else: optionanl_value = None
However, this looks little bit ugly, considering if there would always be just two options one could write:
value1, value2 = s.split(" ")
When the number of optional options grow, the boiler-plate code starts annoy greatly.
I have been thinking how to write a split() code to parse options with non-existing items to be set None. Below is my approach for a string with total of 4 options of which any number can be optional.
parts = s.split(" ", 4) # Will raise exception if too many options parts += [None] * (4 - len(parts)) # Assume we can have max. 4 items. Fill in missing entries with None. value1, value2, optional_value, optional_value2 = parts
Any ideas for something more elegant?
Subscribe to RSS feed Follow me on Twitter Follow me on Facebook Follow me Google+
In python 2.x you can do
txt = “A B”
a, b, c = (txt.split() + [None]*99)[:3]
3.x has a new feature that doesn’t help in this particular situation, but is nice in general:
a, b, *optional = txt.split()
I might be missing some of your goals, and I think Python 3 can actually do it simpler, but something like this works pretty well for me:
parts = s.split() #Space isn’t necessary by default
value1, value2, optionals = parts[0], parts[1], parts[2:]
Hope that helps.
Hi Alec. You don’t parse optionals to None which was the point of the blog post.
It’s basically ok, just wrap it in a function if you use it often and give it optional parameters, something like:
def split(s, numsplits, default=None, sep=None, ignore_extra=False):
parts = s.split(sep, numsplits)
if len(parts) > numsplits:
if ignore_extra:
del parts[numsplits:]
else:
raise ValueError(‘too many values to split’)
else:
parts.extend(default for i in xrange(numsplits – len(parts)))
return parts
Oops, indentation was messed up but you can probably infer it.
You can try “or”:
…
value1, value2, optionals = parts[0], parts[1], parts[2:] or None
The “will raise” comment is incorrect.
split() is different from split(” “): try both on “a b” (two spaces between words).
What you want is probably
def split(s, n):
return (s.split() + [None] * n)[:n]
a, b, c, d = split(s, 4)
Add error handling appropriatelly (too few items, too many items).
The split() method itself doesn’t raise an exception as inferred by the comment in your example.
>>> ‘1 2 3 4 5’.split(‘ ‘,3)
[‘1’, ‘2’, ‘3’, ‘4 5’]
which will raise an exception when you attempt to break apart the list later in the example (‘too many values to unpack’ error).
Following the “explicit is better than implicit” principle, the following, I think, would be more Pythonic and still make the point of your blog:
parts = s.split()
if len(parts) > 4:
raise ValueError(“Too many options”)
parts += [None] * (4 – len(parts))
value1,value2,opt_value1,opt_value2 = parts
If just want to ignore the extra options, just replace the first line:
parts = s.split()[:4]
BTW, thanks for posting this. This is a cleaner than the way I had been handling optional values in the past.
Look at itertools.izip_longest: http://docs.python.org/library/itertools.html#itertools.izip_longest
dict(itertools.izip_longest(s.split(), (value1, value2, option1, option2), None))
I can write a helper function for my application regarding the problem any day and I have already done so *many times*. However, what happens inside my application does not interest Python community as a whole, as the world cannot use the function 🙂
So from this perspective if there is a helper function
* This helper function should end up to Python standard library
* This helper function should end up as PyPi egg (quite heavy option for one function, eh?)
* …other idea…?
>>> parts = ‘a b c’.split(‘ ‘)
>>> if len(parts) > 4: raise Exception
…
>>> value1, value2, optional_value, optional_value2 = [parts[i] if i >> value1, value2, optional_value, optional_value2
(‘a’, ‘b’, ‘c’, None)
oops! >> should be >
def run(v1, v2, opt1=None, opt2=None):
pass #…
run(*s.split())
Simplest? With one small extension, I agree with Juho.
>>> maxparts = 4
>>> parts = s.split()[:maxparts]
>>> p1, p2, opts = parts[0], parts[1], parts[2:] or None
Note: this works [silently] when the number of parts is greater than maxparts.
Larry: What if there are opts1 and opts2? You don’t split them and that was the point of the blog post.
Ah, but opts is a list is it not? Are they then not split? You can then select individually them from there without any more manipulation.
Larry: But I want to see them in Python variables and if opt is not given the variable must be None.
Otherwise you need do if len(opts) > 1:… checks.
See previous comments for more proper solutions.
I use this line in one of my programs to normalize input row length:
if length!=tcols: row=row[:tcols]+[”]*(tcols-length)
What James Thiele probably meant was:
parts = ‘a b c’.split(‘ ‘)
var1, var2, var3, var4 = [parts[i] if i < len(parts) else None for i in range(4)]
An example using izip_longest:
parts = ‘a b c’.split(‘ ‘)
var1, var2, var3, var4 = [x[0] for x in itertools.izip_longest(parts, range(4))]
(Mikko, how do you mark up code properly in this blog to preserve indentation and quotes?)
Here’s a wacky one:
v1, v2, v3, v4 = (parts.pop(0) if parts else None for i in range(4))
itertools madness:
from itertools import islice, chain, repeat
parts = ‘a b c’.split(‘ ‘)
v1, v2, v3, v4 = islice(chain(parts, repeat(None)), 4)
Jack Diederich’s solution with an additional check for missing required parts:
For 3 required and 2 optional parts, use:
v1, v2, v3, v4, v5 = (s.split(‘ ‘, 4) + [None] * 2
Here’s Jack Diederich’s solution enhanced with a check for missing required items.
For 3 required and 2 optional items, use:
try:
v1, v2, v3, o1, o2 = (‘a b c d’.split(‘ ‘, 4) + [None] * 2)[:5]
except ValueError:
print ‘too few items’
Extra items will all end up in o2.