We all write bad code from time to time. I tend to do it deliberately when I’m experimenting in the interpreter or the debugger. I can’t say why I do it exactly except that it’s fun to see what I can fit in one line of code. Anyway, this week I needed to parse a file that looked like this:
00000 TM YT TSA1112223 0000000000000000000000000020131007154712PXXXX_Name_test_test_section.txt data line 1 data line 2
You can download it here if you don’t want to scroll around the above snippet. Anyway, the objective was to parse out the part of the filename in the header that is labeled “section” in the above. The filename itself is XXXX_Name_test_test_section.txt. That part of the filename is used to identify something in a database. So in the Python interpreter, I came up with the following hack:
header.split("|")[0][125:].split("_")[-1].split(".")[0]
Let’s give this some more context. The above snippet fits into a function like this:
#---------------------------------------------------------------------- def getSection(path): """""" with open(path) as inPath: lines = inPath.readlines() header = lines.pop(0) section = header.split("|")[0][125:].split("_")[-1].split(".")[0]
That’s pretty ugly code and would require a fairly lengthy comment explaining what I’m doing. Not that I planned to actually use it in the first place, but it was fun to write. Anyway, I ended up breaking it down into some code that’s something like the following:
#---------------------------------------------------------------------- def getSection(path): """""" with open(path) as inPath: lines = inPath.readlines() header = lines.pop(0) filename = header[125:256].strip() fname = os.path.splitext(filename)[0] section = fname.split("_")[-1] print section print header.split("|")[0][125:].split("_")[-1].split(".")[0] #---------------------------------------------------------------------- if __name__ == "__main__": path = "test.txt" getSection(path)
I’m leaving the “bad” code in the example above to show that the result is the same. Yes, it’s still a little ugly, but it’s much easier to follow and the code is self-documenting.
What funky code have you written lately?
Pingback: Mike Driscoll: Python: Bad Code of the Day (Oct 30th, 2013 Ed) | The Black Velvet Room
I would have used a regular expression – much cleaner, more general and extensible.
Scott, give the regular expression then? If the rules are this simple and don’t have to be more generic, I don’t see the need for regex.
In this case I would simply do:
with open(path) as f_in:
header = f_in.readline()
section = header[header.rfind(‘_’) + 1: header.rfind(‘.’)].
A RegEx seems to be overkill. I like melker’s solution, it’s simple and elegant. I might have done a `os.path.splitext(fname)` before and simply used `header[header.rfind(‘_’) + 1:]`, but it seems unnessesary in this case.
I did mention what the input was at the very beginning of the article. You can even download it.
Thanks for taking the time to come up with some regex examples.
Oh, yeah…that makes sense. Thanks for clarifying. I should do that in my example code more often.
Why to read all the lines? Files that are extracts from a database may be very large. Why not to read only the header line as
line = inPath.readline()
you mean you don’t know how to write the RE? There is nothing simple or elegant about “header.split(“|”)[0][125:].split(“_”)[-1].split(“.”)[0]”. That could take someone an hour to figure out. Especially with no comments or example input line to reference. And it’s hard-coded (125). It’s fragile and brittle as hell. Bad, bad, bad.