Category Archives: Python

Processing text files with mixed line endings in Python

A bane of any cross-platform developer’s life is line endings. It’s sad but true that there is no way to have just one line ending for your files. No matter what you do, it’s a bad compromise (e.g. because say you force CRLF conversions everywhere – this breaks any Unix scripts you might have).

Python has “universal line ending” support when reading files. If you add ‘U’ to the file mode when you open a file, then Python automagically converts line endings. Here’s a Python 2.x snippet

f = open("mixed.txt", "rU")
for line in f:
  print line,
# f.newlines contains None, a single line ending value, or a tuple of line ending values

The one downside is that you lose the line endings per line. Most of the time, you don’t care, because you’re processing files. But if you want to faithfully reproduce the files you read, you’ll need to do some thing in the case where the newlines attribute returns a tuple and not a single value.

But this is cool, because lots of scripts process text files, and this simplifies such processing tremendously.

Grokking Scons

Scons is an interesting idea, but frustrating to work with. Here’s the documentation I’ve been amassing as I’ve been figuring things out.

SCons 2.2.0 User Guide: http://www.scons.org/doc/2.2.0/HTML/scons-user.html. SCons Wiki is at http://www.scons.org/wiki.

Nothing has changed in the user guild in a while (e.g., the diff between the 2.2.0 user guide and the 2.3.0 user guide is a bunch of changes of 2.2.0 to 2.3.0 in the text. It’s also too lightweight for use as a guide for making large projects. It’s still worth reading for terminology, but not for usage.

Google has Software Construction Toolkit, which is a set of extensions to SCons, at https://code.google.com/p/swtoolkit/. Read this because Google usually writes very good code.

You can only set a few options via SetOption – the rest must be set at the command-line. This is really frustrating. http://stackoverflow.com/questions/13881178/how-to-force-quiet-mode-from-sconstruct.

People have been trying to optimize SCons builds for a while. See http://stackoverflow.com/questions/1318863/how-to-optimize-an-scons-script for some tips.

I can’t find a way to share a single top-level site_scons folder among multiple SConstruct files in other directories. The idea here would be to avoid the overhead of understanding dependencies for the entire build when you’re just working on a piece of it. But scons seems to change the working directory, OR it is looking in the path where it found the SConstruct file instead of the working directory. And a quick test reveals the latter – the working directory is still at “the top” where my site_scons folder is, but it’s not processed. And even if I put a –site-dir=MY_SITE_SCONS, it wants it to be relative to where the SConstruct folder is, unless I go to the trouble to specify an absolute path. So, I would have to wrap my sub-SConstruct folders in some kind of script to do this for me (too awkward for users to type it), or duplicate my site_scons folder (bad for maintenance), or do some kind of hard-linking at source control time (not sure how I’d do that), or?

Various

And things to read to get up to speed

And scons source (and forked projects)

Python debugging

Oddly enough, both better and worse than Perl debugging – better architected in my opnion, but poorly explained and evangelized – to the point where Python book and article writers say “use print statements” to debug code, instead of the debugger.

Quick overview

From the command-line:

python -m pdb yourcode.py

This will inject the pdb module into your program, and will start debugging at the first line of your program as if a pdb.set_trace() call had been issued before the first line of your code.

From inside your code:

import pdb
...
pdb.set_trace()

Importing the pdb module gives you access to the debugger features. set_trace() will set an unconditional breakpoint, dropping you into the interactive debugger prompt. Obviously, this requires a console to exist.

Reading list

Simple introduction: Debugging in Python.

From the Python Documentation: 26.2. pdb – The Python Debugger.

Another article: Interactive Debugging in Python

A more in-depth article by Doug Hellmann: pdb – Interactive Debugger.

Adding features to pdb: Python pdb (debugger) disp equivalent?

Python Debugger Cheatsheet – one-page PDF