High-polish use of subprocess.Popen

Python has a pretty decent facility to launch and operate a child process, subprocess.popen. However, like many “scripting systems”, it’s easy to do something that mostly works but is rough around the edges and not all that robust, and this is because sub-processes don’t all run in 100 milliseconds without errors.

First off, avoid the use of subprocess.call. It waits for the process to terminate before returning, which means that if your subprocess hangs, your Python program will hang.

Second, if you’re using Python 2.7 on POSIX, use subprocess32, which is a backport of subprocess from Python 3.

Third, stop using os.popen in favor of subprocess.Popen. It’s a little more complicated, but worth it.

Fourth, keep in mind that Popen.communicate() also blocks until the process terminates, so don’t use it either. Also, communicate() doesn’t seem to handle large amounts of output on some systems (reports of “no more than 65535 bytes of output due to Linux pipe implementation”).

Reading stdout

Now, on to actual details. Let’s call dir on Windows and number each line in the output

ldir.py
from __future__ import print_function

import subprocess
import sys

proc = subprocess.Popen(args=['dir'] + sys.argv[1:], stdin=subprocess.PIPE,
             stdout=subprocess.PIPE, stderr=subprocess.STDOUT, shell=True)
linenum = 1
while True:
  line = proc.stdout.readline()
  if len(line) == 0:
    break
  print("%d: %s" % (linenum, line), end='')
  linenum += 1

We are merging stderr and stdout together in this example (stderr=subprocess.STDOUT). If we run this on C:\Windows\System32 like so

ldir.py /s C:\Windows\System32

we’ll start seeing output like this

1:  Volume in drive C is OSDisk
2:  Volume Serial Number is 062F-8F58
3:
4:  Directory of c:\Windows\System32
5:
6: 04/23/2014  06:09 PM    <DIR>          .
7: 04/23/2014  06:09 PM    <DIR>          ..
8: 04/12/2011  12:38 AM    <DIR>          0409
9: 01/14/2014  11:21 AM    <DIR>          1033
10: 06/10/2009  02:16 PM             2,151 12520437.cpx
11: 06/10/2009  02:16 PM             2,233 12520850.cpx
12: 02/14/2013  09:34 PM           131,584 aaclient.dll
13: 11/20/2010  08:24 PM         3,727,872 accessibilitycpl.dll

And since this is under our control, we can pipe to more, we can control-C to stop it, and so on.

There are still complications, mostly around buffering. The default for Popen is to not buffer data, but that only affects the reader – the source process can still buffer. You can trick programs into thinking they are writing into a console, which usually means that output will be unbuffered. You can use the low-level pty module directly (on Unix) or something higher-level like pexpect

  • Unix: http://pexpect.sourceforge.net/pexpect.html
  • Windows: https://bitbucket.org/mherrmann_at/wexpect

Of course, not all processes write lines. You can use a more generalized approach by reading bytes from the stdout pipe. The previous program modifed to read 128 bytes at a time looks like this

while True:
  line = proc.stdout.read(128)
  if len(line) == 0:
    break
  print("<%d>: %s" % (linenum, line), end='')
  linenum += 1

and produces this output (with numbers changed to to stand out more)

<1>:  Volume in drive C is OSDisk
 Volume Serial Number is 062F-8F58

 Directory of c:\Windows\System32

04/23/2014  06:09 PM    <DIR<2>: >          .
04/23/2014  06:09 PM    <DIR>          ..
04/12/2011  12:38 AM    <DIR>          0409
01/14/2014  11:21 AM    <DIR><3>:           1033
06/10/2009  02:16 PM             2,151 12520437.cpx
06/10/2009  02:16 PM             2,233 12520850.cpx
02/14/201<4>: 3  09:34 PM           131,584 aaclient.dll
11/20/2010  08:24 PM         3,727,872 accessibilitycpl.dll

And of course this would work for programs that are reading and writing octet streams, not just text.

Reading stdout and stderr

Sometimes you want to read from stderr and stdout independently, because you need to react to output on stderr. You can’t just call read or readline, because it could block waiting for input on a handle.

On Unix systems, you can call select on the stdin and stdout handles, because select works on file-like objects, including pipes. On Windows, select only works on sockets, so you need to use some threads and a queue to have a blocking read per handle. Since this works on Unix as well, we can do it for both.

import Queue
io_q = Queue.Queue(5) # somewhat arbitrary, readers block when queue is full
def read_from_stream(identifier, stream):
  for line in stream:
    io_q.put((identifier, line))
  if not stream.closed:
    stream.close()

import threading
threading.Thread(target=read_from_stream, name='stdout-stream', args=('STDOUT', proc.stdout)).start()
threading.Thread(target=read_from_stream, name='stderr-stream', args=('STDERR', proc.stderr)).start()

while True:
  try:
    item = io_q.get(False)
  except Queue.Empty:
    if proc.poll() is not None:
      break
  else:
    identifier, line = item
    print(identifier + ':',  line, end='')

This works well, but has a flaw – it is basically busy-waiting, burning CPU while waiting for input to come in. We’re doing this because we don’t want to block at the reader level – consider that in a more complex situation, we might want to do processing while waiting for input to come in. There’s also a race condition here, in that we could check the queue, it could be empty, then a reader could put something in the queue while we are checking proc.poll(), and then we could miss that item.

We could do something like this, which is not clean, but works

import Queue
io_q = Queue.Queue(5)
def read_from_stream(identifier, stream):
  if not stream:
    print('%s does not exist' % identifier)
    io_q.put(('EXIT', identifier))
    return
  for line in stream:
    io_q.put((identifier, line))
  if not stream.closed:
    stream.close()
  print('%s is done' % identifier)
  io_q.put(('EXIT', identifier))

import threading
active = 2
threading.Thread(target=read_from_stream, name='stdout-stream', args=('STDOUT', proc.stdout)).start()
threading.Thread(target=read_from_stream, name='stderr-stream', args=('STDERR', proc.stderr)).start()

while True:
  try:
    item = io_q.get(True, 1)
  except Queue.Empty:
    if proc.poll() is not None:
      break
  else:
    identifier, line = item
    if identifier == 'EXIT':
      active -= 1
      if active == 0:
        break
    else:
      print(identifier + ':',  line, end='')

proc.wait()
print(proc.returncode)

Now there’s no busy-waiting, and we exit instantly. This is also a lot of scaffolding to write for each time we use subprocess.Popen(). One answer would be to wrap this up into a helper class, or rather a set of helper classes.

stdin and stdout and stderr

There are two cases here

  1. Feeding a pipe that takes input and returns output.
  2. Running an interactive process

For the former, you could just have a file or psuedo-file feed the Popen process instead of subprocess.PIPE. For the latter, you definitely need to trick your Popen process into thinking that it’s writing to a TTY, otherwise the buffering will kill you.

TBD

Reference

http://pymotw.com/2/subprocess/

http://sharats.me/the-ever-useful-and-neat-subprocess-module.html

http://pexpect.readthedocs.org/en/latest/FAQ.html#whynotpipe

 

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>