Finding overloaded files

Many systems have the concept of a search path, which is really a set of paths, used to find something. For example, Unix and Windows have a PATH environment variable, which is a list of directories that are searched one after the other to find a program to run.

Whenever you have a set of paths, you also have the chance that you’re overloading names. Sometimes this is good, because you can “patch” in the right file by arranging your search path properly. Sometimes this is bad, because you can hide files and not know it. You can find single overloads by using the where command on Windows or the which command on Unix.

Here’s a Python program to find all the items that you’re overloading. This handles both PATH-style (where you only reference objects by a base name) and INCLUDE-style, where there are subpaths in each base path.

When run without any parameters, this defaults to searching the PATH variable. If you run it with a set of –path parameters (or with a cmd file, one parameter per line), then it will search that set of paths. This should also work on Unix machines as-is (Linux and Mac).

"""
Find overloaded files. Default to env['PATH'], or search in supplied
set of folders.

TBD - change it to do sub-paths from each root, e.g. this would be
useful for finding overloaded include files or libraries.
"""

from __future__ import print_function

import argparse
import os
import sys

# -------------------------------------------------------------------------------------------------

def main():
    parser = argparse.ArgumentParser(
        description='Find overloaded files',
        fromfile_prefix_chars='@')
    parser.add_argument('--kind', default='PATH',
            help='the kind of search to do: PATH, INCLUDE, LIB, LIBPATH (default to PATH)')
    parser.add_argument('--path', action='append', help='path to search')
    parser.add_argument('--case-sensitive', help='do case-sensitive compares')

    args = parser.parse_args()

    # Fill in from os.environ if we didn't pass explicit paths in
    if args.path is None:
        args.path = []
        args.kind = args.kind.upper()
        if args.kind in os.environ:
            args.path = os.environ[args.kind].split(os.pathsep)

    run(args)

def run(args):

    # If this is not a PATH search, then we want sub-paths too
    subpaths = False if args.kind == 'PATH' else True

    # Find all files and paths to those files
    filemap = {}
    for base in args.path:
        if base == '':
            continue # this is a hack to fix empty paths

        if subpaths:
            print("searching in path %s" % base)
            for root, dirs, files in os.walk(base):
                for f in files:
                    epath = os.path.join(root, f)
                    suffix = epath[len(base)+1:]
                    if not args.case_sensitive:
                        suffix = suffix.lower()
                    if suffix not in filemap:
                        filemap[suffix] = []
                    filemap[suffix].append(epath)
        else:
            print("Looking in path %s" % base)
            entries = os.listdir(base)
            for entry in entries:
                epath = os.path.join(base, entry)
                if os.path.isfile(epath):
                    if not args.case_sensitive:
                        entry = entry.lower()
                    if entry not in filemap:
                        filemap[entry] = []
                    filemap[entry].append(epath)

    # Now output duplicates
    for f in filemap:
        if len(filemap[f]) > 1:
            print("File %s found in multiple paths:" % f)
            for subpath in filemap[f]:
                print("  %s" % subpath)

# -------------------------------------------------------------------------------------------------

main()

When I run this on my system, I find a number of overloaded files and some of these overloads are problematic; a different ordering in PATH would produce a different (and bad) runtime behavior.

>find-overloads.py
searching in C:\Apps\Araxis\Araxis Merge
searching in C:\Chocolatey\bin
searching in C:\Dev\Perl64\site\bin
searching in C:\Dev\Perl64\bin
searching in C:\Dev\Git\cmd
searching in C:\Dev\Git\bin
searching in C:\Dev\SlikSvn\bin
searching in C:\HashiCorp\Vagrant\bin
searching in C:\Program Files\Oracle\VirtualBox
searching in C:\Python27
searching in C:\Python27\Scripts
searching in C:\windows\system32
searching in C:\windows
searching in C:\windows\System32\Wbem
searching in C:\windows\System32\WindowsPowerShell\v1.0\
searching in C:\Program Files (x86)\NVIDIA Corporation\PhysX\Common
searching in C:\Program Files\Windows Imaging\
searching in C:\Program Files\Microsoft\Web Platform Installer\
searching in C:\Program Files (x86)\Microsoft ASP.NET\ASP.NET Web Pages\v1.0\
searching in C:\Program Files\Microsoft SQL Server\110\Tools\Binn\
searching in C:\Program Files (x86)\Microsoft SQL Server\100\Tools\Binn\
searching in C:\Program Files\Microsoft SQL Server\100\Tools\Binn\
searching in C:\Program Files\Microsoft SQL Server\100\DTS\Binn\
searching in C:\Program Files (x86)\Windows Kits\8.1\Windows Performance Toolkit\
File SQLSCM.DLL found in multiple paths:
  C:\Program Files (x86)\Microsoft SQL Server\100\Tools\Binn\SQLSCM.DLL
  C:\Program Files\Microsoft SQL Server\100\Tools\Binn\SQLSCM.DLL
File xmlrw.dll found in multiple paths:
  C:\Program Files\Microsoft SQL Server\110\Tools\Binn\xmlrw.dll
  C:\Program Files\Microsoft SQL Server\100\Tools\Binn\xmlrw.dll
File wimserv.exe found in multiple paths:
  C:\windows\system32\wimserv.exe
  C:\Program Files\Windows Imaging\wimserv.exe
File wimgapi.dll found in multiple paths:
  C:\windows\system32\wimgapi.dll
  C:\Program Files\Windows Imaging\wimgapi.dll
File find.exe found in multiple paths:
  C:\Dev\Git\bin\find.exe
  C:\windows\system32\find.exe
File explorer.exe found in multiple paths:
  C:\windows\system32\explorer.exe
  C:\windows\explorer.exe
File sort.exe found in multiple paths:
  C:\Dev\Git\bin\sort.exe
  C:\windows\system32\sort.exe
File dbghelp.dll found in multiple paths:
  C:\Apps\Araxis\Araxis Merge\dbghelp.dll
  C:\windows\system32\dbghelp.dll
File hh.exe found in multiple paths:
  C:\windows\system32\hh.exe
  C:\windows\hh.exe
File msvcr100.dll found in multiple paths:
  C:\Program Files\Oracle\VirtualBox\msvcr100.dll
  C:\windows\system32\msvcr100.dll
File SqlManager.dll found in multiple paths:
  C:\Program Files (x86)\Microsoft SQL Server\100\Tools\Binn\SqlManager.dll
  C:\Program Files\Microsoft SQL Server\100\Tools\Binn\SqlManager.dll
File SQLSVC.DLL found in multiple paths:
  C:\Program Files (x86)\Microsoft SQL Server\100\Tools\Binn\SQLSVC.DLL
  C:\Program Files\Microsoft SQL Server\100\Tools\Binn\SQLSVC.DLL
File write.exe found in multiple paths:
  C:\windows\system32\write.exe
  C:\windows\write.exe
File sqlresld.dll found in multiple paths:
  C:\Program Files (x86)\Microsoft SQL Server\100\Tools\Binn\sqlresld.dll
  C:\Program Files\Microsoft SQL Server\100\Tools\Binn\sqlresld.dll
File git.exe found in multiple paths:
  C:\Dev\Git\cmd\git.exe
  C:\Dev\Git\bin\git.exe
File msvcp100.dll found in multiple paths:
  C:\Program Files\Oracle\VirtualBox\msvcp100.dll
  C:\windows\system32\msvcp100.dll
File SqlResourceLoader.dll found in multiple paths:
  C:\Program Files (x86)\Microsoft SQL Server\100\Tools\Binn\SqlResourceLoader.dll
  C:\Program Files\Microsoft SQL Server\100\Tools\Binn\SqlResourceLoader.dll
File batchparser.dll found in multiple paths:
  C:\Program Files\Microsoft SQL Server\110\Tools\Binn\batchparser.dll
  C:\Program Files (x86)\Microsoft SQL Server\100\Tools\Binn\batchparser.dll
  C:\Program Files\Microsoft SQL Server\100\Tools\Binn\batchparser.dll
  C:\Program Files\Microsoft SQL Server\100\DTS\Binn\batchparser.dll
File regedit.exe found in multiple paths:
  C:\windows\system32\regedit.exe
  C:\windows\regedit.exe
File SEMMAP.DLL found in multiple paths:
  C:\Program Files (x86)\Microsoft SQL Server\100\Tools\Binn\SEMMAP.DLL
  C:\Program Files\Microsoft SQL Server\100\Tools\Binn\SEMMAP.DLL
File perl.exe found in multiple paths:
  C:\Dev\Perl64\bin\perl.exe
  C:\Dev\Git\bin\perl.exe
File SQLCMD.EXE found in multiple paths:
  C:\Program Files\Microsoft SQL Server\110\Tools\Binn\SQLCMD.EXE
  C:\Program Files\Microsoft SQL Server\100\Tools\Binn\SQLCMD.EXE
File bcp.exe found in multiple paths:
  C:\Program Files\Microsoft SQL Server\110\Tools\Binn\bcp.exe
  C:\Program Files\Microsoft SQL Server\100\Tools\Binn\bcp.exe
File notepad.exe found in multiple paths:
  C:\windows\system32\notepad.exe
  C:\windows\notepad.exe
File license.rtf found in multiple paths:
  C:\Apps\Araxis\Araxis Merge\license.rtf
  C:\windows\system32\license.rtf
  C:\Program Files\Microsoft\Web Platform Installer\license.rtf

For the most part, these collisions are ok, because of the search algorithm Windows uses to find DLLs

  • The directory containing the exe for the current process
  • The current directory
  • The Windows system directory – GetSystemDirectory()
  • The Windows directory – The GetWindowsDirectory()
  • The directories in PATH (Windows does not use LIBPATH)

OK, it’s not quite that simple, see http://msdn.microsoft.com/en-us/library/windows/desktop/ms682586(v=vs.85).aspx for more details.

For an executable, it’s simpler

  • The current directory
  • The directories in PATH

Some programs bundle copies of Windows DLLs either because the DLL might not be on all systems, or the right version of the DLL might not be on all systems. Since the first place searched is the current process’ executable, that works well. Note in the list above that my VirtualBox install, while theoretically injecting its version of the DLL, isn’t doing that in reality because the Windows system directory is searched before PATH. It’s still bad form to have it earlier in the path, though. See http://www.flounder.com/whereis.htm for a Windows-specific whereis program that handles the DLL search paths properly. I’ll update my program to do that properly for Windows and Unix.

Also note that my Perl install is at a higher priority than the Git tools – I need this, otherwise the Git Perl (intended for Git scripts) would be found instead of my Perl install, and that would be bad (the Git perl will automatically be used by Git internals). On the other hand, I have two Git tools – find and sort – that are overloading the Windows ones, but I’m fine with that.

Using Visual Studio toolchains

This is a collection of information about how to use Visual Studio toolchains from command-lines or from other build systems. It’s probably also useful for people who want to know how things are configured – because when something is broken, you either fix it, or reset and start over.

I’m also only going to cover Visual C++, since that’s what I care about. And this is a little disjoint, but it is a blog post, after all – I’ll try to turn it into actual documentation at some point. Or rather, this is half a blog post, since I’m going to update it multiple times.

Visual Studio 2013

This is also known as Visual Studio 12.

Default install path: C:\Program Files (x86)\Microsoft Visual Studio 12.0\

Location to vcvarsall.bat: $(VSTUDIO)\VC\vcvarsall.bat. This is useful to read or run because it contains all the environment variables needed to run tools from the command line. I’m presuming that the Visual Studio IDE does something equivalent.

Environment variables

Pre-existing

These already existed in my environment, but were updated by vcvarsall.bat.

ChocolateyInstall=C:\Chocolatey
CommonProgramFiles=C:\Program Files\Common Files
CommonProgramFiles(x86)=C:\Program Files (x86)\Common Files
CommonProgramW6432=C:\Program Files\Common Files
ComSpec=C:\windows\system32\cmd.exe
PROCESSOR_ARCHITECTURE=AMD64
ProgramData=C:\ProgramData
ProgramFiles=C:\Program Files
ProgramFiles(x86)=C:\Program Files (x86)
ProgramW6432=C:\Program Files
PSModulePath=C:\windows\system32\WindowsPowerShell\v1.0\Modules\
SystemDrive=C:
SystemRoot=C:\windows
TEMP=C:\Users\bfitz\AppData\Local\Temp
TMP=C:\Users\bfitz\AppData\Local\Temp
windir=C:\windows
windows_tracing_flags=3
windows_tracing_logfile=C:\BVTBin\Tests\installpackage\csilogfile.log

Common

These are common to the x86 and amd64 toolchains.

ExtensionSdkDir=C:\Program Files (x86)\Microsoft SDKs\Windows\v8.1\ExtensionSDKs
Framework40Version=v4.0
FrameworkVersion=v4.0.30319
INCLUDE=
  C:\Program Files (x86)\Microsoft Visual Studio 12.0\VC\INCLUDE;
  C:\Program Files (x86)\Microsoft Visual Studio 12.0\VC\ATLMFC\INCLUDE;
  C:\Program Files (x86)\Windows Kits\8.1\include\shared;
  C:\Program Files (x86)\Windows Kits\8.1\include\um;
  C:\Program Files (x86)\Windows Kits\8.1\include\winrt;
LIBPATH=
  C:\Program Files (x86)\Windows Kits\8.1\References\CommonConfiguration\Neutral;
  C:\Program Files (x86)\Microsoft SDKs\Windows\v8.1\ExtensionSDKs\Microsoft.VCLibs\12.0\References\CommonConfiguration\neutral;
Path=
  C:\Program Files (x86)\Microsoft Visual Studio 12.0\Common7\IDE\CommonExtensions\Microsoft\TestWindow;
VCINSTALLDIR=C:\Program Files (x86)\Microsoft Visual Studio 12.0\VC\
VisualStudioVersion=12.0
VSINSTALLDIR=C:\Program Files (x86)\Microsoft Visual Studio 12.0\
WindowsSdkDir=C:\Program Files (x86)\Windows Kits\8.1\
WindowsSDK_ExecutablePath_x64=C:\Program Files (x86)\Microsoft SDKs\Windows\v8.1A\bin\NETFX 4.5.1 Tools\x64\
WindowsSDK_ExecutablePath_x86=C:\Program Files (x86)\Microsoft SDKs\Windows\v8.1A\bin\NETFX 4.5.1 Tools\

x86-specific

These are specific to x86 toolchains.

DevEnvDir=C:\Program Files (x86)\Microsoft Visual Studio 12.0\Common7\IDE\
FrameworkDir=C:\windows\Microsoft.NET\Framework\
FrameworkDIR32=C:\windows\Microsoft.NET\Framework\
FrameworkVersion32=v4.0.30319
LIB=
  C:\Program Files (x86)\Microsoft Visual Studio 12.0\VC\LIB;
  C:\Program Files (x86)\Microsoft Visual Studio 12.0\VC\ATLMFC\LIB;
  C:\Program Files (x86)\Windows Kits\8.1\lib\winv6.3\um\x86;
LIBPATH=
  C:\windows\Microsoft.NET\Framework\v4.0.30319;
  C:\Program Files (x86)\Microsoft Visual Studio 12.0\VC\LIB;
  C:\Program Files (x86)\Microsoft Visual Studio 12.0\VC\ATLMFC\LIB;
Path=
  C:\Program Files (x86)\MSBuild\12.0\bin;
  C:\Program Files (x86)\Microsoft Visual Studio 12.0\Common7\IDE\;
  C:\Program Files (x86)\Microsoft Visual Studio 12.0\VC\BIN;
  C:\Program Files (x86)\Microsoft Visual Studio 12.0\Common7\Tools;
  C:\windows\Microsoft.NET\Framework\v4.0.30319;
  C:\Program Files (x86)\Microsoft Visual Studio 12.0\VC\VCPackages;
  C:\Program Files (x86)\HTML Help Workshop;
  C:\Program Files (x86)\Microsoft Visual Studio 12.0\Team Tools\Performance Tools;
  C:\Program Files (x86)\Windows Kits\8.1\bin\x86;
  C:\Program Files (x86)\Microsoft SDKs\Windows\v8.1A\bin\NETFX 4.5.1 Tools\

amd64-specific

These are specific to amd64 toolchains.

CommandPromptType=Native
FrameworkDir=C:\windows\Microsoft.NET\Framework64
FrameworkDIR64=C:\windows\Microsoft.NET\Framework64
FrameworkVersion64=v4.0.30319
LIB=
  C:\Program Files (x86)\Microsoft Visual Studio 12.0\VC\LIB\amd64;
  C:\Program Files (x86)\Microsoft Visual Studio 12.0\VC\ATLMFC\LIB\amd64;
  C:\Program Files (x86)\Windows Kits\8.1\lib\winv6.3\um\x64;
LIBPATH=
  C:\windows\Microsoft.NET\Framework64\v4.0.30319;
  C:\Program Files (x86)\Microsoft Visual Studio 12.0\VC\LIB\amd64;
  C:\Program Files (x86)\Microsoft Visual Studio 12.0\VC\ATLMFC\LIB\amd64;
Path=
  C:\Program Files (x86)\MSBuild\12.0\bin\amd64;
  C:\Program Files (x86)\Microsoft Visual Studio 12.0\VC\BIN\amd64;
  C:\windows\Microsoft.NET\Framework64\v4.0.30319;
  C:\Program Files (x86)\Microsoft Visual Studio 12.0\VC\VCPackages;
  C:\Program Files (x86)\Microsoft Visual Studio 12.0\Common7\IDE;
  C:\Program Files (x86)\Microsoft Visual Studio 12.0\Common7\Tools;
  C:\Program Files (x86)\HTML Help Workshop;
  C:\Program Files (x86)\Microsoft Visual Studio 12.0\Team Tools\Performance Tools\x64;
  C:\Program Files (x86)\Microsoft Visual Studio 12.0\Team Tools\Performance Tools;
  C:\Program Files (x86)\Windows Kits\8.1\bin\x64;
  C:\Program Files (x86)\Windows Kits\8.1\bin\x86;
  C:\Program Files (x86)\Microsoft SDKs\Windows\v8.1A\bin\NETFX 4.5.1 Tools\x64\
Platform=X64

Note that many of these are disjoint, so could probably be set in a unified environment.

Some pre-existing environment variables were touched. I think it decided to make sure they were correct, because in one case it’s clear that the previous environment variable was incorrect. Before I ran vcvarsall.bat, I had these

CommonProgramFiles=C:\Program Files (x86)\Common Files
PROCESSOR_ARCHITECTURE=AMD64

which is clearly wrong. I also had this:

PROCESSOR_ARCHITECTURE=x86
PROCESSOR_ARCHITEW6432=AMD64

which turned to this for x86

PROCESSOR_ARCHITECTURE=AMD64

and this for amd64

PROCESSOR_ARCHITECTURE=AMD64

I’m guessing this is supposed to be recording the architecture of the host machine, not the toolchain target. This

http://blog.differentpla.net/post/38

indicates that I’m doing something wrong, I’m using 64-bit tools from a 32-bit cmd.exe. So now this makes slightly more sense. Except Task Manager says otherwise, it says that I’m not running 32-bit cmd.exe processes (there’s a *32 annotation on 32-bit processes). So my machine was set up incorrectly? Something to look into down the road.

LG N2B1 NAS with Blu-ray writer

I bought one of these about 4 years ago. It’s discontinued, but still works, and I wanted to use it recently, so I had to find software for it. Besides 500 GB of backups from 2010 on it, it has a Blu-ray writer and I’ll see if it still works (it’s been packed away in its box, so everything should be good – it powered up nicely and I can copy files to/from it).

Product support page

There’s new firmware and new UI, but this requires the drive to be wiped. I probably won’t do that unless something fails to work

Some better links, although this isn’t the exact model (but I bet it’s the same software)

I downloaded a Mac NAS Detector from Softzilla

While I wasn’t paying attention, Mac NAS boxes had a bit of a kerflufle in 2011, Apple transitioned protocols (decided the old one had insecurities), and this means that out of the box, Mac OS X 10.7 and up can’t talk AFP to older NAS boxes. There is a way to re-enable the old DHCAST-128 protocol.

On a related note, MuCommander might be useful. Note: written in Java.

 

Fun with Git tags

Git tags have several uses to me.

There’s the classic use of “here’s something we released in the past”. It doesn’t need a branch, because it’s no longer under development, but you may need to refer to it at some point. Presumably you have some regular patterns for naming tags, and perhaps you use annotated tags to contain release information.  It’s just good release practice to have branches be for active development only, because you can always create a branch from a tag if you need to start doing work on it again.

There’s another use of “I have some dead/obsolete development work, but I’d still like it to stick around in the permanent record”. I prefer this to spelunking in the reflog, because sooner or later you’ll garbage-collect, and if there are no live references to commits, those commits will go away. Obviously, you should not keep actual garbage, but a historical record can be a valuable thing. And when you tire of that history, you can delete it just by removing the tags. I switched to this instead of keeping branches around, and it makes my repos feel a bit cleaner.

Lightweight tags have the advantage of not actually being blobs, but simply associating a string with a commit. Annotated tags let you add extra information, in the form of a commit message, and there are other benefits as well (you can sign tags, for example). I see both as valuable for both kinds of tags. Some projects only use annotated tags – for example, in looking through the Git source itself, it seems like all the tags are annotated tags. My preference is to just have annotated tags.

There’s one troublespot where it comes to sharing tags, and that is that tags are in a single namespace, unlike branch refs. Since people rarely share tags, this isn’t an issue. But if you fetch tags from a remote repository, they go into the same .git/refs/tags location as your local tags. One suggestion I saw that was interesting was to have a pattern for naming tags based on remotes, so that you could keep your tags separate from pulled-in remote tags. It’s not automatic, though, you have to do it manually. There aren’t common workflows yet around sharing tags, as far as I know.

While tags are normally stored in .git/refs/tags, if you look in that directory, you might only see a few tag files. Refs (tags and branches) can be packed up into a single .git/packed_refs file for efficiency’s sake, and this works very well for tags, since tag refs normally never change. A ref will get unpacked if it needs to change. This can be done manually with git pack-refs, or a git gc will also do it when it runs automatically.

As of Git 1.9.0, git fetch –tags fetches both branches and tags. By itself, git fetch will only get tags referenced by commits that are brought down, but it won’t bring down new tags pointing to commits that you already have. One down-side to git fetch –tags is that it will fetch and replace all tags. Normally this is fine, but may be dangerous if you have multiple remotes attached to a single repository, especially if those remotes are disjoint. Just keep this in mind that you may need to explicitly pull tags in some cases.

See a separate post I have yet to write about git log/git rev-list and proper use of –all, –branches, –tags and –remotes.

Examples

Create an annotated tag (assumes that the tag message is in the file <tagmessage>):

git tag -a release-1.5.1 -F <tagmessage>

Show the tag and/or related commit (for annotated tag, will show the annotated tag and then the commit; for lightweight tag, will show just the commit):

git show release-1.5.1

Show tags in <remote> repository, where <remote> is the name of a remote attached to your local repository:

git ls-remote --tags <remote>

Show the most recent annotated tag on the current branch:

git describe

Push a specific tag (and related objects) to a remote repository:

git push <remote> release-1.5.1

Push all tags not already in the remote repository:

git push <remote> --tags

Delete a tag in the local repository

git tag -d release-1.5.1

Delete a tag in a remote repository (note: this has the same perils as rebasing, others could be depending on this tag, but it’s not bad in and of itself):

git push <remote> :refs/tags/release-1.5.1

Reference

Git: git-tag

Git book: Git Basics – Tagging

Git Tag Mini Cheat Sheet Revisited

Git Tip of the Week: Tags

On the Perils of Importing Remote Tags in Git

Git Data File Formats

Git Internals – Maintenance and Data Recovery

StackOverflow: Git: distinguish between local and remote tags

Docker again

Docker is something like 14 months old, and it’s already got lots of adoption. And a conference.

I really want Docker for Windows. Where is it? The closest is boot2docker, which runs Docker in a VM

Some random bits about Docker-like functionality for Windows.

Docker removed Vagrant support, but Vagrant is adding Docker support.

People are working on extending Docker functionality (in fact, this is where boot2docker came from):

Here’s a post about Vagrant, Docker and Ansible that’s relevant (still VMs)

 

High-polish use of subprocess.Popen

Python has a pretty decent facility to launch and operate a child process, subprocess.popen. However, like many “scripting systems”, it’s easy to do something that mostly works but is rough around the edges and not all that robust, and this is because sub-processes don’t all run in 100 milliseconds without errors.

First off, avoid the use of subprocess.call. It waits for the process to terminate before returning, which means that if your subprocess hangs, your Python program will hang.

Second, if you’re using Python 2.7 on POSIX, use subprocess32, which is a backport of subprocess from Python 3.

Third, stop using os.popen in favor of subprocess.Popen. It’s a little more complicated, but worth it.

Fourth, keep in mind that Popen.communicate() also blocks until the process terminates, so don’t use it either. Also, communicate() doesn’t seem to handle large amounts of output on some systems (reports of “no more than 65535 bytes of output due to Linux pipe implementation”).

Reading stdout

Now, on to actual details. Let’s call dir on Windows and number each line in the output

ldir.py
from __future__ import print_function

import subprocess
import sys

proc = subprocess.Popen(args=['dir'] + sys.argv[1:], stdin=subprocess.PIPE,
             stdout=subprocess.PIPE, stderr=subprocess.STDOUT, shell=True)
linenum = 1
while True:
  line = proc.stdout.readline()
  if len(line) == 0:
    break
  print("%d: %s" % (linenum, line), end='')
  linenum += 1

We are merging stderr and stdout together in this example (stderr=subprocess.STDOUT). If we run this on C:\Windows\System32 like so

ldir.py /s C:\Windows\System32

we’ll start seeing output like this

1:  Volume in drive C is OSDisk
2:  Volume Serial Number is 062F-8F58
3:
4:  Directory of c:\Windows\System32
5:
6: 04/23/2014  06:09 PM    <DIR>          .
7: 04/23/2014  06:09 PM    <DIR>          ..
8: 04/12/2011  12:38 AM    <DIR>          0409
9: 01/14/2014  11:21 AM    <DIR>          1033
10: 06/10/2009  02:16 PM             2,151 12520437.cpx
11: 06/10/2009  02:16 PM             2,233 12520850.cpx
12: 02/14/2013  09:34 PM           131,584 aaclient.dll
13: 11/20/2010  08:24 PM         3,727,872 accessibilitycpl.dll

And since this is under our control, we can pipe to more, we can control-C to stop it, and so on.

There are still complications, mostly around buffering. The default for Popen is to not buffer data, but that only affects the reader – the source process can still buffer. You can trick programs into thinking they are writing into a console, which usually means that output will be unbuffered. You can use the low-level pty module directly (on Unix) or something higher-level like pexpect

  • Unix: http://pexpect.sourceforge.net/pexpect.html
  • Windows: https://bitbucket.org/mherrmann_at/wexpect

Of course, not all processes write lines. You can use a more generalized approach by reading bytes from the stdout pipe. The previous program modifed to read 128 bytes at a time looks like this

while True:
  line = proc.stdout.read(128)
  if len(line) == 0:
    break
  print("<%d>: %s" % (linenum, line), end='')
  linenum += 1

and produces this output (with numbers changed to to stand out more)

<1>:  Volume in drive C is OSDisk
 Volume Serial Number is 062F-8F58

 Directory of c:\Windows\System32

04/23/2014  06:09 PM    <DIR<2>: >          .
04/23/2014  06:09 PM    <DIR>          ..
04/12/2011  12:38 AM    <DIR>          0409
01/14/2014  11:21 AM    <DIR><3>:           1033
06/10/2009  02:16 PM             2,151 12520437.cpx
06/10/2009  02:16 PM             2,233 12520850.cpx
02/14/201<4>: 3  09:34 PM           131,584 aaclient.dll
11/20/2010  08:24 PM         3,727,872 accessibilitycpl.dll

And of course this would work for programs that are reading and writing octet streams, not just text.

Reading stdout and stderr

Sometimes you want to read from stderr and stdout independently, because you need to react to output on stderr. You can’t just call read or readline, because it could block waiting for input on a handle.

On Unix systems, you can call select on the stdin and stdout handles, because select works on file-like objects, including pipes. On Windows, select only works on sockets, so you need to use some threads and a queue to have a blocking read per handle. Since this works on Unix as well, we can do it for both.

import Queue
io_q = Queue.Queue(5) # somewhat arbitrary, readers block when queue is full
def read_from_stream(identifier, stream):
  for line in stream:
    io_q.put((identifier, line))
  if not stream.closed:
    stream.close()

import threading
threading.Thread(target=read_from_stream, name='stdout-stream', args=('STDOUT', proc.stdout)).start()
threading.Thread(target=read_from_stream, name='stderr-stream', args=('STDERR', proc.stderr)).start()

while True:
  try:
    item = io_q.get(False)
  except Queue.Empty:
    if proc.poll() is not None:
      break
  else:
    identifier, line = item
    print(identifier + ':',  line, end='')

This works well, but has a flaw – it is basically busy-waiting, burning CPU while waiting for input to come in. We’re doing this because we don’t want to block at the reader level – consider that in a more complex situation, we might want to do processing while waiting for input to come in. There’s also a race condition here, in that we could check the queue, it could be empty, then a reader could put something in the queue while we are checking proc.poll(), and then we could miss that item.

We could do something like this, which is not clean, but works

import Queue
io_q = Queue.Queue(5)
def read_from_stream(identifier, stream):
  if not stream:
    print('%s does not exist' % identifier)
    io_q.put(('EXIT', identifier))
    return
  for line in stream:
    io_q.put((identifier, line))
  if not stream.closed:
    stream.close()
  print('%s is done' % identifier)
  io_q.put(('EXIT', identifier))

import threading
active = 2
threading.Thread(target=read_from_stream, name='stdout-stream', args=('STDOUT', proc.stdout)).start()
threading.Thread(target=read_from_stream, name='stderr-stream', args=('STDERR', proc.stderr)).start()

while True:
  try:
    item = io_q.get(True, 1)
  except Queue.Empty:
    if proc.poll() is not None:
      break
  else:
    identifier, line = item
    if identifier == 'EXIT':
      active -= 1
      if active == 0:
        break
    else:
      print(identifier + ':',  line, end='')

proc.wait()
print(proc.returncode)

Now there’s no busy-waiting, and we exit instantly. This is also a lot of scaffolding to write for each time we use subprocess.Popen(). One answer would be to wrap this up into a helper class, or rather a set of helper classes.

stdin and stdout and stderr

There are two cases here

  1. Feeding a pipe that takes input and returns output.
  2. Running an interactive process

For the former, you could just have a file or psuedo-file feed the Popen process instead of subprocess.PIPE. For the latter, you definitely need to trick your Popen process into thinking that it’s writing to a TTY, otherwise the buffering will kill you.

TBD

Reference

http://pymotw.com/2/subprocess/

http://sharats.me/the-ever-useful-and-neat-subprocess-module.html

http://pexpect.readthedocs.org/en/latest/FAQ.html#whynotpipe

 

SCons Environment in depth, part 3

I’m going to focus on the Microsoft toolchain, with the aim of being able to put a Microsoft toolchain into a package that can be loaded at build time. The plus side to this is that you don’t need toolchains installed to systems, but it require a little finagling of SCons. And to do that, we need to understand what it’s doing. I covered individual Microsoft-specific tools in the past part, but in isolation, and with less understanding than I have now. So, onwards.

Note – this is super-sketchy and should be filled in. I started keeping notes for myself as I was working on Visual-C++-in-a-package, and afer the initial exploration, I started working. I need to circle back and update this.

How does SCons configure Microsoft Visual C++?

There is a debugging environment variable that you can set that will enable some SCons spew from Tool/MSCommon/vc.py. If you do that with a simple SConstruct

env = Environment(tools=[], platform='win32', MSVC_VERSION='11.0')

then you’ll get some output that will guide you. Since we’re trying to use specific Microsoft products, there are well-known registry keys pointing to each version. Visual Studio 2012 has a registry key pointing to the on-disk location for Visual C++:

Software\Wow6432Node\Microsoft\VisualStudio\11.0\Setup\VC\ProductDir = C:\dev\VC11\VC\

If you don’t specify a Visual C++ version SCons will enumerate every possible version of Visual Studio going back to to the dawn of time, and then pick the first one it finds – since the list it searches is ordered from newest to oldest, this will find the most recent Visual C++ that you have installed.

If you do this while specifying a specific Visual C++ version, you’ll see that it skips the registry scanning and goes straight to enumerating the hard disk. However, something later forgets this, and it scans anyway. This is because vc.msvc_exists() is defective – it uses the (cached) list of versions as proof that Visual C++ exists, but nothing set it up for the case where you bypass it. This is an easy fix. I’ll add to the list of things I want to patch.

Another nit is that find_vc_pdir is not memoized – it’s called at least three times during setup. The only reason I care is that SCons on Linux (even in a VM) is about 0.5 sec faster at startup than on Windows – this might be Python overhead on the two systems, or it could be the Microsoft tools init. I’ll profile it at some point.

Then it finds the magic BAT file that Microsoft supplies for command-line use, that sets up all the environment variables that the toolchains need to run. There is an boolean environment variable MSVC_USE_SCRIPT that lets you disable the use of the Microsoft script – if this is set to False (it defaults to True), then SCons assumes you have done all the setup yourself.

And it scans for installed SDKs. This part is missing a preconfigure step to let you select a specific SDK. In general, SDKs are loosely coupled with the Visual Studio install, but only very loosely.

Visual C++ vcvarsall.bat

This is a batch file that Microsoft has been supplying for a while, as a convenience for configuring an environment for building with Visual C++. It takes an optional architecture parameter that if not supplied defaults to ‘x86′. And if you’re curious, this just runs a different batch file at \bin\amd64\vcvars64.bat, and this  makes registry queries and calls another batch file, Common7\Tools\VCVarsQueryRegistry.bat, which does most of the real work.

If you run it like this

C:\Program Files (x86)\Microsoft Visual Studio 11.0\VC\vcvarsall.bat amd64

then it will set the following environment variables:

CommandPromptType=Native
ExtensionSdkDir=C:\Program Files (x86)\Microsoft SDKs\Windows\v8.0\ExtensionSDKs
Framework35Version=v3.5
FrameworkDir=C:\windows\Microsoft.NET\Framework64
FrameworkDIR64=C:\windows\Microsoft.NET\Framework64
FrameworkVersion=v4.0.30319
FrameworkVersion64=v4.0.30319
FSHARPINSTALLDIR=C:\Program Files (x86)\Microsoft SDKs\F#\3.0\Framework\v4.0\
INCLUDE=C:\dev\VC11\VC\INCLUDE;
  C:\dev\VC11\VC\ATLMFC\INCLUDE;
  C:\Program Files (x86)\Windows Kits\8.0\include\shared;
  C:\Program Files (x86)\Windows Kits\8.0\include\um;
  C:\Program Files (x86)\Windows Kits\8.0\include\winrt;
LIB=C:\dev\VC11\VC\LIB\amd64;
  C:\dev\VC11\VC\ATLMFC\LIB\amd64;
  C:\Program Files (x86)\Windows Kits\8.0\lib\win8\um\x64;
LIBPATH=C:\windows\Microsoft.NET\Framework64\v4.0.30319;
  C:\windows\Microsoft.NET\Framework64\v3.5;
  C:\dev\VC11\VC\LIB\amd64;
  C:\dev\VC11\VC\ATLMFC\LIB\amd64;
  C:\Program Files (x86)\Windows Kits\8.0\References\CommonConfiguration\Neutral;
  C:\Program Files (x86)\Microsoft SDKs\Windows\v8.0\ExtensionSDKs\Microsoft.VCLibs\11.0\References\CommonConfiguration\neutral;
Path=C:\dev\VC11\Common7\IDE\CommonExtensions\Microsoft\TestWindow;
  C:\dev\VC11\VC\BIN\amd64;
  C:\windows\Microsoft.NET\Framework64\v4.0.30319;
  C:\windows\Microsoft.NET\Framework64\v3.5;
  C:\dev\VC11\VC\VCPackages;
  C:\dev\VC11\Common7\IDE;
  C:\dev\VC11\Common7\Tools;
  C:\Program Files (x86)\HTML Help Workshop;
  C:\dev\VC11\Team Tools\Performance Tools\x64;
  C:\dev\VC11\Team Tools\Performance Tools;
  C:\Program Files (x86)\Windows Kits\8.0\bin\x64;
  C:\Program Files (x86)\Windows Kits\8.0\bin\x86;
  C:\Program Files (x86)\Microsoft SDKs\Windows\v8.0A\bin\NETFX 4.0 Tools\x64;
  C:\Program Files (x86)\Microsoft SDKs\Windows\v7.0A\Bin\x64;
  C:\Program Files (x86)\Microsoft SDKs\Windows\v8.0A\bin\NETFX 4.0 Tools;
  C:\Program Files (x86)\Microsoft SDKs\Windows\v7.0A\Bin\
Platform=X64
VCINSTALLDIR=C:\dev\VC11\VC\
VisualStudioVersion=11.0
VSINSTALLDIR=C:\dev\VC11\
WindowsSdkDir=C:\Program Files (x86)\Windows Kits\8.0\
WindowsSdkDir_35=C:\Program Files (x86)\Microsoft SDKs\Windows\v7.0A\Bin\
WindowsSdkDir_old=C:\Program Files (x86)\Microsoft SDKs\Windows\v8.0A\

If environment variables already exist, it prepends to them.

Now, this may not be entirely accurate, because I had a few environment variables already set for some reason (I’m assuming the Visual Studio installer did this)

VS100COMNTOOLS=C:\Program Files (x86)\Microsoft Visual Studio 10.0\Common7\Tools\
VS110COMNTOOLS=C:\Program Files (x86)\Microsoft Visual Studio 11.0\Common7\Tools\

I removed these from an environment and ran vcvars64.bat for VC11, and got the VS110COMNTOOLS environment variable. I think this comes from the “Visual Studio Tools” folder which contains Spy++ and other top-level tools that you would run from the IDE, not as part of the build environment.

This may be a side-light to you, but I want to package Visual C++ into a downloadable tool that is used by the build system to allow builds on arbitrary machines. Yes, we’ll have to make sure we only do this where we’re appropriately licensed.

HKLM\SOFTWARE\Microsoft\Microsoft SDKs\Windows\v8.0\InstallationFolder

Path to the installed Windows SDK, put into environment variable WindowsSdkDir. The default is C:\Program Files (x86)\Windows Kits\8.0\

Alternate locations

  • HKCU\SOFTWARE\Microsoft\Microsoft SDKs\Windows\v8.0\InstallationFolder
  • HKLM\SOFTWARE\Wow6432Node\Microsoft\Microsoft SDKs\Windows\v8.0\InstallationFolder
  • HKCU\SOFTWARE\Wow6432Node\Microsoft\Microsoft SDKs\Windows\v8.0\InstallationFolder

HKLM\SOFTWARE\Microsoft\Microsoft SDKs\Windows\v8.0A\InstallationFolder

Path to an older Windows SDK (for Visual Studio 2012), put into environment variable WindowsSdkDir_old.

Alternate locations

  • HKCU\SOFTWARE\Microsoft\Microsoft SDKs\Windows\v8.0a\InstallationFolder
  • HKLM\SOFTWARE\Wow6432Node\Microsoft\Microsoft SDKs\Windows\v8.0a\InstallationFolder
  • HKCU\SOFTWARE\Wow6432Node\Microsoft\Microsoft SDKs\Windows\v8.0a\InstallationFolder

Environment variables

Microsoft build tools need to have some environment variables set up.

PATH

PATH needs to contain the paths to the various tools that will be invoked. For example, it might look something like this. I edited a tiny bit for clarity, where C:\dev\VC11 is the installation folder for Visual Studio 2012 (typically C:\Program Files (x86)\Microsoft Visual Studio 2011), and C:\dev\SDKs is the installation folder for Microsoft SDKs (typically C:\Program Files (x86)\Microsoft SDKs).

'PATH':
  C:\dev\VC11\Common7\IDE\CommonExtensions\Microsoft\TestWindow
  C:\dev\VC11\VC\BIN\amd64
  C:\windows\Microsoft.NET\Framework64\v4.0.30319
  C:\windows\Microsoft.NET\Framework64\v3.5
  C:\dev\VC11\VC\VCPackages
  C:\dev\VC11\Common7\IDE
  C:\dev\VC11\Common7\Tools
  C:\dev\VC11\Team Tools\Performance Tools\x64
  C:\dev\VC11\Team Tools\Performance Tools
  C:\Program Files (x86)\Windows Kits\8.0\bin\x64
  C:\Program Files (x86)\Windows Kits\8.0\bin\x86
  C:\dev\SDKs\Windows\v8.0A\bin\NETFX 4.0 Tools\x64
  C:\dev\SDKs\Windows\v7.0A\bin\x64
  C:\dev\SDKs\Windows\v8.0A\bin\NETFX 4.0 Tools
  C:\dev\SDKs\Windows\v7.0A\bin\
  C:\windows\System32

As mentioned above, the paths come from executing vcvarsall.bat.