Finding overloaded files

Many systems have the concept of a search path, which is really a set of paths, used to find something. For example, Unix and Windows have a PATH environment variable, which is a list of directories that are searched one after the other to find a program to run.

Whenever you have a set of paths, you also have the chance that you’re overloading names. Sometimes this is good, because you can “patch” in the right file by arranging your search path properly. Sometimes this is bad, because you can hide files and not know it. You can find single overloads by using the where command on Windows or the which command on Unix.

Here’s a Python program to find all the items that you’re overloading. This handles both PATH-style (where you only reference objects by a base name) and INCLUDE-style, where there are subpaths in each base path.

When run without any parameters, this defaults to searching the PATH variable. If you run it with a set of –path parameters (or with a cmd file, one parameter per line), then it will search that set of paths. This should also work on Unix machines as-is (Linux and Mac).

"""
Find overloaded files. Default to env['PATH'], or search in supplied
set of folders.

TBD - change it to do sub-paths from each root, e.g. this would be
useful for finding overloaded include files or libraries.
"""

from __future__ import print_function

import argparse
import os
import sys

# -------------------------------------------------------------------------------------------------

def main():
    parser = argparse.ArgumentParser(
        description='Find overloaded files',
        fromfile_prefix_chars='@')
    parser.add_argument('--kind', default='PATH',
            help='the kind of search to do: PATH, INCLUDE, LIB, LIBPATH (default to PATH)')
    parser.add_argument('--path', action='append', help='path to search')
    parser.add_argument('--case-sensitive', help='do case-sensitive compares')

    args = parser.parse_args()

    # Fill in from os.environ if we didn't pass explicit paths in
    if args.path is None:
        args.path = []
        args.kind = args.kind.upper()
        if args.kind in os.environ:
            args.path = os.environ[args.kind].split(os.pathsep)

    run(args)

def run(args):

    # If this is not a PATH search, then we want sub-paths too
    subpaths = False if args.kind == 'PATH' else True

    # Find all files and paths to those files
    filemap = {}
    for base in args.path:
        if base == '':
            continue # this is a hack to fix empty paths

        if subpaths:
            print("searching in path %s" % base)
            for root, dirs, files in os.walk(base):
                for f in files:
                    epath = os.path.join(root, f)
                    suffix = epath[len(base)+1:]
                    if not args.case_sensitive:
                        suffix = suffix.lower()
                    if suffix not in filemap:
                        filemap[suffix] = []
                    filemap[suffix].append(epath)
        else:
            print("Looking in path %s" % base)
            entries = os.listdir(base)
            for entry in entries:
                epath = os.path.join(base, entry)
                if os.path.isfile(epath):
                    if not args.case_sensitive:
                        entry = entry.lower()
                    if entry not in filemap:
                        filemap[entry] = []
                    filemap[entry].append(epath)

    # Now output duplicates
    for f in filemap:
        if len(filemap[f]) > 1:
            print("File %s found in multiple paths:" % f)
            for subpath in filemap[f]:
                print("  %s" % subpath)

# -------------------------------------------------------------------------------------------------

main()

When I run this on my system, I find a number of overloaded files and some of these overloads are problematic; a different ordering in PATH would produce a different (and bad) runtime behavior.

>find-overloads.py
searching in C:\Apps\Araxis\Araxis Merge
searching in C:\Chocolatey\bin
searching in C:\Dev\Perl64\site\bin
searching in C:\Dev\Perl64\bin
searching in C:\Dev\Git\cmd
searching in C:\Dev\Git\bin
searching in C:\Dev\SlikSvn\bin
searching in C:\HashiCorp\Vagrant\bin
searching in C:\Program Files\Oracle\VirtualBox
searching in C:\Python27
searching in C:\Python27\Scripts
searching in C:\windows\system32
searching in C:\windows
searching in C:\windows\System32\Wbem
searching in C:\windows\System32\WindowsPowerShell\v1.0\
searching in C:\Program Files (x86)\NVIDIA Corporation\PhysX\Common
searching in C:\Program Files\Windows Imaging\
searching in C:\Program Files\Microsoft\Web Platform Installer\
searching in C:\Program Files (x86)\Microsoft ASP.NET\ASP.NET Web Pages\v1.0\
searching in C:\Program Files\Microsoft SQL Server\110\Tools\Binn\
searching in C:\Program Files (x86)\Microsoft SQL Server\100\Tools\Binn\
searching in C:\Program Files\Microsoft SQL Server\100\Tools\Binn\
searching in C:\Program Files\Microsoft SQL Server\100\DTS\Binn\
searching in C:\Program Files (x86)\Windows Kits\8.1\Windows Performance Toolkit\
File SQLSCM.DLL found in multiple paths:
  C:\Program Files (x86)\Microsoft SQL Server\100\Tools\Binn\SQLSCM.DLL
  C:\Program Files\Microsoft SQL Server\100\Tools\Binn\SQLSCM.DLL
File xmlrw.dll found in multiple paths:
  C:\Program Files\Microsoft SQL Server\110\Tools\Binn\xmlrw.dll
  C:\Program Files\Microsoft SQL Server\100\Tools\Binn\xmlrw.dll
File wimserv.exe found in multiple paths:
  C:\windows\system32\wimserv.exe
  C:\Program Files\Windows Imaging\wimserv.exe
File wimgapi.dll found in multiple paths:
  C:\windows\system32\wimgapi.dll
  C:\Program Files\Windows Imaging\wimgapi.dll
File find.exe found in multiple paths:
  C:\Dev\Git\bin\find.exe
  C:\windows\system32\find.exe
File explorer.exe found in multiple paths:
  C:\windows\system32\explorer.exe
  C:\windows\explorer.exe
File sort.exe found in multiple paths:
  C:\Dev\Git\bin\sort.exe
  C:\windows\system32\sort.exe
File dbghelp.dll found in multiple paths:
  C:\Apps\Araxis\Araxis Merge\dbghelp.dll
  C:\windows\system32\dbghelp.dll
File hh.exe found in multiple paths:
  C:\windows\system32\hh.exe
  C:\windows\hh.exe
File msvcr100.dll found in multiple paths:
  C:\Program Files\Oracle\VirtualBox\msvcr100.dll
  C:\windows\system32\msvcr100.dll
File SqlManager.dll found in multiple paths:
  C:\Program Files (x86)\Microsoft SQL Server\100\Tools\Binn\SqlManager.dll
  C:\Program Files\Microsoft SQL Server\100\Tools\Binn\SqlManager.dll
File SQLSVC.DLL found in multiple paths:
  C:\Program Files (x86)\Microsoft SQL Server\100\Tools\Binn\SQLSVC.DLL
  C:\Program Files\Microsoft SQL Server\100\Tools\Binn\SQLSVC.DLL
File write.exe found in multiple paths:
  C:\windows\system32\write.exe
  C:\windows\write.exe
File sqlresld.dll found in multiple paths:
  C:\Program Files (x86)\Microsoft SQL Server\100\Tools\Binn\sqlresld.dll
  C:\Program Files\Microsoft SQL Server\100\Tools\Binn\sqlresld.dll
File git.exe found in multiple paths:
  C:\Dev\Git\cmd\git.exe
  C:\Dev\Git\bin\git.exe
File msvcp100.dll found in multiple paths:
  C:\Program Files\Oracle\VirtualBox\msvcp100.dll
  C:\windows\system32\msvcp100.dll
File SqlResourceLoader.dll found in multiple paths:
  C:\Program Files (x86)\Microsoft SQL Server\100\Tools\Binn\SqlResourceLoader.dll
  C:\Program Files\Microsoft SQL Server\100\Tools\Binn\SqlResourceLoader.dll
File batchparser.dll found in multiple paths:
  C:\Program Files\Microsoft SQL Server\110\Tools\Binn\batchparser.dll
  C:\Program Files (x86)\Microsoft SQL Server\100\Tools\Binn\batchparser.dll
  C:\Program Files\Microsoft SQL Server\100\Tools\Binn\batchparser.dll
  C:\Program Files\Microsoft SQL Server\100\DTS\Binn\batchparser.dll
File regedit.exe found in multiple paths:
  C:\windows\system32\regedit.exe
  C:\windows\regedit.exe
File SEMMAP.DLL found in multiple paths:
  C:\Program Files (x86)\Microsoft SQL Server\100\Tools\Binn\SEMMAP.DLL
  C:\Program Files\Microsoft SQL Server\100\Tools\Binn\SEMMAP.DLL
File perl.exe found in multiple paths:
  C:\Dev\Perl64\bin\perl.exe
  C:\Dev\Git\bin\perl.exe
File SQLCMD.EXE found in multiple paths:
  C:\Program Files\Microsoft SQL Server\110\Tools\Binn\SQLCMD.EXE
  C:\Program Files\Microsoft SQL Server\100\Tools\Binn\SQLCMD.EXE
File bcp.exe found in multiple paths:
  C:\Program Files\Microsoft SQL Server\110\Tools\Binn\bcp.exe
  C:\Program Files\Microsoft SQL Server\100\Tools\Binn\bcp.exe
File notepad.exe found in multiple paths:
  C:\windows\system32\notepad.exe
  C:\windows\notepad.exe
File license.rtf found in multiple paths:
  C:\Apps\Araxis\Araxis Merge\license.rtf
  C:\windows\system32\license.rtf
  C:\Program Files\Microsoft\Web Platform Installer\license.rtf

For the most part, these collisions are ok, because of the search algorithm Windows uses to find DLLs

  • The directory containing the exe for the current process
  • The current directory
  • The Windows system directory – GetSystemDirectory()
  • The Windows directory – The GetWindowsDirectory()
  • The directories in PATH (Windows does not use LIBPATH)

OK, it’s not quite that simple, see http://msdn.microsoft.com/en-us/library/windows/desktop/ms682586(v=vs.85).aspx for more details.

For an executable, it’s simpler

  • The current directory
  • The directories in PATH

Some programs bundle copies of Windows DLLs either because the DLL might not be on all systems, or the right version of the DLL might not be on all systems. Since the first place searched is the current process’ executable, that works well. Note in the list above that my VirtualBox install, while theoretically injecting its version of the DLL, isn’t doing that in reality because the Windows system directory is searched before PATH. It’s still bad form to have it earlier in the path, though. See http://www.flounder.com/whereis.htm for a Windows-specific whereis program that handles the DLL search paths properly. I’ll update my program to do that properly for Windows and Unix.

Also note that my Perl install is at a higher priority than the Git tools – I need this, otherwise the Git Perl (intended for Git scripts) would be found instead of my Perl install, and that would be bad (the Git perl will automatically be used by Git internals). On the other hand, I have two Git tools – find and sort – that are overloading the Windows ones, but I’m fine with that.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>