Category Archives: Operating systems

Fun with Mac/Unix binaries

$PATH and its directories

As of Mac OS X 10.7, there is a magic file named /private/etc/paths that contains the initial list of directories for the $PATH variable. It looks like this on a clean install:

/usr/bin
/bin
/usr/sbin
/sbin
/usr/local/bin

Each newline is turned into a ‘:’ character so that $PATH looks like /usr/bin:/bin:/usr/sbin:/sbin:/usr/local/bin.

There is also a directory named /private/etc/paths.d/, which contains an arbitrary number of files that also contain entries for the $PATH variable. The files are read in alphabetic order and their contents catenated to the $PATH variable. On my system, I have a 50-X11 file and a git file, because I installed X11 (probably when I installed Mac OS X 10.7) and then I installed a new version of git from https://code.google.com/p/git-osx-installer/. As a result, my $PATH looks like this: /usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbin:/usr/X11/bin:/usr/local/git/bin.

Some people would suggest that /usr/local/bin come first, but that really doesn’t fix problems, at least not for complex programs like git, because, as you can see, the git that I installed to /usr/local was actually installed as a directory named /usr/local/git, and that folder has a bin folder that needs to be in the path. And I like that there is a paths.d directory so that new programs can be installed and removed easily.

What this means is that you can’t install a new version of a program without removing the old version. XCode 4.0 installed git – this is a good thing. But it installed git to the main set of directories: /usr/bin, usr/libexec and so on. So, if I want my new git to take precedence, I have to either install it on top of the existing one (a bit messy), mess with a bunch of paths to get it to be seen first, or… remove the old one. See below for that, but first a note on some other Unix filesystem bits.

MAN pages

In olden days of yore, all the man pages were installed to /usr/share/man. However, that was then and this is now – man pages have a system like $PATH where new programs can keep their man pages in their own hierarchy, but stitch them together so that the man viewer can find them.

First off, there is a config file for man, located by default at /private/etc/man.conf. This contains the default list of directories for man to search. As with the other parts, you can edit this directly, but you then run the dual risks of having your changes be wiped out by someone else changing this file, or by not being able to easily uninstall specific man files.

Second, there is a file just for man page paths, at /private/etc/manpaths, and there is a directory containing files that contain man file paths at /private/etc/manpaths.d; this is the same mechanism as used to set $PATH, just with different config files. This means that man.conf should never need to be edited. My /private/etc/manpaths looks like this

/usr/share/man
/usr/local/share/man

We have the same issue with man pages that we do with $PATH – if we install man pages for a newer version of a program that’s already installed, we won’t see our new man pages if the already-installed program is higher in the man paths hierarchy. And the solutions are the same as above – install on top, fiddle with the basic manpaths.d file, or remove the older program.

libexec

Note that this is BSD-centric, and that includes Mac; many Linux distributions don’t do this, and it’s fallen out of the latest FHS. On Linux, git-core is in /usr/lib/libexec/, and not /usr/libexec/git-core/.

There are a hierarchy of programs that actually run from a libexec path, and here I don’t know how this is extended. In git, for example, all the old “git-something” programs are in libexec, and most of them are just symlinked or hardlinked to the main git executable (e.g. when Apple built and installed git, it used hardlinks, whereas the googlecode Mac installer uses symlinks, same thing really).

The original intention of libexec was “a directory that contains daemons and utilities that can’t be used directly by the user”. These are not in the $PATH, they are magically located by other programs that just know where they are. I’m assuming that these other programs have the paths hardcoded in source, or are working from paths relative to their location. And knowing how Unix programs typically work, the paths are probably determined at build time and built into binaries.

Updating git in 10.7

With all that said, there are really only two good ways to update git in 10.7 once you’ve installed XCode 4.

  • build from source and install into /usr/bin
  • run the googlecode installer and delete the existing version in /usr/bin

I decided to initially do the latter, and what I actually did was to write a quick script to move the git in a system folder out of the way – a script because there are too many files scattered in too many folders to want to do it by hand, and then theoretically I could put this version of git back in place if it was necessary (maybe an XCode upgrade would be confused if it saw files missing?).

Writing this was interesting if you want to preserve hard links (Apple’s install of git uses hardlinks of libexec git aliases to the git binary, instead of symlinks), and if you want to transpose absolute symlinks so they still point to the same relative object).

For symlinks, there are several cases. First, the symlink could be an absolute path to something outside of the set of files you are moving; in that case, you want to leave the symlink alone. Second, the symlink could be a relative path to something in the set of files you are moving, and as long as you are moving the whole set somewhere else (e.g. preserving the local hierarchy, just moving the root), you want to leave the symlink alone. Third, the symlink could be an absolute path to something inside the set of files you are moving; in that case, you need to adjust the symlink so that it points to the new destination of the parent. Fourth, the symlink could be a relative path to something outside the set of files you are moving; this is probably an error in that you should have moved the parent too, but if not, you need to either turn this to an absolute path, or adjust the relative path so it is still valid.

For hardlinks, it’s easy if the dest location is on the same filesystem as the source location; a mv will just move the dirent and leave it pointing to the same inode. If you’re moving across filesystems, it’s a lot more interesting; you need to pick one file as the original and copy it, and then hardlink all the other entries to that new inode.

I’ll have to create a separate writeup for this, because it’s likely the same kinds of things that archive programs might do, and it’s something that would be interesting to abstract out to a new kind of file operation; while less common, it’s still something people periodically do, move a related group of files. Preserving as much metadata as possible is always a good thing.

 

NFTS Alternate Data Streams

NTFS Streams were introduced in Windows NT 3.1 to enable Services for Macintosh (SFM) to store Macintosh resource forks and finder information. At a technical level, the implementation was cool, because it was a generic solution that could be expanded. Microsoft did very little with it over the years, until very recently. It’s been a source of annoyance and even used as a vector for viruses. You can write an executable to an alternative data stream of any file, and then even execute it, and the stock Windows file system tools don’t really acknowledge the existence of alternative data streams: explorer.exe and others just show the size of the default stream.

Sadly, Microsoft has abandoned SFM as of Windows Server 2008, but third parties such as ExtremeZ-IP still offer support, by using alternate data streams. Mac literature for this refers to it just as “named streams”.

Mac OS X v10.5 and up writes Mac metadata and resource forks to named streams now, instead of using AppleDouble ._<filename> files. You can enable or disable this as a default, or per mount point.

There is one bug I’ve seen so far – if you’re copying a file with no data fork, this confuses Windows, because it doesn’t try to create the default stream before attaching the alternative data stream. The one case where I’ve seen this happen is in aliases copied to netatalk servers. The netatalk server stores its data using .AppleDouble folders, because it  expects to run on file systems that lack support for multiple streams.

Addendum

Resource forks

Streams

NTFS Alternate data streams

 

Windows 7 tweaks

Enable administrative shares on Windows 7

Add this registry key. This is formatted as a reg key, so you can save it to a file “localaccount.reg” and just run it from the shell.

Windows Registry Editor Version 5.00

[HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows\CurrentVersion\Policies\System]
"LocalAccountTokenFilterPolicy"=dword:00000001

This enables shares of the form C$, D$ etc. This assumes that “File and Printer Sharing” is enabled (you’ll find this in the Control Panel : Network and Internet : Network and Sharing Center : Advanced sharing settings”).

 

Handling “path too long” on Windows

This is one of my posts of “study the mistakes of others, so you can learn from them and not repeat them directly or indirectly”.

This is common to see, at least for me

The directory name y:\backup-2012-09-01-2230\Applications\Adobe Acrobat 9 Pro\Adobe Acrobat Pro.app\Contents\Plug-ins\PaperCapture.acroplugin\Contents\Frameworks\OCRLibrary.framework\Versions\A\Frameworks\iDRS.framework\Versions\A\Resources\Asian.framework\Versions\A\.AppleDouble is too long.

(ignore the fact I’m cataloging Mac apps on a Windows share).

The 260 character limit is so deeply embedded in the Windows system that it’s been almost impossible for Microsoft to eradicate it; it persists even into theoretically new frameworks like .NET (mainly because .NET isn’t actually a completely new system, it’s in many cases a thin shim on top of the older Win32 libraries). While there are APIs that let you access paths up to 32767 characters long, there’s enough bits in the system that use the older MAX_PATH limit that you really can’t do anything about it. So while you can use Windows NT-style paths in many cases (paths prefixed with \\?) in your own code, some core APIs in Windows use the old functions (like LoadLibrary, for example).

Note that you can use \\? paths with cmd.exe, BUT these must be drive letter paths, UNC paths don’t work (don’t know why, maybe one needs to use the NT file namespace or volume GUID path?), and it’s useless, because the resultant path is still subject to MAX_PATH limitations. You also can’t set the current working directory to a UNC path (unless you employ a registry hack, see Microsoft KB 156276).

There are a few workarounds I’ve used in the past:

  • using working directories
  • use of subst and/or pushd (auto-subst, don’t forget the matching popd)
  • hard links, junctions, and symbolic links (although these modify the filesystem)

All of these create new virtual hierarchies so that you can read a deeply nested file without needing a path longer than MAX_PATH; the first two can at best double the length, though.

There are some wacky things people have done in code.

Starting in Windows Vista, the shell starts shrinking individual elements in the path until the whole thing fits within MAX_PATH. This is why you can browse a really deep hierarchy in Explorer.exe, and usually copy it or delete it. This, of course, requires that NT short name creation is not disabled.

Of course, the real answer is “stop using Windows, and just use Unix (Linux or Mac OS X)”, but there’s still a lot of programs that run on Windows and not Unix…

Reference