NFTS Alternate Data Streams

NTFS Streams were introduced in Windows NT 3.1 to enable Services for Macintosh (SFM) to store Macintosh resource forks and finder information. At a technical level, the implementation was cool, because it was a generic solution that could be expanded. Microsoft did very little with it over the years, until very recently. It’s been a source of annoyance and even used as a vector for viruses. You can write an executable to an alternative data stream of any file, and then even execute it, and the stock Windows file system tools don’t really acknowledge the existence of alternative data streams: explorer.exe and others just show the size of the default stream.

Sadly, Microsoft has abandoned SFM as of Windows Server 2008, but third parties such as ExtremeZ-IP still offer support, by using alternate data streams. Mac literature for this refers to it just as “named streams”.

Mac OS X v10.5 and up writes Mac metadata and resource forks to named streams now, instead of using AppleDouble ._<filename> files. You can enable or disable this as a default, or per mount point.

There is one bug I’ve seen so far – if you’re copying a file with no data fork, this confuses Windows, because it doesn’t try to create the default stream before attaching the alternative data stream. The one case where I’ve seen this happen is in aliases copied to netatalk servers. The netatalk server stores its data using .AppleDouble folders, because it  expects to run on file systems that lack support for multiple streams.

Addendum

Resource forks

Streams

NTFS Alternate data streams

 

AppleSingle and AppleDouble file formats

AppleSingle and AppleDouble are methods to save the extra Mac metadata on systems that otherwise could not (SMB, NFS, NTFS, ext2, etc). AppleSingle and AppleDouble are the same file format; the only difference is that the data fork is stored as a separate file in AppleDouble and included in the AppleSingle file. As such, AppleSingle is more intended for archive or transmission, whereas AppleDouble allows for non-Mac applications to easily access the data fork, ignoring the metadata. Since most applications store their file’s data in the data fork, especially for anything considered to be cross-platform, this means AppleDouble is a fairly useful system.

There are two conventions for AppleDouble files’ naming. The data fork is generally stored with the name of the file itself. The metadata is stored either in a folder named .AppleDouble parallel with the file (and the metadata itself named the same as the file inside that folder) or the metadata is stored in a file name prefixed with “._” next to the file itself. In the .AppleDouble folder case, the .AppleDouble folders also generally have the .DS_Store file (contains directory metadata unique to a Mac) and a .Parent file, which is also a AppleSingle-format file containing metadata about the parent directory itself (name, dates, icon, finder info). Interestingly, the .DS_Store file is also a AppleSingle-format file, and I’m not sure why there are both. Time to look in the source?

Here’s the appledouble.h header file from opensource.apple.com (pretty sure I can reproduce this in full since it’s under Apple’s open-source license):

/* Information pulled from:
 * "AppleSingle/AppleDouble Formats for Foreign Files Developer's Note"
 * (c) Apple Computer 1990
 * File assembled by Rob Braun (bbraun@synack.net)
 */

#ifndef __APPLEDOUBLE__
#define __APPLEDOUBLE__

#include <sys/types.h>
#include 

/* Structure of an AppleSingle file:
 *   ----------------------
 *   | AppleSingleHeader  |
 *   |--------------------|
 *   | ASH.entries # of   |
 *   | AppleSingleEntry   |
 *   | Descriptors        |
 *   |         1          |
 *   |         .          |
 *   |         .          |
 *   |         n          |
 *   |--------------------|
 *   |   Datablock 1      |
 *   |--------------------|
 *   |   Datablock 2      |
 *   |--------------------|
 *   |   Datablock n      |
 *   ----------------------
 */

struct AppleSingleHeader {
	uint32_t     magic;       /* Magic Number (0x00051600 for AS) */
	uint32_t     version;     /* Version #.  0x00020000 */
	char         filler[16];  /* All zeros */
	uint16_t     entries;     /* Number of entries in the file */
};

#define XAR_ASH_SIZE 26   /* sizeof(struct AppleSingleHeader) will be wrong
                           * due to padding. */

#define APPLESINGLE_MAGIC 0x00051600
#define APPLEDOUBLE_MAGIC 0x00051607

#define APPLESINGLE_VERSION 0x00020000
#define APPLEDOUBLE_VERSION 0x00020000

struct AppleSingleEntry {
	uint32_t     entry_id;    /* What the entry is.  See defines below */
	uint32_t     offset;      /* offset of data, offset beginning of file */
	uint32_t     length;      /* length of data.  can be 0 */
};

/* Valid entry_id values */
/* Entries 1, 3, and 8 are typically created for all files.
 * Macintosh Icon entries are rare, since those are typically in the resource 
 * fork.
 */
#define AS_ID_DATA       1  /* Data fork */
#define AS_ID_RESOURCE   2  /* Resource fork */
#define AS_ID_NAME       3  /* Name of the file */
#define AS_ID_COMMENT    4  /* Standard Macintosh comment */
#define AS_ID_BWICON     5  /* Standard Macintosh B&W icon */
#define AS_ID_COLORICON  6  /* Standard Macintosh Color icon */
/* There is no 7 */
#define AS_ID_DATES      8  /* File creation date, modification date, etc. */
#define AS_ID_FINDER     9  /* Finder Information */
#define AS_ID_MAC       10  /* Macintosh File information, attributes, etc. */
#define AS_ID_PRODOS    11  /* ProDOS file information */
#define AS_ID_MSDOS     12  /* MS-DOS file information */
#define AS_ID_SHORTNAME 13  /* AFP short name */
#define AS_ID_AFPINFO   14  /* AFP file information */
#define AS_ID_AFPDIR    15  /* AFP directory id */
/* 1-0x7FFFFFFF are reserved by Apple */

/* File Dates are stored as the # of seconds before or after
 * 12am Jan 1, 2000 GMT.  The default value is 0x80000000.
 */
struct MacTimes {
	uint32_t  creation;
	uint32_t  modification;
	uint32_t  backup;
	uint32_t  access;
};

/* Finder Information is two 16 byte quantities. 
 * Newly created files have all 0's in both entries.
 */

/* Macintosh File Info entry (10) a 32 bit bitmask. */

/* Entries can be placed in any order, although Apple recommends:
 * Place the data block (1) last.
 * Finder Info, File Dates Info, and Macintosh File Info first.
 * Allocate resource for entries in 4K blocks.
 */

/* AppleDouble files are simply AppleSingle files without the data fork.
 * The magic number is different as a read optimization. 
 */

#endif /* __APPLEDOUBLE__ */

The old tech note describing the AppleSingle and AppleDouble file formats is AppleSingle/AppleDouble Formats: Developer’s Note, but the URL has moved several times, so I’ll try to not rely on it. And this should all maybe go in Wikipedia?

I’m going to write a quick AppleSingle disassembler. Here it is

#!/usr/bin/perl
# =======================================================================================
# dump-applesingle.pl
#
# disassemble an AppleSingle file to text.
#
# Maybe I should call this dump-appledouble since that's the more likely file to
# encounter?
# =======================================================================================

use 5.014; # also does strict
use warnings;
use feature ':5.14'; # I think this is redundant?

use Getopt::Long qw();

Dump->new->run();

# =======================================================================================
# =======================================================================================

package Dump;
use parent -norequire, 'Object';

# ---------------------------------------------------------------------------------------

sub run
{
    my ($self) = @_;

    $self->{'option'} = Option->new()->read_options();
    $self->{'diag'} = Diag->new()->init($self->{'option'}->{'verbose'});

    $self->dump();
}

sub dump
{
    my ($self) = @_;

    foreach my $file (@{$self->{'option'}->{'files'}})
    {
        $self->{'file'} = $file;
        $self->{'diag'}->status("Parsing $file\n");
        local $/ = undef;
        open FILE, "{'blob'} = ;
        close FILE;
        $self->dump_blob();
    }
}

sub dump_blob
{
    my ($self) = @_;

    my $header = substr($self->{'blob'}, 0, 26);
    my @header = unpack("L> L> x[16] S>", $header);
    my $magic = $header[0] == 0x51600 ? "APPLESINGLE_MAGIC"
                : $header[0] == 0x51607 ? "APPLEDOUBLE_MAGIC"
                : "** ERROR **";
    my $version = $header[1] == 0x20000 ? "APPLESINGLE_VERSION" : "** ERROR **";

    my $blob_length = length($self->{'blob'});
    print "\n$self->{'file'}\nlength=$blob_length\n";
    print         "      AppleSingleHeader\n";
    print         "      {\n";
    print sprintf("%04X:     uint32_t  magic      = 0x%08X  $magic\n", 0, $header[0]);
    print sprintf("%04X:     uint32_t  version    = 0x%08X  $version\n", 4, $header[1]);
    print sprintf("%04X:     char      filler[16] = {0}\n", 8);
    print sprintf("%04X:     uint16_t  entries    = %d\n", 24, $header[2]);
    print         "      }\n";

    my $count = $header[2];
    my @descriptors = unpack("(L> L> L>)[$count]", substr($self->{'blob'}, 26, $count * 12));

    # some of these are from the netatalk source, their adouble.h file

    my @annotated_offsets;
    print "\n";
    foreach my $i (0 .. $count-1)
    {
        print "        entry[$i] =\n";
        print "        {\n";

        my $addr = 26 + 12 * $i;
        my $entry_id = $descriptors[3*$i + 0];
        my $entry_id_str = ($entry_id == 1) ? " (AS_ID_DATA)"
                            : ($entry_id == 2) ? " (AS_ID_RESOURCE)"
                            : ($entry_id == 3) ? " (AS_ID_NAME)"
                            : ($entry_id == 4) ? " (AS_ID_COMMENT)"
                            : ($entry_id == 5) ? " (AS_ID_BWICON)"
                            : ($entry_id == 6) ? " (AS_ID_COLORICON)"
                            : ($entry_id == 8) ? " (AS_ID_DATES)"
                            : ($entry_id == 9) ? " (AS_ID_FINDER)"
                            : ($entry_id == 10) ? " (AS_ID_MAC)"
                            : ($entry_id == 11) ? " (AS_ID_PRODOS)"
                            : ($entry_id == 12) ? " (AS_ID_MSDOS)"
                            : ($entry_id == 13) ? " (AS_ID_SHORTNAME)"
                            : ($entry_id == 14) ? " (AS_ID_AFPINFO)"
                            : ($entry_id == 15) ? " (AS_ID_AFPDIR)"
                            : ($entry_id == 0x80444556) ? " (AD_DEV - netatalk)"
                            : ($entry_id == 0x80494E4F) ? " (AD_INO - netatalk)"
                            : ($entry_id == 0x8053594E) ? " (AD_SYN - netatalk)"
                            : ($entry_id == 0x8053567E) ? " (AD_ID - netatalk)" : "";
        my $offset = $descriptors[3*$i + 1];
        my $length = $descriptors[3*$i + 2];

        my $entry_val_and_label = ($entry_id < 16)                 ? sprintf("%d$entry_id_str", $entry_id)                 : sprintf("0x%08X$entry_id_str", $entry_id);                  push @annotated_offsets, [$offset, $length, $entry_val_and_label];                  print sprintf("%04X:       uint32_t  entry_id = $entry_val_and_label\n", $addr + 0);         print sprintf("%04X:       uint32_t  offset   = 0x%08X\n", $addr + 4, $offset);         print sprintf("%04X:       uint32_t  length   = %d\n", $addr + 8, $length);         print "        }\n";     }          my @sorted_offsets = sort { $a->[0]  $b->[0]} @annotated_offsets;
    my $pos = 26 + $count * 12;

    my $bytes = 0;
    my $need_newline = 0;
    while ($pos < $blob_length)     {         # if we have run into one of the objects, print its name         if ($pos == $sorted_offsets[0]->[0])
        {
            print "\n" if $need_newline;
            $need_newline = 0;
            $bytes = 0;
            print sprintf("\n        $sorted_offsets[0]->[2]  length = %d\n",  $sorted_offsets[0]->[1]);
            shift @sorted_offsets;
        }

        if ($bytes == 16) { print "\n"; $need_newline = 0; $bytes = 0; }
        print sprintf("%04X:", $pos) unless $need_newline;
        print sprintf(" %02X", unpack("C", substr($self->{'blob'}, $pos, 1)));
        $need_newline = 1;

        $bytes += 1;
        $pos += 1;
    }
}

# =======================================================================================
# Diag - diagnostic output
#
# char, line: print if verbose >= 3 (-v -v)
# status: print if verbose >= 2 (-v)
# progress: print if verbose > 0 (no option)
# =======================================================================================

package Diag;
use parent -norequire, 'Object';

sub init
{
    my $self = shift;
    $self->{'diag'} = $_[0] // 0;
    $self->{'need_newline'} = 0;
    $self->{'last_progress'} = 0;
    return $self;
}

sub char
{
    my ($self, $char) = @_;
    return unless $self->{'diag'} > 2;

    print STDERR $char;
}

sub line
{
    my ($self, $line) = @_;
    return unless $self->{'diag'} > 2;

    print STDERR "\n" and $self->{'need_newline'} = 0 if $self->{'need_newline'};
    print STDERR $line;
    $self->{'need_newline'} = 1 unless substr($line, -1) eq "\n";
}

sub status
{
    my ($self, $line) = @_;
    return unless $self->{'diag'} > 1;

    print STDERR "\n" and $self->{'need_newline'} = 0 if $self->{'need_newline'};
    print STDERR $line;
    $self->{'need_newline'} = 1 unless substr($line, -1) eq "\n";
}

# =======================================================================================
# Option - parse command-line options
# =======================================================================================

package Option;
use parent -norequire, 'Object';

sub read_options
{
    my ($self) = @_;

    my @opts = ('quiet|q', 'verbose|v+');
    $self->{'verbose'} = 1;

    my $status = Getopt::Long::GetOptions($self, @opts);
    @{$self->{'files'}} = @ARGV; # anything left is a file operand
    $self->{'verbose'} = 0 if $self->{'quiet'};

    if ($self->{'verbose'} > 2)
    {
        print STDERR "options:\n";
        map { print STDERR "  $_ = $self->{$_}\n"; } sort keys %$self;
    }

    return $self;
}

# =======================================================================================
# Object - base class
# =======================================================================================

package Object;

sub new
{
    my $self = shift;
    my ($class) = ref($self) || $self; # allow both virtual (member) and static (class)
    $self = {};
    bless $self, $class;

    return $self;
}

# =======================================================================================

=head1 NAME

dump-applesingle: disassemble AppleSingle/AppleDouble files

=head1 SYNOPSIS

dump-applesingle [options] [args]

dump-applesingle -v ._appledoublefile

=head1 DESCRIPTION

dump-applesingle disassembles AppleSingle and AppleDouble files into a textual
representation. Currently that's all it does, but it will probably evolve into
something to do bulk operations on them (find, change, delete, create etc)

=cut

I guess I need to (1) get some WordPress plugins to format code better and (2) change my theme so that people can view code better (narrow margins are good for text but not quiet as good for code).

More links

Windows 7 tweaks

Enable administrative shares on Windows 7

Add this registry key. This is formatted as a reg key, so you can save it to a file “localaccount.reg” and just run it from the shell.

Windows Registry Editor Version 5.00

[HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows\CurrentVersion\Policies\System]
"LocalAccountTokenFilterPolicy"=dword:00000001

This enables shares of the form C$, D$ etc. This assumes that “File and Printer Sharing” is enabled (you’ll find this in the Control Panel : Network and Internet : Network and Sharing Center : Advanced sharing settings”).

 

Western Digital MyBook Live tweaks

The MyBook Live series is based around Linux, so you can fix some of WD’s design decisions and improve performance. You can do just about anything, but there are some simple performance improvements you can do.

Enable SSH

Assuming you are blocking port 22 from outside your network, enable SSH by logging in to the web dashboard, then switch to the page UI/SSH, and enable from there. If your NAS had the address 192.168.1.101, you would first go to

http://192.168.1.101

log in, then go to

http://192.168.1.101/UI/ssh

to enable SSH. At that point, you can ssh to your box; ssh is built into Mac and Linux boxes, but you’ll need to use Putty or some other client on Windows.

I got this from What are the steps to enable SSH from the WD Community forums.

Fix monitorio.sh

WD has a monitor script as a daemon. That in itself is fine, but it periodically does an ls of the entire drive to build up some simple statistics. Once your drive has a large number of files on it, this will cripple performance. Since I don’t ever look at their stats, I prefer to disable this part of monitorio.sh. I want to repeat this: if you do this, then the web front end will no longer show accurate disk space usage (it will show whatever you had up to the point where you nerf the file_tally function).

I suggest the following: ssh to your NAS, edit /usr/local/sbin/monitorio.sh, rename file_tally to file_tally_old, and insert an empty file_tally function. Your file will look like this:

file_tally() {}

file_tally_old() {
        if [ ! -p $TALLY_PIPE]; then

You’ll want to restart the monitorio daemon (or reboot the NAS, which is more dramatic).

/etc/init.d/monitorio restart

You can also use ps and kill to stop any existing ls process if you’re impatient.

I didn’t figure this out, I got it from Solving the MyBook Live insane load.

Change to CFQ scheduler

Some people think that performance is better if you switch to the CFQ scheduler. This will definitely depend on how you use the NAS. WD has defaulted to the Anticipatory scheduler. You can check to see what yours is set at.

cd /sys/block/sda/queue
cat scheduler

and if you see

noop [anticipatory] deadline cfq

then your NAS is currently using the Anticipatory scheduler. You can switch by writing CFQ to the scheduler file like this

echo cfq >scheduler

And, if you have multiple drives in your WD Live box (for example, a Live Duo), you’ll need to do this for sdb as well.

I got this from Performance problems? Read this first.

You can read more about the various scheduling algorithms in this Red Hat page: Choosing an I/O scheduler for Red Hat Enterprise Linux 4 and the 2.6 Kernel. At some point, this will be out of date, but it was valid as of late 2012.

 

FAQ

FAQ

  • How do I show hidden files in the Mac OS X Finder
  • What are .AppleSingle and .AppleDouble files and folders?
  • Why do I see “short names” from Windows for some files? (copied to a file server from a Mac)
  • How do I unlock files (in bulk) on the Mac

How do I show hidden files in the Mac OS X Finder?

defaults write com.apple.finder AppleShowAllFiles TRUE
killall Finder

But this only controls showing “dot” files (like .git). Finder Info might also exist and have the hidden bit set (FinderInfo came from the classic Mac OS system). Or, an ACL might exist that doesn’t allow browsing for the current user. Files with a ‘@’ appended to the file mode means “this file has extended attributes”, and if there’s a ‘+’ appended, the file has an ACL. You can show the size and kind of the extended attributes with a ‘@’ parameter to ls, and you can show specific FinderInfo attributes in a friendlier fashion with a ‘O’ parameter.

bfitzair:~ ls -l@Oed /Users/bfitz
...
drwx------@ 41 bfitz  staff  hidden 1394 Sep 23 21:03 Library
        com.apple.FinderInfo      32
 0: group:everyone deny delete
...

You can manipulate some flags with chflags

chflags nohidden /path/to/file

or you can remove the FinderInfo from extended attributes with

xattr -d com.apple.FinderInfo /path/to/file

You can manipulate ACLs with chmod. It’s moderately complicated to edit them (see the man page for chmod), but you can nuke one by using the -N

sudo chmod -N /path/to/file

 What are .AppleSingle and .AppleDouble files and folders?

These are files (or files in folders) that preserve Mac resource forks and extended attributes that cannot be otherwise preserved on the file system. For example, if you copy Mac files to a NFS or SMB file server volume, the Mac will by default create these for files with resource forks or extended attributes.

If you don’t ever want this, you can disable this

defaults write com.apple.desktopservices DSDontWriteNetworkStores true

This might be the drastic fix you need, but you’ll be unable to save classic Mac files (which generally have resource forks) or you’ll lose file attributes that you might actually want.

http://en.wikipedia.org/wiki/AppleSingle_and_AppleDouble_formats

AppleSingle combined everything into one file, in a way similar to the older MacBinary. This has an advantage (single file) but a big disadvantage in that the data fork (usually the one most applications care about) is now shrouded inside the AppleSingle file.

AppleDouble separates the data fork and resource fork + finderinfo into two files. In some instances, the second file is stored as ._{filename}. In other cases, it’s stored in .AppleDouble/{filename}.

In times past, I actually wrote code different platforms to work on their file formats (e.g. converting Apple IIgs resource forks to and from Mac resource forks, or to even do the same between Windows and Mac). It’s a lot easier when you have AppleDouble-style layout.

Why do I see “short names” from Windows for some files? (copied to a file server from a Mac)

Windows has a slightly more strict view of what characters can be in a file name. Unix is pretty permissive, and only bans \0 (nul) and ‘/’ (and even the directory separator can be stored in the low-level file system, it’s just a pain to manipulate it if you do that). However, Windows disallows the following characters from being in a leaf name (file or directory)

\ / : * ? " < > |

(the explorer shell also disallows the typing of any “control” character, e.g a character <= \x1F, but these can be used in filenames). The Mac is a lot more permissive; for example, ‘|’ is a legal character in leaf names.

It gets slightly weird due to a compatibility decision in the transition from Mac OS 9 (“Classic Mac”) to Mac OS X. Before Mac OS X, Apple used ‘:’ as the path component separator character. However, Unix uses ‘/’. Apple decided that the file system would continue to use ‘/’, but the Finder would display these as ‘:’. And conversely, any files with ‘/’ characters in them would be stored in the underlying filesystem as ‘:’ characters.

Now, both characters are not legal for Windows. But SMB will accept the full path and store it, which the Mac side can use. Windows, however, doesn’t like seeing a file name with ‘:’ in it, so you get the synthesized short name when you view the file on a Windows machine. This is better than not being able to store files (from the Mac) or work on them (from Windows), but it’s awkward.

The only solution, if you want to work on files conveniently from both operating systems, is to rename files and directories to the legal subset of characters. In most cases, this means using the Windows set of legal characters as the set of allowed characters for file names. Or, you could write your own explorer replacement?

There are further restrictions in Windows using the Win32 API – files can’t be named CON, PRN, AUX, CLOCK$, NUL, COM[0-9], LPT[0-9].

Of course, there are a few files that you should not rename. For example, a file named “Icon\r” (yes, there is a carriage-return character in the filename) is used by the Mac Finder to display a custom icon for a file. For me, at least, the short name for “Icon\r” is “I7CIPB~N”, so they are easy to recognize (the short name algorithm is deterministic, I hope).

And finally, copying these short-name-aliased files with a Windows machine tends to mutate filenames, because many Windows clients don’t use the underlying NT APIs, and so end up garbling things (e.g. copying an Icon\r file to an SMB share with Mac using SMB and then duplicating it to a different SMB server using the Windows explorer changes it to a non-hidden file named “Icon”.

And more finally, we won’t even talk about real file names that have characters outside the limited ASCII/ISO-646 range. These tend to be garbled very easily, for reasons that may or many not be obvious to you (read up on encoding).

How do I unlock files (in bulk) on the Mac?

Assuming you mean “Mac Finder lock bit” and not Unix permissions (which are set with chmod), you use chflags. For example, to unlock recursively (and see what you unlocked), you would do

chflags -v -R nouchg <folder>

and all files rooted at <folder> would be examined and unlocked if needed. The -v flag will print a line for each file/folder that gets unlocked.

This is actually a FreeBSD command and changes “file flags”, which on the Mac has been interpreted to be finder flags. In fact, the man page on Mac OS 10.7.4 and the man page in FreeBSD look very similar (perhaps even identical). A slightly more Mac-specific page can be found at http://ss64.com/osx/chflags.html.

 

Parsing

GLR parsing

GLR is the way to go. Really.

Marpa

active as of late 2012. http://jeffreykegler.github.com/Marpa/. Paper: http://cloud.github.com/downloads/jeffreykegler/Marpa-theory/recce.pdf. Google groups: https://groups.google.com/forum/#!forum/marpa-parser

Accent

last update in 2006? http://accent.compilertools.net/

Elkhound

Not really active any more.

Elkhound home page and Google talk that Scott McPeak gave in 2006. Also 2004 paper. http://www.cs.berkeley.edu/~necula/Papers/elkhound_cc04.pdf

DParser

http://dparser.sourceforge.net/

Scannerless GLR

http://oai.cwi.nl/oai/asset/12772/12772A.pdf

ASF+SDF -> Rascal

http://www.meta-environment.org/ and http://www.rascal-mpl.org/. See http://en.wikipedia.org/wiki/ASF%2BSDF_Meta_Environment too.

GLL Parsing

Maybe GLL is the way to go?

GLL Parsing paper by Elizabeth Scott and Adrian Johnstone. Other papers

Stratego/XT and Spoofax.

http://strategoxt.org/ and http://strategoxt.org/Spoofax.

Books

Dick Grune wrote a book on just parsing, named Parsing Techniques/

Articles

http://tratt.net/laurie/tech_articles/articles/parsing_the_solved_problem_that_isnt

Packrat/PEG

Not sure that these are worth studying, I think they are a dead end. Yes, you can compose two PEG grammars together, but a PEG grammar is more limited than LR . This is the canonical page: http://bford.info/packrat/. ANTLR is the most significant parser generator using PEG parsing.

Other

http://dinosaur.compilertools.net/ – home page for Lex and Yacc.

Converge is interesting: http://convergepl.org/

Cling is an interactive C++ interpreter based on LLVM and clang.

Some useful notes on parsing and compilers from Lutz Hamel at University of Rhode Island: CSC 402 – Lecture Notes

Why I prefer LALR parsers – “[elimination of left-recursion] is achieved by converting left-recursion to right-recursion; which will always convert an S-attribute grammar into an L-attribute grammar; and seriously complicate any L-attribute grammar.”

http://code.google.com/p/blacc/wiki/DesignLeftRecursion

 

Handling “path too long” on Windows

This is one of my posts of “study the mistakes of others, so you can learn from them and not repeat them directly or indirectly”.

This is common to see, at least for me

The directory name y:\backup-2012-09-01-2230\Applications\Adobe Acrobat 9 Pro\Adobe Acrobat Pro.app\Contents\Plug-ins\PaperCapture.acroplugin\Contents\Frameworks\OCRLibrary.framework\Versions\A\Frameworks\iDRS.framework\Versions\A\Resources\Asian.framework\Versions\A\.AppleDouble is too long.

(ignore the fact I’m cataloging Mac apps on a Windows share).

The 260 character limit is so deeply embedded in the Windows system that it’s been almost impossible for Microsoft to eradicate it; it persists even into theoretically new frameworks like .NET (mainly because .NET isn’t actually a completely new system, it’s in many cases a thin shim on top of the older Win32 libraries). While there are APIs that let you access paths up to 32767 characters long, there’s enough bits in the system that use the older MAX_PATH limit that you really can’t do anything about it. So while you can use Windows NT-style paths in many cases (paths prefixed with \\?) in your own code, some core APIs in Windows use the old functions (like LoadLibrary, for example).

Note that you can use \\? paths with cmd.exe, BUT these must be drive letter paths, UNC paths don’t work (don’t know why, maybe one needs to use the NT file namespace or volume GUID path?), and it’s useless, because the resultant path is still subject to MAX_PATH limitations. You also can’t set the current working directory to a UNC path (unless you employ a registry hack, see Microsoft KB 156276).

There are a few workarounds I’ve used in the past:

  • using working directories
  • use of subst and/or pushd (auto-subst, don’t forget the matching popd)
  • hard links, junctions, and symbolic links (although these modify the filesystem)

All of these create new virtual hierarchies so that you can read a deeply nested file without needing a path longer than MAX_PATH; the first two can at best double the length, though.

There are some wacky things people have done in code.

Starting in Windows Vista, the shell starts shrinking individual elements in the path until the whole thing fits within MAX_PATH. This is why you can browse a really deep hierarchy in Explorer.exe, and usually copy it or delete it. This, of course, requires that NT short name creation is not disabled.

Of course, the real answer is “stop using Windows, and just use Unix (Linux or Mac OS X)”, but there’s still a lot of programs that run on Windows and not Unix…

Reference