InternetCrackUrl – quirky to the point of defective

InternetCrackUrl is a simple function in WinInet to split a url into its constituent pieces. Should be simple, right? But does it work properly? That’s hard to say.

For example, try “http://” as input. InternetCrackUrl will return false, and GetLastError will return error 12006, which isĀ ERROR_INTERNET_UNRECOGNIZED_SCHEME. However, it does have a valid scheme – this is clearly a http URL. One would think the error codeĀ ERROR_INTERNET_INVALID_URL (12005) would be more appropriate in this case. One wonders what the developer was thinking.

But then try “notvalid://” as input. You’ll actually get back success! Sure, the scheme in the return structure is set to INTERNET_SCHEME_UNKNOWN (-1), but this is not documented anywhere that I can find; it’s something you have to discover, either by being systematic when you’re developing, or by running across this during development or in the field. Based on these two points of data, I assume that in the first case, the code sees “http://” and knows that there must be more to the URI, whereas with “notvalid://” the code doesn’t understand this scheme and so avoids further parsing, because maybe “notvalid://” is a complete legal URL for the notvalid scheme. However, this is probably not a good practical thing to do for library code, because a generic URL form is pretty well defined:

scheme://username:password@domain:port/path?query_string#fragment_id

Or, for example, MSDN has dire warnings about not passing file:// URLs with space in them “because the value returned in the dwUrlPathLength member is too large”. What exactly does this mean? Well, by executing it with a path buffer of 256 bytes, it parses “file:///my path/with spaces/” into a path component “\my path\with spaces\”, and dwUrlPathLength set to 21; the string is 21 characters long; this looks correct. It’s true that file:// URLs are not supposed to have ‘ ‘ characters in them, those characters are supposed to be encoded to %20. It’s also true that InternetCrackUrl can’t do any encoding on the string, or it might be double-encoding. But there’s also no reason why InternetCrackUrl can’t actually function properly, since the restrictions on encoding are because URLs are embedded in some other protocol, and those protocols have restrictions on what characters can be used. Of course, the other problem is that InternetCrackUrl turns my “/” character into “\” characters, because that’s what the convention is – except that “/” paths work perfectly fine on Windows, even back to MS-DOS 2.0. The assumption by the developer of this function is that you would pass the path component to a Windows function, and maybe you would put it in a shell command, so it shouldn’t have forward-slashes.

There are two points I’m trying to get across. The first is for library designers; you want to make stuff work as much as possible, and as naturally as possible. If you’re too permissive, the caller will never be sure what’s going to happen. But if you’re too strict or too legalistic, the caller won’t be guided to the best way to use your library.

But the bigger point I’m trying to get across is that most of the time, we don’t understand the code we are calling, but we act as if we do, and then we’re surprised when things go wrong. You should systematically discover up front, before you proceed, how the code you’re using actually behaves, for every case that you will ever see. If you can rule out cases, that’s fine, but if you can’t rule them out, you need to know how they will behave.

This is what you need in order to write “practically perfect code”; full understanding. But don’t let this trick you into replacing all library code; if you can understand it fully, it’s always better to use someone else’s code than to write it yourself.

Well, except for InternetCrackUrl.