Via: CableTV.com
Monthly Archives: December 2012
Novel printf debugging technique
I think it’s a bit nasty, but it’s interesting – print to an invalid file descriptor, and use a trace tool that shows such printing.
It’s pretty ghetto.
Martin Majoor’s type philosophy
http://www.martinmajoor.com/6_my_philosophy.html
libabc – guidelines for library writers
This is worth looking at, not for specifics (very tied to its context), but for lessons and guidelines
http://0pointer.de/blog/projects/libabc.html
bup
https://github.com/apenwarr/bup/tree/master/lib
Web design for programmers
How to Make Your Site Look Half-Decent in Half an Hour
Summary
- Use Bootstrap (grid system), maybe buy a custom theme.
- Use better fonts like Google Web Fonts and pair them properly.
- Add textures and icons
- Add CSS and color themes
- Read more tutorials
Still, a decent write-up.
ATerm – term manipulation library
ATerms for manipulation and exchange of sharing data: it’s all about sharing
ATerms used in the Meta-Environment (ASF+SDF)
This is a library available in C/C++ and Java.
Software deployment
Well worth reading
The Purely Functional Software Deployment Model
It’s in English, honest. This is a 2006 Ph.D. thesis by Eelco Dolstra, and his thesis advisor was Eelco Visser (who’s written lots of papers I have found interesting or relevant). Eelco Dolstra’s home page has links to his other publications.
Knit, component definition and linking
Good paper
Knit: Component Composition for Systems Software
I want something better than what is currently available, at least to people writing system software or “shrinkwrap” desktop software. The problems with libraries and object files are numerous. Some of the things I’ve struggled with:
I want to override something in a library. For example, and for good reasons, I want to replace the global new function in C++ with myown function; there is no way to override this in a sane and error-free fashion. There are some tricks that work, but they are tricks. But this is true for any function. I really do want to replace malloc, because the ones given to me are crap.
Get out of my namespace! Namespaces are global, and namespace pollution becomes an ever bigger problem as programs get bigger.
Ten versions of Zlib. There are many libraries that link to specific versions of other libraries. When you compose all those together into one program, likely results include at the least bloat, and more often broken behavior, because library revisions change interfaces or add or remove bugs.
Configuration is a nightmare. I have to physically locate specific versions of libraries, which means either building them myself, or revisiting builds as I add libraries or change versions of libraries.
Library boundaries are very granular, meaning significant fractions of the code are involved with building up and tearing down passed parameters.
I want to insert stuff in the middle. If library A calls library C, but I want it to call library B which calls library C, I can’t do that. You could design library A to allow it, but then it’s slower and more complicated to use, especially when it’s a minority use case.
Grammars for arbitrary bitstrings
Note: this was an older direction that I might return to.
Grammars are defined on a string, but that “string” is usually a subset of all possible strings; e.g. it is usually a string of “human-readable characters”. This bias is due to the fact that grammars and parsing started with languages like English and Fortran. Instead, what we want is a way to write a precise grammar for any language, not just languages expressed as human-readable characters. Also, we want grammars, because they can both be used as recognizers and generators, and potentially recognize/parse files in linear time.
To avoid encoding shortcuts (and thus limiting the expressiveness of our grammars), we want to define grammars on bitstrings. Thus, we only need two terminals, 0 and 1. We’ll build up some common primitives, but they are for convenience only.
However, it’s easy to define grammars as being on bitstrings, albeit a little more cumbersome. Our alphabet is just the two 1-bit values 0 and 1. We’re going to start with the syntax of ABNF, as clumsy as that is (although we’ll abandon it soon, since we’ll want the freedom to declare what a character in a string actually means). We might define a 1-byte value and a 4-byte value like this:
OCTET = 8*8(0|1) DWORD = 4*4OCTET
Note that in neither definition do we require that an OCTET or DWORD starts on an 8-bit bounds. Padding has to be explicit in our grammars.
But right off the bat, we have issues with bit-endianness and byte-endianness once we want to interpret values (which we need to do in order to make an actual language). Yes, an OCTET is 8 consecutive bits from the string, but if you want to reason about it as a value in the range 0..255, you would need to decide if the subset bitstring is big-endian (most significant bit is leftmost in the bitstring) or little-endian. And likewise, a DWORD is 4 consecutive OCTETs, but are those organized in a big-endian fashion or little-endian fashion.
We could tediously construct our values (with semantic actions)
b0 = (0|1) b1 = (0|1) b2 = (0|1) b3 = (0|1) b4 = (0|1) b5 = (0|1) b6 = (0|1) b7 = (0|1) OCTECT_BE = b7 b6 b5 b4 b3 b2 b1 b0 { $$ = ($1 << 7) | ... ($8); }
Or something like that. It’s likely that we will quickly fall back on some predefined primitives, but it’s nice to know that we can define all the primitives ourselves. We might even want to steal the “little” and “big” prefixes from DataScript, so we might simply have
. = (0|1) OCTET = big ........
and then the nonterminal’s value is converted as a big-endian bitstring.
Another challenge is that grammars involve a left-to-right parse, but many files’ interpretation involves moving about in the bitstring. A sequence of bits early on might be an offset to a different section in the bitstring that needs to be understood before more bits can be parsed.
Appendix
Why not ASN.1? Because it’s not suitable. It’s complex, and yet incomplete. It separates abstract notation from encoding, which we don’t want (we want to keep them together), and it can’t handle random-access formats.
Why not BFF (“A Grammar for Binary File Formats”)? It limits itself to a bytestring, not a bitstring.
Something intriguing: Synalyze It!. (sigh, people who put exclamation marks in names). In looking at the “grammar” for ZIP files, it falls back to a fair amount of LUA. It’s really a decompiler, not a grammar; I’m not seeing how this can be used to generate strings (e.g. legal ZIP files). But still, it’s interesting.
Another interesting commercial candidate is 010 Editor, with the brag “Edit Anything”. It’s an interesting approach, use C structs with embedded semantic actions. Again, not a grammar, not generative. It’s really a specialized language for parsing binary data.
Links
Parsing of Context Sensitive Grammars
Earley Parsing for Context-Sensitive Grammars
Sample Parsers from Parsing Techniques
The Convergence of Mildly Context-Sensitive Grammar Formalisms
Genetic programming with context-sensitive grammars
A new method for dependent parsing (Yakker)
xtc (eXTensible Compiler) project.
- SuperC: Parsing All of C by Taming the Preprocessor
- The SuperC Technical Manual
- Better Extensibility through Modular Syntax
PacketTypes: Abstract Specification of Network Protocol Messages