Teaching with Quines

I’ve decided to try to (laboriously) work through Hacker’s Delight by Henry S. Warren, Jr. The book’s preface cites a quine by Vlad Taeerov and Rashit Fakhreyev. Here it is:

main(a){printf(a,34,a="main(a){printf(a,34,%c%s%c,34);}", 34);}

Though the general recursive structure is obvious, this 64-character program introduces a host of C weirdness. I gave an aborted five-minute version of this for an interview to TF (“teaching fellow”) CS50 at Yale next semester. Here are the things I wish I’d gotten to finish exploring on that occasion. As befits my own (in)experience, this will assume very little prior knowledge of C.

One liners

A quine’s output must replicate its source code exactly. If we broke up the program with newlines, we’d need to cleverly insert \n escape sequences in order to preserve this property. But since whitespace does NOT have any syntactic meaning in C, this (and indeed all) C programs can be written as one liners. N.B. that when dealing with quines, good programming style goes out the window.

Entry points

A valid C program must have exactly one main() function. After preprocessing, compiling, and linking, this function is what gets called first by the executable file. Usually you will see int main(int argc, char *argv[]). But even though supplying a type signature and passing in the command line arguments is good programming, you don’t need to do either.

What surprised me is that text other than void did not throw a compiler error. Turns out, you can put any kind of garbage inside the parentheses and it will default to a (char *). Try it:

#include <stdio.h>
	 printf(garbage, garbage="Hello, world!\n");

We can smush a string into garbage and C is fine with it. This raises an interesting point about assignment within printf(). It works fine with a single string (a gets assigned to well, itself), but the C standard warns that you will get undefined behavior if you, for example, try to assign more than one string to a in a printf() call. Or do multiple increment operations on an int, say.

ASCII mapping

What are all those 34’s doing in the quine? Well, the ASCII encoding maps the number 34 to the quotation mark character. Since strings are parsed by quotation marks in C, the quine will need to somehow generate an extra set of quotes for when it replicates at runtime. The 34’s match with the %c format specifiers in the inner call to printf.

What happens is that the first a in the first printf() gets assigned the inner printf string and its format specifiers then grab the three remaining arguments. Bizarrely, the third argument, which matches the %s, is that inner string itself. The 34’s ensure that when the statement prints, the inner part is within quotation parts.

More to come soon! :eyeglasses:

17 Mar 2015