Programming languages

Published 2019-10-31 on A Heroic Pixel

Programming languages are weird, without exception. Each language is odd in its own way, but all languages can quite easily be categorized. C is “low-level” and a “systems language,” C++ is “high-level” and full of “meta-programming” and “zero-overhead abstractions”, JavaScript is “garbage collected”, “interpeted”, and so on.

Once in a while, I like to try new languages and see if their particular bundle of oddities has any appeal to me. This is largely rooted in the fact that two of the first languages I learned were JavaScript and C++. I didn’t really have many plans in mind - I was, I believe, around ten years old at the time. I was fascinated at how a bunch of tiny physical objects could work together to produce games, the internet, word processors, and everything else I loved to play around with at the time. My biggest dream was to fully understand how it all worked. How does moving a weird plastic box we call a “mouse” cause a pointer to move on a screen? How does it know when I’m clicking? How can amazing 3D worlds be rendered on tiny handheld screens?

In that regards, I deeply regret starting with C++ and JavaScript. I don’t like to get involved in the “language wars;” I think they’re juvenile and pointless. That said, languages tend to have different strengths and weaknesses. Javascript is easy to learn, which is why it was quickly recommended to me. That said, both JS and C++ are nearly impossible to master, for a number of reasons. C++’s templates alone would require a very long amount of time to even come close to fully understanding. JS has a lot of “features” that are often affectionately referred to as wat.

js> 1 + "1"
"11"
js> 1 - "1"
0

While many of these were deliberate decisions which make sense in context, there are still many “footguns” present throughout these languages. The irony is, JS was designed to be completely accessible to newcomers to programming, and yet many of the features that are meant to make life easier just make things more difficult.

After learning JS, I eventually moved elsewhere. Processing.org, a “language” built over Java and meant more for artists than engineers, was one of the next languages I learned. It introduced me to concepts like OOP and inheritance, and got me further interested in the underlying mechanisms. How did functions like rect actually draw to the screen? How does the computer turn words like class and interface and extends and turn that into a program? That search for understanding brought me to C++.

C++ was next, and became the language I used the most for at least three or four years (I’m a tad ashamed to admit I even tried writing an x86 kernel in C++ at one point). Now, I haven’t used C++ in a serious project in approximately half a year. That was when I first tried to really use C (in spite of the people who insisted there was no point and that I should just stick to using C++ when I first got curious). Since then, I’ve actually completely rewritten some of my projects to use C instead for a number of reasons (which are worthy of their own complete post). I honestly don’t expect to choose C++ over C for any project again. C99 has proven itself to be much more useful for my general projects.

Over the past eight years, I’ve learned a number of languages. I’ve dabbled with C++, PHP, Java/Processing, Lua, JavaScript/Node.JS, C, variants of BASIC (don’t ask), Assembly (Z80, x86/x64 Intel and AT&T variants, 6502), and more that I’m probably forgetting. This isn’t me bragging; I definitely haven’t mastered most of these languages, and some of them I never plan to (if I never touch JS or PHP again, it’ll be too soon). Some of them I still play around with (Java), some I use on a regular basic (C, Lua, Assembly).

The reason I bring this up is to explain why I make sure to try a new language every now and then. Learning new languages has helped me get better at using the languages I already know, and has helped me further my understanding of the underlying concepts. Last year, I touched Python for the first time while doing some work on the KnightOS SDK. Despite having no preexisting knowledge of the language, I was able to quickly figure out the syntax and finish the task.

Recently, I decided to try the Zig programming language. I’d been meaning to try another language for about a month or two, and was considering Rust or Go when I stumbled upon Zig. I don’t entirely remember who directed me towards it, but to whoever it was, thank you!

Zig is one of those ambitious languages that seeks to compete against C, instead of being written in it. At the moment, its compiler is written in C++, but the self-hosting compiler is under development, and the ultimate goal is to maintain the C++ compiler only as much as is needed to build the stage2 compiler.

I first tried Zig with low expectations. I mean, it was trying to compete against C. I honestly didn’t expect to play around with it for more than a day or two. Now, it’s a week later, and I’ve started working on a kernel in Zig, which I intend to continue working on for at least the next few months. I’m honestly planning on getting involved with the language itself when I get a chance.

The thing is, Zig is not what I expected. It falls into a number of categories - it’s a “simple,” “low-level” “systems language” with a “C-like syntax” and “manual memory management” However, it also falls into completely opposing categories: it’s a “high-level” with “runtime protections” against “detectable illegal behavior” (what C calls “undefined behavior” - the name is used to indicate that the behavior is defined: without the equivalent of -O3, it’s a runtime panic); it’s “strongly typed” with “compile time reflection” and general “compile time code execution” (the single most powerful compile time functionality I’ve ever seen, and that includes C++’s metaprogramming).

Its syntax is very similar to that of C. The language is very simple. It provides nice, high-level abstractions and the most expansive form of compile time execution I’ve ever seen. It has a neat form of generics, concurrency that doesn’t suck, good error handling, and more. It also has manual memory management, amazing C interoperability (no need for FFI or bindings), and better performance than C (no, seriously). In spite of all that, it’s still quite possibly the simplest language I’ve ever used. Instead of trying to push in every feature imaginable (cough C++ cough), it focuses on simplicity. As a new language, it is changing rapidly (so it’s not quite as stable as C, as even the release notes for Zig 0.5 make clear). Zig definitely needs some more work, but it’s already extremely impressive.

First off, the syntax: as I mentioned earlier, it’s very similar to C. For instance, the C function

static int32_t add(int32_t a, int32_t b) {
	return a + b;
}

is in Zig

fn add(a: i32, b: i32)  i32{
	return a + b;
}

One interesting point there is that parameters have their names specified before their types, with a colon separating them. Similarly, variables and constants are declared using

const name: type = value;
var a: u8 = 210;
// If no type is specified, one is inferred - and yes, single-line comments are
// the same as they are in C
const b = a + 2;

Also, the C function there is declared static - in Zig, the default visibility for functions is local - to export a function (for instance, if you’re designing a library), you use the aptly named export keyword, which is equivalent to an unadorned C function.

One intriguing feature of Zig is that you can specify arbitrary integer sizes. Variables can have type u2, i20, etc. One practical application of this also lies in another key difference between C and Zig: in Zig, integer wrapping is undefined behavior for both signed and unsigned integers, while in C, it’s only UB for signed integers. In addition to the performance implications (unsigned addition is noticeably more efficient under Zig), it also means that the following code will throw a compiler error:

const a = 3; 
const b: u2 = a + 1; 
> cat a.zig
const a = 3;
const b: u2 = a + 1;
> zig build-exe a.zig
/home/noam/t/a.zig:2:17: error: integer value 4 cannot be implicitly casted to type 'u2'
const b: u2 = a + 1;

Here are some of my favorite things about Zig so far:

Zig has amazing support for freestanding code - see my LIMNOS project. * I was able to modify the standard library so that the standard code for opening files worked with /dev/stderr and /dev/ttyS0. To put it more simply, the equivalent of

FILE *standard_error = fopen("/dev/stderr");
if (standard_error != NULL) {
	fwrite("Hello!\n", 1, 7, standard_error);
	fflush(standard_error);
	fclose(standard_error);
}

dumps “Hello!” straight to the screen. Furthermore, this works with Zig’s standard library’s formatting, so I can basically printf() over VGA and COM1. Major parts of Zig’s standard library work without change. Even better, upon suggestion from Andrew K - the Zig maintainer - I’ve begun work to make the standard library integrate flawlessly with freestanding code without the heavy burden of maintaining a fork.

Cons: None. I did have to submit a few patches to the Zig compiler to get everything working, for instance disabling SSE in freestanding mode (and a big thanks to Andrew for then following up and doing the same for C code compiled using Zig), but those no longer affect me and should never be necessary for anyone else again.

Some pieces of code which, in a C kernel, would have to be in raw assembly, can be written straight in Zig - and even better, thanks to the awesome compile-time functionality, a lot of boilerplate can be removed. For instance,

// Set up 16KB for the stack, in the .bss section (set aside by bootloader),
// with a 128-bit alignment. `= undefined` means "don't bother initializing it."
var stack: [16 * 1024]u8 align(16) linksection(".bss") = undefined;

// Explicitly don't return - if this function can return, it's a compiler error
export nakedcc fn start() noreturn {
// Switch to the stack set aside earlier, and call kmain
    @newStackCall(stack[0..], kmain);
}

Thanks to mq32 for pointing out that the interrupt cleanup was possible!

Good backtracing, which works in freestanding code. Zig’s standard library includes decoding of the DWARF format (the debugging info used in ELF files), so a small tweak to the linker script and a custom panic function results in ridiculous debugging for a toy OS.

This is a screenshot from the kernel running under QEMU: Stack Trace Example from QEMU

“Cross-compilation is a first class use case,” as the docs claim. For me right now, this means that I can compile this kernel under any supported OS - Windows, MacOS, Linux - with ease. There’s no need to set up a cross-compiler,

Literally just adding -target i386-freestanding to the compilation command line is all that’s needed.

Cons: while the Zig compiler is big, it’s not nearly as big as one might expect, given the sheer number of targets that are supported. Most of the heavy lifting is done by LLVM, so the Zig compiler itself is relatively small.

Zig has, as touched upon earlier, simply insane compile-time functionality. This is another feature worthy of a post of its own. Fortunately, someone else already covered it

Zig is faster than C, while still being safer. Drew Devault mentioned on his blog towards the end of March that “[Rust] solves problems by adding more language features… C solves problems by writing more C code.” This is generally true of Zig as well - the language itself is not as safe as Rust, and is not intended to be; instead, it provides simplicity.

It’s worth noting that Zig doesn’t quite meet every point that Drew brings up. Zig’s target list is identical to Rust’s since they both use LLVM, but again, it’s not quite as portable as C. Zig doesn’t have a spec, but its documentation is quickly catching up (the standard library’s documentation was released last week!). Zig currently only has one implementation, though the stage2 compiler (a Zig compiler written in Zig) is underway. However, on just about every point, Zig is either as good as or better than C.

Zig contains built-in testing. This has come in very handy, especially on the kernel. I was able to test natively that kernel code would be correct. No need to try to embed a simplistic testing framework to run within the kernel; any test that doesn’t depend on runtime values can be run natively.

Zig doesn’t have NULL pointers, but replaces them with something better. Normal pointers in Zig cannot be null.

/home/noam/t/a.zig:2:21: error: expected type '*i32', found '(null)'
    const a: *i32 = null;

Instead, Zig introduces optional types, which are good for more than just pointers. By prefixing a type name with a question mark, a type becomes optional, which means it can either hold a value or null. This has a number of advantages over existing methods.

Exceptional error handling (pun intended). Zig’s error handling feels like - at least to me - what exceptions (in e.g. C++/Java) were supposed to be. Zig has error types, which are basically enumerations of what can go wrong in a function. Any function that returns an error needs to specify it in the return type using what’s called an error union, which is basically a hybrid type. Unlike in C++, there’s no hidden control flow; if you call a function a();, it cannot magically jump out of the function call into an exception handler. Zig’s error handling is very, very powerful, but importantly, it’s also very simple. One of the advantages Zig provides is that “passing along an error is the laziest thing a programmer can do.”

The defer keyword. One of the biggest sources of memory leaks in C programs is caused by the programmer simply forgetting to free memory. Maybe there are complex paths, and the programmer thinks all allocated memory is freed in any given path but they missed one. No matter how careful you are, you will mess up eventually. That’s the premise behind smart pointers, RAII, and similar features. Zig’s method blows all of those out of the water. Imagine for a second that the following C code was valid:

int calculate_the_thing() {
	int *i = malloc(sizeof(int));
	int *j = malloc(sizeof(int) * 4);
	defer free(i);
	defer free(j);
	if (i && j) {
		if (a) {
			get_blob(j);
			if (j[2]) {
				// Even more convoluted paths...
			}
			else {
				return 18;
			}
		}
		else {
			return 3;
		}
	}
}

Notice the lack of a free before the return statements: that’s the power of Zig’s defer keyword. No matter what path is taken out of the function, free(i) and free(j) will be invoked right before the function returns. This feature is extremely useful.

Given all of these advantages, what’s the catch? Every language has flaws, as I mentioned earlier. Well, here’s a short list of things that have either bugged me or that I can easily see bugging others:

Zig contains compiler-enforced styling. In order to stop the many lengthy discussions on “tabs vs spaces”, it was arbitrarily decided that only spaces would be allowed, with the sole purpose of getting people to shut up about freaking indents. (It also enforced Unix-style newlines, but that’s probably far less of a concern). Positive side: you don’t have to do anything in your editor or switch from tabs to spaces, the compiler comes with a lightning fast fmt (format) command that will take care of it for you (plus, for any editor that automatically uses the formatting already present in the document being edited, that’s not even needed).

Zig’s memory allocation is a complete paradigm shift - unlike in C, the standard library does not provide any malloc or free function; rather, it does not provide a single malloc/free function, instead providing a number of different allocators to choose from depending on the situation. Personally, I love this, but I can see this being a nuisance to others. Any standard library function that needs to allocate memory takes an allocator parameter. The upside of this is that it’s rather trivial to provide a custom allocator, and a number of useful allocators are provided in the standard library.

Zig does not allow multiline comments, only single line comments. On a related note, triple-slash comments (///) are considered DocComments by the compiler, and are as such considered an error in invalid places. This is in some ways an advantage though: as a result, Zig has line-independent tokenization. It is trivial to tokenize a line of Zig entirely out of context.

Identifier shadowing is forbidden. This is another bold decision that I personally am a fan of that others might not like. In C, this is perfectly valid:

int a = 3;
void do_the_thing() {
	char a = 'h';
}

(Sorry, I’m bad at coming up with quick examples of random code).

In Zig, the equivalent code is completely invalid: two identifiers cannot share the same name within any given scope. However, the following is perfectly fine:

fn do_the_thing() void {
	var a: i32 = 3;
}

fn blob() void {
	const b = packed struct {
		a: u32,
		c: i8
	};
	const a = b{
		.a = 0,
		.c = 1
	};
}

The best part of Zig is that through all of the many features it provides, it’s still an amazingly simple language. In the same blog post I mentioned earlier, Drew finishes with “[C’s] replacement will be simpler [than C], not more complex.” I don’t know if Zig is that language, but it definitely has a lot of potential, and I look forward to seeing how this project continues.

Thanks to Drew Devault and Andrew Kelley for helping me revise this!


Articles from blogs I follow around the net

Cleanup interrupts (move from boatload of inline asm to comptime loop)

via ~pixelherodev/LIMNOS log October 30, 2019

An old-school shell hack on a line printer

It’s been too long since I last did a good hack, for no practical reason other than great hack value. In my case, these often amount to a nostalgia for an age of computing I wasn’t present for. In a recent bid to capture more of this nostalgia, I recently pi…

via Drew DeVault's Blog October 30, 2019

Working full-time on open-source software

I’ll soon be working full-time on open-source software! I’m pleased to announce that I’m joining Sourcehut. Huge thanks to Drew DeVault for making this possible. I also want to thank everyone supporting Sourcehut and allowing it to grow. Being able to do …

via emersion October 15, 2019

Generated by openring