Kernigan and Rust

https://developers.slashdot.org/story/25/08/30/044226/what-happened-when-unix-co-creator-brian-kernighan-tried-rust

It is a peculiar experience to know more and have a better understating that the CS celebrity, whose book you have read as a teenager.

Just like I wrote before, C was a PHP of the 80s, if you will – a bunch of clever hacks (while PHP has nothing clever and was a Fractal Of Bad Design by an unqualified and ignorant amateurs) which allowed one to quickly “get shit done” in the way that wasn’t possible before, which is the reason behind its popularity. It also appeared to be way more “cool”, “practical” and clever than “theoretical” Algol or PL/1.

While some hacks were, indeed, clever (at the time), and the “pragmatic” aspect cannot be denied, the overall design was by no means consistent, systematic or even well-thought, unlike, lets say CLU, or ML, which were “Out There”. The mature LISP implementations were also completely ignored.

Let’s try to see things as they [really] are.

C Is an ultimate low-level language, while it is a “high-level” over assembly and machine-level data representations. When one programs in C one is forced to think and model in terms of representations and implementation details, just like in an assembly language. Yes, C generalized a lot – the imperative control flow constructs and the primitive types, notably arrays and machine-level array-based ASCII strings, but all this was, again, in terms of the details of the underlying hardware and data representations.

The really big and fundamental idea of truly high-level programming with Abstract Data Types and Data Abstractions (Liskov, CLU) and Abstract Control Flow – composeable abstract interfaces (ML, LISP) were completely ignored.

This results in a poor support for abstraction and non-leaking ADTs in C, with focus on how everything maps to an underlying hardware representations (how the crappy enums, instead of proper algebraic sum-types, map to integers, how the structs map to memory layouts, etc).

The “pragmatic” C programmers, who are not aware of the proper theoretical FP concepts, derived from advanced abstract algebra – the very way in which these generalized algebraic structures has been captured and defined – are forced to think in terms of these low-level details and representations, which is a leakage of abstraction and a violation of the abstraction principle itself. Just like webshit coders were unaware of what a fucking abomination that PHP 2.0 was.

Again, this is not like some option, you know. This is fundamental flaw of the language design – no proper support for proper abstraction, in principle. No way to program in a high-level way, in terms of abstract data types and abstract control flow, even if one wishes to – everything leaks everything else, literally.

The last 60 years of programming language semantics research, however, concluded that the only way to properly support abstraction is through a systematic and consistent design of the language semantics, which is what modern languages like ML, Haskell, Scala, Rust, etc do.

The most important principle is that one has to think, model and program at the same higher level of abstraction as the concepts of the problem domain, and not in terms of the conceptually irrelevant underlying implementation details and machine representations. Ideally, there must be a one-to-one mapping between the concepts of the problem domain and the data abstractions defined in terms of a programming language semantics. “Data dominates”, you know.

Not just that, but it was well-understood that the Algebraic Data Types – sum-types, product types and function types is all you need, and that these has to be truly abstract, parameterized and composeable, to be able to model any complex data abstractions in a systematic way. This is what ML, Haskell, Scala, etc do.

The last real major innovation was the formulation of Type-classes by Wadler, which is just a very clever formalization of the universal mathematical notion of such that at the level of abstract interfaces – proper bounds on sets of abstract interfaces. Notice that “abstract” here is absolutely necessary. Notice that all the “individual components” of the type-classes formulation are just well-understood rigorous mathematical notions, such as Equality or an Ordering.

Even without these, the use of just proper Algebraic Data Types alone (and the “new-types”) would lead to an enlightening experience as in “Where Is The Code”, of Scott Wlaschin. This is not something random or arbitrary in it.

One more time – all the errors, complexity, verbosity and imperative “unsafety” arise from mixing the irrelevant levels of abstraction – high level concepts of the problem domain, with low-level language and implementation details and representations. This is the source of Java’s retarded verbosity, C’s inherent unsafety, C++’s explosive complexity, etc. Staying abstract and high-level, while maintaining the one-to-one correspondence between the hierarchy of the layers of complexity in the problem domain concepts and the modular, layered stricture of the hierarchy of abstractions in terms of a programming language, at the same level of abstraction, is not optional, but is a fundamental requirement for any non-trivial software system.

Yes, here I am basically paraphrasing the Barbara Liskov’s book, but it does not make these principles less fundamental and less ignored by C designers. Kernigan embarrased himself as an ignorant old fool (ignorant of all the fundamental results of the last 60 years of programming language semantics research).

This is not the end of the story. The proper understanding of “what Rust did right” and why it is the right way (and is the only way, which cannot be refuted) is the next level up.

Lets face some reality for a change. Strings are no longer ASCIIZ. Period. The type “char” itself and the assumption that 1 char = 1 byte is no longer valid. This breaks the “C strings” and all the clever imperative looping hacks and indexing tricks around them. It is gone, just accept the reality as it is. The way more complicated proper unicode strings require proper abstraction barriers and proper, non-leaking abstractions (high level libraries based on ADTs) but C lacks the required proper semantics and even necessary typing support – both C and C++ have primitive, ad-hoc unsound type system.

Processes are no longer totally isolated. Enforced shared mutability destroys all the naive assumptions about the validity of the data in an imperative memory-locations overwriting (destructively updating) language.

Any process can be “swapped out” between any two imperative instructions (by the kernel) and god knows what will happen before it resumes, so all the in imperative code flows assumptions are broken – by the time the very next imperative command will be executed the data could be invalidated in many ways.

An OS can move your data in memory at a whim, invalidating your addresses and any address arithmetic, but not the offsets, which are the proper notion of a reference within a more realistic and RAII-like restricted memory model.

As the direct consequences of the above, all structs with malloc‘ed addresses in them (not the proper “abstract” offsets) are fucked up in principle. It is when, not if potential but inevitable UB.

In short, the simplistic “C model” is totally broken, so all its assumptions are no longer valid, and all the clever hacks, like imperative looping and indexing, the char and string “abstractions” are flawed and, indeed, unsafe. It is that simple.

Notice that imperative, memory-location based and OO-meme’d C++ cannot solve the “invalidation problem” – unrestricted (by the type system) arbitrary mutation of arbitrary (unrestricted) shared stateful “objects” (potentially containing addresses of other mutable data and a “hidden state”) constitute the very same intractable (unsolvable in principle) problem, even with the RAII meme (but unsound and permissive type system). This is a well-understood fundamental finding, which goes all the way back to Joe Armstrong and Erlang research, and even to the foundations of the classic ML – mutation and concurrency cannot safely co-exist in principle.

Yes, yes, we have a whole Windows 11, with billions of lines of imperative OO C++ crap, but once in a while… and everyone gets properly fucked.

Notice also that no such problem ever arise in the properly designed languages (by the actual math and occasionally physics majors) including the runtimes, with the principal goal to maintain the fundamental Referential Transparency property, from which all the necessary semantic properties simply follow (which is not a random coincidence, of course). And, again, given that, the proper Algebraic Types, packaged into non-leaking ADTs with modules that establish and enforce clear, non-leaking, impenetrable (as in a Monad) abstraction barriers, are enough for everything.

This, by the way, is as valid as the Set Theory.

And for C, everything which has been, indeed, very cool and clever in the late 70s – the “generalized” imperative looping and indexing constructs to manipulate weakly-typed memory locations and “ASCIIZ C strings” – with all the simplistic hardware of the time, are no longer relevant or even usable. Just try to comprehend these simple statements of fact.

As long as the underlying hardware and OS “guarantees” and representations are no longer outstanding, everything collapses and keeping using an inherently, by design low-level language for a high-level ADT-based programming (for which is its not suitable at all and was never designed for) is just plain and dangerous stupidity, fueled by the Sunken Costs Fallacy and simply refusal to see things as they are to avoid the pain of the Cognitive Dissonance (with What Is).

So, what Rust did “just right”

the move semantics by default (one of its major innovations, which prevent nasty and subtle aliasing bugs which even Java has infested with).
formalizing and restricting the semantics of references as only offsets, to cope with possibility of moving the data to a different memory location.
establishing a clear distinction at the type-level between mutable reference (modeled as FP ref types) and immutable references (pointers as offsets, not addresses)
enforcing the necessary constraint of either at most one mutable reference or any number of read-only “references” at a compile-time (at the AST level)
adding traits, which are misunderstood Wadler’s type-classes and building the standard library on top of them (another major innovation – necessary bounds on abstract interfaces).
consistent use of “fat pointers” instead of “ASCIIZ C strings”, and str as a proper built-in abstraction for C compatibility.
emphasis on using the iterator abstraction (B. Liskov, again) and other standardized method and abstract interface’s chaining or composition.
consistent use of “slightly crippled” sum types (tagged disjoint unions), with parameterization and distinct impl blocks.
the impl blocks, which are the way of reducing cognitive load by clearly separating the implementation details from (more-or-less) abstract interfaces.

And the most important and fundamental result is that combination of the stricter type-constraints (of at most one mutable reference) and of semantic invalidation of moved from variables (enforced by so-called “borrow checker”) yield the same properties as are necessary for a proper thread safety, and when mut (and mutable references) are systematically avoided, even an ad-hoc referential transparency [property].

The list is by no means exhaustive, but even these alone are the major, fundamental advances from C or C++ or any other Algol-style memory-location based imperative languages. This is what Kernigan missed out and is ignorant of.

Together, these complementing each other decisions at the level of the language semantics allows one to define and implement data abstractions out of proper fundamental building blocks (slightly crippled Algebraic Data Types) and to stay high-level, so the one-to-one correspondence betwen the layers of abstractions (and the necessary abstraction barriers) of the domain and code will be established and maintained and the code will be less cluttered with the irrelevant low level implementation and representation details, just as per the fundamental Abstraction Principle of Barbara Liskov. Now you know this too.