Programming Before AI

I have to rush to finish my series before AI will completely enshittificate the written knowledge and the classic discipline of programming, which is based on understanding and mathematical rigour.

Yes, I seemingly overuse this word beyond the limits of a good style, but I like it, how it is awkward and ugly and yet perfectly captures what a generative AI is doing to a writtern knowledge itself.

So, let’s try to build it all top-down, for a change. Bottom up is probably the better way, but it takes so long time before one begins to see the whole picture, which is, of course, the one and the same mountain (or an elephant) viewed from different angles and perspectives.

In the golden age of programming, the ancient sages were discovered the LISP. It has unique properly that its uniform syntactic forms (including the special forms) are represted in terms of the LIST structures (there is a special term for that which I forgot) which means that it is not only a programming language, but also a meta-language for (and of) itself, which intuitively, connects it to molecular biology, where somehow similar, everything is “defined” in terms of just 20 aminoacids.

But this is not the only non-coincidental connection. The uniform prefix-notation used in LISP structurally corresponds to an AST, which is not merely a tree, but an acyclic graph of a program, which has all the universal “primitive” structural patterns or “properties” – a “step”, a “fork”, and a “join” (yes, yes, a composition, a conditional and a curried application). There is also recursion, which has a “higher-level” spiral-like complex shape.

It turns out, that these “structural elements” are enough for everything – the same underlying “building blocks” which are used to express profs in logic, expressions in functional programming, and even any algorithm using the structural algorithm charting techniques of the 70s (and electrical schematics in the EE field).

The question is “why?” Why these “primitive” structures are enough for everything? Why they are universal?

Well, it has something to do with the proper (and the only) philosophy, which has the only one question to consider – “What Is?”. Most of degens when they hear the world “philosophy” think about some abstract and useless “metaphysical” bullshit and hand-waving, but the real philosophy is about the “reality” and “What Is”, which is not a subjective opinion, but an objective fact, which sometimes can be expressed in terms of the “universal” language of mathematics, and sometimes can only be grasped as an intuitive (but not wrong) understanding, which is what we are doing here.

There are some universal fundamental notions, which has been captured in an unambiguous human language and studied in what we call “logic”. There is something called Modus Ponens, which is a universal rule of inference that, among other things, intuitively captured universal the notion of an “necessary and sufficient condition”, which underlay everything, not just biology of Life Itself, but the “chemistry"and “physics” as its “runtime”.

Again, no one knew anything about physics, chemistry and biology (which turn out to be layere DSLs on top of each other), back then, and the only tools they had were observations, capturing of commonalities and similar appearances, generalization, abstraction and intuition.

Somehow unsurprisingly, the simple typed lambda calculus, which is enough (“necessary and sufficient”) to compute any commutable function, are based on the captured and generalized notions of an “abstraction” and “application” (and “bindings” and nested “scopes”).

Do you know that every single enzyme produced by all biology is for its “use”. So are every single abstraction (in terms of the lambda calculus) has been introduced for its eventual application. Unsurprisingly, there is something called Gentzen’s Introduction and Elimination Rules (of Natural Deduction), which accidentally captured the proper generalizations of the above. It seems to be a yet another universal principle (at least in Biology), that something “produced” is for being eventually “used” (eliminated or “consumed”).

This, abstract notion (that the logical rules come in pairs), of course, goes way deeper. Every valid rule captures some aspect of What Is, and since it actually Is, it can be used somehow, just like a proper definitions can be used within a proof. If you want to dive into metaphysics, which I don’t, think that for every “plus” there is a corresponding “minus”, otherwise the dichotomy simply would not exist.

Molecular biology has something very similar – an enzyme, which is a particular structure (a particularly “folded” into a unique “shape” or a “form” sequence of amino acids), which perform a certain “functions” when all the necessary and sufficient conditions are met. Not dissimilar to sequences of LISP structures (CONSes) which from particular lambda expressions, parameterized by the “arguments”.

And composition of these lambda expressions form distinct “pathways”, something quite similar to biological “pathways”, which control everything in the “higher-level” systems biology (this is not just an empty analogy – the pathways use a form of “structural message passing”, and the “messages” are just particular molecular structures produced by enzymes, when the necessary conditions are meet, and the conditions, in turn, can be the presence of another molecular structure in some of its particular “slot”, so the “chaining” (of reactions) occurs).

Yes, this “crazy back-and-forth” is intentional. Yes, nothing here is incidental or arbitrary. What the different “blind philosophers” have been captured are the one and the same elephant.

The abstract algebra that we are trying to discover, which underlay logic, mathematical proofs, functional programming, sequential presses, and even partially captures the Causality itself, and its “dual” Potentiality, is one and the same (just because the universe is the one and the same unfolding process, and thus contradictions cannot exist in principle). This is what it is all about.

Again, this isn’t just an analogy; it’s a deep structural equivalence at the level of “structural primitives”, which we all know intuitively, but never really thought about it in a systematic way.

The logical connectives, the algebraic data types, the structural elements of a directed acyclic graph, are the one and the same “primitive patterns”, which emerge again and again at all the different levels of any properly captured abstraction, which is natural, if you thing for it for a second.

These simple “forks” and “joins” can be combined (composed) at all levels, to form more complex layered structures, creating a hierarchy of languages and abstractions. And this is why Layered DSLs is such a universal idea.

Where else have we seen DAGs? In the implementations of Artificial Neural Networks, which support back-propagation. There is a theoretical result, that a very simple such network – a Multi-Layered Perceptron (or MLP) can, in principle, approximate any commutable function. The same result as of the Simple Typed Lambda Calculus, by the way.

Why is that? Maybe it is because the whole Universe itself, as an unfolding process (as well as any sub-process in it) can be seen as a direct acyclic graph, with its “fringe” is what we call “now”. The actual fact is that Biology after endless trials-and-errors came up with an “representation” which can capture some aspects of What Is (everything in biology is conditioned by the constraints of the environment it happen to evolve in) inside a piece of meat called “brain” (and the central nervous system).

Looks like a DAG is a universal structure, isn’t it? At least it underlay (can be the basis of) any computation and can be “conditioned” (“trained” using simple feedback loops) to represent any possible relations. It is also has been discovered by SPJ as the way to implement lazy pure functional languages, as so called G-Machine, which performs “graph-reductions” efficiently. Seems like not a random coincidence.

Anyway, just remember that what looks like an expanding sphere (a mature tree, seen from above) is actually a DAG. So is what you call Universe.

Ok, a more abstract data types forms the basis of a higher-level DSL over the basic “graph structures”. The “abstraction barrier” is the boundary between these layers—the functional interface of a module, for instance, hides the specific implementation (the concrete DAG structure) from the user, exposing only the types (the propositions). This modularity is a direct consequence of the isomorphism, allowing for clean, stateless interfaces.

Abstraction, Modularity Abstract Data Types – all the way back to Barbara Liskov, the 70s, 80s, and the early 90s – the golden age of programming, of ML, Scheme and CLU, – the intuitive knowedge.

The ADTs correspond to the universal notion of the existential qualifier (there exist an … /such that) – hiding a concrete type behind an abstract interface, an abstraction barrier, exposing only some property (which can be captured by a predicate – such that).

The universal qualifier (forall … the property holds) corresponds to parametric polymorphism (of some functions) and to generics. Again, nothing is arbitrary or accidental.

There is much more. Nesting of Sets and composition of Functions (which is also nesting) form on the same universal patterns, at the most abstract level, which has been noticed by some Category Theorists, etc, etc.

Okay. The Curry-Howard Isomorphism reveals that logic, mathematics, and functional programming share the same underlying “primitive” abstract structures, even further reducible to “forks” (branching, possibility) and “joins” (convergence, causality) in a directed acyclic graph, (while evaluation is defined as a graph-reduction process based on substitution of an expression for the value it denotes). These structures underpin algebraic data types, proofs, and programs, forming a hierarchy of modular, stateless “universal” abstractions.

These (and only these) shall be the basic building locks of our programs if we want them to be correct, manageable an short. Most of the bloat of modern programming is the direct consequence of unnecessary, redundant “intermediate” abstractions, which mismatch the actual structure of complexity of the domain, and appear at a wrong level of abstraction (usually too low) and unnecessary, redundant error-prone conversions back and forth. between them.

The math-majors who designed the ML family of languages understood this intuitively, and designed the languages appropriately. For instance, for any “introduction” of a proper sum-type (a tagged disjoint union) there are at least one corresponding “elimination” using structural pattern-matching (on the tags), so no “case” can be left out or forgot in principle. Options (Maybe) and Results are just proper specialization (of the universal notion of an algebraic sum-type).

Currying and partial application is just another instance. We define a curried lambda abstraction as a “join”, and it acts as one when eventually being applied – composed to another such lambda abstraction.

Everything is an expression, which denote a value of a particular type (no statements or commands) is another universal principle from mathematics and logic. Immutability of both variable bindings and data bindings lead to the referential transparency and (as a direct consequence) to persistent, “immutable” data structures.

Classic Functional Programming is a well-developed and well-understood subject, so I won’t state all the principles right there (I already did so many times).

Here is some more philosophy (everyone loves some non-bullshit philosophy!) Do you recall when the Gerald Jay Sussman, the great wizard. cut his fingernail on live TV and asked “whether or not he is the same entity?!” it was so dramatic, I still feel it after so many years. mr. Sussman mocked philosophy that day, any be only to realize that one day philosophy will mock him. Anyway, the answers are simple – one is that no two entities of the same kind can occupy the same localton unless they are nested (nesting is a very subtle and topic – one of the most universal universals, and one has to be very careful). This principle is as good as any mathematical or geometric axiom, as good as no two values can be located at the same address, except that in the crappy imperative programming context they can! – they will be destructively modified with an information loss behind your back!

When you violate some fundamental philosophical principle, all bets are off. Such embarrassing things never arise in the context of mathematics and logic, so they shouldn’t arise in the context of serious programming.

Another answer, by the way, is that mr Sussman is a process, not a static “object”, so it continuously undergoes the changes and transformations within – it continuously “unfolds”, like any other process, so cutting out a piece of a nail is just another change or transformation among many. The location-based identity of a process remains the same. This is what a classic ref-type from FP properly captures – the reference remains the same, while changes may undergo within [the particular location in memory] in a strictly ordered manner.

Now consider a particular apple – what color it has? Well, it is green at first, then it is red when it is ripe, and then it may be grown or even back eventually. The apple is the same, the color is different, how do we cope with this? The answer is that the color of an apple is only meaningful when we actually observe it, and when we eventually observe it its color will be whatever it is. The trick is that it will always be “now”, not in the past or in the future, because there is always now (after now, after now).

This is how Functional Programming is actually “timeless” – we just declare that the color of an apple has to be observed eventually, and then to be compared to (or matched with) some pre-defined and anticipated [set of] possible outcomes, with some “otherwise” clause, perhaps. Only one of the potential clauses will be “selected”, and only one of the potential “branches” (paths) will be evaluated (taken). The whole expression will be substituted to the value, that the selected branch will be reduced to.

Time is just an abstract “scale” of equally spaced “notches” (just like any other scale) superimposed by the mind of an observer onto some observation of some aspect of What Is, and this is another very subtle topic. No observer, no Mind – no time. Math or logic, being collections of properly captured and generalized abstractions, have no notion of time. Neither should our “serious” programs.

Lets try see why this is the only right way to program, as opposed to mere coding, like that in webshit or some other imperative crap generated by an AI.

Lets start from the Category Theory! Who wouldn’t like to be Bartosz, slowly methodically explaining the inhabitants of the highest realms of abstraction to a bunch of idiots.

Functors map between layers of abstraction (e.g., from one DSL to another).
Monads as higher-level abstractions (an ADT) over “forks” and “joins” on a DAG.
Composition is not arbitrary, and it has its “laws” (rules), which has to be preserved.

The most intersting and useful part of the Category Theory is not the categories themselves, but from what and how they have been built (constructed). There are, unsurprisingly, some informally defined “Abstract Data Types”, defined in terms of “abstract mathematical interfaces (sets of function’s type-signatures)”, such as Monoid and Functor.

The DAG structure corresponds to a category itself, where nodes are “objects” and edges are “morphisms”. “Forks” and “joins” are specific morphisms – co-products (sums) and products – which define the overall structure of the category.

Forks (sum types, disjunctions) capture possibility: a system can take one of several paths (e.g., Left or Right in Either).
Joins (function types, conjunctions) capture causality: a system requires all inputs to produce an output (e.g., a curried function application).
product types are another kind of “Joins”, which capture the universal notion of “being present in the same physical locality”, which imply “at the same time”.
partial application (when not all the parameter are present – not all the slots of an enzyme are filled – captures the notion of… well, “not all necessary conditions meet” ).

These are the smallest pieces of the DAG, forming the building blocks of all logical and computational structures. Higher-level abstractions (e.g., lists, trees, monads) are compositions of “forks” and “joins” (on a DAG), organized hierarchically.

At a higher level (above the DAG) the Algebraic Data Types form a hierarchy of abstractions, where higher-level types are built from lower-level ones, creating modular, composable systems.

Why wouldn’t they also define the structure of our proframs? They do in the better, well-designed languages, in which has the proper support of the algebraic data types – tagged disjoint unions and universal structural pattern-mathing on tags, along with curried functions and the universal partial application. The products (the record or “struct” types) are not such a big deal. All the subtleties are within the data-constructors for a sum-type (a co-product), which corresponds to a “tag” within a type (which, by the way, is just a set of values all possible values), and a “natural” parameterization of these, being a pure functions.

Type-classes and “traits” come as a straightforward (the notion of “such that”) level of abstraction over types.

So, how do we actually program, given all this “abstract BS”? This, actually, is the easiest questions – we program with the proper algebraic data types inside abstract data types. End of the story.

package the concepts from the “problem domain” into a self-contained orthogonal, lose-coupled modules, one at a time.
export the simplest, minimal high-level stateless abstract interfaces from such each module (proper non-leaking abstractions)
use distinct layers of such abstractions, separated by clear abstraction boundaries (stateless high-level interfaces)
these layers has to be in an one-to-one correspondence with the structure of the problem, otherwise it all will be wrong and flawed.
make sure that the two hierarchies (of the problem and of the program) match perfectly (1-to-1) at all levels of abstraction.
develop several layers of REPL-friendly functional DSLs, which only use abstractions one level below it – enforce the abstraction barriers
go “mostly functional” and always return a new value from a function (unless this is 10000x10000 matrix or something)
use the “functional idioms” and the advanced typing support which underlay these idioms.
avoid any hidden “encapsulated in objects” mutable state as a plague. Use ref types if you must have some state. statefull OOP is flawed in principle.
use “plain” structured data values, like these “in-wire representations”. Remember that biology “works” because every molecule is identical.
use iterators over collections and chaining of operators (methods) instead of imperative looping construts over indices.
make user-defined types collection-like, by implementing standardized interfaces (traits, typeclasses).
make your collections immutable (as in Scala or Clojure) and thus your data strcutres persistent
use the Option (Maybe) and Result types consistently everywhere (no nulls in principle, no “forgot” checks)
avoid “naked” primitive type values – always use a “new type” wrappers with bounds checking
use “smart constrictors” to make an illegal state unrepresentable (yes, yes, this very meme)
use type-classes (or traits) to even further constrain too general types. The more constraints – the better.
use the pure subsets of a language (even of Rust) to maintain the referential transparency of your code.
avoid muts in Rust in princilple, use “moving pipelines” (which “consume”) instead.
use wrapper “new types” to extend or further restrict the other types by implementing some “missed” traits.
use the standard mathematical notions packaged as type-classes or traits (Eq, Ord. Functor, Monoid). They underlay everything.