LLM (and Math) Philosophy 202

DESCRIPTION: Idiots, idiots everywhere.jpg

I am getting old and my cognitive abilities are slowly declining, and I will have no pension or social security, so I need a “Turing Award” or something. (only partially kidding). And, yes, I “hacked” these “longevity” memes and “protocols”, but seems like fighting an increase in entropy and the second law of thermodynamics is not that efficient as some meme-guys claim on YouTube ).

So, lets settle one fundamental meme-question once and for all, so we can let it all go and focus on “bio-hacking” and what not.

There is a notion of a vector in a 2D space. It can be conceptualized (thought of) as Cartesian coordinates of a Point on a plane, or/and the a distance from Origin to the Point on a such plane. Notice that the Origin can be set arbitrary anywhere, the only requirement is that “all the notches (of an arbitrary length) has to be exactly equal”.

On a 3D space the same principles hold. The vector can be imagined and visualized as an arrow from Origin to the Point.

Now, some abstract bullshitters (and thus the socially constructed and maintained “consensus”) claim that the same notions of a “distance” would hold on “higher dimension spaces”. The exact wording is that “the mathematical notions are still valid on \[n > 3\]”.

This is, of course, an abstract bullshit. Adding another imaginary dimension (and just extend a vector by having one more number in it) is an illegal operation. What is produced is a pure imaginary abstract system, which has no existence apart from being build in the way I just described – by adding one more number to a vector.

Yes, the operations of addition (combining) and multiplication (scaling) would “work”, and all the mathematical properties of the resulting abstract system will hold, the math is perfectly valid. This mathematical fact, however, does not imply any existence or applicability whatsoever.

LLM bullshitters claim a “constructive” existence (by construction) of an n-dimension space in which each “token” is a Point. They claim that in such a space the notion of a distance (from Origin and/ between any two points/) are valid notions. The fact is that “existence” is as imaginary as the n-dimensional spaces themselves. So are the “distances”.

The whole gospel of abstract explanations of what LLMs are is just the same as explanations of existence of gods and angels and what not. The clever trick is that math is involved and the math itself can be verified to be correct (which only means that application of the well-know rules produce an expected answer).

What LLMs are really? The answer is not in the realm of imagined abstract systems, but in the actual algorithms, representations and the code being used, because this is what is real.

So, what are the algorithms? It is a process of systematic updates of “weights”, no more, no less.

There is a proper explanation: Inside the brain, which is a tissue, two basic processes on the level of individual neuron cells occurs (among many others) – myelination of the axons and transformation of “synaptic gaps” , which affects the rate of “firing”. This is how a “conditioning” actually works – brain is a “muscle”, after all.

Now pay attention. Evolution came up with this “arrangement” without any n-dimensional bullshit. The process of biological evolution at the level of cell biology has no notion of a “number”, of “counting” and thus no “overflows”, no “counters”, no notion of a time (as an abstraction of evenly spaced intervals), and no clocks. And “it just works”.

Similarly, the by just systematically updating “weights” or “gradients” on what is essentially a DAG (which is what biological neural nets are) everything can be captured and “represented”. This is the ultimate finding of non-bullshit Artificial Intelligence – that every commutable function can be approximated by a simple neural net, and that the “backprop” process captures the notion of biological feed-back loops and thus in a sense is “universal”. This is exactly what the Nobel has been given for.

Notice that another “universal” notion is the one of a “fixed point”, which captures the process of a “convergence to What Is”, which is an essential part of any evolutionary process of trials and errors.

Again – evolution achieved its results at the level of brain structures without any notion of existence of any bullshit n-dimensional spaces. They are irrelevant and ultimately non-existent. Yes, it is possible to make up a story, and prop up the story with valid rules of mathematics, but this does not make the story more “real”.

Yes, it is possible to say that there are n-dimensional vectors in an n-dimensional space, but that would be just saying, a story. A gospel.

In reality there is a particular “information artifact” – a concrete data structure, which represents a fully-connected neural network with each edge having a particular weight associated with it. The artifact itself has been loosely modeled according to how we currently discovered and think the brain works.

There are two distinct sets of algorithms – the one is for producing (or training) of such an information artifact (an actual data structure), and the other is for using (or doing “inference”) on such pre-trained data structure. Notice that the actual algorithms, not the imagined abstract explanations, are the “ultimate source of truth”.

Conclusion? Every time you see some Chuddie on the Internet wrote a blog post about n-dimension vectors in an n-dimension spaces, clustering, “distances” and what not, just tell them to kindly fuck off. Nothing of what they spew out to attract attention and to proclaim (an unwarranted) higher social status of being “smart” and “valuable” is real.

The fact that the tokens are “one-hot” encoded does not imply that the whole interpretation (the story) is valid. It has something to do with the fact that matrices are used to represent the data in order to perform the fundamental “weighted sum” operations efficiently. An alternative interpretations and representations (as mere weights on a DAG) are even “more valid”, but the point is that no abstract interpretation is necessary for the whole thing to work, just as it is not necessary for any biological system.

Only the actual data-structure within a given file is real, and the code that works with it according to a particular pre-defined algorithm.