The LLM mania is still going on, with no sign of bursting of the bubble. This will be (already is) way larger than even the DotCom bubble. Grab your popcorn.
I already wrote this on the old site, and, of course, because I haven’t followed the rules I got “canceled” as they do nowadays with anyone who disagree with their current set of beliefs.
Lets talk it again, even with millions of views behind each Karpathy or Friendman videos.
We have to start with Deep Learning – what it actually is (instead of what all the normies think it is and what the talking heads tell us it is).
When we consider what is actually going on we will find out that there is a two-stage process of constructing a mathematical artifact (with a particular representation) and of using (“prompting”) this artifact to output (generate) some information.
The vague terminology is what sells and generates a hype (and profits) – just like any organized religion, but it is exactly what destroys the meaning and creates dogmas and debates about them (again, this is how everything actually works as a large social construction).
First of all, we do not “talk” to the large binary blobs, and they never “speak” to us. The additional layers of software generate this experience as a from of a User Experience (UX).
Lets state some facts. These facts are at different levels of abstraction, but however general these statements are not wrong (which makes them statements of a fact).
When we study the actual mathematics involved, we realize that this is a representation learning problem. The training process has been shown to be general enough to learn a representation of any function.
This means that given enough correctly labelled data (and correctly applying a back-propagation algorithm) an mathematical artifact will be produced, such that it can be used as a black-box implementation of a procedure which computes given a function.
The crucial part is that the mathematical techniques at all levels (both of abstractions and of actual representation and implementation) are correct and even “straightforward”, but there is no “intelligence” in them.
The illusion of intelligence (exactly as in a parrot – a talking bird) is inside the heads of the consumers, and literally nowhere else to be found. Just as there no “mind” inside a personal computer.
Lets see where it all gets wrong (which is not a mere opinion or a “research exploration”, but actual facts). There is some non-bullshit philosophy required, but I will keep it simple.
People say that a function is a special kind of a relation. This is too abstract and actually wrong. A function captures a relation, it formally defines a relation so it is at a more “concrete” level.
A function, in turn could have more than one procedure of how to actually compute (calculate) it – a sequence of finite steps, usually called as an algorithm. An actual implementation of an algorithm in some formalism (a programming or a machine language) is one more level down to Earth (from Platonic heights). Everything is well-understood since Turing.
The point so far is this – we already have a whole tradition of how to implement mathematics on machines and we have studied this process at many levels, there are no “miracles” in there – just propertly captured (mathematical) abstractions and their proper implementations with layers upon layers of Abstract Data Types. (and their actual implementations).
The first take-home message is this – there is nothing more out there. All the talk about “emergent properties” is an utter bullshit. Just as there is no mind in a PC.
Lets make in more clear. What we call a relation is not an arbitrary ephemeral abstract Platonic idea. It has to be observed and captured with an informal human language and then translated into a formal of mathematics – which captures of what is already known.
Here we have to invoke the first piece of a popular philosophy – according to the nature (laws) of our Universe contradictions do not exist. They just cannot arise within a single unfolding process. This means that every new “piece of knowledge” must not contradict what is already known (non-billshit mathematics and first-order logic).
So, a sort of type-checking for a relation is that it has to be observed and properly captured and it must not contradict what is already known (the math, phisics and biology).
When we have an actual relation we could have a corresponding function and then one or more algorithm or any other representation. Representation is when it gets interesting.
Abstractly, a function could be thought of as a table, just like the multiplication table in an elementry school. The fundamental property is not that it can be drawn as a table, but that the inputs and outputs are fixed (and can be thought of as pairs or “arrows”).
There is another a-ha moment – not everything can be properly captured as a mathematical function. One could say that only " the stable aspects of the Universe" (which make Life Itself possible) are.
Here is a operational definition of non-bullshit – something which can be actually observed and captured and turned into a math (surprise! – proper mathematics is not a bunch of abstract Platonic ideas, and never been). All math so far has been built exactly like this.
Now here is the main point – almost nothing (except for What actually Is) can be properly captured as a relation and a function. Building representations of what is not a relation or a function (what isn’t Out There) using mathematical techniques is, indeed, no different (in principle) from alchemy or astrology, which is what it is as a social construction or a mass-hysteria at the social level.
While each step– mathematical, algorithms, representations, implementations – can be well-understood and well-defined, the result is just bullshit, because all the correct, proper, known-so-far methodology has been applied to bullshit in a particular social settings – just exactly as alchemy.
One more time – pay attention – there is no “intelligence” out there, no actual “emergent properties”, just as there is no angels or gods. This is nothing but an information processing.
Information processing, in principle, cannot be a source of truth, just as mere applied math cannot yield an experimental science. An experimental science is another level of abstraction from mere information processing or calculations. Just as mere texts (as sequences of words and symbols) never (by themselves) are sources of truth (most of the times they are just verbalized dogmas).
Now lets explain it as if to a 5 year olds. Mathematics (and information processing using computers) however pure and flawless, applied to bullshit yields bullshit. The relations has to be out there.
The only non-bullshit applications of Deep Learning and other AI techniques has been to measurements (using various instruments) of some aspects of actual reality – biology, mostly.
What cannot be adequately measured and empirically validated cannot be used as a training data to a deep learning model. Well, it obviously can, but it will yield as bullshit. It is this simple and this fundamental.
Here is another infallible principle (even the law of the Universe) – the non-stable environments, in which the factors keep evolving (new ones emerge, old ones disappear or diminish) cannot be properly captured using mathematics, because your function will give different results, due to the “changes within the environment”.
The same principle has been observed when imperative programs crash after some data “changed behind their backs”. Immutability (of so-called “stable intermediate forms”) is a requirement, not an academic fancy. In short – the environment has to be stable-enough (for a Life to emerge and sustain itself).
The principle is this – Application of any Deep Learning techniques to bullshit will yield bullshit. Language, (unless it is a very strict mathematical formalism or a scientific discipline) captures bullshit, not even “the whole world at once” (as naive liberal arts majors would tell us).
No, all the written texts do not capture reality. It actually captures all the bullshits that a humand mind are capable of.
Last but probably the most important fact – any AI system which currently is capable of generating any code that compiles and runs never does it from the first principles (reality -> math -> abstract data types -> representation -> implementation), it just generates something that sounds like a human speech (code) – exactly what a parrot does. There is no “understanding” involved.
Just as it is with a parrot, the actual understanding is at another level, many layers of abstractions away from the soundwaves the bird emits (or the binary numbers a software+model produces). Let this fact sink in.