Defeating Nondeterminism, my ass

And while passing by…

Yet another “look, look at us, we are soooo smart and clever, give us much more money just because we are so cool” article dropped.

https://thinkingmachines.ai/blog/defeating-nondeterminism-in-llm-inference/

The lack of exact precision is not the fundamental issue here. Even if one manages to overcome the “numerical instability issues” and would be able to reproduce always the same structural output from the same linguistic (or otherwise structured) input the “hallucinations” and “subtle bullshitting” won’t go away in principle.

As long as sampling from conditional probability distributions, obtained at the level of “tokens” (mere syntax) is the underlying model, no technical trick, larger amount of the training data or an increased precision, even the totally exact math at all levels, will make such model to accurately capture even the simplest aspect of reality. What a probabilistic model “tries to capture” is just at another level of abstraction, semantically very distant from the “training data”.

And no, exact precision is never the issue in Biology. Mother nature relies on crude, gross approximations and on actually capturing somehow the actual, relevant constraints of the environment and on reusing the “stable” recurring patterns in What Is.

Yes, floating point math is inexact. Yes, there are “numerical stability issues”, well-understood since 70s or so, but this is not /“the why” your “AI” is only an appearance of intelligence.

Let’s however “contribute” some, just for kicks.

The “non-determinism you are “trying to defeat” is not jut inherent in the world, it is inherent in the algorithmic approach at all levels, not just floating point math.

The weight “initialization” is non-deterministic. There are some “wight answers” from biology – brain has a pre-defined structure of the distinct specialized areas, which is an non-random “initialization” before the actual “training” (fine tuning) begins.

The “ordering” in which the tainting occurs (it is inherently sequential after all) does matter a lot (contrary to what some popular bullshitters postulate) and this is another source of non-determinism. One does not have to be Hinton to see that if you “feed in” the carefully written children’s English (pre- and elementary school) textbooks before all the random slop from reddit and 4chan, there will be way less hallucinations , especially when being trained on per-revered written materials.

The reason is that earlier “proper” weights would be less prone to distortion by a linguistic noise.

When applied to the code, the common, recurring patterns are “intuitively” captured in the standard library code of classic, concise math-inspired less cluttered and less verbose languages (of the ML family), and ideally, the “coding models” should be trained and operate at the level of “pure ASTs” – some evolved intermediate representation, just like advanced compilers of by really smart people (GHC) do.

Operating at the level of mere syntax in the context of a source code will always, in principle, yield subtle bullshit here and there.

Let’s address the “in LLM inference” part.

Selection of one of many possibilities based on estimated probability is not /“the how” knowledge is obtained. Yes, there is always a less wrong answer (up to some “fixed point”), and it is not “probable”, it “just how it is”.

Individual particular outcomes of complex processes are non-deterministic and thus unpredictable, so we think of terms of similarity and of a likelihood of coming out being similar (similar but not the same outcomes), which is the only adequate notion of probability. It is not even the frequencies of observations (mere counting of outcomes), it is due to the inherent non-determinism within the process which produce the individual outcomes.

In short, it is prior to counting the outcomes, even the complicated coining with respect to the “previous outcomes”. Ironically, what you will have is a “common belief or conditioning” (not What actually Is), which is an accurate interpretation of “conditional probability” – a measure of belief.

As for precision, again, it is not the fundamental requirement. The fundamental requirement regains the same since the times of the Buddha – to see (capture) things as [really] they are. This is what is actually necessary and this is what makes mathematics to be whatever it is.

And no, no amount of computation (or precision) will turn verbiage from reddit “into mathematics”, simply because the process of determining which statement is true or false requires reproducible experiments (the actual, rigorous application of the scientific method) or mechanically verifiable proofs in properly chosen (matching reality) axiomatic system.

It is not the loss of associativity what ruins the naive promises of “intelligence”.

But yes, your writing is so detailed and pictures are so neat. How much are you entitled to be paid for this?