There is a typical quote from some random Chud on the internet:
The location within the [high-dimensional hyper-] space represents the semantic meaning of the content, according to the embedding model’s weird, mostly incomprehensible understanding of the world.
This is bullshit at so many levels, and in a such “lecturing” tone.
First of all, there is no “understanding” whatsoever and it is not “of the world”.
“The world” is at a fundamentally different level of abstraction from what has been used as “inputs” to a language model.
What that represents is not the how the world is, but some information encoded in a particular way, from a huge garbage pile, a verbiage landfill, so to speak.
Let’s see what is actually going on at a higher level.
There is some “world”, in which there are human beings, which have evolved an ability to use sequences of sounds to communicate with one another about some particular aspects of it.
The basis of any human language is “things”, their “attributes” and the words used to describe and characterize “processes”.
There is something evolved since the beginning of time which is called “predicate logic” and then a “first order logic”, which define or at least show what a precise and correct use of a language shall be.
Notice that any human language is in principle sequential, and to partially overcome this limitations a set of rules has been invented, to indicate relations among individual words.
I intentionally do not use the term “semantic relations” hare, because it is from joggling with ill-defined terminology and in the wrong (unapplicable) contexts bullshit arise.
So, in every language there is the same abstract structure, which intended to mimic the actual structure of the world - there are “things” (“objects”) and their “attributes”.
So, the structure of the world is being collapsed into some linear structure with some implicit relations, which are represented in a linear structure according to the set of grammar rules for a particular language.
Traditionally, linguists are using trees to represent the “full structure” of a sentence.
It has been well-understood that just these abstract tree-like structures are not enough to reconstruct the intended meaning of a communication and the shared context is required for a communication to be meaningful.
The Chuds think that the contexts could be “learned” from (by observing) the sequences of words, while the results show that the shared context comes first.
What then is actually going on there?
Well, structural transformations of information (at the level of a language without a well-defined context).
It is exactly similar to what a search engine does - it indexes (conceptually - rearranges) the words of a language and builds a complex data structure (informally called a “global index”). This index then is used to find “similar” or “related” pages (texts).
Does a search engine or it “global index” has any understanding of the world? Hardly. Would we call a “global index” a (form of) intelligence? Probably not, if you are not a Chud.
Now what about that “semantic meaning”?
Does “semantic meaning” arise or emerge when one re-arranges sequential (ok, tree-like) pieces of data into a high-dimensional structure? Probably not.
The “meaning” exists only at the level of the world for which words of a language are mere crude “labels”.
Now the most fundamental question: Does the resulting multi-dimensional structure actually captures or reproduces and then represents the actual structure of the world? No, it does not.
It cannot, in principle, not just because “the map is not the territory” (especially a totally abstract map of an arbitrary structure), but because the “relations” are being broken at each level of abstraction.
Here is how. One takes a lot of text and feeds it onto a “structural transformation” process (a procedure) which spits out an high-dimensional hyper index of abstract pieces (not even sounds).
The notion of distances is very intuitive and “natural” but unwarranted and misleading, because it has been applied to the wrong level of abstraction.
All the results are no different from a search-engine output based on a frequencies statistics and mostly about nouns, which are closest to the actual reality (the world).
Nouns are “sorted” or “indexed” and the index somehow “mimics” the world.
At the level of mere words it does not. The individual “maps” within out brains is what create a “coherent picture” and recreate the meaning from the output, slightly different for each person, due to the differences in their inner “maps”.
The “meaning” is not in the information itself, it is inside our heads, which continuously update and maintain an inner representations (“maps”) of our environment (“world”).
So, what exactly all the Chuds are doing?
They play games with their own brains, by creating highly distorted, completely disconnected form reality set of clues to the brain, and watch how it reconstructs reality back form given abstract bullshit.
This is what they do. They give to brain an abstract bullshit (that went trough a shredder first and then arranged into a hyper-plane, or what they prefer to call it, by “learning” (counting) the frequencies of the individual pieces, and use this as the “correct” map of the world) to interpret.
Now pay attention. All the semantic meaning and understanding is NOT within the models, but withing interpretations our brains gave to what we “see” (the outputs of the models).
The abstract notion of a “distance” between pieces is only “meaningful” because the world has its structure, and there are, indeed, causality relations between “things” and “objects” (sub-processes).
In short, all these complex “structural transformations” of information (only) have no more meaning that Hegelian “philosophy” or astrology.
Calculations, just as arrangements of abstract concepts, can be completely meaningless, like multiplying birds by trees or calculating “distances” between words taken from a huge garbage dump.
The fact that none of these transformations are applicable to mathematical texts is my informal “proof by contradiction”.
No “new meanings” can be “generated” by merely shuffling the shredded pieces of a verbiage. It has to be “Out There Prior To That” and has to be captured in the texts which are used to train a model.
In short, bullshit, bullshit everywhere. And nothing more “semantically meaningful” that a frequency based statistics, used to “mimic” the actual world for the brain of an external observer, so it can barely interpret all this Chud’s nonsense.