So I watched some. The launch video I mean. Closed the frame when they began that “voice” thing.

The PhD. level across all subjects is just a meme. I can easily be shown by asking the question that require a beyond memorizing textbooks reasoning.

I don’t want to “register” for “Grok4”, but here are some example problems which will break the “PhD level” meme.

  • A recursive functions in, say, Ocaml, or Scala, without explicitly mentioning the accumulator pattern as the required way to avoid stack overflows on languages which does not do TCO at the compile time. This is very basic stuff, which all “PhDs” have to know. The inner lambda with an extra argument, and the “trampoline” is such a classic pattern that some compliers do it automatically. Again, without mentioning it the model will fail by writing non TCO code .

  • In Rust there is a well-known and well-understood simple problem with borrow_mut of some struct as a whole and then trying to compute something with individual fields, within a backward_fn closure in a classic a-la Karpathy micrograd toy engine. The issue is that one has to either use more inderection (for each individual slot which can be potentially updated during backprop), or, as an idiomatic solution, to wrap borrow_mut().val = in setters, which will only live just enough to do an assignment, and to chain these “setters” to implicitly serialize the mutable borrows within any higher-level function. The model will fail to “recognize” such simple issue.

I could, probably, come up with lots and lots of such cases, which, again, goes beyond memorizing textbooks. The idea is that it has been shown for almost a hundred of years (since “serious” classic technical Schools, like MIT or Stanford or CalTech have been established) that only (and only) one’s own understanding, an inner representation, built bottom-up from the first principles, is what is required to anything beyond textbooks. This is exactly how Richard Feynman did what he did – by building up his own inner understanding (an inner representation, literally), and then applying (using) it for writing process.

Grok4 has no such capacity. It is still, in principle, an estimated probability based next-token predictor, trained on less-bullshit data – textbooks instead of reddit and 4chan. But, again, memorizing a textbook was never the way.

I will compare what it will spit out for my highly sophisticated prompts of some Rust boilerplate. Grok3 failed subtly in almost every case (by not being “aware” of the nuances). Gemini, by the way, produced less error-prone code, but the code itself is “primitive”, if you will (which is not necessarily a bad thing).

And, yes, in humanties it will definitely look like a PhD..