Deepseek R1
DESCRIPTION: Memes and mirrors. Nowadays things are moving way too fast. It is not just controlled trial-and-error, it is literally throwing everything at the wall (to see what sticks). It started with that meme “Attention Is All You Need”, when they just came up with an “architecture” that sticks. That “attention” and “multi head attention” turned out to be just a few additional layers of a particular kind. No one can explain the actual mechanisms of how exactly or even why the layers are as they are (abstract bullshit aside)....