This is sort of an answer to this question.

https://news.ycombinator.com/item?id=38425475

So, what if you are late to the party?

Unfortunately, nowadays it is even much harder to get through all the utter bullshit and hype, but there is a sort of a shortcut or “the Hard Way”.

There are two and a half key figures: Geoffrey Hinton, who did most of the mathematical heavy lifting, Andrew Ng, who not just did all the derivations, but became the most famous practitioner, and Andrej Karpathy who us just a narcissistic asshole, similar to Lex Friedman.

Anyway, there essential video courses and readings.

  • The early Hinton’s course on Neural Networks (originally on early Coursera)
  • The recordings of Andrew Ng’s course on Stanford (where he did all the derivations on camera)
  • The first ever MOOC by Andrew Ng, taught in Octave (I am proud of passing it)
  • The “reboot” as a whole “specialization” by Andrew Ng.
  • The “classic” lecture notes for the Standford course by Andrew Ng.
  • The Youtube videos by Karpathy, who is, arguably, one of the Hinton’s students.
  • The “Deep Learning” book by John D. Kelleher (the best beginners book)

All this is non-bullshit top quality learning material, including the Youtube videos.

The Hinton’s course is terse and theoretical, bit it shows how everything has been bootstrapped, using the most fundamental and well-understood basic (even simple) mathematical building blocks.

The Stanford course and the corresponding lecture notes are self-contained and go through all the mathematical derivations.

The MOOC is well-designed and beginner’s friendly. It is absolutely required to get through the Octave part to actually write all the “oneliners”.

The “specialization” suffer from “the second system” syndrome, but it is absolutely required to see the practical side (in the “ML Devops” part). Especially the necessity of constantly retrain on the changing data, realization that young people’s voices are different from adults (so the model trained on adults performs like a crap), etc.

The Karpathy’s videos are good for showing all the intermediate steps in actually coding a model from the first principles. If you are smart-enough, your will realize from the data-structures and actual algorithms being used that there is no “intelligence” in it and everything is just an “information processing”.

This is an important point. All the word2vec abstract “encodings” is just a form of “indexing”, which, in turn, is a set of ways of sorting the data (and to search efficiently on information being already sorted).

Clustering is a form of “topological” sorting (a bucket-sort), which loosely related to the abstract conceptual framework of having probabilities of a complete, fully-observable, fixed set of possible events.

Neither clustering nor probabilities in themselves are “sources of truth”. Only verifiable formal proofs and reproducible experiments are.

Both proofs and experiments are ultimately reveal and describe observable aspects of “What Is” or, as in the case of pure mathematics, properly generalize on the observations (producing a proper abstraction, like a Set or a Number).

The “end of knowledge” is in realization that there is no “knowledge” in mere information processing and statistical “inference”, but in measurable observations of so-called Reality and in the studying of the properties of properly generalized abstractions, which is what a non-bullshit mathematics is.

And, of course, there is no intelligence whatsoever in these particular algorithms applied to these particular data structures (which is nothing but an information processing). And won’t be in principle.

Just information is not enough (look at all the “information” in your Bible or the Vedeas).