I am looking for a reference to learn probability theory which satisfies the following criteria:
Why am I looking for this: I would like to get a first understanding of machine learning and other data science techniques. I do not want to rush copy-pasting code and applying recipes without any understanding of them, so I am trying to get some grounding in probability and statistics, two subjects I know exceedingly little about. On the other hand, I am not trying to prepare to do research, and it should not take exceedingly long (4-6 weeks is roughly the amount of time I am planning to spend on this).
I hope that there is something satisfying these criteria around there!
EDIT: I have opened a similar question asking for references in statistics here.
Little late, but I’d say it depends on your situation/context a little more. Personally, I’d say that if you want grounding in ML, then you should read a book on ML, many of which cover the probability and statistics background in more or less detail. For instance,
Hastie, Tibshirani, and Friedman’s The Elements of Statistical Learning takes a very statistical approach and may actually be a great way to learn certain areas of statistics in general actually. It’s practically oriented.
Shalev-Schwartz and Ben-David’s Understanding Machine Learning talk more about the formal theories of ML. The focus is still algorithmic and computational though. It is a little less applied and more theoretical. But still accessible with very little heavy probability theory.
In fact, for “practical” ML specifically, I’d say optimization theory and knowledge of matrix computations and numerical linear algebra is more important than probability theory. It is only for “theoretical” ML (e.g. VC dimensions, Rademacher complexities) that you need some probability theory.
In any case, to answer your actual question, the above commenter’s suggestion of Durrett’s Probability: Theory and Examples might be a good choice. My own suggestion might be Williams’ Probability and Martingales.
But both are probably overkill, as they focus on continuous probability whereas ML focuses on discrete theory.
More appropriate, though I haven’t read it, might be Grimmett and Welsh’s Probability: An Introduction. Looks like it covers the right topics at the right level for your purposes.
I had to settle on some references relatively quickly. As noted in some comments here and in my other question, the ability to follow a mathematically advanced presentation does not necessarily imply that such an exposition is the best intro to a new area, which I found a good point.
Mostly I am following the book “Statistical Inference” by Casella and Berger. It treats both probability, in the first 4-5 chapters, and statistics, which I regard as a plus in my current situation. It is not particularly challenging mathematically but at the same time not too sloppy (my impression anyway).
Occasionally I look at the book “Elementary probability” by Stirzaker. It contradicts most of what I asked for: It is long, does not cover much ground, and it is not at all mathematically sophisticated. However, it provides a huge number of non-trivial solved examples and problems, which should be useful in developing some intuition for the field.
The work by Durret seems a fine book but I have only skimmed it, and I am also well-disposed towards Feller “An introduction to probability theory and applications” (2 volumes), but those will have to wait for another time.
Hopefully at some point I will also have time to check other suggested references, and some other people may be interested too, so do not let this answer stop you from posting other answers.