So, I don’t like proofs.
To me building a proof feels like constructing a steel trap out of arguments to make true what you’re trying to assert.
Oftentimes the proof in the book is something that I get if I study, but hard to come up with on my own. In other words I can’t make steel traps, but I feel fine buying them from others.
How does one acquire the ability to create steel traps with fluency and ease? Are there any particular reference books that you found helped you really get how to construct a proof fluently? Or is it just practice?
I’d like to second one part of Qiaochu Yuan’s answer: the recommendation to read Polya’s book. Unlike many other books I’ve seen (albeit none of the others recommended above), it actually does contain guidance on how to construct a proof “out of nothing”.
And that’s one problem with the “practise, practise, practise” mantra. Practise what? Where are the lists of similar-but-not-quite-identical things to prove to practise on? I can find lists of integrals to do and lists of matrices to solve, but it’s hard coming up with lists of things to prove.
Of course, practise is correct. But just as with anything else in mathematics, there’s guidelines to help get you started.
The first thing to realise is that reading others proofs is not guaranteed to give you any insight as to how the proof was developed. A proof is meant to convince someone of a result, so a proof points to the theorem (or whatever) and knowing how the proof was constructed does not (or at least, should not) lend any extra weight to our confidence in the theorem. Proofs can be written in this way, and when teaching we should make sure to present some proofs in this way, but to do it every time would be tedious.
So, what are the guidelines for constructing a proof? You’ll probably get different answers from different mathematicians so these should be construed as being my opinion and not a(n attempt at a) definitive answer.
My recommendation is that you take the statement that you want to prove and apply the following steps to it as often as you can:
Once you’ve done all that, the hope is that the proof will be much clearer.
Here’s an example.
Original statement:
The composition of linear transformations is again linear.
Replace generic statements:
If $S$ and $T$ are two composable linear transformations then their composition, $S T$, is again linear.
It is important to be precise here. The word “composable” could have been left out, as the statement only makes sense if $S$ and $T$ are composable, but until you are completely familiar with this kind of process, it is better to be overly precise than otherwise. In this case, leaving in the word “composable” reminds us that there is a restriction on the domains and codomains which will be useful later. (However, one has to draw the line somewhere: even the word “composable” is not quite enough since it leaves open the question as to whether it is $S T$ or $T S$!)
Include implicit information:
If $S \colon V \to W$ and $T \colon U \to V$ are linear transformations then $S T \colon U \to W$ is again linear.
Here’s where remembering that $S$ and $T$ are composable in the previous step helps keep things clear. As $S$ and $T$ are composable, we only need $3$ vector spaces. Then, since we explicitly have the vector spaces the fact that $S$ and $T$ are composable is plain, though some may prefer to keep that fact in the statement. Also, some may like to have the fact that $U$, $V$, and $W$ are vector spaces explicitly stated.
Expand out definitions:
If $S \colon V \to W$ and $T \colon U \to V$ are such that $S(v_1 + \lambda v_2) = S(v_1) + \lambda S(v_2)$ and $T(u_1) + \mu T(u_2)$ for all $v_1, v_2 \in V$, $u_1, u_2 \in U$, and $\lambda, \mu \in \mathbb{R}$, then $S T(x_1 + \eta x_2) = S T(x_1) + \eta S T(x_2)$ for all $x_1, x_2 \in U$ and $\eta \in \mathbb{R}$.
Note that I have been careful not to repeat myself with the newly introduced symbols. It would be technically alright to reuse $u_1$ and $u_2$ in place of $x_1$ and $x_2$ since these are local declarations (restricted by the phrases “for all …”). However, humans are not good at differentiating between local and global declarations so it is best not to reuse symbols unless the scope is very clear.
Replace generic statements:
If $S \colon V \to W$ and $T \colon U \to V$ are such that $S(v_1 + \lambda v_2) = S(v_1) + \lambda S(v_2)$ and $T(u_1) + \mu T(u_2)$ for all $v_1, v_2 \in V$, $u_1, u_2 \in U$, and $\lambda, \mu \in \mathbb{R}$, then whenever $x_1, x_2 \in U$ and $\eta \in \mathbb{R}$, $S T(x_1 + \eta x_2) = S T(x_1) + \eta S T(x_2)$.
Up to now, the rephrasing has not taken into account the fact that there is a conclusion and a hypothesis. This rephrasing modifies a part of the conclusion to turn it from a generic statement “$P(p)$ is true for all $p \in Q$” to a statement about a generic object “whenever $p \in Q$ then $P(p)$ is true”. We do not do this for the similar statements in the hypothesis. This is because these two pieces are treated differently in the proof.
Replace generic statements, and reorganise to bring choices to the fore:
Let $S \colon V \to W$ and $T \colon U \to V$ be such that $S(v_1 + \lambda v_2) = S(v_1) + \lambda S(v_2)$ and $T(u_1) + \mu T(u_2)$ for all $v_1, v_2 \in V$, $u_1, u_2 \in U$, and $\lambda, \mu \in \mathbb{R}$. Let $x_1, x_2 \in U$ and $\eta \in \mathbb{R}$. Then $S T(x_1 + \eta x_2) = S T(x_1) + \eta S T(x_2)$.
In this form, the distinction between hypothesis and conclusion is all the clearer. Parts of the hypothesis use the word “Let”, parts of the conclusion use the word “Then”.
With this formulation, the proof essentially writes itself. With all it’s gory details:
Proof
Let $S \colon V \to W$ and $T \colon U \to V$ be such that $S(v_1 + \lambda v_2) = S(v_1) + \lambda S(v_2)$ and $T(u_1) + \mu T(u_2)$ for all $v_1, v_2 \in V$, $u_1, u_2 \in U$, and $\lambda, \mu \in \mathbb{R}$. Let $x_1, x_2 \in U$ and $\eta \in \mathbb{R}$.[^quick] Then:
$$
S T(x_1 + \eta x_2) = S \big( T(x_1) + \eta T(x_2)\big)
$$
using the hypothesis on $T$ as $x_1, x_2 \in U$ and $\eta \in \mathbb{R}$. So:
$$
S T(x_1 + \eta x_2) = S T(x_1) + \eta S T(x_2)
$$
using the hypothesis on $S$ as $T(x_1), T(x_2) \in V$ and $\eta \in \mathbb{R}$. Hence the conclusion is true.
Notes:
1. This could be condensed, but the important thing here is how to find it, not what the final form should be.
2. Notice that I wrote “as $x_1, x_2 \in U$” rather than “with $u_1 = x_1$ and $u_2 = x_2$”. This is partly style, and partly because in the statement of linearity, $u_1$ and $u_2$ are placeholders into which we put $x_1$ and $x_2$. So saying $u_1 = x_1$ is semantically incorrect as it equates a virtual vector with an actual vector. This is a very minor point, though.
Finally, I would like to disagree with one part of Qiaochu’s answer. I actually like the imagery of a steel trap. A proof is a bit like a trap: we want to capture the theorem in a trap so that it can’t wriggle out. We construct the proof so that there is no possibility of escape. Eventually, yes, we want the proof to be beautiful but when it’s first constructed we just want it to do the job. Only once the theorem is caught can we spend a little time decorating the cage to make it look pretty and set it off to its best advantage. So build the trap because theorems can be dangerous! An escaped theorem can do untold damage, rampaging across the countryside, laying waste like an unchecked viking.
(Okay, not quite finally. The step-by-step proof above was taking from a page I wrote for my students on the nature of proof. The original can be found here.)
You get better at proofs the same way you get better at basketball or carpentry: lots and lots of practice. (In particular, like in basketball and carpentry, you can only get so far by reading books.) Of course, there’s good practice and bad practice. For general experience writing and coming up with proofs I think the same kind of material that people use to prepare for Olympiad-style mathematics is very helpful. To that end, here are a few references you might find helpful:
For writing proofs specific to a course (e.g. real analysis) you’ll generally find that the same basic ideas are being used over and over again, and once you learn to recognize when these ideas will be useful your life will be much easier. But I think this is a hard skill to teach.
It doesn’t help that many proofs in textbooks are written in a style that makes it nearly impossible to see how someone could have come up with the proof from first principles. This is an unfortunate tendency, and you should find a different textbook if this is too much of an issue. (Alternately, you should see if there are better proofs online, for example on someone’s blog. Tim Gowers and Terence Tao are fond of writing up conceptual proofs of things, and more generally their blogs are a great source of insight into how mathematicians think.) Once you build up a little problem-solving skill, another way to fix this is to reprove things yourself. It helps if you can’t remember what the proof in the textbook is.
But I’m serious about the practice. $10000$ hours and all that.
Edit: I also find this “steel trap” analogy very depressing. While it might seem that way in some textbooks, a properly presented proof should feel more like a poem.
Edit #2: I’d also like to mention that the general outline of Polya’s advice is on Wikipedia.
It seems you are asking two questions. The first is how to get better at doing proofs, and the previous answers are better than I can do. The second is why to do them, which has not been addressed. From your questions on this site, you seem more an engineer than a mathematician. This is not a negative-I am educated as a physicist and practicing as an engineer(ing manager), but I enjoy proof-type math. It just is a different view.
In response to your question “How can I characterize the type of solution vector that comes out of a matrix?” Greg Graviton not only gave an answer of what matrices met your criterion, he proved that matrices that did not met his answer did not work. This is a major improvement as you know there aren’t others out there.
Proofs are designed to make sure you covered all the loose ends. Ideally, each statement in a proof would be derived from the previous statements through an accepted rule of logic. But aside from the automated theorem provers, this is unachievable because it would make the proofs too long. So you only need to display enough steps to convince the audience it can be done. How big a leap is acceptable varies with the audience.
As physicists/engineers our numbers and functions are better behaved than the ones the mathematicians worry about. The mathematical world is not uniformly continuous, but ours is. As such, we interchange limits, integrals, derivatives, and sums freely.
As Arturo and Qiaochu have pointed out you get better at proving things by practicing a lot, solving exercises and by seeing how other people have proved things. There’s a big part in learning how to prove statements by repeating certain strategies that may have worked for particular types of problems.
For instance, a particular argument that is used countless times in analysis is the “technique” of adding and substracting a quantity in a way that let’s you group things in a useful way. And there are lots and lots of tricks such as this one that you will learn during the way by seeing them being used several times (and also by using them yourself).
But there’s certainly something about proving things that I must say. I don’t know how common this is for people studying at “top universities” around the world were only really talented and smart people are admitted, but in my country it is relatively easy to get accepted in a mathematics program.
As a consequence, I’ve seen lots of people fail several times in the first mathematics course in my university for mathematics students (where actually students are introduced to proof techniques and basic logic and set theory, basically what in some American universities is referred to as a transition course from the usual calculus courses to proof oriented courses).
The main reason for most of them failing such a course from what I’ve seen, is that some of them just don’t seem to get used to the idea of having to prove something rigorously (as a mathematician will expect). I remember very well when one my classmates argued with the professor because he was asked in the test to prove an equality of sets, and my classmate draw a Venn diagram which showed the sets in question. The professor told him repeatedly that as an aid for intuition the drawing was perfectly fine, but that the drawing didn’t constitute a proof by itself.
In the end (as you’ll imagine) my classmate didn’t win the argument (or extra points for that matter) and he ended up going home frustrated and finally he gave up on the course.
My point is that for some reason some people are just better than others at proving things, and I’m not saying that they’re just smarter, it’s only that the thought processes involved just seem to come more naturally to some persons.
That being said, there are some books that specifically address the topic of introducing students to the task of proving things. What they usually do is to begin by explaining some basic logic and them they build up some easy facts from set theory, binary relations and functions and in the way they introduce some proof techniques, such as “proof by contradiction”, “proof by contrapositive”, etc.
For example, one book that helped me a lot when I was starting is Proofs and Fundamentals: A First Course in Abstract Mathematics, but there a lot of other books that I’m pretty sure work really well when making your first steps in proving mathematical statements, such as How To Read and Do Proofs by Daniel Solow or How To Prove It by Daniel Velleman.
I recommend the book by Lay, “Analysis with introduction to proofs”. The first chapter focuses on logic and proofs and personally I found it rather helpful. At least it helped me. Of course to master the art of proofs you have to keep practicing, but knowledge of some basics such that instead of proving the statement “A implies B you can prove contrapositive “not B implies not A”, and how it differs from “Proof by contradiction”, and the fact that negation of “A implies B” is “A and not B”, and how to deal with existential and universal quantifiers, and so forth and so forth. Those are useful techniques one should master.
Incidentally, as an introduction to analysis, the aforementioned book is quite mediocre. I had a first hand experience with that book, since it was a textbook when I took the “low level introduction to analysis (wasn’t sure if I was ready for Rudin’s Principles of Analysis). So, this is only the chapter on logic and proof which is worth reading. Of course, purchasing the whole book just for a single chapter is funny, but may be you can find the used one or get one in the library.
But that chapter on proofs did help me. When subsequently I took a course on analysis and we used Baby Rudin as a textbook, I was in a very good shape and in fact did quite well.
And, of course, another suggestion: learn by example. See how other people write the proofs. Rudin’s style, for instance, is very terse, you literally have to dig through his proofs, but after a while you start getting used to it, and still after a while you start writing similar proofs.
Good luck.