Intuition behind Cantor-Bernstein-Schröder

The book I am working from (Introduction to Set Theory, Hrbacek & Jech) gives a proof of this result, which I can follow as a chain of implications, but which does not make natural, intuitive sense to me. At the end of the proof, I found myself quite confused, and had to look carefully at the build-up to see how the conclusion followed. I get the steps, now – but not the intuition.

The authors took sets $X$ and $Y$ and assumed injections $f: X \rightarrow Y$, $g: Y \rightarrow X$. Since $X \sim g[f[X]]$ and $X \supseteq g[Y] \supseteq g[f[X]]$, and $Y \sim g[Y]$, the authors went to prove the lemma that $A \sim A_1, A \supseteq B \supseteq A_1 \implies A \sim B$.

For the lemma, they defined recursive sequences $\{A_n\}_{n \in \omega}, \{B_n\}_{n \in \omega}$ by $A_0 = A, A_{n+1} = f[A],$ $B_0 = B, B_{n+1} = f[B].$ Since $A_0 \supseteq B_0 \supseteq A_1 \implies f[A_0] \supseteq f[B_0] \supseteq f[A_1]$, we get from induction that $A_{n+1} \subseteq A_n$. Putting $\{C_n\}_{n \in \omega} = \{A_n – B_n\}_{n \in \omega},$ they noted that $C_{n+1} = f[C_n]$ (since $A_n \supseteq B_n$ inductively, $f[A_n – B_n] = f[A_n] – f[B_n]$). Putting

$$C = \bigcup_{n=0}^\infty C_n, \text{ } D = A – C,$$

they noted that $f[C] = \bigcup_{n=1}^\infty C_n$, and that $h(x): A \rightarrow f[C] \cup D$ can be defined by sending $x \in C \rightarrow f(x),$ and $x \in D \rightarrow x$. It is clear that $h$ is one-to-one and onto.

Inductively, for $n > 0$, $C_0 \cap C_n = \varnothing$; it follows that $C_0 \cap \bigcup_{n=1}^\infty C_n = C_0 \cap f[C] = \varnothing$. We know that $C_0 \cup f[C] \cup D$ = A, and since all three sets are disjoint, we may conclude that $f[C] \cup D = A – C_0 = A – (A – B) = B$. Thus, our bijection $h$ maps $A$ to $B$, as we wanted.

What is the intuition here? What were H&J, or Cantor, Bernstein, etc. thinking when they went to prove this – the “high-level” idea? Is there a clearer proof, in the sense of thought process (not necessarily shorter)?

Solutions Collecting From Web of "Intuition behind Cantor-Bernstein-Schröder"

$\newcommand{\ran}{\operatorname{ran}}$Here’s one way to think about it. Suppose that $y\in\ran f$; then we can pull $y$ back to $f^{-1}(y)\in X$. If $f^{-1}(y)\in\ran g$, we can pull it back to $g^{-1}(f^{-1}(y))\in Y$. If we continue this pulling back, one of two things must happen: either we reach a dead end at a point of $X$ or $Y$ that can’t be pulled back (because it’s in $Y\setminus\ran f$ or $X\setminus\ran g$), or we don’t.

Let $X_0=X\setminus\ran g$, the set of points of $X$ that cannot be pulled back at all, and let $Y_0=Y\setminus\ran f$. More generally, for each $n\in\omega$ let $X_n$ be the set of points of $X$ that can be pulled back exactly $n$ times, and let $Y_n$ be the set of points of $Y$ that can be pulled back exactly $n$ times. Finally, let $X_\omega$ and $Y_\omega$ by the subsets of $X$ and $Y$, respectively whose points can be pulled back infinitely many times.

At this point a sketch helps; it should show the partitions $\{X_n:n\le\omega\}$ of $X$ and $\{Y_n:n\le\omega\}$ of $Y$, and it should include arrows indicating what parts of $X$ get mapped to what parts of $Y$ and vice versa. To avoid having arrows crossing, I’ve taken $X$ and $Y$ apart in the following diagram.

$$\begin{array}{} X_0&\overset{f}\longrightarrow&Y_1&\overset{g}\longrightarrow&X_2&\overset{f}\longrightarrow& Y_3&\overset{g}\longrightarrow&X_4&\dots&X_\omega\\ Y_0&\overset{g}\longrightarrow&X_1&\overset{f}\longrightarrow&Y_2&\overset{g}\longrightarrow&X_3&\overset{f}\longrightarrow&Y_4&\dots&Y_\omega \end{array}$$

Each of the arrows is a bijection, so I can break up the diagram into $\omega$ self-contained parts. The first two parts are:

$$\begin{array}{} X_0&\overset{f}\longrightarrow&Y_1\\ Y_0&\overset{g}\longrightarrow&X_1 \end{array}\qquad \begin{array}{} X_2&\overset{f}\longrightarrow&Y_3\\ Y_2&\overset{g}\longrightarrow&X_3 \end{array}$$

Ignoring $X_\omega$ and $Y_\omega$ for the moment, I can rearrange the rest of the diagram to give my a bijection from $X\setminus X_\omega$ to $Y\setminus Y_\omega$:

$$\begin{array}{ccc} X_0&\overset{f}\longrightarrow&Y_1\\ X_1&\overset{g^{-1}}\longrightarrow&Y_0\\ X_2&\overset{f}\longrightarrow&Y_3\\ X_3&\overset{g^{-1}}\longrightarrow&Y_2\\ \vdots&\vdots&\vdots\\ X_{2k}&\overset{f}\longrightarrow&Y_{2k+1}\\ X_{2k+1}&\overset{g^{-1}}\longrightarrow&Y_{2k}\\ \vdots&\vdots&\vdots \end{array}$$

Finally, I claim that $f[X_\omega]=Y_\omega$: everything in $X_\omega$ can be pulled back infinitely often, so everything in $f[X_\omega]$ can be pulled back infinitely often, and therefore $f[X_\omega]\subseteq Y_\omega$. On the other hand, if $y\in Y_\omega$, then $y$ can be pulled back infinitely often, so it must be possible to pull $f^{-1}(y)$ back infinitely often, and therefore $f^{-1}(y)\in X_\omega$. Thus, $Y_\omega\subseteq f[X_\omega]$ as well. The diagram above can now be completed to show a bijection from $X$ onto $Y$:

$$\begin{array}{ccc} X_0&\overset{f}\longrightarrow&Y_1\\ X_1&\overset{g^{-1}}\longrightarrow&Y_0\\ X_2&\overset{f}\longrightarrow&Y_3\\ X_3&\overset{g^{-1}}\longrightarrow&Y_2\\ \vdots&\vdots&\vdots\\ X_{2k}&\overset{f}\longrightarrow&Y_{2k+1}\\ X_{2k+1}&\overset{g^{-1}}\longrightarrow&Y_{2k}\\ \vdots&\vdots&\vdots\\ X_\omega&\overset{f}\longrightarrow&Y_\omega \end{array}$$

The bijection is defined piecewise, but that’s no problem.

There are a few details to be filled in to make this a fully rigorous proof, but I think that it does give a reasonable idea of one possible intuition.

Added: Here’s a very rough sketch. Arrows from left to right are (parts of) $f$, and arrows from right to left are (parts of) $g$.

It is a reasonably long and seemingly complicated proof, but actually the idea is very simple (or at least it is in the proof I’ve seen). We have injections $f:X\to Y$ and $g:Y\to X$. To build a bijection $h:X\to Y$ we need to send each point in $X$ to a unique point in $Y$, making sure we hit everything.

We can start by saying $h(x) = f(x)$. But of course, this doesn’t hit everything if f is not surjective. But , for each element $y$ we don’t hit in $Y$ (i.e. the set ${Y\setminus{f(X)}}$) there is a unique element $g(y)$ in $X$ which we can make map to $y$. So we edit $h$ such that now $$h(x) = \begin{cases} g^{-1}(x) & \mbox{if } g^{-1}(x) \notin f(X) \\ f(x) & \mbox{otherwise} \end{cases}$$

So this time we hit everything we didn’t get first time from the $g^{-1}(x)$ part. But now we’re missing some other parts! (namely the values $f(x)$ where $x \in g(Y\setminus f(X))$ but $f(x) \notin Y\setminus f(X)$.) We can define the sets that we “miss” in each iteration of improving $h$ as $C_n$ and say $C = \displaystyle \bigcup_1^{\infty}C_n$. Then defining
$$h(x) = \begin{cases} g^{-1}(x) &\mbox{if } g^{-1}(x) \in C \\ f(x) &\mbox{otherwise} \end{cases}$$

And by thinking of how we built it up, it kind of makes a bit more sense as to why it is a bijection.

Please note that this isn’t the most efficient way to carry out the proof, but I explained it in the way that I would come up with it whilst trying to see why it works, rather than what I’d write down formally. Also please note it’s been a long time since I’ve seen the proof and I may have gotten some of the specifics wrong (please anyone correct me if this is the case!), but this is the right general idea (I hope!)

My favorite proof is due to R. H. Cox, and I learned it from the wonderful Handbook of Analysis and its Foundations by Schechter, from which I’ve taken the illustration below. The basic idea is that an injection from $Y$ to $X$ identifies $Y$ with a subset of $X$. So we can reduce the problem to showing that if we have $Y\subseteq X$ and $f:X\to Y$ is an injection, then $|X|=|Y|$ and this is less confusing. let $C=X\backslash Y$. The sequence of sets $\big(f^n(C))$ is disjoint since $f(X)\subseteq Y$ and $f$ is injective. Let $S=\bigcup_{n=0}^\infty f^n(C)$. Define $g:X\to Y$ by

$$g(x) = \begin{cases} f(x)&\mbox{if } x\in S \\ x & \mbox{if }x\notin S. \end{cases}$$

It is easily verified that $g$ is a bijection.

Remark: According to Kunen’s book on the foundations of mathematics, Dedekind already proved this version of the Cantor-Bernstein theorem in 1887.

When Y is a subset of X, Y can be assumed a proper subset of X otherwise the result is trivial. f is then called a reflection. If Z is a subset of X that contains Y then X is partitioned into Z and the rest of X which can be imagined as a frame of Z. This partitioning is carried over by f to Y, and then again to the image of Y under f and so on. Thus we obtain a sequence of disjoint frames and a sequence of nested subsets of Z. Picture the frames as an infinite stack. f then pushes the stack one level down. So if you take f on the stack of frames and the identity on the subsets you get a 1-1 mapping from X onto Z.

This is the essence of Dedekind’s proof mentioned above. cf. my book http://link.springer.com/book/10.1007/978-3-0348-0224-6/page/1

Consider the function

$\tag 1 \ F: \{ \cdots,-3,-2,-1\} \to \{0, 1,2,\cdots\}$
$\qquad \qquad \qquad \qquad\qquad \qquad n \mapsto -n$

$F$ is an injection of one countable set into another that misses exactly one point, $0$, of the target (codomain). We can easily remedy this and define a bijection.

Let $U^-$ be the predecessor operator, $n \mapsto n – 1$. The composition $U^- \circ F$ now gives a 1:1 correspondence between the domain and target of $F$.

Intuition flash: Modifying an injective function so that it becomes bijective is all about translating $U^-$ to the given context, with the multiplicity required to fix all the missed elements. So given injections,

$\tag 2 f: X \rightarrow Y$
$\tag 3 g: Y \rightarrow X$

we want to “$U^-$ fix” $f$ for every point in $Y$ that is missed by $f$.

Exercise: Let $\hat y$ be an element of $Y$ that is not in the image of $f$. Define a new function $\hat f$ satisfying $\hat {f}(X) = f(X) \cup \{\hat y \}$.

At this point you could prove the theorem by extending this exercise in several ways. The set theoretic technique supplied by Michael Greinecker is abstract and elegant, and the final word. We apply it here, constructing a bijective map between $X$ and $Y$ using (2) and (3).

Let $C=Y \backslash f(X)$ and set $h = f \circ g$ to the easily obtained injective mapping of $Y$ into itself. We know that $C$ must be disjoint from $h(C)$, and so $h(C)$ is disjoint from $h^2(C)$. But by definition, $h^2(C)$ must also be disjoint with $C$. We can now state the following

$\tag 4 \text {The sequence of sets }\big(h^n(C)) \, n \in \mathbb{N} \, \text{are disjoint and if } n\gt 0 \text{,} \; h^n(C) \subset f(X).$

$\tag 5 h^{-1} \text{ is a bijective mapping } h^n(C) \to h^{n-1}(C) \; \text {for all } n \gt 0$

Exercise: Construct a bijective transformation $V^-: f(X) \to Y$ so that $V^- \circ f$ becomes a bijection between $X$ and $Y$. (Hint – modify the insertion map by redefining it on each $h^{n}(C)$ for $n \gt 0$).