Precision of operations on approximations

If $ x $ and $ y $ have $ n $ significant places, how many significant places do $ x + y $, $ x – y $, $ x \times y $, $ x / y $, $ \sqrt{x} $ have?

I want to evaluate expressions like $ \frac{ \sqrt{ \left( a – b \right) + c } – \sqrt{ c } }{ a – b } $ to $ n $ significant places, where $ a $, $ b $, $ c $ are nonnegative integers. I was thinking about doing it recursively, i.e., if want to evaluate $ x / y $ to $ n $ places, I need to evaluate $ x $, $ y $ to $ m $ places, if want to evaluate $ x – y $ to $ n $ places, I need to evaluate $ x $, $ y $ to $ m $ places…

What book should I be reading?

Solutions Collecting From Web of "Precision of operations on approximations"

Let us look at your specific example, because it is very interesting. We want to evaluate
$$\frac{\sqrt{(a-b)+c}-\sqrt{c}}{a-b},$$
where $a$, $b$, and $c$ are integers. There can be serious loss of precision if $c$ is huge and $a$ and $b$ are very close to each other.

However, there is a straightforward workaround. Suppose that $c$ is very large and $|a-b|$ is small, the kind of situation that can lead to catastrophic loss of precision.

Imagine multiplying “top” and “bottom” by $\sqrt{(a-b)+c} +\sqrt{c}.$
After a small amount of algebra, we obtain
$$\frac{1}{\sqrt{(a-b)+c}+\sqrt{c}}.$$
Our new expression no longer involves subtracting nearly equal very large numbers. There is no longer any substantial loss of precision issue.

Comment: The expression you mentioned is very close to the kind of expression we obtain when solving the quadratic equation $ax^2+bx+c=0$. (Of course $a$, $b$, and $c$ no longer have the same meaning as in your example.)
The familiar Quadratic Formula
$$\frac{-b\pm\sqrt{b^2-4ac}}{2a}$$
can give numerical evaluation issues when $|4ac|$ is very small in comparison with $|b|$.

Multiply “top” and “bottom” by $-b\mp\sqrt{b^2-4ac}$. After the smoke clears, we obtain
$$\frac{2c}{-b \mp \sqrt{b^2-4ac}}.$$

So we have a new formula for solving the quadratic equation. This formula is sometimes called the Citardauq Formula.

Suppose that our quadratic equation has two real roots, and $|4ac|$ very small compared to $b^2$. If we want to find the larger root, use the Quadratic Formula. For the smaller root, we get less, sometimes much less, loss of precision by using the Citardauq Formula.

In general, when we are planning a computation, it is very important to set things up so that the standard precision issues do not arise. Numerical differentiation is particularly problematic. And even something as simple as row reduction can give problems if we use blindly the process taught in first Linear Algebra courses.

$x\times y$, $x/y$, and $\sqrt{x}$ all have $n$ significant places. $x+y$ and $x-y$ can have up to $n$ significant places, but depending on cancellation, one of them might have fewer. For example, suppose we know both $\pi$ and $22/7$ to $6$ significant places. We only know $22/7-\pi$ to $3$ significant places: $3.14286-3.14159=0.00127$. However, we know $22/7+\pi$ to $6$ significant places:
$3.14286+3.14159=6.28445$

$ \left[ a , b \right] + \left[ c , d \right] = \left[ a + c , b + d \right] $

$ \left[ a , b \right] – \left[ c , d \right] = \left[ a – d , b – c \right] $

$ \left[ a , b \right] \times \left[ c , d \right] = \left[ \min \left( a \times c , a \times d , b \times c , b \times d \right) , \max \left( a \times c , a \times d , b \times c , b \times d \right) \right] $

If $ 0 \notin \left[ c , d \right] $, then $ \left[ a , b \right] / \left[ c , d \right] = \left[ \min \left( a / c , a / d , b / c , b / d \right) , \max \left( a / c , a / d , b / c , b / d \right) \right] $

If $ a \geq 0 $, then $ \sqrt { \left[ a , b \right] } = \left[ \sqrt a , \sqrt b \right] $

$ a \approx a ^ { \prime } $ to $ n $ decimal significant places after the period if and only if $ a \in \left[ a ^ { \prime } – 5 \times 10 ^ { – \left( n + 1 \right) }, a ^ { \prime } + 5 \times 10 ^ { – \left( n + 1 \right) } \right) $, assuming I’m rounding half up.

If $ a \approx a ^ { \prime } $, $ b \approx b ^ { \prime } $ to $ n + 1 $ decimal significant places after the period, then $ a + b \approx a ^ { \prime } + b ^ { \prime } $, $ a – b \approx a ^ { \prime } – b ^ { \prime } $ to $ n $ decimal significant places after the period because $ a + b $ $ \in \left[ a ^ { \prime } – 5 \times 10 ^ { – \left( \left( n + 1 \right) + 1 \right) } + b ^ { \prime } – 5 \times 10 ^ { – \left( \left( n + 1 \right) + 1 \right) } , a ^ { \prime } + 5 \times 10 ^ { – \left( \left( n + 1 \right) + 1 \right) } + b ^ { \prime } + 5 \times 10 ^ { – \left( \left( n + 1 \right) + 1 \right) } \right) $ $ = \left[ a ^ { \prime } + b ^ { \prime } – 10 ^ { – \left( n + 1 \right) } , a ^ { \prime } + b ^ { \prime } + 10 ^ { – \left( n + 1 \right) } \right) $ $ \subset \left[ a ^ { \prime } + b ^ { \prime } – 5 \times 10 ^ { – \left( n + 1 \right) }, a ^ { \prime } + b ^ { \prime } + 5 \times 10 ^ { – \left( n + 1 \right) } \right) $, $ a – b $ $ \in \left[ a ^ { \prime } – 5 \times 10 ^ { – \left( \left( n + 1 \right) + 1 \right) } – \left( b ^ { \prime } + 5 \times 10 ^ { – \left( \left( n + 1 \right) + 1 \right) } \right) , a ^ { \prime } + 5 \times 10 ^ { – \left( \left( n + 1 \right) + 1 \right) } – \left( b ^ { \prime } – 5 \times 10 ^ { – \left( \left( n + 1 \right) + 1 \right) } \right) \right) $ $ = \left[ a ^ { \prime } – b ^ { \prime } – 10 ^ { – \left( n + 1 \right) } , a ^ { \prime } – b ^ { \prime } + 10 ^ { – \left( n + 1 \right) } \right) $ $ \subset \left[ a ^ { \prime } – b ^ { \prime } – 5 \times 10 ^ { – \left( n + 1 \right) }, a ^ { \prime } – b ^ { \prime } + 5 \times 10 ^ { – \left( n + 1 \right) } \right) $. So, to calculate $ a + b $ to $ n $ decimal significant places after the period, I need to first calculate $ a $, $ b $ to $ n + 1 $ decimal significant places after the period.

I’m still working on multiplication, division, root. My brain is frying…