Intereting Posts

The number of real roots of $1+x/1!+x^2/2!+x^3/3! + \cdots + x^6/6! =0$
$\mathcal{L}$ is very ample, $\mathcal{U}$ is generated by global sections $\Rightarrow$ $\mathcal{L} \otimes \mathcal{U}$ is very ample
What might the (normalized) pair correlation function of prime numbers look like?
Preparing For University and Advanced Mathematics
Equidecomposability of a Cube into 6 Trirectangular Tetrahedra
Can we have a matrix whose elements are other matrices as well as other things similar to sets?
History of Commutative Algebra
prove: A finitely generated abelian group can not be isomorphic to a proper quotient group of itself.
Find the value of $\lim_{n\to \infty}\sum_{k=0}^n\frac{x^{2^k}}{1-x^{2^{k+1}}}$.
Evaluating a limit involving binomial coefficients.
For $n \geq 2$, show that $n \nmid 2^{n}-1$
Evaluation of $ \int_{0}^{1}\left(\sqrt{1-x^7}-\sqrt{1-x^4}\right)dx$
Zero sections of any smooth vector bundle is smooth?
Integral $\int_0^{\pi/2} \frac{\sin^3 x\log \sin x}{\sqrt{1+\sin^2 x}}dx=\frac{\ln 2 -1}{4}$
Floyd's algorithm for the shortest paths…challenging

TL;DR: Given 4 points on a two dimentional plane, representing a reclangle seen from an unknown perspective, can we deduce the width / height ratio of the rectangle ?

Details:

From a picture, and some opencv work (canny, hough lines, bucketing to tell appart “lines” and “columns”, choosing interesting lines, math to deduce lines intersections), I get this:

- How to transform a set of 3D vectors into a 2D plane, from a view point of another 3D vector?
- What is (fundamentally) a coordinate system ?
- Finding the parameters of an ellipsoid given its quadratic form
- What is the equation for a 3D line?
- Jacobian of exponential mapping in SO3/SE3
- Calculating new vector positions

lines http://x.mdk.fr/lines.png

From this step, it’s easy to warp it to a “from the top” view, using opencv `getPerspectiveTransform`

and `wrapPerspective`

to “remove” the perspective, being on the top of the rectangle.

My goal now is to keep the aspect ratio of it, as I loose it while doing my actual warping, because I don’t know the ratio it should have.

For this I have to give to `getPerspectiveTransform`

the 4 destination points where I want my 4 found red points to be after warping, not just 4 random points like `(0, 0), (0, 100), (100, 100), (100, 0)`

leading to a deformation if my 4 red points are not a square.

So is there a known way to compute the width/height ratio, or even better the size, of this “seen thrue a perspective rectangle” ?

For the record and the curious, work-in-progress is here: https://github.com/JulienPalard/grid-finder

- How to find the vector equation of a plane given the scalar equation?
- Equilateral triangle geometric problem
- Where does the “Visual Multiplication” technique originate from?
- Is there a geometrical definition of a tangent line?
- What is the reflection across a parabola?
- Affine plane of order 4?
- Intersection of two moving objects
- Are rotations of $(0,1)$ by $n \arccos(\frac{1}{3})$ dense in the unit circle?
- What is the angle $<(BDE,ADH)$?
- Numbers of circles around a circle

Dropbox has an extensive article on their tech blog where they describe how they solved the problem for their scanner app.

https://blogs.dropbox.com/tech/2016/08/fast-document-rectification-and-enhancement/

Rectifying a Document

We assume that the input document is rectangular in the physical world, but if it is not exactly facing the camera, the resulting corners in the image will be a general convex quadrilateral. So to satisfy our first goal, we must undo the geometric transform applied by the capture process. This transformation depends on the viewpoint of the camera relative to the document (these are the so-called extrinsic parameters), in addition to things like the focal length of the camera (the intrinsic parameters). Here’s a diagram of the capture scenario:

In order to undo the geometric transform, we must first determine the said parameters. If we assume a nicely symmetric camera (no astigmatism, no skew, et cetera), the unknowns in this model are:

- the 3D location of the camera relative to the document (3 degrees of freedom),
- the 3D orientation of the camera relative to the document (3 degrees of freedom),
- the dimensions of the document (2 degrees of freedom), and
- the focal length of the camera (1 degree of freedom).
On the flip side, the x- and y-coordinates of the four detected document corners gives us effectively eight constraints. While there are seemingly more unknowns (9) than constraints (8), the unknowns are not entirely free variables—one could imagine scaling the document physically and placing it further from the camera, to obtain an identical photo. This relation places an additional constraint, so we have a fully constrained system to be solved. (The actual system of equations we solve involves a few other considerations; the relevant Wikipedia article gives a good summary: https://en.wikipedia.org/wiki/Camera_resectioning)

Once the parameters have been recovered, we can undo the geometric transform applied by the capture process to obtain a nice rectangular image. However, this is potentially a time-consuming process: one would look up, for each output pixel, the value of the corresponding input pixel in the source image. Of course, GPUs are specifically designed for tasks like this: rendering a texture in a virtual space. There exists a view transform—which happens to be the inverse of the camera transform we just solved for!—with which one can render the full input image and obtain the rectified document. (An easy way to see this is to note that once you have the full input image on the screen of your phone, you can tilt and translate the phone such that the projection of the document region on the screen appears rectilinear to you.)

Lastly, recall that there was an ambiguity with respect to scale: we can’t tell whether the document was a letter-sized paper (8.5” x 11”) or a poster board (17” x 22”), for instance. What should the dimensions of the output image be? To resolve this ambiguity, we count the number of pixels within the quadrilateral in the input image, and set the output resolution as to match this pixel count. The idea is that we don’t want to upsample or downsample the image too much.

Yes, here’s a pen and pencil method:

Find the points $P,Q$ where “parallel” sides interset. The line through $P,Q$ is the “horizon” of the plane containing the rect. Find $R$ such that $\angle QRP=90^\circ$ and $RP=RQ$. Then the parallel to $PQ$ through $R$ intersects your pairs of “parallels” $AB,CD$ resp. $BC,AD$ in points with distance proportional to the rectangle side lengths.

- Why does $z^{-1}$ not have an anti derivative?
- Using exchange argument in proving greedy algorithm
- Proving rigorously a map preserves orientation
- Is Collatz' conjecture the only stable solution of its type?
- How to $\int e^{-x^2} dx$
- Exact value of $\sum_{n=1}^\infty \frac{1}{n(n+k)(n+l)}$ for $k \in \Bbb{N}-\{0\}$ and $l \in \Bbb{N}-\{0,k\}$
- Find all Laurent series of the form…
- Hypergeometric function integral representation
- Complement of closed dense set
- Inverse Trigonometric Integrals
- Proof that $|\sqrt{x}-\sqrt{y}| \leq \sqrt{|x-y|},\quad x,y \geq 0$
- Is there a general formula for the derivative of $\exp(A(x))$ when $A(x)$ is a matrix?
- Sum of the series $\binom{n}{0}-\binom{n-1}{1}+\binom{n-2}{2}-\binom{n-3}{3}+…$
- Dual of a dual cone
- Can Path Connectedness be Defined without Using the Unit Interval?