# Normalizing a matrix with row and column swapping

How do you canonicalize a matrix over column- and row-swap operations?

Or more specifically, does there exist a function f(M) such that f(M)=f(N) iff there is set of column- and row-swap operations (i.e. a permutation matrix A) on M that would transform M into N (i.e. AMA’)?

@Coffeemath and I are discussing a few ways to isolate the problem:

Delete duplicate rows or columns
| 1  3  2  2  2 |      | 1  3  2  2 |
| 1  4  2  2  2 |  =>  | 1  4  2  2 |
| 0  1  0  1  1 |      | 0  1  0  1 |
| 0  1  1  0  0 |      | 0  1  1  0 |

Sorting rows based on lexicographical order of their multisets
| 1  3  2  2 |      |[0  1  0  1]|
| 1  4  2  2 |  =>  |[0  1  1  0]|
| 0  1  0  1 |      | 1  3  2  2 |
| 0  1  1  0 |      | 1  4  2  2 |
but... the remaining top two rows' order is undefined by this step

Sorting cols based on lexicographical order of their multisets
| 0  1  0  1 |      | 0 [0  1] 1 |
| 0  1  1  0 |  =>  | 0 [1  0] 1 |
| 1  3  2  2 |      | 1 [2  2] 3 |
| 1  4  2  2 |      | 1 [2  2] 4 |
but... the remaining mid two cols' order is undefined by this step

Recurse for any sections not defined:
| 0  0  1  1 |    | .  0  1  . |
| 0  1  0  1 | => | .  1  0  . |
| 1  2  2  3 |    | .  .  .  . |
| 1  2  2  4 |    | .  .  .  . |
but... the remaining mid two cols' order is undefined by this step

Recurse for any sections not defined:
| 0  0  1  1 |    | .  0  1  . |
| 0  1  0  1 | => | .  1  0  . |
| 1  2  2  3 |    | .  .  .  . |
| 1  2  2  4 |    | .  .  .  . |
but... the remaining mid two cols' order is undefined by this step

But how to handle this?
123456789
215347698 => This can be sorted into a specific pattern
341268957    But considering M(1,1) could be the "1" from
432179865    any of the nine rows, this becomes messy.
568912374
657891243    Note: not every matrix with rows & cols having
789524136    identical multisets and no duplicated
896735412    rows or columns must be a square matrix.
974683521


#### Solutions Collecting From Web of "Normalizing a matrix with row and column swapping"

The following is only an idea, which maybe can be made into some kind of algorithm, and may need some tweaking…

Let $M$ be an $m \times n$ matrix. Each row of $M$ may be ordered as a multiset, and one may choose the lexicographically least of these and require that the first row of $f(M)$ should be that multiset (in order). Of course there may be several rows of $M$ whose multisets give this same “lex-least” multiset. One then looks at each such row and asks what the rows below it would look like if that row were placed on top, and column swaps were done in any way preserving the multiset.

Suppose e.g. two rows had lex-least multisets $[1,1,1,2,2,3].$ In $M$ itself these rows might appear as respectively $a=[1,1,2,1,2,3]$ and $b=[2,2,3,1,1,1].$ In either case we can initially do a row swap to put that row on top and then a column swap so the first row is now $[1,1,1,2,2,3].$ The appearance of rows beyond the first will likely be different depending on whether $a$ or $b$ was initially put on top, and we need do decide which to do.

Now note that irrespective to which of $a$ or $b$ was initially placed on top, the first three columns have the same number $1$ on top, and so these first three columns may be permuted among themselves without disturbing the first row $[1,1,1,2,2,3]$. Similarly one may permute the two rows underneath the two $2$’s. Then we can try each of rows $a$ and $b$, with all ways of permuting the columns not disturbing the pattern $[1,1,1,2,2,3]$ of the first row, and see for which one the rest of the matrix is lexicographically least, as a “tie-breaker” on whether to select row $a$ or row $b$ to swap initially to the top row.

What needs work is to choose in which sense the matrix from rows 2 on should be lexicographically ordered. I think it should be based on the fact that row swaps only should be done once row 1 is fixed by the initial single row swap and the column swaps to put it into order. On the other hand maybe one needs something more elaborate such as calling a part of the algorithm recursively.

Added: More detail is needed to see which of the initial rows having the minimal lexicographic multiset one should move to the top via one row swap and a choice of column swaps to put the chosen row 1 into lex order. Suppose the rows of $M$ having the min lex order are put into a collection $R$ of rows. For each $r$ in $R$ we do a subcalculation like the following: We form a copy of $M$ with row $r$ placed at the top via a row switch. We do some column swaps so now row 1 is in nondecreasing order.

Now any columns which are under a block of identical elements of row 1 may be permuted among themselves, still keeping row 1 as it is. Call these “allowable column swaps”. Then for each row beyond the first, determine which of the allowable column swaps would put that row in least lex order. Choose the least among these now lex-ordered rows, and call that the “score” of the chosen $r$. We do the above for each candidate $r \in R,$ and see which $r$ gives us the minimal “score”. This tells us which $r$ in $R$ we should switch to the top, and also which column swaps to carry out to put the now chosen row 1 into lex order.

I think now that we have row 1 chosen and in order, the same kind of procedure can be used to get row 2, only now we have restricted all column swaps to those keeping row 1 in its (fixed by now) lex order. It may not be the case that say row 2 is in lex order, only that each of its substrings under a constant block from row 1 is in lex order.