How complicated is the set of tautologies?

Consider the set $\mathcal T$ of all tautologies in the propositional calculus in which the only operators allowed are $\to$ and $\neg$, and involving only the two variables $x$ and $y$.

How complicated is $\mathcal T$? Is it a context-free language?

Does the answer depend on the fixed number of variables? I guess not… If one removes the limitation of the number of variables (and somehow adapts the notions of context-freeness to play nice with this…), does the answer change? I guess it does…

This question is a natural outcome of Doug’s recent question here.

Solutions Collecting From Web of "How complicated is the set of tautologies?"

For a fixed number of variables, it’s context free, regardless of the exact connectives used. To show this, we will construct a context-free grammar for the set of tautologies.

Let the number of variables be $n$. For each $f\colon \{0,1\}^n \rightarrow \{0,1\}$, there corresponds a non-terminal $S_f$ generating all formulas with truth table $f$. The starting symbol is $S_1$, where $1$ is the constant $1$ function.

For your connectives, here is the construction. For each projection function $p_i$, we add the production $$S_{p_i} \rightarrow x_i$$ ($x_i$ is a variable). For each $f$, we add the production $$S_f \rightarrow \lnot S_{1-f}.$$ For each $f_1,f_2$, we add the production $$S_{1-f_1(1-f_2)} \rightarrow S_{f_1} \Rightarrow S_{f_2}.$$

Now consider the case of an infinite number of variables. If there were a CFG for the set of all tautologies (with some fancy encoding), then you could decide whether a particular formula is a tautology in polynomial-time, so P=NP.

More to the point, suppose that the encoding we wish to use is strings in some appropriate (finite alphabet). If the language of all tautologies were context-free, then so would $\{ x_i \Rightarrow x_i : i \in \mathbb{N} \}$ (intersect with an appropriate regular language). It follows that the language of squares in the digits is context-free, which is false.

Yes, it’s context-free. A formula is a tree, which can be traversed with a stack. As we traverse, we are evaluating four things:

  • x = 0, y = 0;
  • x = 0, y = 1;
  • x = 1, y = 0;
  • x = 1, y = 1.

In effect, we are evaluating four “substantiated formulas” in parallel. A substantiated formula is one with variables replaced by 0 or 1.

Let’s get more precise and define the transition table. We assume that the formula is in pre-order, that is, operator before operands.

  • The initial state:
    • If the current symbol is “$\lnot$”, push onto the stack the symbol “($\lnot$, operator, ?, ?, ?, ?)” and go to the initial state.
    • If the current symbol is “$\to$”, push onto the stack the symbol “($\to$, operator, ?, ?, ?, ?)” and go to the initial state.
    • If the current symbol is “x”, go to the (0, 0, 1, 1) state.
    • If the current symbol is “y”, go to the (0, 1, 0, 1) state.
  • The (a, b, c, d) state:
    • If the top symbol of the stack is “($\lnot$, operator, ?, ?, ?, ?)”, pop the stack and go to the ($\lnot$a, $\lnot$b, $\lnot$c, $\lnot$d) state.
    • If the top symbol of the stack is “($\to$, operator, ?, ?, ?, ?)”, pop the stack, push “($\to$, first operand, a, b, c, d)”, go to the initial state.
    • If the top symbol of the stack is “($\to$, first operand, e, f, g, h)”, pop the stack, go to the (e $\to$ a, f $\to$ b, g $\to$ c, h $\to$ d) state.

The (a, b, c, d) state is a template. There are actually 16 such states: the (0, 0, 0, 0) state, the (0, 0, 0, 1) state, the (0, 0, 1, 0) state, the (0, 0, 1, 1) state and so on.

Similarly, the third rule in the generic (a, b, c, d) state is a generic rule. There are 16 such rules, with (e, f, g, h) = (0, 0, 0, 0), (0, 0, 0, 1), and so forth.

The (1, 1, 1, 1) state has one extra rule:

  • If the stack is empty, go to the accepting state.


OK, this program has a weak point as discussed in the second comment. Now let’s define a perfect version.

Assume two things: 1) The formula is in pre-order, and 2) there is a pair of parenthesis around each sub-formula.

Now, the transition table:

  • start:
    • if read = (, go to operator.
  • operator:
    • if read = ~, push (~, ?, ?, ?, ?), go to start.
    • if read = ->, push (->, ?, ?, ?, ?), go to start.
    • if read = x, push (ret, 0, 0, 1, 1), go to eval.
    • if read = y, push (ret, 0, 1, 0, 1), go to eval.
  • eval:
    • if read = ), let the top element be (ret, a, b, c, d), pop, go to (a, b, c, d).
  • (a, b, c, d):
    • if read = ) and top = (~, ?, ?, ?, ?), pop, go to (~a, ~b, ~c, ~d).
    • if read = ) and top = (->, e, f, g, h), pop, go to (e -> a, f -> b, g -> c, h -> d).
    • if read = ( and top = (->, ?, ?, ?, ?), pop, push (->, a, b, c, d), go to operator.
  • (1, 1, 1, 1) has one extra rule:
    • if the stack is empty, accept.

Generalization of Yuval’s answer:

For any finite set $A$ and operations $o_i \colon A^2 \to A$ the language of expressions using elements of $A$ and infix operators $o_i$ that evaluate to fixed $a \in A$ is context-free. Here $A = \{0,1\}^n \to \{0,1\}$.

If you regard the expressions as trees not words, it will be a regular tree language (recognizable with finite tree automaton).

I remember an exercise in automata theory: the set of Boolean formulas (as words) using 0,1,$\vee$,$\wedge$,$\neg$,(,) evaluating to 1 is regular, provided that arbitrary answer is allowed when input has mismatched brackets.

As a consequence of this there should exist a regular language $L$ such that the set of tautologies is $L \cap R$ where $R$ is the context-free language of all well-formed formulas (tautologies or not).