Non-Context-Free Languages

The Context-Free Languages include the Regular languages and more, but not all languages are Context-Free.

For example, { anbn  n0  }\{\ a^nb^n\ |\ n\ge 0\ \ \} is Context-Free (but not Regular), but { anbncn  n0 }\{\ a^nb^nc^n\ |\ n\ge 0\ \} is neither Regular nor Context-Free.

Similarly, the language { wwR   w{0,1} }\{\ ww^R\ |\ \ w\in\{0,1\}^{*}\ \} of all even-length binary palindromes is Context-Free (but not Regular), but { ww  w{0,1}  }\{\ ww\ |\ w\in\{0,1\}^{*}\ \ \} is neither Regular nor Context-Free.

This makes a certain intuitive sense; anbna^nb^n is context free because a PDA can use the stack to store the aa’s, and then match them up with the bb‘s. But anbncna^nb^nc^n is not context-free because once we’ve matched the aa’s with the bb’s, the stack is empty and we don’t know how to check the number of cc‘s.

But that’s not a proof; how do we know there’s no cleverer way to store aa’s on the stack that lets us match the number of bb’s and cc’s? To get a proof, we need to find a property shared by all context-free languages, and then show that the anbncna^nb^nc^n lacks that property.

CF-Pumping

There is no nice analogue of “Distinguishable Strings” for context-free languages. but context-free languages still have a property similar to (but slightly different from) the pumping property of regular languages, and so there is a pumping lemma for context-free languages.

Definition

Given a language LL and a “pumping length” p1p\geq 1,
we say a string sLs\in L (with sp|s| \geq p) can be CF-pumped if

  1. s=uvxyzs = uvxyz
  2. vyϵvy \not= \epsilon and vxyp|vxy| \leq p
  3. uvixyizLuv^ixy^iz \in L for every i0i\geq 0

Example

If L:={ anbn  n0 }L := \{\ a^nb^n\ |\ n\geq 0\ \} and p:=4p := 4,
then   s:=aaabbb  ~~s:= \mathrm{aaabbb~~} can be CF-pumped.

There are many ways to show this, including

  • Set u=au=\mathrm{a}, v=av=\mathrm{a}, x=abx = \mathrm{ab}, y:=by:=\mathrm{b}, and z=bz=\mathrm{b}
  • Set u=au=\mathrm{a}, v=aav=\mathrm{aa}, x=ϵx = \epsilon, y:=bby:=\mathrm{bb}, and z=bz=\mathrm{b}
  • Set u=aau=\mathrm{aa}, v=av=\mathrm{a}, x=bbx = \mathrm{bb}, y:=by:=\mathrm{b}, and z=ϵz=\epsilon

In all cases vxy4=p|vxy|\leq 4 = p and uvixyizLuv^ixy^iz\in L for every i0i\geq 0.

The Pumping Lemma for Context-Free Languages

The utility of this definition comes from the following fact:

Lemma (Pumping Lemma for CFLs)

If LL is a context-free language, then there exists p1p\geq 1 such that every string sLs\in L with sp|s|\geq p can be CF-pumped.

More importantly the contrapositive holds; if for every pp\geq there is a string sLs\in L with sp|s|\geq p that cannot be CF-pumped, then LL is not context-free.

Example

The language L:={ anbncn  n0 }L := \{\ a^nb^nc^n\ |\ n \geq 0\ \} is not context-free.

Proof.
Let p1p \geq 1 be given.
Put s:=apbpcps := a^pb^pc^p. Note that sLs\in L and sp|s|\geq p.
To show: ss is not CF-pumpable.
Let s:=uvxyzs := uvxyz be any partition of ss into five parts with vyϵvy\not=\epsilon and vxyp|vxy|\leq p.\

There are five cases for vxyvxy:

  1. vxyvxy contains all aa’s;
  2. vxyvxy contains all bb’s;
  3. vxyvxy contains all cc’s;
  4. vxyvxy contains aa’s and bb’s;
  5. vxyvxy contains bb’s and cc’s;

vxyvxy cannot contain aa’s and bb’s and cc’s, because vxyp|vxy|\leq p and the aa’s and cc’s have pp bb’s in-between them. In all five cases, it is clear that uv2xy2zuv^2xy^2z will increase the count of one or two letters (whatever letters occur in vxyvxy) but not all three, and so uv2xy2z∉Luv^2xy^2z\not\in L.\

Thus no partition works to pump ss.
By the Pumping Lemma for Context-Free Languages, LL is not regular.

Example

The language L:={ ww   w{0,1}}L := \{\ ww\ |\ \ w\in\{0,1\}^{*} \} is not context-free.

Proof.
Let p1p \geq 1 be given.
Put s:=0p1p0p1ps := 0^p1^p0^p1^p. Note that sLs\in L and sp|s|\geq p.
To show: ss is not CF-pumpable.
Let s:=uvxyzs := uvxyz be any partition of ss into five parts with vyϵvy\not=\epsilon and vxyp|vxy|\leq p.\

There are three cases for vxyvxy:

  1. vxyvxy is entirely in the first half of ss; then uv2xy2z∉Luv^2xy^2z\not\in L because we have changed the first half but not the second half of the string, and uv2xy2zuv^2xy^2z does not have the form wwww;
  2. vxyvxy is entirely in the middle (1p0p1^p0^p) part of ss; then uv2xy2z∉Luv^2xy^2z\not\in L because we have changed the number of 1’s in the first half and/or the number of 0’s in the second half, and uv2xy2zuv^2xy^2z does not have the form wwww;
  3. vxyvxy is entirely in the second half of ss; then uv2xy2z∉Luv^2xy^2z\not\in L because we have changed the second half but not the first half of the string, and uv2xy2zuv^2xy^2z does not have the form wwww.

The fact that vxyp|vxy|\leq p rules out any other possibility.

Thus no partition works to pump ss.
By the Pumping Lemma for Context-Free Languages, LL is not regular.

Extra: Justifying the Pumping Lemma for CFLs

We don’t need to know the proof of the Pumping Lemma for CFLsin order to use it effectively. But it is not hard to see why it is true, once you know the trick.

Lemma (Pumping Lemma for CFLs)

If LL is a context-free language, then there exists p1p\geq 1 such that every string sLs\in L with sp|s|\geq p can be CF-pumped.

Proof.
Suppose LL is a context-free language with grammar G=(Σ,V,S,R)G=(\Sigma,{\cal V},S,{\cal R}).

Then there is a value pp (depending on the number of nonterminals V|{\cal V}| in the grammar and the maximum “branchiness” of parse trees in this grammar) such that every string ss with sp|s|\geq p is so long, that its parse tree is so tall, that there is a path with at least V+1|{\cal V+1}| nonterminals.

By the Pigeonhole Principle, this means that given an arbitrary sp|s|\geq p, a parse tree for ss has a path where some nonterminal RR appears more than once:

Parse tree for uvxyz; a path contains R twice

So there is a big legal subtree with RR at the root, and a small legal subtree with RR at the root.

When a parse tree contains a node RR, any valid subtree (anything the grammar allows us to derive from RR) can appear below it. It follows (by “tree surgery”) that infinitely many other legal parse trees must exist, including:

Parse trees for uxy and uvvxyyz

Since these are all valid parse trees, the corresponding strings (of the form uvixyizuv^ixy^iz) must also be in the language LL.