It is easy to see that not all languages can be regular. Given an alphabet, there are uncountably many possible formal languages (possible sets of strings), but there are countably many regular expressions and hence countably many regular languages. That is, the number of possible languages is infinitely larger than the number of regular languages, and many (and in fact most) languages are not regular.
Intuitively, any problem where an automaton has to “count arbitrarily
high” is not feasible with bounded memory, e.g., for , the
automaton has to count arbitrarily many a’s to make sure there are
the right number of b’s following.
Similarly, any problem where an automaton has to “remember arbitrarily
long parts of the input” is not feasible with bounded memory, e.g., for
being the set of palindromic strings of a’s and b’s, we must
remember the entire first half in order to match it against the second
half.
But intuition is not a proof that these languages aren’t regular. Maybe there’s some clever trick! After all, the language of strings whose length is a multiple of 3 might initially seem like we need to count the arbitrarily high number of input characters (in which case the language would not be regular), but a bit more thought shows we only need to keep track of the number of characters modulo 3, and the language is regular.
We now turn to the first of two methods for conclusively proving that a language is not regular, and no DFA for the langauge can possibly exist.
Given an arbitrary language , we say that strings are distinguishable if there’s a suffix (the distinguishing suffix) such that and , or vice-versa.
When ,
a is distinguishable from aa, because
but
(or
because but
)a is distinguishable from aab, because
but
.aab isn’t distinguishable from aaabb, because
and ; and ;
and
;
and ; etc.If , i.e., sequences of 1’s whose length is a multiple of 3, we have:
1 is distinguishable from 11 (e.g., we can use the suffix 1);11 is distinguishable from 111 (e.g., we can use the suffix
);1 is distinguishable from 111 (e.g., we can use the suffix
);Let be a language, and let be a deterministic state machine for (finite or infinite).
If and take us to the same state in ), they aren’t distinguishable. (because for any suffix , and lead to the same state in this deterministic machine and so they both accept or both reject.
Conversely, if and can be distinguished by at least one suffix , then it must be that and take us to different states in the machine.
Let be a language.
If you can find a set of strings all distinguishable from each
other,
then in any deterministic machine for these strings lead from start to different
states (since otherwise they wouldn’t all be distinguishable); in this case,
no deterministic machine for can have fewer than states.
And if you can find an infinite set of strings all distinguishable from
each other,
then in any deterministic machine these strings all lead to different
states,
so no finite deterministic machine (DFA) for can exist, so is
not regular.
Note that the set of distinguishable strings can consist of arbitrary strings; they won’t necessarily be in the language , but they will typically be strings that are prefixes (initial segments) of strings in .
The set is not regular.
Proof:
Consider the infinite set of strings .
To show: for all pairs of strings in , the strings are
distinguishable.
Let and (with ) be two arbitrary strings in
.
Then
but , so they’re distinguishable.
Thus is an infinite set of pairwise-distinguishable strings, and
is not regular.
The set is not regular.
Proof:
Consider the infinite set of strings .
To show: for all pairs of strings in , the strings are
distinguishable.
Let and (with ) be two arbitrary strings in .
Then but
, so they’re
distinguishable.
Thus is an infinite set of pairwise-distinguishable strings, and
is not regular.