Regular Grammars

Context-Free Grammars are just one kind of grammar. In this section we briefly consider an even more restricted kind of grammar, known as a Regular Grammar.

As one would expect, the language of a Regular Grammar is a Regular Language. In fact, any regular language has a regular grammar, so now we have a four ways to show a language is regular: find an NFA, a DFA, a regular expression, or a regular grammar.

Definition

A Regular Grammar is a context-free grammar in which all rules have one of the following forms:

Production Shape(In English)
XaYX\to aYNonterminal goes to terminal + nonterminal
XYX\to YNonterminal goes to nonterminal
XaX\to aNonterminal goes to terminal
XϵX\to \epsilonNonterminal goes to empty string

Example

The grammar

S0S  1AA1A  ϵ\begin{array}{lcl} S & \to & 0S\ |\ 1A\\ A & \to & 1A\ |\ \epsilon\\ \end{array}

is a regular grammar (as well as being a context-free grammar).

Every Regular Language has a Regular Grammar

Suppose we have a regular language LL. There must be an NFA for LL; it is fairly straightforward to turn an NFA recognizing LL into a regular grammar that can produce any string in LL.

The trick is to have one nonterminal for each state in the NFA. Each nonterminal will produce all the strings in the language of the corresponding NFA state (i.e., the strings that would be accepted if we started at that NFA state instead of at the start state.) The definitions of these nonterminals can then be read off from the transitions of the NFA.

Example

Consider the following state machine:

3-state DFA

If we look at state XX, we see that the strings that would be accepted starting at state XX either:

  • start with an aa and have a remainder that can be accepted starting from state BB, or else
  • start with bb and continue a remainder that can be accepted starting from state XX.

Similarly, the strings we can accept starting from state BB either has an aa or bb followed by a string that can accept starting from state XX, or else is the empty string (which would accept since BB is an accepting state).

In this fashion, we can construct a (regular!) grammar for the languages of all the states:

AaX  bAXaB  bXBaX  bX  ϵ\begin{array}{lcl} A & \to & aX\ |\ bA\\ X & \to & aB\ |\ bX\\ B & \to & aX\ |\ bX\ |\ \epsilon\\ \end{array}

and since the language of a state machine is (by definition) the language of its start state, this grammar (where AA is the start symbol) produces exactly the strings in the language of this state machine.

One can also go the other direction; any regular grammar can be directly translated into an NFA with one state per nonterminal:

Example

The regular grammar

S0S  1AA1A  ϵ\begin{array}{lcl} S & \to & 0S\ |\ 1A\\ A & \to & 1A\ |\ \epsilon\\ \end{array}

can be translated into the following NFA:

NFA from Regular Grammar

Note that the state corresponding to start symbol SS is the start state, and state(s) that can produce the empty string are accepting states.

Be Careful with Terminology

  • A context-free grammar is a grammar whose rules have a nonterminal on the left-hand-side.
  • A context-free language is a language (set of strings) that can be described with a CFG (or PDA)
  • A regular grammar is a grammar whose rules are particularly simple (all having one of the four allowed forms).
  • A regular language is a language (set of strings) that can be described with a regular grammar (or NFA, DFA, regex).

Every regular grammar is also a context-free grammar, so every regular language is also a context-free language.

Similarly, a grammar can be context-free but not regular, and a language can be context-free but not regular.

What is slightly less obvious is that a non-regular grammar can describe a regular language!

Example

The grammar

SSS  1\begin{array}{lcl} S & \to & SS\ |\ 1\\ \end{array}

is not a regular grammar (the rule SSSS\to SS is not one of the allowable forms for a regular grammar), but the language of this grammar is { 1n  n1  }\{\ 1^n\ |\ n\geq 1\ \ \}, which is easily seen to be regular.

How is this possible? Well, a language is regular iff it can be described by at least one regular grammar, e.g.,

S1S  1\begin{array}{lcl} S & \to & 1S\ |\ 1\\ \end{array}

The definition of regular language doesn’t say anything about whether there might be non-regular (“unnecessarily complex”) grammars for the language too.