6.4 Regular Languages

A Regular Language is a formal language (set of finite strings) that can be built from very simple sets with a small number of allowable operations. Regular languages are interesting because we will show that the language of any DFA (or NFA) is regular, and that every regular language is the language of some DFA (and some NFA).

Definition

The collection of all regular (formal) languages is defined as follows: > >- \(\emptyset\) is a regular language. >- \(\{\epsilon\}\) is a regular language. >- \(\{a\}\) is a regular language (for any symbol \(a\in\Sigma\)). >- If \(L_1\) and \(L_2\) are regular languages, then so is \(L_1L_2\). >- If \(L_1\) and \(L_2\) are regular languages, then so is \(L_1\cup L_2\). >- If \(L\) is a regular language, then so is \(L^*\).

Example

Here are some examples of regular languages, assuming \(\Sigma = \{0,1\}\): > >1. \(\{01\}\) > \(\{0\}\{1\}\) >2. \(\{000,11\}\) > \(\{0\}\{0\}\{0\}\cup > \{1\}\{1\}\) >3. \(\{\epsilon, 01, 0101, 010101, \ldots \}\)
> \((\{0\}\{1\})^{*}\) >4. \(\{11,110,1100,11000, \ldots \}\)
> \(\{1\}\{1\}\{0\}^{*}\) >5. \(\{\epsilon, 0,1,00,01,10,11,000,001,\ldots\}\)
> \((\{0\}\cup\{1\})^{*}\)

Regular Expressions

Every regular language has a finite description (in terms of union, concatenation, and star) of singleton sets. However, writing all the curly braces rapidly grows tedious. Therefore, we will use a shorthand for describing a regular set, called a regular expression.

Definition

Regular expressions are formulas, define as follows: > >- \(\emptyset\) is a regex >- \(\epsilon\) is a regex >- \(a\) is a regex for any \(a \in\Sigma\). >- If \(R_1\) and \(R_2\) are regexes, then so is \((R_1R_2)\) and \((R_1|R_2)\). >- If \(R\) is a regex, then so is \((R^*)\). > >As we did with logical formulas, we will almost always write regular expressions with a more relaxed approach to parentheses than the official definition suggests. In terms of precedence, star binds more tightly than concatenation, which binds more tightly than the vertical bar, so, e.g., \(ab^*|c^*\) means \((a(b^*))\ |\ (c^*)\).

Example

Here are some regular expressions, assuming \(\Sigma=\{0,1\}\): > >- \(01\) >- \(000\,|\,11\) > > Concatenation binds more tightly than choice, so this means \((000)\ | (11)\) > >- \((01)^*\) >- \(110^*\) > > Star binds more tightly than concatenation, so this means \(11(0^*)\). >

Example

In terms of where the implicit parentheses are, there is a strong analogy between >- the regular expression \(a\,|\,bc^*\) >- the algebraic expression \(x + yz^2\) > >So \(a|bc^*\) means \(a\,|\,\bigl(b(c^*)\bigr)\), just as \(x+yz^2\) means \(x+(y(z^2))\).

Definition

Given a regular expression \(R\), we denote the language of \(R\) by \(L(R)\), defined as follows: > >- \(L(\emptyset) = \emptyset\) >- \(L(\epsilon) = \{\epsilon\}\) >- \(L(a) = \{a\}\) >- \(L(R_1R_2) = L(R_1) L(R_2)\) >- \(L(R_1|R_2) = L(R_1) \cup L(R_2)\) >- \(L(R^*) = L(R)^*\) > >We say that a string \(w\in \Sigma^*\) matches \(R\) if \(w\in L(R)\).

Example

The languages of the regular expressions in the previous Example are exactly the regular sets in the Example before that. That is, > >- \(L(01) = \{01\}\) >- \(L(000|11) = \{000,11\}\) > >- \(L((01)^*) = \{\epsilon, 01, 0101, 010101, \ldots \}\) >- \(L(110^*) = \{11,110,1100,11000, \ldots \}\) >- \(L((0|1)^*) = \{\epsilon,0,1,00,01,10,11,000,001,\ldots\}\)

Previous: 6.3 State Machines

Next: 6.5 Kleene’s Theorem