“There is geometry in the humming of the strings” - Pythagoras
Definition
An alphabet is a nonempty finite set, usually denoted \(\Sigma\). > >Elements of the set \(\Sigma\) are called letters or symbols or characters.
Here \(\Sigma\) is just a Greek letter being used for the name of a particular set; it has nothing to do with summation.
Example
Here are some common examples of alphabets: > >- \(\{0,1\}\) (the set of binary digits) >- \(\{\mathtt{a},\ldots,\mathtt{z}\}\) (the set of lowercase ASCII letters) >- All ASCII characters >- All Unicode characters
Definition
A string over \(\Sigma\) is a finite, but possibly empty, sequence of characters from \(\Sigma\). > If the alphabet is obvious from context, we just say string.
In Python, to write a string we need to use quotes, e.g.,
'37', or "37". This lets us tell the
difference between, e.g., the string '37' and the integer
37.
In Theory of Computability, though, we are always working with
strings, and using quotation marks all the time quickly becomes tedious.
Therefore, when studying Computability we use the convention that we
omit all the quotation marks. For example, we would write \(\mathtt{abc}\) instead of
"abc". #### Is x a symbol or a string of
length 1?
It’s true that if we’ve dropped the quotes, the notation
x is ambiguous; it could be a character from our alphabet
\(\Sigma\), or it could be a string of
length 1 containing that character.
In practice, the difference rarely matters. And in those very rare cases where it does matter, we can tell from context which interpretation was intended.
If we omit the quotes, then the empty string "" (a
sequence of zero characters from our alphabet) turns invisible, and we
can’t see whether there’s a string there at all! This is awkward, so for
the empty string we will use different notation.
We could continue to write "" as is done in many
programming languages, but the tradition in computability is to use a
greek letter. In this class, we will use the symbol \(\epsilon\) to stand for the empty
string.
(Some books use \(\lambda\) instead of \(\epsilon\) but there are better uses for \(\lambda\) in theoretical computer science, as you will see in the Programming Languages course.)
Note that \(\epsilon\) is notation for a string (a sequence of characters), rather than a character. Consequently, we can be sure that \(\epsilon\not\in\Sigma\). #### But what if my alphabet is \(\{\alpha, \beta, \gamma, \delta, \epsilon, \ldots \omega\}\) ?
In the rare case where you want to use the Greek letter epsilon as a
character in your strings, you just need to choose a different symbol
(such as "") to represent the empty string. ### Operations
on Strings
Definition
If \(w\) and \(v\) represent strings, then > >- \(wv\) is the result of concatenating \(w\) and \(v\) together. >- \(|w|\) is the length of \(w\), the number of characters. >- \(w^R\) is the reverse of \(w\), the same characters in the opposite order.
Example
Various properties of this operation immediately follow, e.g., > >- \(w\epsilon = w = \epsilon w\) >- \(|w| = |w^R|\) >- \(|wv| = |w| + |v|\) >- \((wv)^R= v^R w^R\)
Definition
A formal language (over \(\Sigma\)) is a set of strings (over \(\Sigma\)) > Formal languages can be empty, finite or infinite, but (by definition of “string”) they only contain finite strings.
Example
Since all the languages we’ll discuss in Computability Theory are formal (i.e., mathematically precise), we will often just say language to mean a set of strings.
Because languages are sets, we can union two languages (\(L\cup M\)) intersect two languages (\(L\cap M\)), or take the set-difference (\(L\setminus M\)). But since they are more specifically sets of strings, we can define additional operations.
Definition
If \(L\) and \(M\) are languages, then > >- \(LM := \{\ wv\ |\ w\in L,\ v\in M\ \ \}\)
>- \(L^1 := L\)
>\(L^2 := LL = \{\ wv\ |\ w\in L, v\in L\
\}\)
>\(L^3 := LLL = \{\ wvu\ |\ w\in L, v\in L,
u\in L\ \}\)
>In general, \(L^{n+1} := L\,L^n\)
where we define \(L^0 :=
\{\epsilon\}\). >- \(L^* := L^0 \cup
L^1 \cup L^2 \cup L^3 \cup \cdots\)
> i.e., \(L^*\) is all the strings
we can construct by picking strings from \(L\) zero or more times (but finitely many)
and concatenating them. >- \(L^+ := L^1
\cup L^2 \cup L^3 \cup \cdots\)
> i.e., \(L^+\) is all the strings
we can construct by picking strings from \(L\) one or more times (but
finitely many) and concatenating them.
Example
If \(L = \{\mathtt{ba}, \mathtt{da}\}\) and \(M = \{\mathtt{da}, \mathtt{rk}, \epsilon\}\), then > >- \(L\cup M = \{\mathtt{ba}, \mathtt{da}, \mathtt{rk}, \epsilon\}\) >- \(L\cap M = \{\mathtt{da}\}\) >- \(L\setminus M = \{\mathtt{ba}\}\) >- \(LM = \{\mathtt{bada}, \mathtt{bark}, \mathtt{ba}, \mathtt{dada}, \mathtt{dark}, \mathtt{da}\}\) >- \(L^3 = \{\mathtt{bababa}, \mathtt{babada}, \mathtt{badaba}, \ldots, \mathtt{dadaba},\mathtt{dadada}\}\) >- \(L^{*} = \{\epsilon, \mathtt{ba}, \mathtt{da}, \mathtt{baba}, \ldots, \mathtt{dada}, \mathtt{bababa}, \ldots\}\) >- \(L^+ = \{\mathtt{ba}, \mathtt{da}, \mathtt{baba}, \ldots, \mathtt{dada}, \mathtt{bababa},\ldots\}\) >- \(M^{*} = \{\epsilon, \mathtt{da}, \mathtt{rk}, \mathtt{dada}, \ldots, \mathtt{rkrk}, \mathtt{dadada}, \ldots\}\) >- \(M^+ = \{\epsilon, \mathtt{da}, \mathtt{rk}, \mathtt{dada}, \ldots, \mathtt{rkrk}, \mathtt{dadada}, \ldots\}\) > Note that \(\epsilon\in L^*\) and \(\epsilon\in M^*\) by definition of the star operator. On the other hand, \(\epsilon\not\in L^+\) (because there’s no way to construct an empty string by concatenating one or more elements of \(L\)), but \(\epsilon\in M^+\) (because we can construct an empty string by concatenating one or more elements of \(M\), as long as those elements are \(\epsilon\)).
Example
Given the definitions of the operations, the following equivalences all hold. > >- \(L\emptyset = \emptyset\) > > There is no way to concatenate something from \(L\) with something from \(\emptyset\). > >- \(L\{\epsilon\} = L\) >- \(\{\epsilon\}^{*} = \{\epsilon\}\) > > Appending zero or more empty strings only yields the empty string. > >- \(\{\epsilon\}^+ = \{\epsilon\}\) > > Appending one or more empty strings only yields the empty string. > >- \(\emptyset^{*} = \{\epsilon\}\) > >Appending zero strings from \(\emptyset\) yields an empty string by definition. > >- \(\emptyset^+ = \emptyset\) > > There are no ways to concatenate one or more strings taken from the empty set. > >- \((L\cup M)N = LN \cup MN\) >- \((L^*)^* = L^*\)
Previous: 6.1 Introduction to Computability
Next: 6.3 State Machines