-----

Strings

-----

This page summarizes operations involving strings and characters.

Characters

Variables of type char hold one character. So you declare them as follows:

char ch;       \\ declare ch
ch = 'a';      \\ store the character a in it

Normally, characters are written using single quotes, e.g. 'a' is the character a. Certain special characters, however, must be written using special "escape sequences." The commonly used ones are:

Variables of type char can also be used to store integers that fit in one byte of storage. If you store a character (e.g. 'a') in such a variable, the variable actually contains the character's ASCII code (97). Characters can also be specified via their ASCII codes in octal , e.g. 'a' can also be written as '\141'. Type "man ascii" at the Unix prompt to get a table of character codes.

Character utilities

The following functions are useful for classifying characters. They return 0 (false) or 1 (true).

isalpha(char)
char is alphabetic
isupper(char)
char is uppercase
islower(char)
char is lowercase
isdigit(char)
character is a digit
isalnum(char)
character is alphabetic or a digit
isspace(char)
character is whitespace (blank, tab, newline, return, form feed)

See "man isalpha" for more functions of this type. To use these functions, you must include the ctype header file.

#include <ctype.h>

Also defined by ctype.h are case conversion functions:

toupper(char)
change lowercase characters to uppercase, leave other inputs unchanged
tolower(char)
change uppercase characters to lowercase, leave other inputs unchanged

Declaration and allocation of strings

Strings are, from C's point of view, simply arrays of characters. Therefore, you declare and create them as follows.

char str[36];  \\ declare and allocate a fixed-length string
               \\ allocated on the stack
char str[10] = "foo";  \\ set up a 10-element string and   
                       \\ initialize it with the value "foo"
char *str;             \\ declare a string variable
str = new char[k];     \\ allocate space (from the heap) for a 
                       \\ string with k characters
delete [] str;         \\ free the string

Strings whose length is not known at compile-time, or which must be passed from a function to its caller, must be allocated on the heap. That is, you must use the second of the above methods.

Length of strings and null characters

Many string handling functions assume that the last character in a string is the null character '\0'. This character tells C where the string ends: C does not store the length of the string. Bad things will happen if you pass strings without null characters to string-handling utility functions.

Strings that do not end in a null character are typically found in contexts where your code keeps a pointer to the end of the string. For example, the string might be a stack, in which you are storing characters one-by-one as they are read from a file. Do not pass such strings to C string-handling functions without first adding a null character to the end!

For most purposes, the "length" of a string must include the null character. That is, the length is one larger than the number of real characters in the string. An exception is the function strlen (see below) which returns the number of real characters.

The string library

The usual assignment and comparison operators (=, <, >, etc) do not work on strings. Or, if they appear to work, it's not clear whether they will do "the right thing." Rather, you should use functions from the C string library. See "man strcpy" for full details. Commonly used functions include:

#include <string.h>
strcpy(char *dst, const char *src);
strlen(const char *s);
strcmp(const char *s1, const char *s2);
strcasecmp(const char *s1, const char *s2);

The function strcpy copies the string src to the string dst, including the terminating null character, stopping after the null character has been copied. It alters the contents of the first input (dst) and also returns dst. Notice that it does not check whether dst has enough room to store the contents of src: you must check before calling strcpy.

strlen takes a string as input and returns its length, not including the null character at the end of the string.

strcmp compares two strings byte-by-byte, according to the ordering of your machine's character set. It returns an integer. The sign of this value (zero, negative, positive) indicates whether the two strings are identical or whether the first string is before or after the second string. The ordering is similar to alphabetical ordering, but uses the ASCII character order. So all capital letters go before all lowercase letters.

strcasecmp is similar, but treats lowercase and uppercase letters as equivalent.

These functions assume that all input strings end in null characters.

Numbers written as strings

For various reasons, particularly when reading input parameters from the command line, you may find yourself with a number written out as a character string. To translate these into real numbers, use

atoi (char *str)
Returns the integer corresponding to the string.
atof (char *str)
Returns the floating point number corresponding to the string.

To use atoi and atof, you must include the stdlib header file:

#include <stdlib.h> 

This page is maintained by Geoff Kuenning.