Week 09

Introduction To One-Dimensional Arrays And Searching In Arrays
Version 1


 


Introduction to Arrays

 
Arrays Store Groups Of Related Values Limited to individual named variables, there are many programs that would be inordinately difficult or annoying to write. For example, we can easily write the following program to add up and average five grades:

Click Here To Run This Program On Its Own Page
// Program: FiveGrades
// Author:  Joshua S. Hodas
// Date:    October 31, 1996
// Purpose: To demonstrate the need for arrays
 
import HMC.HMCSupport;
 
class FiveGrades {
 
  public static void main(String args[]) {
 
    double g1, g2, g3, g4, g5, average;
 
    HMCSupport.out.println("Please enter five grades:");
    g1 = HMCSupport.in.nextDouble();
    g2 = HMCSupport.in.nextDouble();
    g3 = HMCSupport.in.nextDouble();
    g4 = HMCSupport.in.nextDouble();
    g5 = HMCSupport.in.nextDouble();
 
    average = (g1 + g2 + g3 + g4 + g5) / 5;
 
    HMCSupport.out.println("The average of " + g1 +
                           ", " + g2 + ", " + g3 +
                           ", " + g4 + ", " + g5 +
                           " is " + average);
  }
 
}

But what if we want to deal with fifty grades, or five hundred? We could put the summation in a loop, and just keep reading inputs into the same single variable, as in:

Click Here To Run This Program On Its Own Page
// Program: LoopGrades
// Author:  Joshua S. Hodas
// Date:    October 19, 1998
// Purpose: To demonstrate the need for arrays
 
import HMC.HMCSupport;
 
class LoopGrades {
 
  public static void main(String args[]) {
 
    int numGrades = 10;
    double grade, sum = 0, average;
 
    HMCSupport.out.println("Please enter " + numGrades + " grades:");
 
    for (int i = 0 ; i < numGrades ; i++) {
    
       grade = HMCSupport.in.nextDouble();
       sum += grade;    
    }
 
    average = sum / numGrades;
    HMCSupport.out.println("The average of the grades is " + average);
  }
 
}

But then we would lose the ability to echo back the numbers, or to do any other manipulations that would require access to all the individual grades at a later point.

The solution lies in using arrays. Arrays are a kind of aggregate variable. That is, in an array, several values are referred to by a single name. An array can be of any size you need, but once the array is created with a certain number of elements, you cannot change the number of elements it holds. Similarly, an array can hold elements of any type, but all the elements in a given array must be of the same type, which is also determined when the array is created.


The Mechanics of Arrays

 

Referring To The Elements Of An Array The elements of the array are stored in a fixed order. To specify which element of the array you want to access, you write the name of the array followed by a numerical array index given between a pair of square brackets. So, for example, if a is a ten-element array of ints, then a[3] refers to a specific one of the int variables in the array.

Of course, the index does not need to be a literal number. Any integer expression can be used to specify the index. So, for example, if the int variable i has the value 2, then a[3] and a[i+1] refer to the same element of the array.

  For somewhat obscure reasons related to Java's relationship to C, and the way C handles arrays, the index of the first element is zero. Thus, if a is an array variable, a[0] is the first element of the array, a[1] is the second, and so on.

Once we have identified a particular element of the array, we can use it like any other variable of that type. In particular, it can appear on either side of the equal sign of an assignment statement. For example, the statement:

a[3] = 2 * y; 
says to put twice the value of the variable y into the fourth cell of the array a, while the statement:
x = 2 * a[3]; 
says to put twice the value of the fourth cell of the array a into the variable x.

Declaring An Array As we said above, an array can be made of elements of any type. In fact, to declare an array of elements of a given type you use the same declaration you would for any other variable of that type; you just put an empty pair of square braces after the variable name. For example, we can declare a to be an array of integers with the declaration:
int a[]; 
This allows you to mix the declaration with other arrays of the same type, as well as with declarations of ordinary variables of that type, as in:
int a[], b, c[], d; 
which declares two arrays of ints, a and c, and two ordinary int variables, b and d.

Alternately, the brackets can be put after the type. This is most useful if you are declaring several arrays of the same underlying type, as in:

int[] a, b, c, d; 
which declares four int arrays.

  Note: the ONLY time you ever write a variable name followed by a pair of empty square brackets is in a type declaration (either of an ordinary variable, or of a formal parameter to a function). In actual code you will always either write just the variable name (in which case you are refering to the entire array), or the variable name followed by square braces with an index expression inside (in which case you are refering to an individual element of the array).

Reading Types It is useful to practice reading off array declarations in English, as this will help you understand when you use the brackets and when you don't. Once you convince yourself that both ways of declaring arrays makes sense, you will probably never make a mistake with the brackets. To start, any time you see a type name with brackets after it, you read it as the underlying type preceded by "an array of". For example, you would read:
int a; 
as "a is an int". But you would read:
int[] a; 
as "a is an array of int". Now, suppose we write:
int a[]; 
Since the type is unadorned, we would read this as: as "a[] is an int". But what kind of variable do you put brackets after (albeit with an index expression inside the brackets) to get an int? An array of ints, of course! So it must be that "a is an array of int".

So, having used either declaration, the name a refers to the whole array, whereas when we put the brackets after it and put an index value in the brackets, we are refering to an individual int in the array.

Allocating An Array Unlike ordinary, scalar, variables that we have seen so far, declaring an array variable is not sufficient to allow you to use it. Notice that, while we have said we want a variable to store an array of elements of a specified type, we have not yet specified how many elements the array holds. This requires a separate step called allocating the array, which is done with the command new, using the following syntax:

array_name = new type[number_of_elements]

So, for example, to allocate the previously declared int array a as holding ten integers, you would write:

a = new int[10]; 
Since it is done with an assignment statement, this allocation step can be thought of as a sort of initialization. As such it can be merged into the declaration of the array variable like any other variable's initialization, as in:
int a[] = new int[10]; 
Similarly, we could declare and allocate an array of twenty string values with the statement:
String s[] = new String[20]; 

  The value used to define the number of cells to allocate must be an integer, but, as with the expression used to access an element of the array, it need not be a literal. Any valid integer expression can be put between the brackets in the call to new. For example, the following two lines declare an int variable with the value 5 and declare and allocate two arrays of doubles, one with 5 elements and one with 13 elements:
int sizeBase = 5;
double[] a = new double[sizeBase], b = new double[2 * sizeBase + 3]; 

Initializing The Contents Of An Array By default, when the array is allocated, the elements of the array are all initialized to zero (or some variation thereof appropriate to the type of the elements). For small arrays for which you have other values you would like to initialize the elements to, you can replace the call to new with a comma-separated list of values between a pair of curly braces. The array will have as many cells allocated as there are values in the list, and the cells will be initialized with the values from the list. For example, the following statement declares, allocates, and initializes the contents of an array of twelve ints holding the number of days in each month of the year:
int monthDays[] = {31,28,31,30,31,30,31,31,30,31,30,31}; 
Similarly, the next statement declares and initializes an array of Strings holding the names of the months:
String monthNames[] = {"January","February","March","April",
                       "May","June","July","August",
                       "September","October","November","December"};
 

Java Checks All Array References It is not uncommon to make a programming mistake that causes the bounds of a loop to be misdefined and the loop index variable to stray beyond the range of the array. The most common cause of this is what is known as an obiwan error (or off-by-one error), which is an error you typically get by starting a loop at 0 when you should have started at 1 or vice-versa, or by writing < instead of <= or vice-versa. (These, in turn, are sometimes, but not always, instances of fence-post errors which occur when you get confused between counting things and counting the spaces between things. For example, how many items are there between (and including) position m in an array and position n in an array. The answer is not (n - m), but, rather, (n - m + 1).)

In C, referencing outside the bounds of an array leads to a nasty situation in which the computer nevertheless reads a value from a spot in memory outside the array (or, worse, writes a value to a spot in memory outside the array) as though it were properly accessing inside the array (that is, the spot where the element of the array would have been if the array were that large). This can lead to unpredictable behavior and crashing. (In fact, an immediate crash is the best situation you can hope for, as it makes finding the bug easier. More likely the program will continue on for a while and start exhibiting strange behavior because the contents of some variable have been overwritten by the erroneous access.)

Fortunately, Java is more careful. Each access to an array is checked to make sure that the index given is valid for that array. So, for instance, if we tried to access the sixteenth element of the array monthNames the system would immediately halt the program, raising the exception

java.lang.ArrayIndexOutOfBoundsException:15 
echoing the invalid index value that we tried to use.

While this slows down array operations a bit, indexing errors are so common (and, in C, so catastrophic), that it is worth it.


Arrays and For Loops

 
  Arrays and for-loops go together like peanut butter and jelly! A large number of the things we want to do with arrays involve manipulating the entire array. Unfortunately, there are very few built-in operations that are defined to work on the contents of an entire array. Instead, we must define those processes in terms of what they require us to do to the individual elements of the array. This task is then repeated once for each element of the array. Since the size of the array is fixed once the array is allocated, it is a natural application of a for-loop.

For example, suppose we wish to print out the names and lengths of each of the months. First we must consider how to do this for any one given month. Assuming we had the definitions above, and recalling that the indexing of elements of an array start at zero, to print out the name and length of the fourth month we would write:

HMCSupport.out.println(monthNames[3] + " is " +
                       monthDays[3] + " days long."); 

To do this for each of the twelve months, we place this statement in the body of a for loop that counts off all the months, and replace the value 3 with the loop index variable, as in:

for (int i = 0 ; i < 12 ; i++) {
   
  HMCSupport.out.println(monthNames[i] + " is " + 
                         monthDays[i] + " days long.");
} 

Notice that the loop index variable is set to start at 0 and continue as long as it is less than (but not equal to) 12. Thus it will range over all of the arrays cells (which, you will recall, are numbered 0 to 11). The choice of writing i < 12 instead of i <= 11 is a stylistic one. The former style is favored by C programmers, and has been carried over into Java. In general, it is a good style since, looking at the program, the value 12 will have some meaning to the reader (since it is associated with a number of months) whereas the value 11 would be less helpful.

Example: An Array Based Grade Averager Putting all these ideas together, we can rewrite the program that averages five grades to one that uses arrays, and therefore can handle any number of grades specified by the user.

The program begins by asking the user how many grades are to be averaged. It then uses that information to allocate the array. A loop is then used to get the grades from the user and to simultaneously compute their sum. After the average is computed, a second loop is used to echo the grades, and the average, back to the user.

Click Here To Run This Program On Its Own Page
// Program: ArrayGrades
// Author:  Joshua S. Hodas
// Date:    October 27, 1997 (Modified 10/19/98)
// Purpose: To demonstrate the use of arrays
 
import HMC.HMCSupport;
 
class ArrayGrades {
 
  public static void main(String args[]) {
 
    int numGrades;             
    double grades[];
    double sum = 0, average;
 
    // Prompt the user for the number of grades to be used.
 
    HMCSupport.out.print("How many grades do you want to average? ");
    numGrades = HMCSupport.in.nextInt();
	
    // Allocate the grade array, now that we know how many grades.
 
    grades = new double[numGrades];                            
 
    // Input the grades, and add them up.
 
    HMCSupport.out.println("Please enter " + numGrades + " grades:");
    sum = 0;
    for (int i = 0 ; i < numGrades ; i++) {
    
       grades[i] = HMCSupport.in.nextDouble();
       sum += grades[i];
    }
    average = sum / numGrades;
 
    // Echo back the grades, and the average.
 
    HMCSupport.out.print("The average of ");
    for (int i = 0 ; i < numGrades ; i++) {
    
       // Print a comma before all but first grade.
 
       if (i != 0)
          HMCSupport.out.print(", ");
	      
       HMCSupport.out.print(grades[i]);
    }
 
    HMCSupport.out.println(" is " + average);
  }
 
}


Finding the Length of an Array

  In some circumstances, like the last few examples, we know how long the array is (and therefore, the range of our index variable) because the array was declared in the same method. However, there are times (particularly when an array is passed as an argument to a method) that we do not know the size of a given array.

In these circumstances, we use the length attribute that is defined for arrays. Like the equals method defined for strings, this attribute is used by giving the name of the particular array we are interested in, followed by a period and then the attribute name. (Since it is an attribute and not a method, though, we do not follow it with parentheses.) So, for example, if we had the month name and days arrays declared as before, we could write:

int numMonths = monthNames.length; 
Note that the value returned is the number of cells, which is one more than the index of the last element. Thus another way of writing the loop that printed out the days of each month would be:
for (int i = 0 ; i < monthNames.length ; i++) {
   
  HMCSupport.out.println(monthNames[i] + " is " + 
                         monthDays[i] + " days long.");
} 
Note that we start the loop at 0 and continue the loop as long as the index variable is less-than the value of the length attribute of the array. This is the most natural structure for accessing the entire array.


Passing Arrays to Methods

 
Methods For Higher- Level Actions Many of the higher-level operations that we would like to implement for arrays are natural candidates for methods. These operations arise in many different settings. By building them as methods we can construct a library of useful array manipulation routines that can be reused many times in different settings.

When an array is to be received as a parameter to a method, we must declare its type among the formal parameters, just like any other parameter. As usual this looks just like any other array variable declaration. Thus if the method foo takes an array of ints and return no value, it would be declared something like:

public static void foo(int a[]) 
or:
public static void foo(int[] a) 

Example: Computing The Sum Of The Elements Of An Array As an example of a higher-level operation on an array, there are many settings in which it would be useful to know the sum of the elements in an array of double values. This is a generic operation that is really independent of the reason we want to know the sum, so it is a good candidate for a method.

To define a method that will compute this for us, we must first consider its header. Since the array will contain double values, the sum will be returned at that type. Thus the header indicates that the method takes an array of doubles as a parameter and returns a double (not an array of doubles) as a result:

public static double sum_array(double[] a) 

In order to compute the sum, we must first declare a variable that will be used to store it, and set that to zero:

double sum = 0; 

Now we want to add each cell of the array into that sum, one by one. So, we set up a loop over the array. Since we do not know the size of the array ahead of time, we will get it from the length attribute:

for (int i = 0 ; i < a.length ; i++) 

Each pass through the loop, we will select the cell in the array whose index is the value of the loop counter, and add that cell into the sum:

sum = sum + a[i]; 

Finally, when the loop is done, we return the sum as the result of the computation:

return sum; 

Puting it all together, we get:

// Method: SumArray
// Author:  Joshua S. Hodas
// Date:    October 26, 1997
// Purpose: To demonstrate passing arrays to methods
 
public static double sumArray(double[] a) {
  
  double sum = 0;
  
  for (int i = 0 ; i < a.length ; i++) {
 
    sum = sum + a[i];  
  }  
  return sum;
}

Here is a simple program that includes and makes use of this method.

Click Here To Run This Program On Its Own Page
// Program: SumArray
// Author:  Joshua S. Hodas
// Date:    October 26, 1997
// Purpose: To demonstrate passing arrays to methods
 
import HMC.HMCSupport;
 
class SumArray {
 
  public static void main(String args[]) {
 
    double[] data = {5.3,2.4,8.1,4.0,1.6};
 
    HMCSupport.out.println("The sum of the elements in the array is: "
                           + sumArray(data));
  }
 
 
  public static double sumArray(double[] a) {
  
    double sum = 0;
  
    for (int i = 0 ; i < a.length ; i++) {
    
      sum = sum + a[i];
    }  
    return sum;
  }
 
}

In addition to running the program, you should try tracing its execution by hand to make sure you can reproduce its behavior.

  Notice that when we call the method sumArray, we just pass it the variable data as a parameter. This makes sense since the method is expecting an array of doubles as a parameter, and data is such a value. If we were to try to make a call like sumArray(data[3]), it would not work, since data[3]is just a single double, not an array.

By making sumArray a method, it can be dropped in any program that needs it, without concern for what the actual names and sizes of the arrays it will manipulate are. It is worth noting that in naming tha method, and the formal parameter, we chose generic names that make sense from the method's perspective. Even if we were writing this method in the context of a grading program, it would be better to name things as we do here, than to use more specific names like sumGrades (for the method) and grades (for the formal parameter). Of course, in that situation, in the main method, or any other method that depended on the nature of the data in the array, calling the array grades would make sense.


Working With String args[] in main

  By now you should have recognized that, since our first program, we have had a method that receives an array as a parameter. That is, part of our standard incantation up till now has said that the main method recieives an array of Strings that it names args! Why is it there and who is sending it?

Well, consider all the times you've typed commands at the unix prompt and included the names of some files to do things with. When you type, for example:

rm foo.java bar.java 
How does the rm program (which is just another program, not a command built-in to Unix) get the names of the files you told Unix you wanted it to remove? Unix tells it, of course! It does so by passing the main method of the executed program an array of Strings containing all the words that came after the program name on the command line.

So, suppose you had a program named foo.java which you had compiled to produce the file foo.class. Then, if you typed the command:

java foo testing hello world 
at the Unix prompt, then when the main method of the class foo was started, the parameter args would be a three element array, with the three elements being the words testing, hello, and world.

Here is a simple example program that echos back the words typed after the command when it is launched. (Unfortunately, you won't be able to run this in the browser. You'll need to copy it to the Unix machine and compile and run it there.)

// Program: EchoArgs
// Author:  Joshua S. Hodas
// Date:    October 19, 1998
// Purpose: To demonstrate accessing the command line arguments
 
import HMC.HMCSupport;
 
class EchoArgs {
 
  public static void main(String args[]) {
 
    HMCSupport.out.println("The words on the command line were:");
 
    for (int i = 0 ; i < args.length ; i++) {
    
      HMCSupport.out.println(args[i]);  
    }
  }
 
}


Case Study: Searching for Data in an Array

  Many computer applications center on the problem of manipulating large data sets (or databases) of information. A frequent task in such applications is to search through the data set for a particular piece of data. Sometimes we are just interested in knowing whether the given value is in the data set at all, and sometimes we are interested in the location of the data item (because there is other information connected to it that we wish to access). In either case the problem is the same, and it is an appropriate task for a method.

Linear Search In the general case, when we know nothing about the arrangement of the values in the data set and the value being searched for, the only complete search procedure (i.e., one that is guaranteed to give the right answer), is to search the entire array, looking at every cell in the array. This can be accomplished with the following method:

// Method:  linearSearch1
// Author:  Joshua S. Hodas
// Date:    November 22, 1996
// Purpose: To Search an Array for a Value
 
  public static int linearSearch1(double arr[], double searchValue) {
 
    int positionFound = -1;        // If not found, return -1
 
    for (int i = 0 ; i < arr.length ; i++) {
 
       if (arr[i] == searchValue)  // Item found at position i 
          positionFound = i;
    }
    return positionFound; 
  }

Notice that the search procedure is designed to produce reasonable results even if the element we are looking for is not in the data set. In that case it returns the value -1. Otherwise it returns the index of the position in the array that holds the element it was told to look for.

This search technique is called linear search because there is a linear relationship between the number of elements in the data set and the length of time it takes to perform the search. If we double the size of the data set we are looking through, the search takes twice as long.

Of course, the design of this code is a little silly. Once we have found the desired element there is no reason to continue the search. (Notice that if the value we are looking for is in the array more than once, this method returns the location of the last occurence.) We could accomplish this by replacing the assignment inside the if statement with:

return i; 
which would exit the method immediately and return the current index. But this would be somewhat poor style: the for loop would no longer be executing a fixed number of times dictated by its header, and we'd have return statements scattered about the method, rather than having just one at the bottom, which is prefered for readability.

As a better solution, we can, as below, use a while loop rather than a for loop and structure the test so as to make the loop terminate when the value is found. This is most easily done using a boolean sentinel value, as in:

// Method:  linearSearch2
// Author:  Joshua S. Hodas
// Date:    November 22, 1996
// Purpose: To Search an Array for a Value More Efficiently
 
  public static int linearSearch2(double arr[], double searchValue) {
 
    int positionFound = -1, i = 0;
    boolean found = false;    // We haven't found the value yet
 
    while (i < arr.length && (!found)) {// read "!found" as "not found"
                                        // i.e.  "while not found"
       if (arr[i] == searchValue) {
 
          positionFound = i;  // Item found at position i
          found = true;       // Stop the loop at next round
       }
       i++;
    }
    return positionFound;
  }

Sometimes this version of linear search will find the desired element near the beginning of the array, while other times it will need to search almost to the end. On average, it will need to search half the elements and thus, on average, it will be twice as fast as the previous version. Nevertheless, the search time still grows linearly with the size of the data set.

Binary Search While linear search is the best we can do if we don't know anything about the data, if we know something about the data we can often do much better. In particular, if we know that the data is stored in the array in sorted order we can write a search procedure whose execution time grows only as fast as the logarithm of the size of the data set.

The idea is based on a simple intuition: instead of looking at each element of the data set, go straight to the middle element. If it is the element we are looking for, we're done. If, on the other hand, it is not, we can immediately rule out half of the remaining elements in the data set. If the middle element that we looked at is bigger than the element we are looking for, then so is everything in the upper half of the data set (since the array is sorted and they are all bigger than the middle element). If the middle element is smaller than the element we are looking for, we can, similarly, rule out the the entire lower half of the data set. Now we just repeat the process, looking at the middle element of the half of the data set that was not ruled out.

Since each time we examine the middle element of a range we rule out half the remaining elements, doubling the size of the data set will only require examining one additional data point. If we make the data set one thousand times as large we will only need to actually look at ten more elements in the course of the search.

Because we always split the data in half, identifying the target as being either to the left or the right, this technique is called binary search. In total, we never examine more elements than the number of times the data set can be divided in half, which is the logarithm (base 2) of the number of elements in the data set.

The following method implements this search technique. It uses three variables which keep track of the locations of the bottom, top, and middle elements of the area of the array still under consideration. Each time through the loop we reduce the area of focus by half. If the middle element of the range is the one we are looking for, the sentinel is used to stop the loop. Otherwise, the loop continues until the top and bottom markers cross (that is, when the area of focus shrinks to nothing). In that case, the sought after element is not in the array. Here's the code:

// Method:  binarySearch
// Author:  Joshua S. Hodas
// Date:    November 22, 1996 (modified 10/19/98)
// Purpose: To Search an Array for a Value More Efficiently
 
  public static int binarySearch(double arr[], double searchValue) {
 
    int positionFound = -1;
    int bottom = 0, top = arr.length-1, middle;
    boolean found = false;     // We haven't found the value yet
 
    while (bottom <= top && (!found)) { // There is still stuff to search
                                        // and element still "not found"
 
       middle = (bottom + top)/2;       // Identify element to examine
       
       if (arr[middle] == searchValue) { // See if it is one we want
 
          positionFound = middle;       // If it is, then we found it here
          found = true;                 // Item found so exit loop
       }
 
       // If middle element was too small, next time look above it.
       else if (arr[middle] < searchValue) {
 
          bottom = middle + 1;
          
       }
 
       // Otherwise, next time look below it.
       else {
       
          top = middle - 1;
       }
    }
    return positionFound;
  }

Last modified August 28 for Fall 99 cs5 by fleck@cs.hmc.edu


This page copyright ©1998 by Joshua S. Hodas. It was built with Frontier on a Macintosh . Last rebuilt on Mon, Oct 26, 1998 at 10:24:27 AM.
http://www.cs.hmc.edu/~hodas/courses/cs5/week_09/lecture/lecture.html