| |||||
Introduction to Arrays | |||||
| Arrays Store Groups Of Related Values |
Limited to individual named variables, there are many programs that
would be inordinately difficult or annoying to write. For example,
we can easily write the following program to add up and average
five grades:
But what if we want to deal with fifty grades, or five hundred? We could put the summation in a loop, and just keep reading inputs into the same single variable, as in:
But then we would lose the ability to echo back the numbers, or to do any other manipulations that would require access to all the individual grades at a later point. The solution lies in using arrays. Arrays are a kind of aggregate variable. That is, in an array, several values are referred to by a single name. An array can be of any size you need, but once the array is created with a certain number of elements, you cannot change the number of elements it holds. Similarly, an array can hold elements of any type, but all the elements in a given array must be of the same type, which is also determined when the array is created.
| ||||
The Mechanics of Arrays | |||||
| |||||
| Referring To The Elements Of An Array |
The elements of the array are stored in a fixed order.
To specify which element of the array you want to access, you write
the name of the array followed by a numerical
array index given between a pair of square brackets.
So, for example, if a is a ten-element array of ints,
then a[3] refers to a specific one of the int variables in the array.
Of course, the index does not need to be a literal number. Any integer
expression can be used to specify the index. So, for example, if the
| ||||
For somewhat obscure reasons related to Java's relationship to C,
and the way C handles arrays,
the index of the first element is zero. Thus, if a is an
array variable, a[0] is the first element of the array,
a[1] is the second, and so on.Once we have identified a particular element of the array, we can use it like any other variable of that type. In particular, it can appear on either side of the equal sign of an assignment statement. For example, the statement: says to put twice the value of the variable y into the fourth cell of the array a, while the statement:
says to put twice the value of the fourth cell of the array a
into the variable x.
| |||||
| Declaring An Array |
As we said above, an array can be made of elements of any type.
In fact, to declare an array of elements of a given type you use
the same declaration you would for any other variable of that type;
you just put an empty pair of square braces after the variable name.
For example, we can declare a to be an array of integers
with the declaration:
This allows you to mix the declaration with other arrays of the same type, as well as with declarations of ordinary variables of that type, as in: which declares two arrays of ints, a and c, and two ordinary int variables, b and d.Alternately, the brackets can be put after the type. This is most useful if you are declaring several arrays of the same underlying type, as in: which declares four int arrays.
| ||||
|
Note: the ONLY time you ever write a variable name followed by a pair of empty square brackets
is in a type declaration (either of an ordinary variable, or of a formal parameter to a function).
In actual code you will always either write just the variable name (in which case you are refering to the entire array), or the variable name followed by square braces with an index expression inside (in which case you are refering to an individual element of the array).
| |||||
| Reading Types |
It is useful to practice reading off array declarations in English,
as this will help you understand when you use the brackets and when you
don't. Once you convince yourself that both ways of declaring arrays
makes sense, you will probably never make a mistake with the brackets.
To start, any time you see a type name with brackets after it, you read it
as the underlying type preceded by "an array of". For example, you would read:
as " a is an int". But you would read:
as " a is an array of int".
Now, suppose we write:
Since the type is unadorned, we would read this as: as " a[] is an int". But what kind of
variable do you put brackets after (albeit with an index expression inside the brackets)
to get an int?
An array of ints, of course! So it must be that
"a is an array of int".
So, having used either declaration, the name
| ||||
| Allocating An Array |
Unlike ordinary, scalar, variables that we have seen so far,
declaring an array variable is not sufficient to allow you to use it. Notice
that, while we have said we want a variable to store an array of elements
of a specified type, we have not yet specified how many elements the array holds. This
requires a separate step called allocating the array, which is
done with the command new, using the following syntax:
So, for example, to allocate the previously declared Since it is done with an assignment statement, this allocation step can be thought of as a sort of initialization. As such it can be merged into the declaration of the array variable like any other variable's initialization, as in: Similarly, we could declare and allocate an array of twenty string values with the statement:
| ||||
The value used to define the number of cells to allocate must be an
integer, but, as with the expression used to access an element of the array,
it need not be a literal. Any valid integer expression can be put
between the brackets in the call to new. For example, the following
two lines declare an int variable with the value 5
and declare and allocate two arrays of doubles, one with 5 elements
and one with 13 elements:
| |||||
| Initializing The Contents Of An Array |
By default, when the array is allocated, the elements of the array
are all initialized to zero (or some variation thereof appropriate to
the type of the elements). For small arrays for which you have other values
you would like to initialize the elements to, you can replace the call to
new with a comma-separated list of values between a pair of
curly braces. The array will have as many cells allocated
as there are values in the list, and the cells will be initialized
with the values from the list. For example, the following statement declares,
allocates, and initializes the contents of an array of twelve ints
holding the number of days in each month of the year:
Similarly, the next statement declares and initializes an array of Strings
holding the names of the months:
| ||||
| Java Checks All Array References |
It is not uncommon to make a programming mistake that causes the bounds of
a loop to be misdefined and the loop index variable to stray beyond
the range of the array. The most common cause of this is what is known as
an obiwan error (or off-by-one error), which is an error
you typically get by starting a loop at 0 when you should have started at 1 or
vice-versa, or by writing < instead of <= or vice-versa.
(These, in turn, are sometimes, but not always, instances of fence-post errors which
occur when you get confused between counting things and counting the spaces between things.
For example, how many items are there between (and including) position m
in an array and position n in an array. The answer is not (n - m), but,
rather, (n - m + 1).)In C, referencing outside the bounds of an array leads to a nasty situation in which the computer nevertheless reads a value from a spot in memory outside the array (or, worse, writes a value to a spot in memory outside the array) as though it were properly accessing inside the array (that is, the spot where the element of the array would have been if the array were that large). This can lead to unpredictable behavior and crashing. (In fact, an immediate crash is the best situation you can hope for, as it makes finding the bug easier. More likely the program will continue on for a while and start exhibiting strange behavior because the contents of some variable have been overwritten by the erroneous access.)
Fortunately, Java is more careful. Each access to an array
is checked to make sure that the index given is valid for that array.
So, for instance, if we tried to access the sixteenth element of the array echoing the invalid index value that we tried to use. While this slows down array operations a bit, indexing errors are so common (and, in C, so catastrophic), that it is worth it.
| ||||
Arrays and For Loops | |||||
Arrays and for-loops go together like peanut butter and jelly!
A large number of the things we want to do with arrays involve manipulating
the entire array. Unfortunately, there are very few built-in operations
that are defined to work on the contents of an entire array. Instead,
we must define those processes in terms of what they require us to do to the
individual elements of the array. This task is then repeated once for
each element of the array. Since the size of the array is fixed once the array is allocated,
it is a natural application of a for-loop. For example, suppose we wish to print out the names and lengths of each of the months. First we must consider how to do this for any one given month. Assuming we had the definitions above, and recalling that the indexing of elements of an array start at zero, to print out the name and length of the fourth month we would write:
To do this for each of the twelve months, we place this statement in the body of a
Notice that the loop index variable is set to start at 0 and continue as long
as it is less than (but not equal to) 12. Thus it will range over all of the
arrays cells (which, you will recall, are numbered 0 to 11).
The choice of writing
| |||||
| Example: An Array Based Grade Averager |
Putting all these ideas together, we can rewrite the program that averages five grades
to one that uses arrays, and therefore can handle any number of grades specified
by the user. The program begins by asking the user how many grades are to be averaged. It then uses that information to allocate the array. A loop is then used to get the grades from the user and to simultaneously compute their sum. After the average is computed, a second loop is used to echo the grades, and the average, back to the user.
| ||||
Finding the Length of an Array | |||||
|
In some circumstances, like the last few examples, we know how long the array
is (and therefore, the range of our index variable) because the array was declared in
the same method. However, there are times (particularly when an array is passed as an
argument to a method) that we do not know the size of a given array.
In these circumstances, we use the Note that the value returned is the number of cells, which is one more than the index of the last element. Thus another way of writing the loop that printed out the days of each month would be: Note that we start the loop at 0 and continue the loop as long as the index variable is less-than the value of the length attribute of the array. This is the most natural structure for accessing the entire array.
| |||||
Passing Arrays to Methods | |||||
| Methods For Higher- Level Actions |
Many of the higher-level operations that we would like to implement for arrays
are natural candidates for methods. These operations arise in many different
settings. By building them as methods we can construct a library of useful
array manipulation routines that can be reused many times in different settings.
When an array is to be received as a parameter to a method, we must declare its type among
the formal parameters, just like any other parameter. As usual this looks just
like any other array variable declaration. Thus if the method or:
| ||||
| Example: Computing The Sum Of The Elements Of An Array |
As an example of a higher-level operation on an array,
there are many settings in which it would be useful to know the sum
of the elements in an array of double values. This is a generic
operation that is really independent of the reason we want to know the
sum, so it is a good candidate for a method.
To define a method that will compute this for us, we must first consider its header.
Since the array will contain
In order to compute the sum, we must first declare a variable that will be used to store it, and set that to zero:
Now we want to add each cell of the array into that sum, one by one. So, we
set up a loop over the array. Since we do not know the size of the array
ahead of time, we will get it from the
Each pass through the loop, we will select the cell in the array whose index is the value of the loop counter, and add that cell into the sum:
Finally, when the loop is done, we return the sum as the result of the computation:
Puting it all together, we get:
Here is a simple program that includes and makes use of this method.
In addition to running the program, you should try tracing its execution by hand to make sure you can reproduce its behavior.
| ||||
Notice that when we call the method sumArray, we just pass it
the variable data as a parameter. This makes sense since the method
is expecting an array of doubles as a parameter, and data is
such a value. If we were to try to make a call like sumArray(data[3]), it would
not work, since data[3]is just a single double, not an array.
By making
| |||||
Working With | |||||
By now you should have recognized that, since our first program, we have had a method that receives an array as a parameter. That is, part of our standard incantation up till now has said
that the main method recieives an array of Strings that it names args! Why is it there and who is sending it?Well, consider all the times you've typed commands at the unix prompt and included the names of some files to do things with. When you type, for example: How does the rm program (which is just another program, not a command built-in to
Unix) get the names of the files you told Unix you wanted it to remove? Unix tells it, of course!
It does so by passing the main method of the executed program an array of Strings containing all the words that came after the program name on the
command line.
So, suppose you had a program named at the Unix prompt, then when the main method of the class foo was started, the parameter args would be a three element array, with the three elements being the words testing,
hello, and world. Here is a simple example program that echos back the words typed after the command when it is launched. (Unfortunately, you won't be able to run this in the browser. You'll need to copy it to the Unix machine and compile and run it there.)
| |||||
Case Study: Searching for Data in an Array | |||||
|
Many computer applications center on the problem of manipulating
large data sets (or databases) of information.
A frequent task in such applications is to search through the
data set for a particular piece of data. Sometimes we are just interested
in knowing whether the given value is in the data set at all, and
sometimes we are interested
in the location of the data item (because there is other information
connected to it that we wish to access). In either case the problem
is the same, and it is an appropriate task for a method.
| |||||
| Linear Search |
In the general case, when we know nothing about the arrangement
of the values in the data set and the value being searched for,
the only complete search procedure (i.e., one that
is guaranteed to give the right answer), is to search the entire
array, looking at every cell in the array. This can be accomplished
with the following method:
Notice that the search procedure is designed to produce reasonable results even if the element we are looking for is not in the data set. In that case it returns the value -1. Otherwise it returns the index of the position in the array that holds the element it was told to look for. This search technique is called linear search because there is a linear relationship between the number of elements in the data set and the length of time it takes to perform the search. If we double the size of the data set we are looking through, the search takes twice as long.
Of course, the design of this code is a little silly. Once we have found the
desired element there is no reason to continue the search.
(Notice that if the value we are looking for is in the array more than once, this method
returns the location of the last occurence.)
We could accomplish
this by replacing the assignment inside the which would exit the method immediately and return the current index. But this would be somewhat poor style: the for
loop would no longer be executing a fixed number
of times dictated by its header, and we'd have return
statements scattered about the method, rather than having just one at the
bottom, which is prefered for readability.
As a better solution, we can, as below,
use a
Sometimes this version of linear search will find the desired element near the beginning of the array, while other times it will need to search almost to the end. On average, it will need to search half the elements and thus, on average, it will be twice as fast as the previous version. Nevertheless, the search time still grows linearly with the size of the data set.
| ||||
| Binary Search |
While linear search is the best we can do if we don't know anything about the
data, if we know something about the data we can often do much better.
In particular, if we know that the data is stored in the array in sorted order we can write
a search procedure whose execution time grows only as fast as the logarithm of the
size of the data set. The idea is based on a simple intuition: instead of looking at each element of the data set, go straight to the middle element. If it is the element we are looking for, we're done. If, on the other hand, it is not, we can immediately rule out half of the remaining elements in the data set. If the middle element that we looked at is bigger than the element we are looking for, then so is everything in the upper half of the data set (since the array is sorted and they are all bigger than the middle element). If the middle element is smaller than the element we are looking for, we can, similarly, rule out the the entire lower half of the data set. Now we just repeat the process, looking at the middle element of the half of the data set that was not ruled out. Since each time we examine the middle element of a range we rule out half the remaining elements, doubling the size of the data set will only require examining one additional data point. If we make the data set one thousand times as large we will only need to actually look at ten more elements in the course of the search. Because we always split the data in half, identifying the target as being either to the left or the right, this technique is called binary search. In total, we never examine more elements than the number of times the data set can be divided in half, which is the logarithm (base 2) of the number of elements in the data set. The following method implements this search technique. It uses three variables which keep track of the locations of the bottom, top, and middle elements of the area of the array still under consideration. Each time through the loop we reduce the area of focus by half. If the middle element of the range is the one we are looking for, the sentinel is used to stop the loop. Otherwise, the loop continues until the top and bottom markers cross (that is, when the area of focus shrinks to nothing). In that case, the sought after element is not in the array. Here's the code:
| ||||
Last modified August 28 for Fall 99 cs5 by fleck@cs.hmc.edu
This page copyright ©1998 by Joshua S. Hodas. It was built with Frontier on a Macintosh . Last rebuilt on Mon, Oct 26, 1998 at 10:24:27 AM. | |
http://www.cs.hmc.edu/~hodas/courses/cs5/week_09/lecture/lecture.html | |