| |||
The Problem Of Heterogeneous Data | |||
|
Arrays are useful when we want to store a collection of values, but they have
a serious limitation: all of their elements have to be of the same type.
Thus, while they can be declared to be of any size, they are homegeneous
data structures. They are of no help if we wish to group together several
values of different, or heterogeneous, types. For example, consider the problem of maintaining a database of employee records. There are several things we might wish to record about each employee: first name, last name, department, salary, ID number, number of dependents, et cetera. How do we store the information about an employee? Some of these values are strings, some are floating point values, and some are integers. We could store them all as strings, and thereby get away with using an array of six strings to represent an employee record. But this would be awkward. In the least it would require converting back and forth between the string representation of the salary and its numeric value any time we wanted to do some calculations. The only alternative at this point would be to use six separate variables. But that turns into a logistical headache. Whenever we want another employee, we need to declare six more variables. If we want to define a function that operates on an employee record, we would need it to take six parameters for the employee information. If the function did something that involved comparing the information for two employees it would need twelve parameters! This quickly becomes far to cumbersome to be practical. And what if we want to store information about one hundred employees? For the same reason we couldn't create a six-cell one-dimensional array to hold a single employee, we can't create a six-hundred-cell two-dimensional array to hold the data about one hundred employees. Instead, we would need to create six separate arrays of one hundred elements each. One would hold all the first names, one all the salaries, and so on. Then, if we wanted to sort the records in order of the employees' names we would need to make sure that each time the sorting algorithm directed us to exchange two names that we make the same corresponding exchange in each of the six arrays. Otherwise the data for an employee would become erroneously attached to another employee's name.
| |||
Objects Store Heterogeneous Data | |||
|
The solution comes in the form of objects, which are compound variables
designed to hold heterogeneous data.
| |||
| Classes Describe Objects |
If we wish to define an object to represent something,
we begin by specifying the class of the object.
The class can be seen as a template describing what information an object
stores and how it behaves. For now we will ignore the "behavior" part
and concentrate instead on the "storage" part. We will get to behavior next week. We have already been defining classes since we wrote our first Java programs, but they have been weird degenerate classes just used to act as the shells for our programs. Now we will define degenerate classes of a different sort: ones that have no methods in them, only data. In the next lecture we will put everything together and discuss classes that have both data and behavior.
| ||
| Defining A Class |
For example, suppose we wanted to define a class
of objects corresponding to our employees above, We might define it as:
While this looks pretty much like a series of ordinary variable declarations, the key thing to notice is that the declarations are not inside a method, but are, instead, just inside the class header. Each object of this class (sometimes refered to as an instance of the class) will have its own one of each of these variables. Therefore, they are sometimes refered to as instance variables. (Because an object of this degenerate form is alot like a record in a database, or like a form to be filled in, the instance variables are also sometimes refered to as the fields of the object.)
| ||
| Only Public Fields Can Be Accessed |
The other big apparent difference between the declaration of each of the instance variables
of the Employee class and the declaration of a variable inside a method is that each
was preceeded by the word public. This declaration makes it possible for
other classes making use of Employee objects to access those fields.
| ||
If the keyword public had been left out of some of the declarations,
it would not be possible for programs
to access those fields. Instead they would only be accessible from within the Employee
class's own methods, as will be described in the next lecture.
Of course, in this example, the Employee class doesn't have any methods,
so any non-public fields (otherwise known as private fields)
would be useless here. So, for now, we will declare all of an object's fields to be public.
| |||
| Assigning Default Values To Fields |
If the field declarations in a class definition include initializations, then the given values will
be automatically assigned to those fields every time a new object of the class is created. This
provides a way
to provide default values for fields when that is appropriate. For example, suppose we knew
that most of the employees at our company were named "Smith", worked in the engineering
department, made $30,000, and had two children. Then we could assign these values as
defaults by changing the class definition to read:
| ||
| Declaring An Object |
Once we have created a class definition for Employee objects,
we can use the identifier Employee in the code of other classes as though it were
a type name. That is, we can declare variables of that type. For instance, if we
want to declare two variables which will be used to hold information about
two employees, we would write:
| ||
| Allocating An Object |
As with array variables, though, it is not enough just to declare an object variable, as all the variable itself holds is a reference to the actual object. Therefore, we must use the
new command to actually set aside space on the heap for the object itself and set
the object variable to refer to that space. Thus, for example, we would write:
to make emp1 refer to an actual Employee object.
If the class definition included default values for any of the object's fields, those
values would be inserted into the new object on the heap at this point.
| ||
Note that, for reasons that will be explained next week,
when you call new to allocate space for an object, a pair of
(empty) parentheses are given after the class name. Don't confuse this with the
use of square braces in allocating an array. The two uses are unrelated.
To reiterate,
each time we call
| |||
| Accessing An Object |
Once you have declared and allocated an object, to access its fields you write the
name of the object, followed by a period, followed by the name of the field.
For example:
| ||
| A Simple Example |
The following lengthy but simple program puts these ideas together.
It declares two Employee objects, fills in their fields with data supplied by the
user, and compares their salaries. We have included the definition of the Employee
class within the same source file. As will be explained below, however, this is not
strictly necessary.
| ||
Organizing Your Source Code Files | |||
We are now in a new situation. Up until now each program we wrote required a single
class definition (since we weren't really using classes the way they were intended).
Now, this last program involves two class definitions: one for Employee
which defines the class of objects being manipulated, and one for EmployeeTest
which is one of our program-only classes that is making use of the Employee
class as a data structure to be manipulated.
| |||
| Putting All The Class Definitions In One File |
How should we organize this code? Java gives us several options.
As above, you can put all of the
class definitions inside a single source code file. When you compile the file using
would produce two files Employee.class and
EmployeeTest.class as a result of compilation.
Even though the source file was called since that is the .class file with the main method that we want to execute.Note that the order of the class definitions in a single file does not matter. As with the methods within a program, the compiler will recognize the use of classes defined later in the same file since it makes multiple passes through the source file during the compilation.
| ||
| Putting Each Class Definition In Separate Files |
You can also put each of the class definitions in separate files, and compile them separately.
This will produce exactly the same two .class files.
The advantage of this technique is that if you want to use the Employee
class in a number of programs, you don't need to replicate its definition in each program.In addition, if each class definition goes in its own file, then each file can be named for the class whose source code is given in the file. This allows the java compiler to perform an extra bit of magic:
When the java compiler encounters, for example, the class name
In fact, even if the compiler finds a
Thus, if the
| ||
|
In general it is considered best style to put each class definition into a separate
file. However, for the purposes of submitting the remaining homework assignments we will
ask you to include all the relevant classes in a single file.
| |||
Objects Are Passed By Reference | |||
|
As we said above, like arrays, object variables just hold references to the actual objects
which lie elsewhere in memeory on the heap.
Therefore, as with arrays, when an object variable is passed to a method,
any changes made to the object inside the method will affect the original object. This can come in handy. For instance, we can shorten the last program considerably by putting the code that gets an employee's information into a method:
Note that we have not included the definition of the
| |||
Returning Objects From Methods | |||
|
As with arrays, there is no problem returning objects (or, more accurately, references to objects) from methods. The actual objects returned can be either ones that came into the method as parameters, or ones that were created (allocated) within the method.
The next program is identical to the last one, except that the
Is this a good idea? Perhaps. It does remove the need for the two allocation steps in the
| |||
Arrays Of Objects | |||
| Declaring An Array Of Objects |
Now, because an employee record fits (conceptually, if not in reality) in a single variable,
it is possible to declare an
array of those records. For example, if we want to store an array of employee records
in the variable employeeDatabase, we would declare that variable as:
Just as int[] is the type of an array of integers, Employee[]
is the type of an array of employee records (or, more precisely, an array of references to
employee objects).
| ||
| Allocating An Array Of Objects |
Of course we must then set aside space for the actual array, by allocating it with a
call to new. So, if we want a database of one hundred records we would
allocate it with:
As usual, the declaration and allocation of the array can be merged into a single statement:
| ||
| Allocating The Objects In The Array |
It is important to understand, though, that we have not created space for one hundred employee
records. Rather we have allocated space for one hundred nameless Employee
variables each of which holds a reference to such a record. But at this point none those references
is pointing to an actual object. Just
as when an individual object variable was used we needed to allocate space for the actual
object, in this case we still need to allocate the one hundred objects themselves.
There is no way to do this in a single command. It is most easily accomplished with
a loop such as:
Once this is done, each element of the array can be treated as an employee record. So,
for example, if we wish to specify the salary of the employee in the 37th cell of the
array, we would write:
| ||
Arrays Within Objects | |||
The fields defined in a class definition can be of any valid Java type.
Thus a field need not be a simple value like a string or an integer. It
could be and array, for instance. In the case of the
employee records we could use this to store an employee's salary history for the last ten years,
for example. If we added the field declaration:
to the Employee class definition, then each employee record would contain
such an array. To set the salary from two years ago for the Employee in the
variable emp1 from the earlier example, we could write:
Carrying this to the next level, if we wish to set the salary from two years ago for the employee in the 37th cell of the employee database array, we would write:
| |||
| Understanding The Types Of Complex Structures |
To see that this makes sense from the perspective of types, it is good practice to examine the
types of progressively more specified parts of employeeDatabase[36].salaryHistory[1]:
is an array of Employees. That means that:
is an Employee. But then, given the modified definition of the Employee
class, this object has a field named salaryHistory that is an array of floats, so:
is an array of floats. That means that:
is a float, so assigning the value 34567.89 to it is a reasonable action.
| ||
Objects Within Objects | |||
|
Another important outgrowth of the fact that fields can be of any type is that
you can just as easily have objects whose fields are themselves objects. For example,
we might want to store information about when an employee was born and when he was hired.
Now a date consists of a month name, a day in the month, and a year. While we could
specify those three fields separately, it is natural to think of them as a unit.
So, it makes sense to define a new class for holding dates:
With this we can add the following declarations to the To bring you up to date, that gives us the following as the current definition of the Employee class:
Because we included the call to
To access the individual fields of a
We can also refer to the entire date object stored in one of these fields. If we knew, for
instance, that Of course, since the hireDate and birthDate fields only hold references
to objects, not the actual objects themselves, this code will make those two fields of the given
objects refer to the same single Date object. If we change emp1.birthDate.month to February, then that will
become the value of employeeDatabase[32].hireDate.month as well. If we wish the two
date objects to be independent, then we must, instead, copy over the fields of emp1.birthDate individually, as in:
| |||
Last modified August 28 for Fall 99 cs5 by fleck@cs.hmc.edu
This page copyright ©1998 by Joshua S. Hodas. It was built with Frontier on a Macintosh . Last rebuilt on Wed, Nov 11, 1998 at 12:49:37 AM. | |
http://www.cs.hmc.edu/~hodas/courses/cs5/week_12/lecture/lecture.html | |