Week 01

Introduction to CS 5, Computers, and Java Programming
Version 1.0

Welcome to CS 5

A Grand Experiment

In 1996 we made two big changes to our introductory programming course, CS5:

We began teaching Java rather than C++.
We eliminated lectures in favor of recitations and lab sessions.

We will discuss the decision to change languages below. The change to a recitation and lab based course was made because of our sense that for students with little or no background in programming the exisiting course was often a real struggle. In the end, many students would simply drop out. Others would squeak by, but without the level of comfort and familiarity with the computer and programming that we felt was adequate to the demands they would face in the rest of their careers at HMC and beyond. We believe that much of the problem was in the format of the course. Some students with a natural gift for programming can learn about it in a lecture format. For the rest, the lectures went by in a blur. The goal of the new recitation/lab format is to shift towards a more natural learn-by-doing environment.

The main course page gives the meeting times for lectures and lab sections. The material that would ordinarily be covered in lectures is instead being posted in this series of web pages. The notes will contain many links to other parts of the notes and other documents on the web, as well as examples that can be downloaded to your account for experimentation. Therefore, they are best read on the computer. However, if you prefer, they can be printed from within the browser and read offline.

About These Notes

There are still few good books about Java for beginning programmers. They almost all assume a prior programming background. In fact, even after all this time there are suprisingly few good introductory programming text in C or C++. Most of the good intro texts are in Pascal.

These notes will be your complete source for learning both programming, and Java. We have decided to make them pretty complete. We also want to try to keep the presentation of a given topic within a single week's notes. That means that sometimes a topic may be presented in a bit more depth than is strictly necessary at that point. Don't let it overwhelm you. Read through it and get what you can. Come to recitation and ask questions about the stuff that caused you trouble. Later you can go back and reread it to absorb more.

Sometimes we will present stuff that we just know is going to be hard to understand the first time around. So that you don't get too frustrated, we'll mark those passages with the little guy over to the left there.

Important passages, that point out details you really need to take to heart and keep in mind, are marked with the exclamation point.

Much more rarely we will mark a passage with the gavel. That's an indication that the passage lays out an administrative rule for the course. You won't see it too often.

If you find errors in these notes, please report them to the author (Josh Hodas) as well as the instructors for this year ( Margaret Fleck, and Mike Erlinger).

Electronic Resources

Most of your work for this course will be done on Orion, Computer and Information Service's Sun SparcServer. We expect you to learn how to use Pine to read your email, GNU Emacs to edit your files, the Java compiler and executor to run your programs, and Unix to manage your files. You have already received quick guides to Unix, Pine, and Emacs in your orientation packets. A breif guide to Unix is linked from the course home page. There is additional documentation on Pine and Unix in the CIS online documentation library at

http://www.hmc.edu/comp/doc/

We have also included several guides to Unix and Emacs among the recommended texts for the course. All of these are excellent books that will serve you well throughout your four years at HMC.

In order to access Orion, you will use either telnet or x-windows from a personal computer in the AC labs or in your dorm room. Telnet is easier to use, but x-windows offers more power for some uses. You may want to start with telnet and then switch to x-windows as you become more comfortable with things.

Finally, you will use Netscape, or a similar web browser, to read the lecture pages and to submit your assignments. Netscape is available on the AC lab machines and, like telnet, can be copied to your own computer if you have an Ethernet connection. If you are using x-windows to connect to Orion, you also have the option of running Netscape on Orion via x-windows. The advantage of that way of using Netscape is that it simplifies the process of copying sample programs from the lecture notes to your own directory for experimentation.

Getting Set Up

In order to properly configure your Orion account for use in the course, you must run the CS 5 setup script. This is done by logging in to Orion (using telnet or x-windows) and typing:

/home/cs/cs5/bin/cs5setup

at the Unix prompt. This script makes two changes to your account. First, it sets up your Unix environment so that the Java compiler will know where to locate certain files that you will need to write programs for the course. Second, it creates a directory named cs5 as a subdirectory of your home directory. This directory is configured so that the files in it are only accessible to you (and the homework submission software). It is important that you create all your course-related files inside that subdirectory. This will insure that the submission software can retrieve your submissions, and also guarantees that no one else, in a moment of weakness, looks at your solutions.

Computers and Programming Languages

The First Computer

What was the first computer? Most people will answer Eniac, built at the University of Pennsylvania in 1946 by J. Presper Eckert and John W. Mauchley. A few might think of work at Harvard or the University of Iowa. Some with a slightly broader world view might mention work going on at the same time in England and Germany. However, it is reasonable to argue that the first computers were really the Difference Engine and the Analytic Engine designed by Charles Babbage in the 1830's.

Babbage's Analytic Engine was a purely mechanical device, based on many of the same ideas used in the automated fabric mills of the day. He had already built a mechanical calculator capable of computing logarithms to six digits of accuracy. The Analytic Engine was a much grander device capable of a variety of primitive operations. A program specifying a complex calculation using these primitives could be encoded on a set of punch cards that would then be used to run the computation on the engine. This is what separated it from the other mechanical calculators of the time. The calculation to be performed was not designed into the machine, nor was it determined by human intervention after each step. Instead the calculation was embodied in the program punched on the cards, which was executed without intervention. The result of a computation was even supposed to be printed directly onto another set of cards, rather than having to be read off a set of dials.

The analytic engine was never built. However, Augusta Ada King (Countess of Lovelace, daughter of the poet Byron) now considered the first programmer, did write several programs for it. More recently mathematicians and computer scientists have proven that the ideas behind the design are sound. (William Gibson and Bruce Sterling have also written a great book, The Difference Engine, which imagines what would have happened to society if Babbage's machines had been built, and the information society introduced, in the early 1800's, rather than in the 1950's.)

The First Electronic Computer

Eniac was the first general purpose electronic digital computer. It consisted of several dozen large calculating units built from thousands of vacuum tubes. The calculating units were wired together so that the values calculated on one unit could be passed as input to the next unit. The calculations performed by each unit were determined by dial settings. What distinguished Eniac from its contemporaries were its speed (because it was built from vacuum tubes rather than electro-mechanical relays it could execute 5000 additions or 300 multiplications per second) and a feature added after it had already been built: It was possible for the computation to test the results of one operation and perform different subsequent operations depending on the value. This ability to choose among a set of options based on the state of a computation is a crucial aspect of programming, as we will discuss in a few lectures.

(Note: There is some contention that most of the important innovations were actually present in the less well-known ABC (Atanasoff-Berry Computer) constructed at Iowa State University between 1937 and 1942.)

Stored Program Computers

On Eniac, the data being manipulated were stored in the computer's memory, but the program was determined by the physical configuration of the machine: that is, by the setting of dials on the front of the machine, and how different parts were plugged together. The key development of the late 1940's was the realization, generally credited to John Von Neumann, that you could just as well have the program steps fed to the computer as a separate data stream. At each step the computer would read a bit of program data in order to determine what to do next. This is called the stored program computer, and it is the first computer whose design really matched what we now think of as computers. The first computer to use a program store was the Small-Scale Experimental Machine (or Baby Mark I) which was switched on at the University of Manchester in June of 1948. The first fully realized implementation of this idea, the Univac, was also built by Eckert and Mauchley.

Bits and Bytes

An electronic computer is just a big set of circuits and switches. The switches determine the level of current running through the circuits. A given circuit is either on or off depending on the voltage across it.

Being on or off is the smallest distinction you can make; it is the smallest amount of information there can be. Someone (If you know who, please tell me, I've lost the reference.) once said:

"Information is any difference that makes a difference."

and on or off is the smallest difference that makes a difference. On or off, one or zero. It's called a bit, both because it is just that, a little bit, and because "bit" is a contraction of binary digit.

We do arithmetic in decimal, or base ten. The digits are 0 through 9. In binary, or base two, the digits are just 0 and 1. A bit represents one binary digit. But "represents" is the key word here. To the computer it's just a circuit with electricity flowing or not. We are the ones who think of it as representing some value.

Obviously a bit is not a lot of information. Therefore we usually look not at the state of a single circuit, but a group of circuits taken together. Nowadays, the most common way is in groups of eight. A group of eight circuits represents an eight-digit binary number, which is referred to as a byte. How much information is that? We have to look at how binary works in relation to decimal to understand it.

How many possible values can a three digit decimal number have? One thousand, in the range 000 to 999. The rightmost digit indicates a number of ones, the second digit represents a number of tens, and the leftmost digit represents a number of hundreds. These are the values because they are the successive powers of ten starting from zero. Similarly, in a binary number the successive digits represent the powers of two. A three digit binary number is the number of ones, the number of twos, and the number of fours. Since there can only be zero or one of each, there are a total of eight values in the range 000 to 111. If you extend this logic you find that an eight digit binary number can have any of 256 values. The leftmost digit represents the number of 128s.

It quickly becomes cumbersome to write down and manipulate binary numbers. They just look too much alike. It is too error prone. Yet we need a way to describe the state of the computer. Writing down values in decimal is not very handy because the conversion between the two systems is not very natural, since ten is not a power of two. A solution that gained popularity in the early days of computing was to use octal, or base eight. Since three binary digits represent a value from 0 to 7, and so does one octal digit, each digit of an octal number is shorthand for a group of three binary digits. (To hear what Harvard mathematician, turned comedian, Tom Lehrer had to say about octal, click here.)

Octal became popular at a time before everyone had standardized on eight bits as the most logical grouping. To represent a byte in octal you need to use a three-digit octal number in which the leftmost digit is only allowed to be in the range 0 to 3 (since that's all you can represent with the leftover two bits). This is a little awkward. The more common shorthand used now is hexadecimal, or base sixteen, in which each digit represents a group of four binary digits (the 1's, 2's, 4's, and 8's), so that you only need two hex digits to represent a byte. But wait, octal is easy: you just limit yourself to the digits 0 to 7. Where do we get the sixteen digits needed for hex? Nowadays we generally use the digits 0 through 9 and the letters A through F.

Putting it all together, consider the following example: the decimal value 46 is 00101110 in binary. If we group the digits by threes, 00 101 110, we get the octal value 056. If we group the digits by four, 0010 1110, we get the hex value 2E. Here is a shockwave demonstration that will take you through this conversion step by step. If you'd like a more complete and general demonstration of how to convert back and forth between the various bases, try this one.

Now don't panic, you don't need to be too proficient at all this. In fact, you probably won't actually use binary, octal, or hexadecimal in the rest of this course. But it's important that you understand that they are out there and why they are important.

Machine Language

Besides representing a bit of information, the current running through a circuit can be used to control a switch and thereby change the current running through another circuit. With enough circuits and switches wired together in the right way, you can imagine that it is possible to take the values on two groups of eight circuits and make it so that the values on a third group of eight circuits represents the sum of those two values. To the computer it is just alot of switches turning on and off; to us it's addition.

Now imagine another set of switches and circuits that does what we think of as multiplication. Then, take it a step further by adding a control switch that determines if three banks of circuits (two input banks and one output bank) are connected to the addition circuits or the multiplication circuits. If we had two control wires, we could select between four sets of computing circuits performing four different mathematical operations. With eight control wires we could choose between 256 sets of computing circuits. Now, think of the bank of control wires as just being another value. You have one number controlling how two other numbers are treated. One number is a program instruction and the other two are data values being manipulated. But they're all just numbers in the computer. In fact they are all just voltages across circuits.

At the lowest level, that is how the computer sees all programs. It is the only kind of program the processor at the heart of the computer really knows how to run: a stream of numbers, some data and some instructions determining how to treat the data. This is machine language.

Of course, since this is all just a game of interpretation, there is no reason that the data circuits must be representing numbers. They could just as easily represent letters of the alphabet, and the computing circuits could be doing character manipulations. To the computer it is all the same. We just have to build the circuits so that they do the right thing given the representation.

In addition to the circuits in the processor doing the actual calculations, the computer has a huge number of circuits that just hold values until they are needed. These circuits are the memory. And some of the control instructions and circuitry are used to bring values in and out of memory, to and from the processor's circuits. Each position in memory is identified by a number (or memory address. Putting this number on the memory control lines makes that the corresponding location the "current" address under consideration.

Assembly Language

Just as it's hard for a person to manipulate alot of binary numbers, it is hard to write programs of any size in machine language. The solution was to first write a program out in long hand, writing, for example, ADD instead or 2E, which might be the code for addition. These English-like instructions are known as mnemonics. The programmer could do the program design in this notation (called assembly language) and then use a table to convert it to machine language. Eventually, someone realized that converting the string of characters ADD to the number 2E is just another sort of calculation that the computer could do. So a program was written to take a string of number representing the characters of an assembly language program and to produce as output the string of numbers of the corresponding machine language program. This program was called an assembler.

Von Neumann's loop is now closed. The program is not only stored in the computer in the same way as data, but it is manipulated as data by another program!

High-Level Languages

Assembly language is still very low-level. Its commands correspond one-for-one with the computations that the processor is designed to carry out in a single step. Commands must be given explicitly to move values in and out of memory, and only a small computation can be performed at each step. Consider a (typical) machine in which the two inputs to a computation must be in special memory positions called registers inside the processor (a typical modern processor might have about sixteen general purpose registers) and the output goes back into a register specified as the third argument. Then if we want to add the values in memory cells 1 through 3 and put the result in cell 4, we'd have to write something like:

LOAD M1, R1 ; copy data from address 1 to register 1. LOAD M2, R2 ; copy data from address 2 to register 2. ADD R1, R2, R1 ; Add values in R1 and R2, put result in R1. LOAD M3, R2 ; Get value from address 3 into R2. ADD R1, R2, R1 ; Add values in R1 and R2, put result in R1. STORE R1, M4 ; copy result from R1 to address 4.

(The text on each line, following the semicolon, is a "comment" for a human reader.)

Writing a spreadsheet, a word processor, or a missile guidance system in such a language is a daunting task. Within a few years of Eniac's being switched on, computer scientists had already begun developing high-level programming languages which let you write programs in a much more natural notation. They let you use names to stand in for memory addresses, and you could write complex mathematical formulas. It was the job of the compiler to convert the higher-level program to assembly language, which would in turn be handled by the assembler.

The original program in the higher-level language is called the source program. The program produced by the compiler is the object program the code in each is referred to respectively as source code and object code

The first popular high-level language was Fortran, whose name is short for Formula Translation. In Fortran you could write that addition just as you'd like:

sum = x1 + x2 + x3

Since the 1950's there has been an endless stream of high-level languages: Fortran, Lisp, Algol, RPG, PL/1, APL, C, Pascal, SmallTalk, Prolog, Modula, ML, C++, Dylan, Java... Each has added new features and its own way to think about programming. Each has its good points and its bad points, its adherents and its detractors. Language preference is a religious issue among programmers.

A Bit About Java

Of the two changes made in CS 5 in 1996, you may be suspicious about the change in language in particular. You've almost ceratinly heard about Java. It has something to do with the world-wide-web, and you can't get hotter than that right now; even the popular press talks about Java. So, you may think we are just latching on to a new fad in computing. But rest assured that both changes in the course were made for only one reason: to improve the quality of the course and the skills of the students who complete it.

Java is really a variation of C++. Syntactically (that is, what a program looks like on paper), the languages are nearly identical. However, C++ was a very free-wheeling language. Allen Holub wrote a book for advanced C and C++ programmers called "Enough Rope To Shoot Yourself In The Foot". The book opens with the following comment:

The title of this book describes what I consider to be the main problem with both C++ and C: the languages give you so much flexibility that, unless you're willing and able to discipline yourself, you can end up with a large body of unmaintainable gobblygook masquerading as a computer program. You can do virtually anything with these languages, even when you don't want to.

What he means is that in C and C++ alot of expressions that you type by accident, when you really mean to write something else, are nevertheless likely to be accepted by the compiler because there might be some context where you really meant what you wrote. Java simply puts some reasonable limits on what you can write. Expressions that might make sense one time in one hundred, but are really mistakes the other ninety-nine times are no longer acceptable. This makes it a much better choice for the beginning programmer.

Applications and Applets

All that said, it is true that there is alot of hype about Java, because it was designed for developing applications to be run on the world-wide-web. Java is a little different from most programming languages in that it can produce two different kinds of programs: applications, and applets. An application is just an ordinary sort of program that you are used to. It's the kind of program you store on your computer's hard drive and run by either typing its name at the command line, or by clicking on an icon with the mouse. An applet is a program that does not generally reside on your computer and is not launched explicitly. Instead, it is embedded as a part of a page on the world wide web. When you load a page containing an applet, the program is downloaded into your browser along with the rest of the page, and it controls what you see on part of the page.

To see an example of an applet in action, take a look at the cool three-dimensional molecule renderer at:

http://www.mbi.ucla.edu/people/legrand/pdb.html

Unfortunately, as with any skill, with programming you must walk before you can run. And while Java makes it easier to program fancy graphical programs than most languages, they still require a fair amount of background to develop. Therefore, in this course we will be talking about applications, as they are easier to understand and build.

Note that the line between applications and applets can be blurred. We have written a Java applet that can run Java applications. This will enable you to test the sample application programs in the notes without having to copy them over to Orion and compile them. Instead you will be able to try them out from within the notes.

The Art of Programming

Programming is a skill. It can be taught. But it is also an art. There is a difference between a bad program and a good program, even if they both work. Like any skill or art, it takes practice. Most of the programming we will do in this course is the equivalent to playing scales. By the end of the course you will just be starting to come into your own as a programmer. You will have the tools to write simple programs as needed in your technical courses. But to really be a programmer you will have to do a lot more programming over the next four years. Our goal in this course is to make it interesting and enjoyable enough that you want to.

Computers Are Dumb

One thing that you may find frustrating in the beginning is that computers are just unerringly, mind-numbingly, stupid. They may seem smart, but it's mostly an illusion. In the beginning this is going to get in your way alot.

The problem is that you've spent years learning how to abstract, and now you need to bring yourself down to the computer's level. Why? Because if you write a program at the level that you think is natural, you'll be leaving out millions of steps that the computer cannot infer. It will only do exactly what you tell it.

When I took my first programming course in high school, my teacher, Mrs. Staub, sent us home the first night with an assignment to write out a detailed set of instructions for making a peanut butter and jelly sandwhich. When we returned the next day she collected the recipes and selected a few at random.

She took out a tray with jars of peanut butter and jelly, a loaf of bread, a plate, and a couple of knives. She picked up a recipe and read the first step aloud: "Take two pieces of bread out of the bag and put them on the plate." She plunged her hand through the unopened plastic wrapper and pulled a couple of slices of bread through the hole she'd torn. "Pick up a knife. Lower the knife into the peanut butter." She did as she was told, but the knife kept bouncing off the lid of the unopened peanut butter jar.

You get the idea. It was a simple but effective demonstration.

Programs Are Like Recipes

I've heard many variations of this story from other programmers over the years. Instructions for brushing your teeth is a popular choice. I think, though, that Mrs. Staub's choice of the peanut butter and jelly sandwich recipe was particularly apt. The reason is that, as an intellectual construct, a program is more like a recipe than anything else. Not only because it is a series of step-by-step instructions on how to do something, but because of the natural way in which it is constructed.

A cookbook author doesn't write out a recipe in full detail immediately. The process begins with imagining the result. Then the author sketches out the general outline. What are the major components of the dish and how are they prepared? For each major component the process is repeated as though that were the entire dish. Round-by-round the level of detail is increased until a true recipe emerges. The process begins away from the kitchen. It moves into the kitchen only when the author already has a pretty concrete view of the recipe developed. Once the overall structure of the dish is understood, one part of the dish (the sauce, for instance) may be focused on and refined before the others are even attempted.

Programming is much the same. Given a problem, the key is to break it down into a set of smaller problems. The process is repeated until the problems are so small as to have obvious solutions. Most of the programming does not happen at the computer. It happens at your desk, in the car, in the supermarket. You only begin typing when the problem is well understood. Finally, if the problem has been broken up properly, you can develop some pieces to completion while other modules are left unattempted.

Conclusion

Well, that's it for background. Of course we haven't seen a single line of Java yet, but there is plenty of that to come. Perhaps this week's notes were too philosophical for some of you. But hey: this is HMC, you're supposed to take one-third humanities!

Last modified August 29 for Fall 99 cs5 by mike@cs.hmc.edu

	This page copyright ©1998 by Joshua S. Hodas. It was built with Frontier on a Macintosh . Last rebuilt on Sat, Sep 5, 1998 at 1:37:22 PM.
http://www.cs.hmc.edu/~hodas/courses/cs5/week_01/lecture/lecture.html