HMC Homepage      CS Home

How to use mpi on the graphics machines

MPICH-1.2.0 has been installed in /usr/local/mpi, which is common to all the graphics workstations in Beckman 105 as well as turing. It has been configured to use the graphics machines as hosts for the distributed processes using the ch_p4 communication device with ssh.

This last part about ssh is causing some confusion. In order for mpirun to communicate with the various hosts without prompting you for a password for each connection, you _must_ configure your personal ssh2 settings to use port 3003 and have either a NULL passphrase for public key authentication, or have a non-NULL passphrase and run an ssh-agent in the shell from which you will launch mpirun.

WARNING: Changing the default ssh2 settings to use port 3003 will cause ssh to give the following error on most machines:

FATAL: Connecting to machinename failed: Connection Refused

This can be solved by executing the command ssh machinename -p 22 to explicitly ssh to port 22.

First, your home directory needs to be world executable. If you need help checking this or changing the settings feel free to ask a CS consultant or staff member.

Now, you must set up a public / private key pair. To do this:

  • EITHER use the Prof. Kellers script to set this up for you:

    source /cs/cs156/turing-mpi-setup

    It is recommended that you simply press enter for the passphrase, as this will set it to NULL and you will therefore not be required to run an ssh-agent in order to authenticate without entering your passphrase. Be careful since this script deletes your current .ssh2 directory. If you have key information in this directory that you wish to save, such as customized settings, try doing it yourself.

  • OR do it yourself:

    • Create a private public key pair for use with ssh2. Run ssh-keygen, no arguments, and enter a passphrase. It is recommended that you simply press enter for the passphrase, as this will set it to NULL and you will therefore not be required to run an ssh-agent in order to authenticate without entering your passphrase. ssh-keygen makes two files, ~/.ssh2/id_dsa_1024_a (your private key) and ~/.ssh2/id_dsa_1024_a.pub (your public key).

    • cp ~/.ssh2/id_dsa_1024_a.pub ~/.ssh2/authorized_keys

    • Make a file ~/.ssh2/authorization (which public keys are authorizable) containing the line:

      Key id_dsa_1024_a.pub

    • Make a file ~/.ssh2/identification (which private keys) containing the line:

      IdKey id_dsa_1024_a

    • Make a file ~/.ssh2/ssh2_config containing the line:

      Port 3003

      This tells ssh to use port 3003 by default. Public/private key authentication is only enabled on the sshd2 daemons servicing this port (and these daemons only allow access from hosts within the cs subnet).

    • If you did _not_ enter a NULL passphrase, then you will need to start an ssh-agent in the shell you will use to launch mpirun:

      ssh-agent $SHELL

      ssh-add

Note that if you are using an ssh-agent, it must be started in every shell prior to launching mpirun from that shell. The other steps involved in configuring ssh2 need only to be done once.

At this point, test that you can ssh to the graphics machines without entering a password (try ssh StarScream). If you cannot, then something's not quite right & it'd be better to fix it before proceeding.

Now, to actually run this thing. First, you should know there is a file /usr/local/mpi/share/machines.solaris which lists the hosts in the order they will be used. This is the default file & lists 19 of the graphics machines. Of course, if you start mpirun from turing, turing will host the first process. This default selection can be overridden using the -machinefile option to mpirun. If you make your own file, note it is possible to include a host more than once.

Before you can begin testing you need to acquire the public keys from all hosts that you plan on using. You can do this by manually logging onto every machine which will be hosting your MPI session. Or you can use the script:

/usr/local/mpi/bin/mpigethostkeys

which grabs keys for most of the machines listed in /usr/local/mpi/share/machines.solaris. If you run this script you will simply need to type "yes" a number of times. Note that you may be prompted to enter a password for the box from which you run this script. This is ok. However, none of the other machines should ask you for passwords.

To test that you can actually access all the machines in the default machines.solaris list, from a turing login (if you are on a graphics machine simply ssh onto turing) try:

/usr/local/mpi/sbin/tstmachines -v solaris

If the this seems to hang on a machine, try sshing into that machine and see what happens. If the machine asks you for a password your host key may not be properly configured. If it asks you if you wish to accept the host key, you may still need to get the given machines host key. If the machine dose not respond, that the machine may be down.

To run a test program, try:

/usr/local/mpi/bin/mpirun -np 4 /usr/local/mpi/examples/cpi

Complete documentation can be found following the other link in the qrefs.

mpiJava

mpiJava-1.2-beta2 is installed in /usr/local/mpiJava. To use it, first you must set yourself up to use MPICH as detailed above, then do this:

  • Add to PATH /usr/local/mpi/bin:/usr/local/mpiJava/src/scripts

  • Add to CLASSPATH /usr/local/mpiJava/lib/classes

  • Add to LD_LIBRARY_PATH /usr/local/mpiJava/lib

An example of how to compile and run a program:

javac Life.java

prunjava 4 Life

The ‘prunjava’ script is a wrapper for the MPICH ‘mpirun’ script. The first argument is the number of processors on which the program will be executed.


Copyright (c) HMC Computer Science Department. Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.1 or any later version published by the Free Software Foundation; with the no Invariant Sections, with no Front-Cover Texts, and with no Back-Cover Texts. A copy of the license is included in the section entitled ``GNU Free Documentation License.''

HMC Computer Science Department
Contact Information
Last Modified Monday, 27-Jan-2003 15:27:50 PST