Beowulf Cluster: User Information
Carnegie Mellon University -- Department of Chemical Engineering
Beowulf Distributed Computer Cluster

Home

    Research

    Hardware

    Software

    Photo Gallery


    User Information

    System News[new news indicator]


    Cluster Status

       

General User Information


=======================================================================
HELP for ChemE Beowulf Cluster

Please send questions/comments/errors/omissions to
the contact address listed at:

  http://beowulf.cheme.cmu.edu/

=======================================================================

The 'help' command does not exist on UNIX systems; what you're
reading now is an in-house script displaying local documentation.

There is extensive Linux documentation available online at

  http://www.linuxdoc.org/

=======================================================================

If you are reading this from the command line, you can
move around in the documents with the following keystrokes

  SPACE        forward on page
  b            backward one page
  q            to quit

The document is also accessible through our web server at

  http://beowulf.cheme.cmu.edu/cgi-bin/unixhelp.cgi


=======================================================================
Chapters in this document are:
[Aug 03 05] -- Chpt 01: Basic Information
[Sep 21 05] -- Chpt 02: Administrative Policies
[Jun 07 05] -- Chpt 03: Manual Pages
[Jun 07 05] -- Chpt 04: General Documentation
[Jun 07 05] -- Chpt 05: Local Software
[Jun 07 05] -- Chpt 06: Parallel Computing
[Jun 07 05] -- Chpt 07: File System Organization
[Aug 03 05] -- Chpt 08: Login Information
[Aug 03 05] -- Chpt 09: PBS Batch System
[Jun 07 05] -- Chpt 10: Getting Help
Each 'chapter' is available as a text file in the directory:

   /store/doc/unix-help

======================================================================
BASIC CLUSTER INFORMATION
======================================================================

a) To log on to the cluster nodes, use the command "ssh nodename".

   Please do not execute simulations on the front-end machine;
   debugging and short interactive runs should be done on your own
   desktop machine or on interactive nodes reserved for this purpose.

   Node names are generated systematically.  The machine named "cXnY"
   is node Y found in cabinet X.  The cabinets have a different number
   of nodes, but are always numbered from 1 to <max> in a cabinet.

   Note that it is (purposefully) made impossible to log on directly
   to any of the nodes without first logging on to the front-end
   machine named "beowulf.cheme.cmu.edu".

b) This computer system is frequently subject to unauthorized
   login attempts from all over the world.  You must use ssh
   (protocol version 2) for all interactive connections.
   However, we also disallow direct access from most sites.
   (see Chpt 8 for details on access restrictions)

   Note: the concept of "shared accounts" is frowned upon.
         Do NOT give away your password to _anyone_ at all.
         This applies for ANY reason; just do not do it.

   Use the command "passwd" to change your password.

   When doing so, please:
     - use a non-dictionary password.
     - change it at least every 6 months or so.
     - read the output from the "passwd" program.  If it suggests
       that your password is a bad one, your password is usually
       crackable in a matter of hours: pick a different one. 

   You do not need a password to connect to any of the nodes once
   you have logged on to the cluster.

======================================================================
ADMINISTRATIVE POLICIES
======================================================================

1) User Accounts

     - user accounts are available to anyone in or related to the
       research groups of Biegler, Hauan, Kitchin or Sholl.
     - all user accounts are individual, never shared.
     - do not give out your password to anyone; if they think
       they need or should have a user account, let them email us.

2) Remote Login Access

     The cluster is only accessible using ssh with protocol v2 or
     higher.  Free clients are available for all operating systems.

     Direct access is also filtered by an explicit list of
     IP addresses and domains.  All connections are permitted from 
     CMU machines (.cmu.edu) and the Pittsburgh supercomputing
     center (.psc.edu). 

     Selected DSL domains are also allowed if they:
       - provide a static IP address
       - are locally or regionally limited to Pittsburgh
         example: Verizon is enabled by ".pitt.east.verizon.net"

     The complete list is available as beowulf:/etc/hosts.allow

     Due to large portscanning activity and numerous unauthorized
     connection attempts, general access for large/national service
     providers will NOT be provided.  Typical examples would include
     ".aol.com", ".att.com" and ".comcast.net"; these domains
     represent millions of computers and imply too much exposure.

     If you use one of these providers, you may be able to contact
     them and ask for details as to how they allocate IP addresses
     based on regional info.  If so, email us the info and we'll enable
     login from a subset of the relevant machines.

     The only other alternative is to first log on to an andrew
     machine and connect to the cluster using ssh from andrew.

     (you must use "ssh -2" from andrew to get the right protocol)

3) Disk Usage

     - The cluster should not be used for permanent storage of files.
       In general, all data and result files should be moved to your
       own computers when they no longer are being written to.

     - The cluster fileserver has a redundant (RAID-5) disk array
       which offers some protection against hardware failure. However,
       we do NOT take backup -- complete or incremental -- of user
       data. You should take the necessary precautions to ensure that
       your source code is backed up outside the cluster and that any
       data files generated are moved to your own computer.

     - At present we are not enforcing disk quotas.  This is convenient
       for performance reasons and also allows anyone to temporary
       generate large amounts of data.  However, if the main /home disk
       should go full it will ruin the simulations for everyone.
       Please make sure this does not happen because of you.

       To get a list of your 20 biggest files not accessed the last
       7 days, execute the command:

          find ~/ -type f -atime +7 -ls | sort -n +6 | tail -20

       (you can of course change "7" or "20" to suit your needs)

     - You are strongly encouraged to write temporary data to
       the /scratch partition on execution nodes, in particular if
       data is written continuously.  This will both avoid disk storage
       problems and make your program(s) run faster.  This data is
       available directly from the login machine through the network
       file system in /beowulf/<nodename> (see Chpt 7 for details).

4) Fair CPU Usage

    - The cluster is a multi-user environment where everyone would
      like their calculations to be run as fast and as often as
      possible.  At present, there are no restrictions on the amount
      of resources that may be simultaneously occupied by any one
      individual. please take care to help us continue this policy by:

      (a) using the batch system
      (b) not submitting an excessive number of jobs 
      (c) always leave your batch jobs as "rerunable".
      (d) carefully estimating the resources your job(s) will need.
          this helps the scheduler to achieve maximum throughput.
      (e) not submitting your jobs to a specific (named) node or
          group of nodes.  While it is possible to continuously
          request the fastest nodes available, it is not nice
	  ... and everything is logged, so we will know ...

      Also make sure you do not submit programs with an signal handler
      that traps run control (SIGSTOP, SIGCONT) as this will interfere
      both with systems for load balancing and temperature monitoring.
      (If you don't know what this means, do not worry about it; you
      will not be doing this "by accident".)

======================================================================
MANUAL PAGES
======================================================================

Most Unix commands have a manual page, accessible from 
the command prompt by typing "man name-of-command".

  example: "man man" = overview of the "man" command

Related/useful commands include:

  apropos: "man apropos" = topical search in man pages

           Use this when you know what you want to do, but
           need the name(s) of the relevant command(s).

  locate : "man locate" = search for file(s) by name

           If you know the (partial) name of a file, use
           "locate name-of-file" to get it's full path.
           (Please note that locate only indexes files on the 
           local machine, and thus will not list files from users'
           home directories unless run while logged in to the file 
           server.)

The GNU TexInfo pages often contains more detailed information.
These pages are accessible through the "info" command.

While this information is considered 'dense', it is -- by far --
the most accurate/detailed help source available.

======================================================================
PACKAGE DOCUMENTATION
======================================================================

Most software in Fedora Linux comes in packages, several of which
include substantial documentation and examples in addition to the
manual pages.  These files are usually located in a separate
subdirectory under one of the directories:

   /usr/doc
   /usr/share/doc

To list all documentation pages relevant to a specific command,

   1. "which name-of-command"      = get full path to relevant file
   2. "rpm -qf /full/path/to/file" = get name of file's package
   3. "rpm -qd name-of-package"    = list package documentation files

Almost all software packages also have "info" and "manual" pages
(see "info info" and "man man" for how to get started).

======================================================================
LOCALLY INSTALLED/MAINTAINED SOFTWARE
======================================================================

Local software on the computer cluster is installed in a special
program distribution system residing under the /store directory.

Installation is flexible, but the general directory structure is

   /store/bin		    executables / runtime wrappers
   /store/man		    manual pages
   /store/lib		    library files
   /store/include	    header files

   /store/name-of-package   the actual software package

   /store/doc		    extra documentation
   /store/examples	    examples of use

   /store/store		    the software repository itself
                            (you should never need to go here)

If you have a software package used by several people, feel free
to contact us and ask to have it installed on all machines.  
This is particularly true if the same files are needed by
multiple members of a research group; shared installations are
strongly preferred over everyone keeping one (or more) private
copies in their individual home directories.

======================================================================
SPECIAL PARALLEL/CLUSTER SOFTWARE
======================================================================

A few special commands are required for parallel applications:

    execution	->  read "man mpirun"
    MPI		->  read manual pages for "mpi" + "type"

    where "type" is one out of "cc", "CC", "f77" or "f90"
    (with no spaces, i.e. "man mpiCC" for a C++ compiler)

The default compiler for MPI is gcc; you may find this slow.
A better alternative is the Intel Compilers  (icc, ifort),
but this will require some more work on your behalf.

======================================================================
CLUSTER DIRECTORY STRUCTURE
======================================================================

1. Home directories: found on the file server as:

     fs:/home/username

   and available on all nodes as "/home/username".  This setup has
   one important consequence: if you are running the same program
   on two or more nodes, make sure they are not writing to the
   same directory and filename; else they will interfere with each
   other and almost certainly ruin all your results ...

   The solution is simple -- use different directories or the
   special /scratch partition for data storage.

2. Temporary/local directories

   All nodes have a "/scratch" partition which is free
   to use for everyone.  The same directory is also available
   on all machines as "/beowulf/nodename"; i.e. "/scratch/mydir"
   on node "c1n01" would also be "/beowulf/c1n01/mydir" on
   the login server "beowulf" (or any other node).

   In fact, any program that semi-continuously reads and writes data to
   a file should ALWAYS use the /scratch partitions, both for its own
   sake and for everyone else's as the write speed to shared network
   drive is substantially slower than to local disk.

     --> see /store/examples/pbs for how to write the proper scripts.

   Also, do "man 2 chmod" and read about the "sticky bit" for
   technical details about how your files (automatically) will be
   safe.

   Note: old files in the /scratch partitions will be deleted.
         See /scratch/README on any node for the exact policy.

======================================================================
LOGIN INFORMATION
======================================================================

1. When you log in to the front-end machine (beowulf), you will get
   a list of the 5 nodes with lowest current usage.  The format is:

      c1n12         up   2+01:08,     0 users,  load 0.00, 0.00, 0.00
      c1n01         up   8+05:45,     0 users,  load 0.98, 0.97, 1.00
      c1n04         up  10+07:23,     0 users,  load 1.65, 1.52, 2.18
      c1n11         up  15+05:47,     0 users,  load 1.98, 1.97, 1.91
      c1n05         up  15+05:47,     0 users,  load 2.00, 2.00, 2.00

   The above output is generated with the script "node-load N" where
   N is the number of lines in the output (default=5).  Only the
   login machine has the ability to "see" nodes in multiple cabinets;
   any computing node will only report statistics for other machines
   within the same logical network.

   The "load" on a unix machine is equal to the average number of
   processes in the "run queue"; i.e. executing or waiting for
   cputime.  Although each machine in principle could handle a load in
   excess of 200, they are most efficient if the load is equal to the
   number of processors in each machine.  For our cluster, this number
   is typically 2 or 4.  If you have more than N processes running on
   an N-cpu machine, all jobs will still execute, but the total
   throughput will go down.  The batch system takes care of this.

2. When you log in to one of the nodes, you get a line on the form:

      mem=932.8/3464.9/4250.0 swap=0.0/2056.3 cpu=2.00 1.89 1.83 3/92 2676

   This is a (very) brief status monitor for the machine.

       mem=x/y/z    memory status : in-use/"free"/total [mb]
                                    ("free" = free+cached+buffered)

       swap=x/y	    virtual memory: in-use/total [mb]

       cpu	    3x cpuload (1m, 5m, 15m)
		    number of processes (running/total)
		    last PID (process identification number)

General hints for maximum performance:
  - make sure the node has enough available memory for your job.
  - swapping is a (very) bad thing for speed.
  - if the load is high (or there are many processes running),
    you should consider to log in to a different node.

======================================================================
BATCH SYSTEM
======================================================================

The cluster has a fairly general batch system installed called PBS;
the Portable Batch System.

To get started, type "info pbs", "man qsub" and "man qstat".

Current status for the batch system is also available on the cluster
web site (http://beowulf.cheme.cmu.edu) under "Current Status".  At
the command prompt, the alias 'qs' will show you the status of the
batch queues along with any currently running batch jobs owned /
submitted by you:

=====
server: beowulf

Queue            Max Tot Ena Str Que Run Hld Wat Trn Ext Type
---------------- --- --- --- --- --- --- --- --- --- --- ----------
reject             0   0 yes yes   0   0   0   0   0   0 Execution 
short              0   0 yes yes   0   0   0   0   0   0 Execution 
long               0   0 yes yes   0   0   0   0   0   0 Execution 
hog_s              0   0 yes yes   0   0   0   0   0   0 Execution 
hog_l              0   0 yes yes   0   0   0   0   0   0 Execution 
q_feed             0   0 yes yes   0   0   0   0   0   0 Route     
=====


(a) There are 6 queues
       routing queue: determines where to send jobs based on the
                      amount of cpu time & memory requested by the
                      user.

       q_feed: name of the routing queue.  Jobs typically live here
               less than 1 second.

       execution queues: the jobs are run here

         short : cpu time < 24 hr; memory < 500 mb
         long  : cpu time > 24 hr; memory < 500 mb
         hog_s : cpu time < 24 hr; memory > 500 mb
         hog_l : cpu time > 24 hr; memory > 500 mb

       special execution queue: rejects unspecified jobs

         reject: jobs submitted without resource constraints
                 will end up here & get killed within 2 minutes.

(b) Example batch submission command.

    Suppose I have a script 'batchjob' that I know will run for
    maximum 4 hours and use no more than 200mb of memory:

      command: qsub -l cput=4:00:00,mem=200mb batchjob

    You must always specify either cputime or memory size (or both);
    any job submitted with no resource constraints will be rejected.

    Batch system defaults:
      - memory : 100 mb  (if you only specify cputime, mem limit = 100mb)
      - cputime: 8 hrs   (if you only specify memory, cpu limit = 8 hrs)

    Note #1: Any job exceeding its limits will get KILLED by the system.

             --> It is generally a good idea to "overestimate" the
                 system resources required by maybe 10-20%.  That
                 said, you don't want to add (way) too much since
                 there is a mild preference in the job scheduler to
                 pick short/small jobs first in preference of
                 long/large ones.

    Note #2: It is possible to change the resource requirements for
             running jobs through the 'qalter' command.  however, to
             _increase_ the amount of any resource requires system
             manager rights.

             --> If you need this, please email the administrator(s)
                 at the address listed at http://beowulf.cheme.cmu.edu
                  
    Note #3: All batch jobs are executed from your home directory.
             read "man qsub" or see example script(s) in:

               /store/examples/pbs

             for possible ways to deal with directory issues.

    Note #4: We have implemented a local set of restrictions to
             control job resources more tightly than the standard
             PBS system; this is an attempt to avoid inefficient
            use of system resources (particularly memory).

             (1) "-l" resource requests MUST be on the command line.
                 --> any PBS commands in your scripts are ignored.

             (2) All nodes have property fields that describe their
                 available "per-cpu" memory. All batch jobs are
                 automatically assigned only to the subset of nodes on
                 which they can run without the per-job memory
                 requirement exceeding the per-cpu amount available.
                 This guarantees that we avoid swapping.

                 Consequences:
                 - you do NOT need to supply node properties yourself.
                 - your jobs will WAIT until they can run without
                   possibly swapping.  This makes it more important
                   (for you!) not to overestimate memory requirements.

             (3) Large-memory jobs (currently > 1gb per job) require
                 the use of a new command-line argument "-vcpu=N".

                 To submit a 50hr min job w/ 1.4GB of memory, use:

                   qsub -l cput=50:00:00,mem=1400mb -vcpu=2 script

                 This will allocate (and wait for!) a node where you
                 can allocate 2 CPUs and a total of 1400mb memory.

                 You could also submit this job as:

                   qsub -l cput=50:00:00,mem=1400mb -vcpu=3 script
                 or
                   qsub -l cput=50:00:00,mem=1400mb -vcpu=4 script

                 In these cases pbs will wait for a node where it can
		 allocate 3 or 4 CPUs, respectively (and 1.4GB mem).
  
                 ===> in general, use "-vcput=2" for jobs up to 2GB.

                 Your job will NOT gain any execution speed from
                 requesting more "virtual" CPUs than necessary; your
                 job will just prevent the CPUs from being used by
                 someone else.

(c) Upon submission, the batch system will reply with the request ID.
    When the job is completed, 2 files will be created in the
    directory from which you submitted the job:

      batchjob.oNNN : output from 'batchjob' with request ID NNN
      batchjob.eNNN : errors from 'batchjob' with request ID NNN

(d) The batch system is fairly robust with respect to error
    conditions.  If the file server goes down while your job is
    running, your job will temporarily be suspended unless running
    against the local /scratch partition.  Once the server gets back
    online, all jobs will continue from their previous state.

    If the execution node goes down, any job lost will be
    automatically restarted on the first available node assuming
    you have not used the "qalter" command to mark the job as
    "not rerunnable."  That said, automatic resubmission will not
    happen until the original execution host is back online.

    If your job is stuck on a dead node, you may rerun it with
    the command "qrerun -W force JobId".  You can only do this if your
    job is set to be "rerunable"; otherwise you have to restart it.

======================================================================
GETTING HELP
======================================================================

The cluster is a research resource and has no support contracts; 
however, limited assistance is offered on a voluntary basis.

NOTE: before reporting any problems, please seek help within your own
      research group.  Then think about it and try yourself 
      ... and only THEN email the cluster administrators at the
      contact address listed at http://beowulf.cheme.cmu.edu/

General guidelines for reporting problem:

  (a) Briefly describe the problem:
        - What are you trying to do.
        - What happens ... and why is this wrong.
        - Which machine(s) do(es) it happen on.
        - Do you think it is an error in the cluster configuration
          or simply something you don't understand how to make work.

  (b) Include a verbatim log of ALL error messages:
        - Cut and paste the exact commands + output.

  (c) Include a stepwise description on how to reproduce the problem:
        - If the problem involves any of your personal files, please
          send the name of the directory and make sure all the files
          in there are readable for a normal user.
        - If the problem involves files in multiple directories, 
          add them all to a "tar.gz" file ("man tar", "man gzip")
          and attach them to your email.
        - Include the sequence of commands needed to reproduce
          the problem and/or errors.  If you have any special
          environment variables set, state what they should be.

  If you do not follow these very simple rules to help us minimize
  time spent on support, you risk that your request is "delayed"
  for quite some time.  You are also likely to get a reply along the
  lines of: "please submit a proper problem report."


Thu Sep 22 09:10:12 2005