RUNNING LARGE JOBS

1. What kinds of jobs tend to be CPU intensive?

Phylogenetic analysis

Distance matrix methods (eg. Neighbor Joining, FITCH) usually require negligible time; time increases roughly linearly with the number of sequences
Parsimony (eg. DNAPARS, PROTPARS) - moderately efficient; time increases exponentially with the number of sequences
Maximum likelihood (DNAML, PROTML, fastDNAML) - very slow; time increases according to a FACTORIAL function of the number of sequences.

Sequence database searches - time proportional to product of sequence length and database size; use high k values to speed up search; protein searches faster than DNA.
Multiple sequence alignments - cluster type alignments increase roughly linearly in proportion to the number of sequences
Retrievals of large numbers of sequences - linearly-related to number of sequences
any sorting operation with a large number of items - efficiency depends on sort algorithm
SAS - some SAS jobs can take a long time

2. What kinds of jobs should never be CPU intensive?

If the following applications are eating up significant percentages of CPU time, they are not functioning normally, and are probably runaway apps.

bioLegato - bioLegato by itself does almost nothing. One exception is when reading in enormous sequence files eg. large numbers of sequences of very long sequences. This can take a few minutes. (Note that biolegato will always appear as 'java' in the output from 'ps' and from 'top'. This is because bioLegato is a Java program, and runs in a Java Virtual Machine (JVM).
Most user apps (eg. word processors, mailers, spread sheets, drawing programs).
Desktop tools (eg. GNOME desktop tools such as the gedit editor, nautilus filemanager etc.)
Most Unix commands
Web browsers - For short bursts Firefox and Mozilla can be very CPU intensive, but this should not persist more than a minute or two.

3. Some bad habits to avoid

Clicking repeatedly when an application hangs - All this does is to make things worse. Most often applications slow down because of network slowdowns, which you can't do anything about.
Logging out with a screenful of windows - Always close each window before logging out. Sometimes, apps don't terminate and continue running even after you logout.
Running numerous CPU-intensive jobs simultaneously - Running 5 fasta jobs at once will simply cause all jobs to take 5 times as long to run. If the jobs use a lot of memory, they will run much more slowly than that because they will be repeatedly swapping in and out of memory. Run big jobs sequentially.

3. Start small and work your way up.

If you are working with a very CPU-intensive program, or a large number of sequences (eg. greater than 20) or both, you should try to get an idea of how long your job will take. The BIRCH implementation of bioLegato records the time used by most of the CPU-intensive programs and appends it to the output (eg. .outfile, .fasta). For example, a run of DNAML4 with increasing numbers of genes for PAL (phenylalanine ammonia lyase) gave the following times:

5 sequences - Execution times on goad: 4.0u 0.0s 0:05 79% 0+0k 0+0io 0pf+0w
10 sequences - Execution times on goad: 47.0u 0.0s 0:49 95% 0+0k 0+0io 0pf+0w
15 sequences - Execution times on goad: 144.0u 0.0s 2:31 95% 0+0k 0+0io 0pf+0w

The times listed left to right are:

User time - CPU time in seconds required used by the program eg. DNAML4
System time - CPU time in seconds used by the system to run the program, usually negligible
Elapsed time - real time elapsed between start and end of job.

4. Monitoring your jobs

Which jobs are eating up the most CPU time on the machine I am currently logged into?

The top command gives you a real time picture of the most CPU intensive jobs currently running on the server you are logged into. Type 'top' at the command line:

load averages:  0.68,  0.39,  0.32                                     12:56:44
229 processes: 227 sleeping, 2 on cpu
CPU states: 71.2% idle, 26.2% user,  2.6% kernel,  0.0% iowait,  0.0% swap
Memory: 16G real, 12G free, 1578M swap in use, 21G swap free

   PID USERNAME THR PRI NICE  SIZE   RES STATE    TIME    CPU COMMAND
 18950 umchan94    1   1    8   10M 9600K cpu/3    0:27 18.10% dnaml
 18870 umchan94   21   1    0  394M  142M sleep    0:23  0.71% java
 22575 kdc        3   1    0  158M  104M sleep   69:16  0.66% mozilla-bin
  9079 umchan94    1   1    0   93M   92M sleep  114:36  0.60% Xvnc
   786 root       2  59    0 4784K 3664K sleep  654:59  0.46% automountd
 28905 umchan94    3  59    0  182M   94M sleep   46:17  0.16% mozilla-bin
 18952 umchan94    1  59    0 5032K 2232K cpu/2    0:00  0.14% top
 13432 groff      1   1    0   68M   32M sleep  508:21  0.13% mixer_applet2
 14583 operac2    1  40    0   69M   33M sleep  467:07  0.13% mixer_applet2
  9170 umchan94    1   1    0   68M   33M sleep  209:16  0.13% mixer_applet2
 16400 umchan94    2  44    0   80M   27M sleep    0:02  0.12% gnome-terminal
  9129 umchan94    1  27    0   63M   31M sleep    6:57  0.05% metacity
  9133 umchan94    1   1    0   97M   62M sleep   50:29  0.05% gnome-panel
 20832 umchan94    2  41    0   89M   27M sleep    7:53  0.05% gnome-terminal
 13430 groff      1  57    0   68M   11M sleep  131:59  0.04% gnome-netstatus

This display is updated every few seconds in the terminal window.

To quit, type 'q'.

load average - CPU load averaged over several time increments (usually a few seconds). Even with lots of users doing normal tasks, this is seldom greater than 1.0. CPU intensive jobs like DNAML can push it much higher. Above a load average of 4, system performance noticeably degrades.

PID - process ID. This is the number you need to know to kill a job.

USERNAME - who owns the job

NICE - governs the percentage of CPU time a job can use. Low NICE values are needed by user apps such as Web browsers or word processors because things like cursors and scrollbars need to work instantly. Number crunching programs should run at higher NICE values so that they don't impede the overall performance of the system. bioLegato launches most CPU-intensive programs with a suitably high nice value. (See man pages for 'nice' and 'renice' commands).

SIZE - memory used by an application.

TIME - Time elapsed since. Most apps use up far less than 1 minute. A Netscape session can use up several minutes.

CPU - percentage of CPU time being used. Note that DNAML eats up a lot of CPU time because it does a very exhaustive set of calculations in constructing a phylogeny. The 'java' job also shown abouve is actually biolegato.

COMMAND - the command being run

The top command has a lot of great options. For example, you can sort jobs by memory used, or list only jobs under a given userid. You can even kill jobs directly in top. Type 'man top' to read about them.

Which jobs are currently running under my userid on the machine I am currently logged into?

The ps command with no arguments tells which jobs are running in the current shell (the current window):.

ps
   PID TTY      TIME CMD
 28018 pts/19   0:00 csh
 25266 pts/19   0:00 csh
 28022 pts/19   0:02 gde

ps -u userid tells which jobs are running under a given userid on the host you are logged into

ps -u frist

   PID TTY      TIME CMD
 25225 pts/16   0:00 dsdm
 25252 ?        0:01 dtprinti
 27938 ?        0:01 xman
 25251 ?        0:01 clock
 24552 pts/11   0:15 Xvnc
 27919 pts/18   0:00 sh
 25250 ??       0:01 dtterm
 24555 pts/11   0:00 Xsession
 24643 ?        0:05 dtwm
 25241 pts/18   0:00 dtsessio
 28018 pts/19   0:00 csh
 25249 ?        0:02 dtmail
etc.......

5. Killing unwanted jobs

Notes:

You can only kill jobs belonging to you
You can only kill jobs on the host machine to which you are currently logged in.

To kill a single job

To kill a job just type 'kill -9 PID' For example, to kill the dtmail mailer
kill -9 25249

To kill multiple jobs

A list of jobs can be included in a single kill command:
kill -9 25249 25252 27938

If your terminal screen is frozen, kill jobs remotely

If your screen locks up, you can log into the same host machine from another terminal and kill jobs from there. If you see one job eating up a lot of CPU time, kill that first. It is probably the one that caused the screen to freeze up in the first place, and killing that job will usually free up the screen.