RUNNING LARGE JOBS
1. What kinds of jobs tend to be CPU intensive?
-
Phylogenetic analysis
-
Distance matrix methods (eg. Neighbor Joining, FITCH) usually
require negligible time; time increases roughly linearly with the number
of sequences
-
Parsimony (eg. DNAPARS, PROTPARS) - moderately efficient;
time increases exponentially with the number of sequences
-
Maximum likelihood (DNAML, PROTML, fastDNAML) - very slow;
time increases according to a FACTORIAL function of the number of sequences.
-
Sequence database searches - time proportional to
product of sequence length and database size; use high k values
to speed up search; protein searches faster than DNA.
-
Multiple sequence alignments - cluster type alignments
increase roughly linearly in proportion to the number of sequences
-
Retrievals of large numbers of sequences - linearly-related
to number of sequences
-
any sorting operation with a large number of items
- efficiency depends on sort algorithm
-
SAS - some SAS jobs can take a long time
2. What kinds of jobs should never
be CPU intensive?
If the following applications are eating up significant
percentages of CPU time, they are not functioning normally, and are probably
runaway apps.
-
GDE - GDE by itself does almost nothing. If you see it eating
a lot of CPU time, it's probably a runaway GDE job. One exception is when
reading in enormous sequence files eg. large numbers of sequences of very
long sequences. This can take a few minutes.
-
Most user apps (eg. word processors, mailers, spread sheets,
drawing programs).
-
Desktop tools (eg. CDE tools such as the dtpad editor,
dtfile
filemanager etc.)
-
Most Unix commands
-
Web browsers - For short bursts Netscape can be very
CPU intensive, but this should not persist more than a minute or two.
3. Some bad habits to avoid
-
Clicking repeatedly when an application hangs - All
this does is to make things worse. Most often applications slow down because
of network slowdowns, which you can't do anything about.
-
Logging out with a screenful of windows - Always close
each window before logging out. Sometimes, apps don't terminate and continue
running even after you logout.
-
Running numerous CPU-intensive jobs simultaneously
- Running 5 fasta jobs at once will simply cause all jobs to take 5 times
as long to run. If the jobs use a lot of memory, they will run much more
slowly than that because they will be repeatedly swapping in and out of
memory. Run big jobs sequentially.
3. Start small and work your way up.
If you are working with a very CPU-intensive program,
or a large number of sequences (eg. greater than 20) or both, you should
try to get an idea of how long your job will take. The BIRCH implementation
of GDE records the time used by most of the CPU-intensive programs and
appends it to the output (eg. .outfile, .fasta). For example, a run
of DNAML4 with increasing numbers of genes for PAL (phenylalanine ammonia
lyase) gave the following times:
5 sequences - Execution
times on goad: 4.0u 0.0s 0:05 79% 0+0k 0+0io 0pf+0w
10 sequences - Execution times
on goad: 47.0u 0.0s 0:49 95% 0+0k 0+0io 0pf+0w
15 sequences - Execution times
on goad: 144.0u 0.0s 2:31 95% 0+0k 0+0io 0pf+0w
The times listed left to right are:
-
User time - CPU time in seconds required
used by the program eg. DNAML4
-
System time - CPU time in seconds used
by the system to run the program, usually negligible
-
Elapsed time - real time elapsed between
start and end of job.
4. Monitoring your jobs
Which jobs are eating up the most CPU
time on the machine I am currently logged into?
The top command gives you a real time picture of
the most CPU intensive jobs currently running on the server you are logged
into. Type 'top' at the command line:
last pid: 28426; load averages: 0.83, 0.43, 0.33
213 processes: 207 sleeping, 1 running, 3 zombie, 1 stopped, 1 on cpu
CPU states: 49.2% idle, 50.2% user, 0.6% kernel, 0.0% iowait, 0.0% swap
Memory: 512M real, 292M free, 116M swap in use, 1932M swap free
PID USERNAME THR PRI NICE SIZE RES STATE TIME CPU COMMAND
28401 frist 1 0 4 5120K 3784K run 1:15 47.81% fastDNAml
25154 frist 1 58 0 10M 9496K sleep 1:43 0.98% Xvnc
20679 umkell15 1 58 0 9368K 8648K sleep 15:36 0.55% Xvnc
27273 umkell15 1 58 0 21M 17M sleep 1:01 0.51% netscape
28423 frist 1 41 0 2688K 1416K cpu0 0:00 0.43% top
28422 frist 1 52 0 5808K 4432K sleep 0:00 0.12% dtpad
28022 frist 1 58 0 7592K 5832K sleep 0:01 0.11% gde
25250 frist 1 58 0 6424K 5896K sleep 0:00 0.09% dtterm
20808 umkell15 1 58 0 7728K 6984K sleep 0:09 0.06% dtfile
25247 frist 5 58 0 7464K 7000K sleep 0:08 0.05% dtwm
28118 umkell15 5 58 0 7464K 4520K sleep 1:27 0.02% dtwm
25495 frist 1 52 2 23M 18M sleep 1:00 0.02% netscape
24552 frist 1 58 0 8424K 7912K sleep 0:13 0.02% Xvnc
27920 frist 1 58 0 6144K 5832K sleep 0:00 0.02% dtpad
10294 umzhan05 5 58 0 7512K 6360K sleep 0:41 0.02% dtwm
This display is updated every few seconds in the
terminal window.
To quit, type 'q'.
load average - CPU load averaged over several
time increments (usually a few seconds). Even with lots of users doing
normal tasks, this is seldom greater than 1.0. CPU intensive jobs like
FASTA can push it much higher. Above a load average of 4, system performance
noticeably degrades.
PID - process ID. This is the number you
need to know to kill a job.
USERNAME - who owns the job
NICE - governs the percentage of CPU time
a job can use. Low NICE values are needed by user apps such as Web browsers
or word processors because things like cursors and scrollbars need to work
instantly. Number crunching programs should run at higher NICE values so
that they don't impede the overall performance of the system. GDE launches
most CPU-intensive programs with a suitably high nice value. (See man pages
for 'nice' and 'renice' commands).
SIZE - memory used by an application.
TIME - Time elapsed since. Most apps use
up far less than 1 minute. A Netscape session can use up several minutes.
CPU - percentage of CPU time being used
COMMAND - the command being run
The top command has a lot of great options. For
example, you can sort jobs by memory used, or list only jobs under
a given userid. You can even kill jobs directly in top. Type 'man top'
to read about them.
Which jobs are currently running under
my userid on the machine I am currently logged into?
The ps command with no arguments tells which
jobs are running in the current shell (the current window):.
ps
PID TTY TIME CMD
28018 pts/19 0:00 csh
25266 pts/19 0:00 csh
28022 pts/19 0:02 gde
ps -u userid tells which jobs are running under a
given userid on the host you are logged into
ps -u frist
PID TTY TIME CMD
25225 pts/16 0:00 dsdm
25252 ? 0:01 dtprinti
27938 ? 0:01 xman
25251 ? 0:01 clock
24552 pts/11 0:15 Xvnc
27919 pts/18 0:00 sh
25250 ?? 0:01 dtterm
24555 pts/11 0:00 Xsession
24643 ? 0:05 dtwm
25241 pts/18 0:00 dtsessio
28018 pts/19 0:00 csh
25249 ? 0:02 dtmail
etc.......
5. Killing unwanted jobs
Notes:
-
You can only kill jobs belonging to you
-
You can only kill jobs on the host machine to which you are
currently logged in.
|
To kill a single job
To kill a job just type 'kill -9 PID'
For example, to kill the dtmail mailer
kill -9 25249
To kill multiple jobs
A list of jobs can be included in a single
kill command:
kill -9 25249 25252 27938
If your X-terminal screen is frozen, kill jobs remotely
If your screen locks up, you can log
into the same host machine from another terminal and kill jobs from there.
If you see one job eating up a lot of CPU time, kill that first. It is
probably the one that caused the screen to freeze up in the first place,
and killing that job will usually free up the screen.