TUTORIAL:

DISPLAYING AND MANIPULATING SEQUENCES - Command line


Oct. 21, 2014


This tutorial goes over an example of simple sequence tasks using the command line program NUSEQ. The next tutorial will show you how to run NUMSEQ from the BioLegato graphic interface.

NUMSEQ documentation: $doc/fsap/numseq.txt
BACHREST documentation: $doc/fsap/rest.txt


Mac-OSXMac Users: To open a terminal window, go to Applications -->Utilities --> Terminal

1. Copy sample sequences to your $HOME/sequence directory


{brassica:/home/plants/frist}cd                 do next step if $HOME/tutorials doesn't exist
{brassica:/home/plants/frist}mkdir tutorials     create directory for this tutorial 
{brassica:/home/plants/frist}mkdir tutorials/sequence  

{brassica:/home/plants/frist}cd $birch/tutorials/bioLegato/sequence
the location of birch ($birch)  is /home/psgendb in this example
{brassica:/home/psgendb/tutorials/bioLegato/sequence}cp *.gen $HOME/tutorials/sequence  
copying GenBank files to new directory

return to $HOME directory verify that new files and directories are present
{brassica:/home/plants/psgendb/tutorials/bioLegato/sequence}cd
{brassica:/home/plants/frist}ls -l
drwx------   1 frist    drr     512 Oct 31 10:11 tutorials/
{brassica:/home/plants/frist}cd tutorials
{brassica:/home/plants/frist/tutorials}ls -l
drwx------   3 frist    drr     512 Oct 31 10:11 sequence/
{brassica:/home/plants/frist/tutorials}cd sequence
{brassica:/home/plants/frist/tutorials/sequence}ls -l
-rw-------   1 frist  frist     5404 Oct 31 10:13 X52331.gen
-rw-------   1 frist  frist    10739 Oct 31 10:13 PBI101TD.gen
-rw-------   1 frist  frist     8278 Oct 31 10:13 pBSGUS.gen
-rw-------   1 frist  frist     3674 Oct 31 10:13 PEACAB15.gen

Files with the .gen extension are in GenBank format. Since these are ASCII text files, you can view them in any text editor. Double clicking on a file in the file manager will bring up the file in the default text editor for your bioLegato installation.
 

2. Read PEACAB15.gen in NUMSEQ

NUMSEQ is a program for printing out, translating, and subcloning sequences. It runs at the command line. The main menu handles file input and output. Output can either be to the screen or to a file. In the example, the output file has been called PEACAB15.numseq to indicate that the file contains output from numseq.

Example: Reading and printing PEACAB15.gen with NUMSEQ

3. Parameter menu

The parameter menu controls how the sequence is printed. Type '4' in the main menu to bring up the Parameters menu.

Name: PEACAB15 Topology: LINEAR Length: 822 nt
________________________________________________________________________________
Parameter Description/Response Value
________________________________________________________________________________
1)START first nucleotide printed 1
2)FINISH last nucleotide printed 822
3)NUCCASE U:(A,G,C,T...), l:(a,g,c,t...) U
4)STARTNO number of starting nucleotide 1
5)GROUP number every GROUP nucleotides 10
6)GPL number of GROUPs printed per line 7
7)WHICH I: input strand O: opposite strand I
8)STRANDS 1: one strand, 2:both strands 1
9)KIND R:RNA D:DNA D
10)NUMBERS Number the sequence (Y or N) Y
11)NUCS Print nucleotide seq. (Y or N) Y
12)PEPTIDES Print amino acid seq. (Y or N) N
13)FRAMES 1 for this frame, 3 for 3 frames 1
14)FORM L:3 letter amino acid, S: 1 letter L
________________________________________________________________________________
Type number of parameter you wish to change (0 to continue)

By default, NUMSEQ will print out the entire sequence (from START to FINISH) as a single strand (STRANDS) in 7 groups (GPL) of 10 nucleotides (GROUP) per line. To change parameters, type the number of a parameter, and you will be prompted for a new value. When you're ready to view the sequence with the new parameters, type '0' at the prompt and '5' in the main menu to print the sequence to the screen.
 

Examples:


To view both strands:

8) STRANDS: 2

To translate in 3 reading frames:

12) PEPTIDES: y
13) FRAMES: 3
5) GROUP: 15
6) GPL: 4
NUMSEQ breaks up the sequence into groups of nucleotides, numbering each group. For translation, GROUP must be divisible by 3, because translation is done in discrete codons of 3 bases each. GPL is set to 4 so that the output line will fit on a typical 80-character line.

To limit printing to only part of the sequence eg. bases 200 - 400:

1) START:  200
2) FINISH: 400

To view the opposite strand of the same region:

7) WHICH: o
1) START: 400
2) FINISH: 200
This example illustrates that creating an opposite strand requires two steps. First, we have to specify the strand as 'o' (opposite)  rather than 'i' (input strand). This causes the bases to be complemented. However, if all we do is complement the input strand, then the opposite strand would be printed 3' to 5', because we would be starting at 200 and ending at 400. Therefore, START must be set to 400, and FINISH to 200.