#--------------------- Import Foreign Format ( 9/02/09) ------------------- item:Import Foreign Format #itemmethod:cp $INPUTFILE OUTFILE.tmp; readseq -a -fGenBank -o=OUTPUTFILE OUTFILE.tmp; $RM_CMD -f OUTFILE.tmp # FASTA files produced by NCBI have definition lines in the form # >gi|169079|gb|M18250.1|PEADRRB Pea (P.sativum) disease resistance # whereas most FASTA files have a definition line like # >PEADRRB - Pea (P.sativum) disease resistance # Many programs cannot read the NCBI style, or truncate the lines at # unpredictable places. The command below uses sed to change the second # pipe character to a blank so that the gi number is used as a name # and the rest of the line is read as the definition. # >gi|169079 |gb|M18250.1|PEADRRB Pea (P.sativum) disease resistance # This way, at least we have the gi number, and know it is a gi number # This should not affect any other files read by readseq, other than # NCBI generated files. itemmethod:cat $INPUTFILE | sed 's/^\(>gi|[0-9][0-9]*\)\(\|\)\(.*\)/\1 \3/' > OUTFILE.tmp; readseq -a -fGenBank -o=OUTPUTFILE OUTFILE.tmp; $RM_CMD -f OUTFILE.tmp itemhelp:readseq.help itemopen:gde_help_viewer.csh arg:INPUTFILE argtype:text #@argtype:file_chooser arglabel:Name of foreign file? out:OUTPUTFILE outformat:genbank