GDE
- File formats
The GDE File --> Open menu is "hardwired"
into GDE, and can open the main file types that are intrinsic to
the GDE program: GDE (.gde), GenBank (.gen), and flat (.flat). Each GDE
interface also has
one or more Import menus, which have been programmed for data types
that are specific to each GDE interface.
The
Open Menu
The
File --> Open menu will only
read files in GDE (.gde), GenBank
(.gen), or flatfile (.flat) format.
GenBank format - GDE can read any file in GenBank format, as defined in
the GenBank release notes from NCBI [
ftp://ftp.ncbi.nih.gov/genbank/gbrel.txt]
. GDE can also save
in a pseudo-GenBank format using File
--> SaveAs. However, this
format loses some of the annotation information in from the original
GenBank file, and in other cases, rearranges data, so that GenBank
files generated by GDE do NOT comply with the official standard.
Examples:
thionin.gen
- file containing two thionin sequences directly retrieved from NCBI.
thionin.gde.gen - thionin.gen was read into
GDE and then saved using SaveAs.
Warning:
Although the differences between these two formats may look minor, they
may break programs that read GenBank files. Therefore, when retrieving
GenBank files using GDE or dGDE, it is best to save them directly to a
file, rather than sending them to GDE and later saving.
|
GDE format - This format is
similar to the ASN.1 format used in
databases. It preserves most of the annotation from the original
GenBank file.
Example:
thionin.gde
- thionin.gen was read into GDE and then saved using SaveAs.
GDE flatfile format - This is a
minimal format, which only retains the name and sequence data. It is
used by GDE to create temporary files, and there is no way to directly
save to this format.
Example:
gde6162_1
- GDE flat file using thionin sequences from above.
The name of each
"sequence" in a GDE flat file indicates the type of data:
# - DNA/RNA
% - protein
@ - mask sequence
" - text
For example, a
gde flat file containing a list of GI numbers for DNA
sequences from Brassica
insularis
(TaxID 69183) might look like this:
"69183.dna.gi
75707982,57638973,57638879,44888690,9719214,2598264,2598263
GDE
The File --> Import
Foreign Format menu calls readseq, which can
translate a number of sequence formats into a pseudo-GenBank format.
The ouput is then read into GDE. It is important to realize that no
other file format contains the complete information found in GenBank
files. Usually, only the name of the sequence and the sequence itself
are retained. Even if this function is used to read in a GenBank file,
the fact that it is processed by both readseq and GDE means that the
full annotation may not be retained.
Note
on circular sequences: The only common sequence file format that
can
specify whether a sequence is circular or linear is GenBank. Always use
this format when working with circular sequences.
|
The File --> Export Foreign Format
menu also calls readseq,
to create files in numerous formats. Again, since GDE can not
necessarily export all of the original annotation, some annotation my
not appear in the exported files.
Example:
thionin.fasta - Fasta-format file
using thionin sequences from above.
|
|
dGDE
Most of what
dGDE does is to send lists of data items to SeqHound, and the results
are returned as second list.
dGDE works with lists
of data, such as GI numbers, TaxID numbers etc. . The File --> Import Namefile menu
translates a list file
into
GDE flat file, and imports the flat file. A list file is simply a file
containing names or numbers, one per line.
Example:
|
|
The
corresponding protein GIs could be retrieved using Retrieve --> protein GIs.
Similarly, the File --> Export
Namefile menu saves the new list of protein GIs as a flat
file, which is then translated into a raw list file.
Example:
|
|
mGDE
mGDE works with
discrete state data, such as molecular markers. The File --> Import
Phylip Discrete Data menu translates a Phylip discrete data file
into
GDE flat file, and imports the flat file.
Example:
Similarly, the File --> Export
Phylip Discrete Data menu saves discrete state data as a flat
file, which is then translated into a Phylip discrete data file.
|
|
Importing Data from a
Spreadsheet
Molecular marker data is often assembled in a spreadsheet, as
illustrated below. Marker data for each species, isolate, or strain
should be in a row. The first column is holds tha name of the species,
isolate or strain, and the remaining columns have the marker data, such
that each column is a different marker (ie. locus).
To use this data in mGDE, it must be saved as a Comma-Separated Value
file (.csv).
If you are using StarOffice or OpenOffice Calc,
- choose File --> SaveAs
- set the File Type to "Text CSV (.csv)".
- Make sure
the field delimiter is a comma
(,).
If you are
using Microsoft Excel
- choose File --> SaveAs,
- Other formats
- set Save as type to "CSV (Comma
delimited) (*.csv)".
For example, if your spreadsheet file was MarkerExample.xls, then save
your data to MarkerExample.csv.
At this point it is a good idea to check your .csv file by looking at
it in a text editor. It should look something like this:
"LR27
",0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,1,0,1,1,1,1,0,0,0,1,1,1,1,1,1,1
"LR29
",0,0,0,0,0,0,0,0,1,0,0,1,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,1,0,0,0,1,0,0,0,0,0,0,1,1,1,1,1,1,1,1,1,1,1,0,0,1,1,1,1,1,1,1
"LR210
",1,0,0,0,0,0,0,0,0,1,0,0,0,1,0,0,0,1,0,1,0,0,0,0,0,1,0,1,0,1,1,0,0,1,0,0,0,0,0,0,1,1,0,1,0,1,1,1,1,0,1,0,0,1,0,0,1,1,1,1
"LR211
",1,0,1,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,1,1,1,0,1,0,0,1,1,0,1,1,1,0,0,1,0,0,0,0,0,0,1,0,0,1,1,1,1,0,1,0,1,1,1,1,0,1,1,1,1,1
"LR212
",1,0,0,0,0,0,0,0,0,1,0,0,0,1,0,0,0,1,0,1,0,0,0,0,0,1,0,1,0,1,1,0,0,1,0,0,0,0,0,0,1,0,0,1,0,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1
"LR213
",1,1,0,0,0,1,1,0,0,0,0,0,0,0,1,1,0,0,1,1,1,1,0,1,0,1,1,0,1,1,1,1,1,1,0,1,1,1,1,1,1,1,0,1,0,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1
"LR214
",0,0,0,0,0,0,0,0,0,0,0,0,1,0,1,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,0,1,0,0,1,0,0,1,1,1,0,0,0,0
"LR215
",0,0,0,0,0,1,1,1,0,0,0,0,0,1,1,1,0,0,1,1,1,1,1,1,0,1,1,1,1,1,1,0,1,1,1,0,1,1,1,1,1,1,0,1,0,1,1,0,1,1,1,1,1,1,1,1,1,1,1,1
"LR216
",1,0,1,1,0,1,0,0,1,1,0,0,0,1,0,0,1,1,1,0,1,0,1,1,1,1,1,0,1,1,1,1,0,1,0,0,0,0,0,0,1,1,1,1,1,1,1,1,0,1,1,1,1,1,1,1,1,1,1,1
"LR217
",0,1,0,0,0,1,0,0,0,0,0,0,0,0,1,1,0,0,1,1,1,1,1,1,0,1,0,1,1,1,0,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1
"LR218
",0,0,1,0,1,0,1,0,0,1,0,0,0,1,0,0,0,1,0,0,0,0,1,0,0,1,1,1,0,0,0,1,0,1,0,0,0,0,0,0,0,1,1,0,0,1,1,1,0,1,1,1,1,1,1,1,1,1,1,1
"LR219
",1,0,1,0,0,0,1,0,0,1,0,0,0,0,0,0,0,1,0,1,0,0,0,0,0,1,0,1,0,1,1,1,0,0,0,0,0,0,0,0,0,1,1,0,0,1,1,0,1,0,1,1,1,1,1,1,1,1,1,1
"LR220
",1,0,1,1,0,0,0,0,1,0,0,0,0,1,0,0,1,1,0,0,0,0,0,1,1,1,0,0,1,1,1,1,0,1,0,0,0,0,0,0,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1
Note: The example above shows that the name field is
enclosed in quotes. These are okay, as quotes will be automatically
stripped out when the file is read by mGDE.
To import this file
into mGDE, choose File --> Import
Discrete Data from CSV file.
The .csv file is read into mGDE, which should appear as shown below.
Note: you could also convert a .csv file into a Phylip file at the Unix
command line using csv2phyl.sh.
|
|
tGDE