introduces some concepts that will help you leverage the power of
to work more efficiently.
1. The Computer: What's under the hood,
when does it matter
2. The Home
Directory: Do everything from the comfort of your $HOME
your files: A place for everything, and everything in its place
4. Text files:
It's actually quite simple
5. Screen Real
Estate: Why one window should not own the screen.
Computing - Any user can do anything from anywhere
The Computer: What's under the hood, and when does it matter
1.1 What is
Unix is an
operating system, that is, an environment, that
provides commands for creating, manipulating, and examining
running programs. But behind the scenes, an operating system also
manages system resources, and orchestrates the running of anywhere
dozens to hundreds of programs that may be running at the same
Some other operating systems with which you may be familiar
MS-Windows, Macintosh OSX. Despite their
differences, all of these operating systems do
the same things, which is to act as the unifying framework within
tasks are performed.
the system of choice for scientific and mathematical work, as well
for enterprise-level systems and servers. This is because Unix was
designed as a multitasking, multiuser, networked system with that
to be reliable and responsive under heavy loads, have 24/7
availability, and be highly secure.
designed as a single-user desktop system, primarily for running
program at a time. Higher-level capabilities such as networking,
multitasking, running several simultaneous users, and server
have all been retrofitted into Windows. Security has long been,
still a serious problem on the Windows platform.
of operating systems are include commercial Unix systems such as
Solaris, and the many different distributions of Linux, most of
are free, as well as Apple's proprietary OSX.
standalone PC: The network is the computer
1.2.1 Every PC is a special case
1.2.2. The network is the computer
- Each computer
a bit different from every other
happens on your PC
- Your data
to be spread out among a number of machines
programs on different machines
- No way
remotely login to most Windows PCs.
PCs actually get backed up?
PC is only one of many ways of using computer resources. This
illustrates the three main functions of computers: File Services,
Processing, and Display. The figure is meant to be Generic. On A
all three functions occur in a single machine. For this reason, a
sometimes referred to as a "fat client".
is no reason that these functions have to be on the same machine.
example, on distributed Unix systems, files reside on a file
processing is done on login hosts, and you can run a desktop
any login host, and the desktop will display on a "thin client".
the thin client does nothing but display the desktop, it
doesn't matter what kind of machine is doing the display. A thin
can be a specialized machine, like a SunRay
terminal, or just a PC running thin client software.
A compromise between a thin client is a fat client is the "lean client".
Essentially, a lean client is a computer that carries out both the
Display and Processing functions, but remotely mounts filesystems
the fileserver, which behave as if they were on the machine's own
drive. Many computer labs are configured in this way to save on
administration work, at the expense of extra network traffic.
Advantages of network-centric computing:
network-centric computing is Google
Google Docs lets you maintain documents, spreadsheets,
online, using any web brower. Your documents stay on the server,
can work on them from any browser on any computer anywhere.
- You can access your data from anywhere
- Full access to the resources of a datacenter from anywhere
- Protection from obsolescence, when you use thin clients
- High availability, because all components are redundant
- There is nothing to lose (eg. memory stick), nothing that
be stolen (eg. laptop)
- Once software is installed, it works for everyone
- Automated backups
and more resources reside on the network. This is now referred to
as "cloud computing":
- Databases - Remote
data in response to queries from local clients.
- Applications servers -
runs on remote host, but displays on local client
- Web services - local
data to web service; service returns the result of the
- Computing Grid - Analogous
electrical grid. A community of servers on a high-speed
computing resources, including CPU time. Different parts of a
be done on different machines, transparently to the client.
network-centric computing can be summarized in a single
can do any task from anywhere
systems - share or serve?
Unix systems typically include many machines, all of which
mount files from a file server. From the user's point of view, it
as if the files are on their own hard drive. There are many
to using a file server. First, all machines on a LAN will have the
files and directories available, regardless of which desktop
you use. Secondly, a file server makes it possible to standardize
practices which contribute to data integrety, including security
protocols and scheduled automated backups. Finally, file servers
typically store data redundantly using RAID protocols, protecting
against of loss of data due to disk failure.
Many LANs support peer to peer file sharing. In file sharing, each
on the LAN may have some files or directories that are permitted
shared with others. Again, from each user's perspective, it looks
the file is on their own hard drive. However, file sharing also
many potential security problems. As well, data integrety is only
good as the hard drive a file is actually on, and whatever steps
owner of that PC may or may not have taken to back up files.
command line - Sometimes, typing is WAY easier than point
One of the
strengths of Unix is the wealth of commands available. While
commands might seem like a stone-age way to use a computer,
are essential for automating tasks, as well as for working with
sets of files, or extracting data from files. For example, when
a DNA sequence to search the
GenBank database for similar sequences, the best matching
summarized, as excerpted below:
gb|EU920048.1| Vicia faba clone 042 D02 defensin-like protein mR... 143 1e-32
dozens of hits. If you wanted to retrieve all matching sequences
NCBI, you would need the accession numbers, found between
the pipe characters "|". Rather than having to copy and paste each
accession number to create a list for retrieval, a file containing
list could be created in a single Unix command:
gb|EU920047.1| Vicia faba clone 039 F05 defensin-like protein mR... 143 2e-32
gb|EU920044.1| Vicia faba clone 004 C04 defensin-like protein mR... 143 2e-32
gb|FJ174689.1| Pisum sativum pathogenesis-related protein mRNA, ... 139 3e-31
gb|L01579.1|PEADRR230B Pisum sativum disease resistance response... 132 4e-29
grep 'gb|' AY313169.blast | cut -f2 -d '|' > AY313169.acc
|would cut out the
accession numbers from AY313168.blast and write them to a
This list could now be used to retrieve all sequences in
The grep command searches for the string 'gb|' in the file
AY313169.blast, and writes all lines matching that string to
output. The next pipe character sends that output to the cut
The cut command splits each line into several fields, using
'|' as a
delimiter between fields. Field 2 from each line is written
to a file
If you learn the commands listed below, you will be able to do the vast
what you need to do on the computer, without having to learn
literally thousands of other commands that are present on the system.
read,write, execute permissions for files
out one or more columns of text from a file
a file for a string
page at a time
Unix manual pages
1.5 What do
programs actually do?
cell is a good analogy for how a computer works. An enzyme
substrate and modifies it to produce a product. In turn,
might be used as a substrate by another enzyme, to produce
product. From these simple principles, elaborate
can be described.
Similarly, computer programs take input and produce
example, program 1 might read a genomic DNA sequence and
write the mRNA
sequence to the ouptut. Program 2 might translate the RNA
and Program 3 might predict secondary structural
the protein. Alternatively, program 4 might predict
structures from the mRNA.
The process of chaining together several programs to
perform a complex
task is known as 'data
subtlety that is sometimes missed about computers has to
do with the
roles of random access memory (RAM) and the hard drive. Programs
actually work directly on files that are on the hard
you open a file in a program, a copy of that file is read
from disk and
written into memory. All changes that you make to the file
occur on the
copy in memory.
The original copy of the file on disk is not changed until
you save the
file. At that time, the modified copy in memory is
copied back to
disk, overwriting the original copy.
The Home Directory*: Do everything from the comfort of your
One of the features of Unix that makes contributes to its
reliability and security, and to its ease of system
the compartmentalization user and system data. The figure below
the highest-level directories of the directory tree. To cite a few
examples, /bin contains binary executables, /etc contains system
configuration files, and /usr contains most of the installed
One of the most important directories is /home, the directory in
each user has their own home directory. Rather than having data
each user scattered across the directory tree, all files belonging
each user are found in their home directory. For example, all
belonging to a user named 'homer' has a are found in /home/homer.
Subdirectories such as 'beer', 'doughnuts', and 'nuclear_waste'
organize his files into topics. Similarly the home directory for
is /home/bart, and is organized according to bart's interests.
Most importantly, the only place that homer or bart can create,
or delete files is in their home directories. They can neither
write files anywhere else on the system, unless permissions are
specifically set to allow them to do this. Thus, the worst any
do is to damage their own files, and the files for each user are
* In Unix, the term directory is synonymous with folder.
can be used interchangeably.
in home directory
data is in your home dir. and nowhere else!
can only read/write their own home directories
3. Organizing your
files: A place for everything, and everything in its place
about organizing their files into a tree-structured hierarchy of
folders. On Unix you can organize your files using a file manager
good guidelines to follow:
- Organize your files by topic, not by type. It makes no
to put all presentations in one folder, all images in another
and all documents in another folder. Any given task or project
generate files of many kinds, so it makes sense to put all
related to a particular task into a single folder or folder
- Each time you start a new task or project or experiment,
create a new folder.
- Your home
directory should be mostly composed of subdirectories. Leave
files there only on a temporary basis.
organization is for your convenience. Whenever a set of files
relate to the same thing, dedicate a directory to them.
- If a directory
gets too big (eg. more files than will fit on the screen when
-l'), it's time to split it into two or more subdirectories.
- On Unix/Linux, a new account will often have a Documents
directory, which is confusing and makes no sense, since your
directory already serves the purpose of a Documents directory
Windows. It is best to just delete the Documents directory and
directly from your HOME directory.
4. Text files: It's
actually quite simple
|A text editor is a
program that lets you enter data into
files, and modify it, with a minimal amount of fuss. Text
distinct from word processors in two crucial ways. First,
editor is a much simpler program, providing none of the
features (eg. footnotes, special fonts, tables, graphics,
that word processors provide. This means that the text
simpler to learn, and what it can do
is adequate for the task of entering a sequence, changing
a few lines
text, or writing a quick note to send by electronic mail.
tasks, it is easier and faster to use a text editor.
Two of the most commonly used text editors with graphic
interfaces are Nedit
Both are available
on most Unix and Linux systems.
Example of a text editor editing a computer-readable file
alternative genetic code used in flatworm mitochondria.
important difference between word processors and
text editors is the way in which the data is stored. The price you
having underlining, bold face, multiple columns, and other
word processors is the embedding of special computer codes within
file. If you used a word processor to enter data, your datafile
thus also contain these same codes. Consequently, only the word
can directly manipulate the data in that file.
a way out of this dilemma, because files
produced by a text editor contain only the characters that appear
the screen, and nothing more. These files are sometimes referred
since they only contain standard ASCII characters.
files created by Unix or by other programs are
ASCII files. This seemingly innocuous fact is of great
because it implies a certain universality of files. Thus,
which program or Unix command was used to create a file, it can
viewed on the screen ('cat
sent to the printer ('lpr filename'), appended
to another file ('cat
filename1 >> filename2'),
or used as input by other programs. More importantly, all ASCII
can be edited with any text editor.
If you plan to
lot of work at the command line, you will need a text editor that
not require a graphic interface. Several common editors include:
- nano - A very simple but
- The vi editor
is the universal screen editor available with
all UNIX implementations.
- emacs - a
editor with many advanced capabilities for programming; it
also has a
long learning curve
5. Screen Real
Estate: Why one window should not own the screen.
One of the
counter-productive legacies from the early PC era is that
window owns the screen". Many applications start up taking
entire screen. This made sense when PC monitors were small with
pixel resolution. It makes no sense today when the trend is toward
bigger monitors with high resolution. The image below shows a
Unix screen, in which each window takes just the space it needs,
more. Particularly in bioinformatics, you will be working on a
of different datafiles, or using several different programs at the
time. The idea is that by keeping your windows small, you can
move from one tast to another by moving to a different window.
desktops today give you a second way to add more real estate to
screen. The toolbar at the lower right hand corner of the figure
the Workplace Switcher. If the current screen gets too cluttered
windows, the workspace switcher lets you move back and forth
several virtual screens at the click of a button. This is a great
organizational tool when you have a number of independent jobs
at the same time.
Computing - Any user can do anything from anywhere
remote Unix sessions at home
or when traveling
Since all Unix and Linux systems are servers, you can always run a
session from any computer, anywhere.
see Using Unix from
and downloading files across the network
not the best way to move files across a network.There are better
for this purpose. On Unix and Linux systems, one of the best tools
gFTP. gFTP gives you two panels, one for viewing files on the
system, and the other for viewing files on the remote system. In
example below, the left panel shows folders in the user's local
directory. The right panel shows the user's files on the coe01
at the University of Calgary. Copying files, or entire directory
from one system to the next is as easy as selecting them in one
and clicking on the appropriate green arrow button. For security,
uses ssh to encrypt all network traffic, so that no one can
on your upload or download.