This section introduces some concepts
that will help you leverage the power of Unix to work more
1. The Computer: What's under the hood,
and when does it matter
2. The Home Directory: Do
everything from the comfort of your $HOME
3. Organizing your files: A place
for everything, and everything in its place
4. Text files: It's actually quite
5. Screen Real Estate: Why one
window should not own the screen.
Computing - Any user can do anything from anywhere
1. The Computer: What's under the
hood, and when does it matter
1.1 What is
Unix is an
operating system, that is, an environment, that provides commands
for creating, manipulating, and examining datafiles, and running
programs. But behind the scenes, an operating system also manages
system resources, and orchestrates the running of anywhere from
dozens to hundreds of programs that may be running at the same
time. Some other operating systems with which you may be familiar
are MS-Windows, Macintosh OSX. Despite their differences, all of
these operating systems do essentially the same things, which is
to act as the unifying framework within which all tasks are
usually the system of choice for scientific and mathematical work,
as well as for enterprise-level systems and servers. This is
because Unix was designed as a multitasking, multiuser, networked
system with that had to be reliable and responsive under heavy
loads, have 24/7 availability, and be highly secure.
was designed as a single-user desktop system, primarily for
running one program at a time. Higher-level capabilities such as
networking, multitasking, running several simultaneous users, and
server functions have all been retrofitted into Windows. Security
has long been, and is still a serious problem on the Windows
family of operating systems include commercial Unix systems such
as Sun's Solaris, and the many different distributions of Linux,
most of which are free, as well as Apple's proprietary OSX.
vs. Unix - Strictly speaking, Unix is a proprietary
operating system owned by ATT. Linux is a Unix-like operating,
written from scratch to function like Unix. Linux is Open Source
software. There are numerous Linux distributions, most of them
freely-avaliable, as well as value-added commercial distributions.
Many people use the terms Unix and Linux interchangeably.
Beyond the standalone PC: The network is the computer
1.2.1 Every PC is a special case
1.2.2. The network is the computer
- Each computer
is a bit different from every other
happens on your PC
- Your data
tends to be spread out among a number of machines
programs on different machines
- No way
to remotely login to most Windows PCs.
many PCs actually get backed up?
standalone PC is only one of many ways of using computer
resources. This figure illustrates the three main functions of
computers: File Services, Processing, and Display. The figure is
meant to be Generic. On A PC, all three functions occur in a
single machine. For this reason, a PC is sometimes referred to as
a "fat client".
|However, there is no reason
that these functions have to be on the same machine. For
example, on distributed Unix systems, files reside on a
file server, processing is done on login hosts, and you
can run a desktop session on any login host, and the
desktop will display on a "thin
Because the thin client does nothing but display the
desktop, it doesn't matter what kind of machine is doing
the display. A thin client can be a specialized machine or
just a PC running thin client software.
A compromise between a thin client is a fat client is the
Essentially, a lean client is a computer that carries out
both the Display and Processing functions, but remotely
mounts filesystems from the fileserver, which behave as if
they were on the machine's own hard drive. Many computer
labs are configured in this way to save on system
administration work, at the expense of extra network
Advantages of network-centric computing:
of network-centric computing is Google
Google Docs lets you maintain documents, spreadsheets,
presentations online, using any web browser. Your documents stay
on the server, so you can work on them from any browser on any
- You can access your data from anywhere
- Full access to the resources of a datacenter from anywhere
- Protection from obsolescence, when you use thin clients
- High availability, because all components are redundant
- There is nothing to lose (eg. memory stick), nothing that
can be stolen (eg. laptop)
- Once software is installed, it works for everyone
- Automated backups
more resources reside on the network. This is now referred to as "cloud computing":
- Databases - Remote
databases return data in response to queries from local
- Applications servers -
application runs on remote host, but displays on local client
- Web services - local
client sends data to web service; service returns the result
of the computation.
- Computing Grid - Analogous
to an electrical grid. A community of servers on a high-speed
backbone share computing resources, including CPU time.
Different parts of a job may be done on different machines,
transparently to the client.
All of network-centric
computing can be summarized in a single sentence:
Any user can do
any task from anywhere
File systems - share or serve?
Unix systems typically include many machines, all of which
remotely mount files from a file server. From the user's point of
view, it looks as if the files are on their own hard drive. There
are many advantages to using a file server. First, all machines on
a LAN will have the same files and directories available,
regardless of which desktop machine you use. Secondly, a file
server makes it possible to standardize best practices which
contribute to data integrity, including security protocols and
scheduled automated backups. Finally, file servers typically store
data redundantly using RAID protocols, protecting against of loss
of data due to disk failure.
Many LANs support peer to peer file sharing. In file sharing, each
PC on the LAN may have some files or directories that are
permitted to be shared with others. Again, from each user's
perspective, it looks as if the file is on their own hard drive.
However, file sharing also invites many potential security
problems. As well, data integrity is only as good as the hard
drive a file is actually on, and whatever steps the owner of that
PC may or may not have taken to back up files.
Unix command line - Sometimes, typing is WAY easier than point
One of the
strengths of Unix is the wealth of commands available. While
typing commands might seem like a stone-age way to use a computer,
commands are essential for automating tasks, as well as for
working with large sets of files, or extracting data from files.
For example, when you use a DNA sequence to search the GenBank
database for similar sequences, the best matching sequences are
summarized, as excerpted below:
gb|EU920048.1| Vicia faba clone 042 D02 defensin-like protein mR... 143 1e-32
dozens of hits. If you wanted to retrieve all matching sequences
NCBI, you would need the accession numbers, found between
the pipe characters "|". Rather than having to copy and paste each
accession number to create a list for retrieval, a file containing
list could be created in a single Unix command:
gb|EU920047.1| Vicia faba clone 039 F05 defensin-like protein mR... 143 2e-32
gb|EU920044.1| Vicia faba clone 004 C04 defensin-like protein mR... 143 2e-32
gb|FJ174689.1| Pisum sativum pathogenesis-related protein mRNA, ... 139 3e-31
gb|L01579.1|PEADRR230B Pisum sativum disease resistance response... 132 4e-29
grep 'gb|' AY313169.blast | cut -f2 -d '|' > AY313169.acc
|would cut out the
accession numbers from AY313168.blast and write them to a
This list could now be used to retrieve all sequences in
The grep command searches for the string 'gb|' in the file
AY313169.blast, and writes all lines matching that string to
output. The next pipe character sends that output to the cut
The cut command splits each line into several fields, using
'|' as a
delimiter between fields. Field 2 from each line is written
to a file
If you learn the commands listed below, you will be able to do the vast
what you need to do on the computer, without having to learn
literally thousands of other commands that are present on the system.
read,write, execute permissions for files
out one or more columns of text from a file
a file for a string
page at a time
Unix manual pages
1.5 What do
programs actually do?
cell is a good analogy for how a computer works. An enzyme
substrate and modifies it to produce a product. In turn,
might be used as a substrate by another enzyme, to produce
product. From these simple principles, elaborate
can be described.
Similarly, computer programs take input and produce
example, program 1 might read a genomic DNA sequence and
write the mRNA
sequence to the ouptut. Program 2 might translate the RNA
and Program 3 might predict secondary structural
the protein. Alternatively, program 4 might predict
structures from the mRNA.
The process of chaining together several programs to
perform a complex
task is known as 'data
subtlety that is sometimes missed about computers has to
do with the
roles of random access memory (RAM) and the hard drive. Programs
actually work directly on files that are on the hard
you open a file in a program, a copy of that file is read
from disk and
written into memory. All changes that you make to the file
occur on the
copy in memory.
The original copy of the file on disk is not changed until
you save the
file. At that time, the modified copy in memory is
copied back to
disk, overwriting the original copy.
The Home Directory*: Do everything from the comfort of your
One of the features of Unix that makes contributes to its
reliability and security, and to its ease of system
the compartmentalization user and system data. The figure below
the highest-level directories of the directory tree. To cite a few
examples, /bin contains binary executables, /etc contains system
configuration files, and /usr contains most of the installed
One of the most important directories is /home, the directory in
each user has their own home directory. Rather than having data
each user scattered across the directory tree, all files belonging
each user are found in their home directory. For example, all
belonging to a user named 'homer' has a are found in /home/homer.
Subdirectories such as 'beer', 'doughnuts', and 'nuclear_waste'
organize his files into topics. Similarly the home directory for
is /home/bart, and is organized according to bart's interests.
Most importantly, the only place that homer or bart can create,
or delete files is in their home directories. They can neither
write files anywhere else on the system, unless permissions are
specifically set to allow them to do this. Thus, the worst any
do is to damage their own files, and the files for each user are
* In Unix, the term directory is synonymous with folder.
can be used interchangeably.
in home directory
data is in your home dir. and nowhere else!
can only read/write their own home directories
3. Organizing your
files: A place for everything, and everything in its place
about organizing their files into a tree-structured hierarchy of
folders. On Unix you can organize your files using a file manager
good guidelines to follow:
- Organize your files by topic, not by type. It makes no
to put all presentations in one folder, all images in another
and all documents in another folder. Any given task or project
generate files of many kinds, so it makes sense to put all
related to a particular task into a single folder or folder
- Each time you start a new task or project or experiment,
create a new folder.
- Your home
directory should be mostly composed of subdirectories. Leave
files there only on a temporary basis.
organization is for your convenience. Whenever a set of files
relate to the same thing, dedicate a directory to them.
- If a directory
gets too big (eg. more files than will fit on the screen when
-l'), it's time to split it into two or more subdirectories.
- On Unix/Linux, a new account will often have a Documents
directory, which is confusing and makes no sense, since your
directory already serves the purpose of a Documents directory
Windows. It is best to just delete the Documents directory and
directly from your HOME directory.
4. Text files: It's
actually quite simple
|A text editor is a
program that lets you enter data into
files, and modify it, with a minimal amount of fuss. Text
distinct from word processors in two crucial ways. First,
editor is a much simpler program, providing none of the
features (eg. footnotes, special fonts, tables, graphics,
that word processors provide. This means that the text
simpler to learn, and what it can do
is adequate for the task of entering a sequence, changing
a few lines
text, or writing a quick note to send by electronic mail.
tasks, it is easier and faster to use a text editor.
Two of the most commonly used text editors with graphic
interfaces are Nedit
Both are available
on most Unix and Linux systems.
Example of a text editor editing a computer-readable file
alternative genetic code used in flatworm mitochondria.
important difference between word processors and
text editors is the way in which the data is stored. The price you
having underlining, bold face, multiple columns, and other
word processors is the embedding of special computer codes within
file. If you used a word processor to enter data, your datafile
thus also contain these same codes. Consequently, only the word
can directly manipulate the data in that file.
a way out of this dilemma, because files
produced by a text editor contain only the characters that appear
the screen, and nothing more. These files are sometimes referred
since they only contain standard ASCII characters.
files created by Unix or by other programs are
ASCII files. This seemingly innocuous fact is of great
because it implies a certain universality of files. Thus,
which program or Unix command was used to create a file, it can
viewed on the screen ('cat
sent to the printer ('lpr filename'), appended
to another file ('cat
filename1 >> filename2'),
or used as input by other programs. More importantly, all ASCII
can be edited with any text editor.
If you plan to
lot of work at the command line, you will need a text editor that
not require a graphic interface. Several common editors include:
- nano - A very simple but
- The vi editor
is the universal screen editor available with
all UNIX implementations.
- emacs - a
editor with many advanced capabilities for programming; it
also has a
long learning curve
5. Screen Real
Estate: Why one window should not own the screen.
The so-called "desktop metaphor" of today's workstations is instead an
"airplane-seat" metaphor. Anyone who has shuffled a lap full of papers while
seated between two portly passengers will recognize the difference -- one can
see only a very few things at once.
- Fred Brooks, Jr.
One of the
counter-productive legacies from the early PC era is that
window owns the screen". Many applications start up taking
entire screen. This made sense when PC monitors were small with
pixel resolution. It makes no sense today when the trend is toward
bigger monitors with high resolution. The image below shows a
Unix screen, in which each window takes just the space it needs,
more. Particularly in bioinformatics, you will be working on a
of different datafiles, or using several different programs at the
time. The idea is that by keeping your windows small, you can
move from one task to another by moving to a different window.
desktops today give you a second way to add more real estate to
screen. The toolbar at the lower right hand corner of the figure
the Workplace Switcher. If the current screen gets too cluttered
windows, the workspace switcher lets you move back and forth
several virtual screens at the click of a button. This is a great
organizational tool when you have a number of independent jobs
at the same time.
Computing - Any user can do anything from anywhere
remote Unix sessions at home
or when traveling
Since all Unix and Linux systems are servers, you can always run a
session from any computer, anywhere.
see Using Unix from
and downloading files across the network
not the best way to move files across a network.There are better
for this purpose. On Unix and Linux systems, one of the best tools
is Filezilla. Filezilla gives you two panels, one for viewing files on the
system, and the other for viewing files on the remote system. In
example below, the left panel shows folders in the user's local
directory. The right panel shows the user's files on the
server. Copying files, or entire directory
from one system to the next is as easy as selecting them in one
and clicking on the appropriate green arrow button. For security, Filezilla
uses ssh to encrypt all network traffic, so that no one can
on your upload or download. Filezilla is freely available for download at https://filezilla-project.org.