SOM: Self Organizing Maps

(Tamayo et al. 1999, Kohonen, T., 1982)

Selecting this analysis will display a dialog that allows the user to set up the size, topology and behavior of the SOM. Once the computations are complete, select the SOM node under Analysis to view the SOM results. The subnodes under this node are very similar in form and function to those found beneath the KMC node.K-Means/K-Medians Support: Initialization Dialog Box.


SOM: Expression Image

Basic Terminology

Basic Terminology

Node

An SOM structure to which expression elements are associated to form clusters. Each node contains an SOM Vector.

SOM Vector

A vector of size n which represents it’s node’s location in the n dimensional expression space. Distances from this vector to expression vectors in the input data are used to determine to which node an expression vector should be associated.

Training/Adaptation

The process of repositioning the SOM Nodes by altering their SOM Vectors. The adaptation process is a result of an expression element being associated with a node. The new position is determined by the distance between the expression element and the SOM Vector, the Alpha value, and the neighborhood convention (see below).

Topology

A two dimensional topology used to define how node-to-node distances are calculated.

Note that a cluster is a collection of expression elements associated with a Node.

Parameters


SOM Initialization Dialog

Sample Selection

The sample selection option indicates whether to cluster genes or samples.

Dimension X

This positive integer value determines the X dimension of the resulting topology.

Dimension Y

This positive integer value determines the Y dimension of the resulting topology. Note that Dimension X times Dimension Y gives the number of clusters that will be produced.

Iterations

This positive integer value indicates the total number of times that the data set will be presented to the network (or Map, Graph). Each expression element will be presented this number of times to train the Nodes.

Alpha

This value is used to scale the alteration of SOM vectors when a new expression vector is associated with a node.

Radius

When using the bubble neighborhood parameter this float value is used to define the extent of the neighborhood. If an SOM vector is within this distance from the winning node (the cluster to which an element has been assigned) then that Node (and SOM vector) is considered to be in the neighborhood and it's SOM vector is adapted.

Initialization

Random Genes or Random Samples: Indicates that the initial SOM vectors will be selected at random as actual elements in the data.
Random Vector: Indicates that the initial SOM vectors will be constructed as random vectors generated to reflect the magnitude of the data set. These initial vectors are not actual expression vectors in the data set.

Neighborhood

The neighborhood options indicate the conventions (formulas) used to update (adapt) an SOM vector once an expression vector has been added into a Node's neighborhood.
Bubble: This option uses the provided radius (see above) to determine which surrounding SOM nodes are in the neighborhood and therefore are candidates for adaptation. When this option is selected the Alpha parameter for scaling the adaptation is used directly as provided from the user.
GaussianThis option forces all SOM vectors in the network to be adapted regardless of proximity to the winning node. In this case the Alpha parameter is scaled based on the distance between the SOM vector to be adapted and the winning node's SOM vector.

Topology

Indicates whether the topology should be rectangular or hexagonal. If rectangular topology is selected the node-to-node distance is determined as Euclidean distance within the two dimensional x-y grid. If hexagonal distance is used an appropriate formula is used to determine the distance given the coordinates of the two nodes.

Hierarchical Clustering

This check box selects whether to perform hierarchical clustering on the elements in each cluster created.

Default Distance Metric: Euclidian