Dynamic Heatmap Viewer

BRB-ArrayTools Development Team

2019-07-09

This package creates a Dynamic Heatmap based on the Hierarchical Clustering method for visualizing microarray expression data. You can easily zoom in and out, change the color preferences, and mark genes and sample classes of the heatmap.

Quick Start

This package provides test.dynamicHeatmap for a quick heatmap viewing for the built-in sample dataset (i.e., “Brca”, “Perou”, and “Pomeroy”).

library(dynamicHeatmap)
test.dynamicHeatmap("Brca")

It launches a new window with a heatmap and a gene clustering dandrogram. If the samples are clustered, a sample clustering dandrogram will be shown.

Main Functions

Dynamic Heatmap Viewer is a powerful and versatile tool to help scientists to interactively visualize the microarray gene expression data. This guide provides a summary of options in the graphical user interface.

Gene ID/Array Label
The drop-down list displays the gene id (array label) from mouse-over gene (array).
Value
This shows the gene expression of the mouse-over array and gene combination.If centering and scaling option is checked in the main dialog, it will be applied on this value.
Class
A color bar representing different classes in arrays by different colors can be shown with this option.
Color Pref
Several predefined color preferences representing the expression data are available for users to select. Gray color represents missing values.
Color Scheme
Two color schemes are provided. The Saturated method will truncate gene expression at color limits in heatmap while Quantile method will use expression data quantiles as bins to represent the colors. Gray color represents missing values.
Color Limit
If Saturated method was selected (default), the heatmap displays gene expression data by truncating theexpression at lower and upper thresholds. Users can adjust the upper/lower threshold by clickingthe +/- buttons. The color legend on the right hand side of heatmap shows the current threshold values. By default, if the minimum value of expression is less than 0, the limit is determined by the data; otherwise [max(-2, smallest data value), min(2, largest data value)] will be used. Summary Statistics: These statistics provide basic information about the data which could be useful for the selection of thresholds if ‘Saturated’ color scheme is used. The gene distance information is useful when cutting gene tree is performed.
Link to GeneCards
Right click on the heatmap, a browser will be opened and connected to GeneCards that contains more relevant information about the gene.
Zoom In
To zoom-in to a subset of genes, users can left click mouse button and select an area on the heatmap. The heatmap will be re-drawn the desired region after the mouse is released. If the plot was zoomed in to show less than 50 genes, gene symbols or unique IDs will be shown along the plot on the right hand side.
Reset Zoom-In
The heatmap will be re-drawn using all genes. The keyboard shortcut is Alt+R.
Zoom Out
To go back to the heatmap before zooming in, click on the Zoom Out button.
Print to PDF
This function allows users to save the plot in a high-quality Portable Document Format (PDF) format. Alternatively, users can use Windows keyboard shortcut Alt + PrtSc to copy the current heatmap to a clipboard. Then the graph in the clipboard can be pasted to a MS-WORD document or other image processing software.
Highlight Genes
To highlight genes in the heatmap, users can choose from one of three options (the selected gene identifier will be used to match the gene labels if users choose the option #1).

After clicking the OK button, genes containing the desired labels will be highlighted using purple circle dots located on the right hand side of the heatmap. To erase the dots, select either option 1 or 2. Then press the Clear and then the OK button.

Cut Gene Tree
This will cut gene dendrogram and save the gene IDs in each individual cluster to a text file. For example, if users select 3 clusters or a distance which results in 3 clusters, there will be 3 tab-delimited text files generated (cluster1.txt, cluster2.txt and cluster3.txt) under the output folder where each text file contains gene IDs from each individual cluster. This function will be disabled if the zoom-in has been applied.
Save Genes
This will save the gene IDs from currently selected genes to a text file.
Options
Tips
  1. The heatmap displays array IDs at the bottom of the heatmap. The array IDs are obtained from the first column of the Experiment Descriptor (this can be changed from the Options dialog). They are displayed only if the total number of arrays is less than or equal to 200.

  2. The heatmap displays gene symbol at the right hand side of the heatmap. If gene symbols are not available, the first column of gene identifiers worksheet will be used (this can be changed from the Options dialog). The gene symbols are only displayed if the number of genes is less than or equal to 100.

  3. If the number of classes is more than 10, the colors for the class color bar will be re-used.

  4. The executable program Qheatmap.exe can be used as a stand alone application. When users run the program without specifying a working directory, the program will pop up a dialog asking the working directory. The working directory should contain input files expression.txt (gene expression), GeneIDs.txt (gene annotation), ExpDescWkSht.txt (experiment descriptor), options.txt (options such as linkage method and distance), order.txt (orderGene.txt), merge.txt (mergeGene.txt), and height.txt (heightGene.txt) for plotting array and gene dendrograms.

Clustering Methods

dynamicHeatmap is the main R function to perform the clustering calculations with the function hclust().

Hierarchical clustering parameters:

Analysis options
Metric
Choose “1-correlation” or Euclidean distance as the distance matrix.
Linkage
Choose from “Average linkage” Complete linkage Single linkage or “Ward linkage (ward.D2)” to determine the method of computing the distance between two clusters. Average linkage takes the average of the distances between each possible pair of genes from the two clusters whereas complete linkage takes the maximum and single linkage takes the minimum of the distances between each possible pair of genes from the two clusters.
Select genes for analysis
The analysis may be based on all genes in the dataset or a subset of those genes. Subsets of genes may be selected for inclusion or exclusion from the analysis. Gene subsets can be specified in the ‘Genelists’ folder.
Select experiments for analysis
The analysis may be based on all experiments in the dataset or a subset of those experiments. A subset of experiments may be specified by specifying an experiment descriptor variable and any experiment for which that experiment descriptor variable is blank will be excluded from the analysis.
Plotting-order of experiments is based on
Experiments may be ordered by the dendrogram-order of a hierarchical cluster analysis performed on the entire filtered set of log-ratios. In this case all the cluster lineplots will have the same experiment order. Another option is to order the experiments based on an ordering variable selected from the experiment descriptors. If the ordering variable is categorical then the experiments will be sorted by alphabetical-order of the categorical values.

Data Input

dynamicHeatmap is the main R function to generate the heatmaps. In this section, we will look into details about how to prepare inputs for dynamicHeatmap. Once again, we use the “Brca” sample data for an example. The package contains the following “Brca” sample information:

*Brca_LOGRAT.txt : a table of expression data with rows representing genes and columns representing samples;

*Brca_FILTER.TXT: a list of filtering information, where 1 means the corresponding gene passes the filters while 0 means it is excluded from analysis;

*Brca_GENEID.txt: a table of gene information corresponding to row information of Brca_LOGRAT.txt and Brca_FILTER.TXT;

*Brca_EXPDESIGN.txt: a table with class information AND/OR separate test set information.

There are a total of 22 samples in the dataset. We run the following code to obtain objects like exprData as inputs to dynamicHeatmap.


dataset<-"Brca"
# Gene IDs
geneId <- read.delim(system.file("extdata", paste0(dataset, "_GENEID.txt")
  , package = "dynamicHeatmap"), as.is = TRUE, colClasses = "character") 
# Expression data, and here are log ratio.
x <- read.delim(system.file("extdata", paste0(dataset, "_LOGRAT.TXT")
  , package = "dynamicHeatmap"), header = FALSE)
# Gene filter information, 1 - pass the filter, 0 - filtered
geneFilter <- scan(system.file("extdata", paste0(dataset, "_FILTER.TXT")
  , package = "dynamicHeatmap"), quiet = TRUE)
# Class information
expDesign <- read.delim(system.file("extdata", paste0(dataset, "_EXPDESIGN.txt")
  , package = "dynamicHeatmap"), as.is = TRUE)
# Filter out genes.
geneId <- geneId[geneFilter == 1, ]
x <- x[geneFilter == 1, ]
# Pick the first column as the array IDs.
exprData <- x
colnames(exprData) <- expDesign[, 1]

exprData is a 2009*22 log ratio matrix with rows representing 2009 genes and columns representing 22 samples. Before filtering, it has 3226 rows.

##         s1321      s1996      s1822      s1714      s1224      s1252       s1510      s1900
## 1 -1.39854932 -3.0817938 -2.7303929 -1.8744690 -2.2882450 -0.3453870 -1.42321134 -1.7776077
## 2  0.39940688  0.2781018 -0.2011399 -0.5334322 -0.5792937 -0.2874397 -0.88264304 -0.4150376
## 3 -0.02509096  0.4375801  0.1047962  0.9533499 -0.2205003  0.3532323 -0.67318958  0.5109619
## 4 -0.13006058 -0.8389376 -0.2356283  0.6195197  0.8122152 -0.4181434 -0.52509099  0.2630344
## 6 -0.46566358 -0.6667566 -0.6199690  0.4760281  0.1093474 -0.6036991  0.04809438 -0.6214885
## ......

geneFilter is a 1/0 integer vector with length 3226 for genes. Here 2009 genes are 1, and 217 genes are 0. After filtering, 2009 genes will be left.

##    [1] 1 1 1 1 0 1 1 1 1 1 1 1 0 1 1 0 1 0 1 0 1 1 0 1 1 1 1 0 0 0 0 0 1 1 0 1 1 1 1 0 1 1 1 0 0
##   [46] 1 1 0 0 0 0 0 1 1 1 1 0 0 1 1 1 0 1 0 0 1 0 1 1 1 0 0 1 1 1 1 1 1 1 0 1 1 1 0 0 0 0 1 1 1
##   [91] 0 1 1 0 1 0 0 0 1 1 1 1 1 0 0 1 1 0 1 1 1 1 1 1 1 0 1 0 1 0 0 1 1 1 1 1 1 1 0 1 1 1 0 0 0
##  [136] 1 0 0 1 1 0 0 1 0 0 1 1 1 1 0 0 0 1 0 1 1 1 1 1 0 1 0 0 1 1 0 0 1 1 1 0 0 1 1 1 1 0 1 1 0
##  [181] 1 0 1 1 1 0 1 1 0 1 1 0 1 1 0 0 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 0 1 1 0 1 1 1 1 1 1 1 1 1
## ......

Then generate the heatmap:

projectPath <- tempdir()
outputName = "dynamicHeatmapBrca"
dynamicHeatmap(exprData, expDesign, geneId,
                 analysisOptions = "CenterAndScaleGenes", 
                   # "CenterAndScaleGenes", "CenterGenes", "None"
                 metric = "1-Correlation", # "1-Correlation", "Euclidean"
                 linkage = "Average",  # "Average", "Complete", "Single", "Ward"
                 sortSamplesByClass = FALSE,
                 sortSamplesClassName = "BRCA1.v.notBRCA1",
                 useSamplesCenteredCorrelation = FALSE,
                 projectPath,
                 outputName)

A new window with a heatmap and a gene clustering dandrogram will pop up. For more details about dynamicHeatmap, please type help("dynamicHeatmap") in the R console.