Home
Key_Features
FAQs
System_Requirements
Technical_Notes
Microarray_Analysis_Tips
Download
Workshops
Links

FAQs

What are all of the features available in BRB-Array Tools?

What are the system requirements?

What types of filtering options are available?

What about missing values?

What are the options for normalization?

Can I do clustering with BRB-ArrayTools?

What statistical methods are available for comparing gene expression across two or more groups?

How do I specify the groups I want to compare in BRB-ArrayTools?

How can BRB-ArrayTools help manage false discoveries?

Do I need biological replicates to analyze my data using BRB-ArrayTools?

Can I analyze data with two or more technical replicates?

Can I analyze flip-dye experiments using BRB-ArrayTools?

Can I use BRB-ArrayTools to analyze data from a paired data experiment, where I have sets of control and test samples from the same biological sample?

Can I analyze time series data?

Can I do Survival Analysis?

What is Class Prediction?

What is Multi-Dimensional Scaling?

How do I import data from the NIH mAdB database?

How do I import Affymetrix chip data?

What is the format for the gene identification data for cDNA arrays?

Can I import my own gene lists?

Can I annotate my data using the stored gene lists (lists derived from analysis, provided with the software, and/or imported by the user)?

Can I access information about selected interesting genes from public databases? 

Can I save my gene lists for export into other packages?

What if I have questions about how to use BRB-ArrayTools?

What if I have questions about the design and analysis of my experiment?

Do you provide workshops and hands-on training?

====

What are all of the features available in BRB-Array Tools?

The features available in V3 are listed below.

Data Import

  • two-dye and single-dye cDNA array intensities, background subtracted intensities, or ratios in separate tab delimited array files or in a single tab delimited file
  • autoload function for NIH mAdB users
  • Affymetrix MAS 4.0 and 5.0 export files, including multi-chip data

Filtering

  • Exclude spots based on channel intensity, flag values, and/or spot size.
  • Exclude genes based on missing values.
  • Exclude genes with low variability across arrays (ie not differentially expressed) by filtering on the variance of log-ratios (or log intensities) across arrays to reduce the number of genes included in the analysis.

Normalization

  • For dual-channel chips, normalize arrays by median-centering the log-ratios in each array, or subtracting out a lowess-smoother based on the red and green average log-intensities.
  • For single-channel chips, normalize arrays by centering the log intensity of each array to the median log intensity of the first array.
  • Housekeeping genes.

ScatterPlots

  • Flexible array vs array scatterplot function  plot red, green, and ratio intensities or log-intensities for any pair of arrays, in addition to M-A plots (average log-intensity of the red and green channels vs the log ratio of red/green).
  • Scatterplots of  average log-ratios or log-intensities by group for two groups.

Unsupervised Analysis

  • Hierarchal cluster analysis of genes
  • Hierarchal cluster analysis of samples
  • Cluster reproducibility and cluster significance statistics
  • Multi-Dimensional Scaling

Supervised Analysis

Class Comparison:
  • Univariate t and F-tests estimating both parametric and permuted p-values
  • Specification of the number of false discoveries or the proportion of false discoveries based on the actual false discovery rate (include link to tech rpt)

Class Prediction

  • compound covariate predictor (2 groups) 
  • diagonal linear discriminant analysis
  • k-nearest neighbors 
  • nearest centroid 
  • support vector machines (2 groups)
  • leave-one-out cross-validation error rate and permutation p value

Survival Analysis

top

What are the system requirements?

  • Windows 98/2000/NT or later
  • Excel 2000
  • 256 MB ram (recommended)

top

What types of filtering options are available?

Filtering criteria can be specified prior to data import and also prior to each analysis.

Spots can be filtered out using any of the following 3 methods:

  • specification of a minimum intensity 
  • specification of a flag variable with values either above or below the cut-off value
  • specification of a minimum spot size
  • absent call for Affymetrix data

Genes can be filtered using either of the following 2 methods:

  • using the log-ratio of the variance of he gene across all the arrays, exclude genes based on the significance of the variation relative to the median of all variances or exclude genes based on a percentile criteria
  • exclude genes based on the number of missing values across all the arrays
  • using a proportion of arrays more than 2-fold from the median of a gene

top

What about missing values?

You have the option of filtering out genes with a specified number of missing values. Missing values are imputed when necessary.

top

What are the options for normalization?

For dual-channel experiments, you can use any of the 3 following methods:

  • linear - subtract the median log-ratio from all log-ratios on the array
  • intensity dependent - subtract a Lowess smoother from the log-ratios
  • custom - subtract the median log-ratio of user-defined housekeeping genes

For single-channel experiments:

  • centering the log intensity of each array to the median log intensity of the first array by subtracting a constant from each array so that the median over each array is the same as the median over the first array

 Affymetrix chip experiments:

  • linear - median center the log-ratio of signals for each array using one array as a reference
  • housekeeping genes - using Affymetrix defined genes or user-defined genes
  • multi-chip sets - each chip type is separately normalized

top

Can I do clustering with BRB-ArrayTools?

You can do hierarchal clustering of genes and/or samples and create cluster dendograms and color heat maps of all the genes. The output includes plots of the median gene expression within a cluster vs array. Lists representing genes within a cluster can be saved for further analysis.

If you do cluster analysis of samples, you can compute a statistical reproducibility index providing a measure of the possibility samples clustered together by chance.

top

What statistical methods are available for comparing gene expression across two or more groups?

You have several options for carrying out this type of analysis.

  • univariate parametric t/F tests
  • randomized variance t/F tests, a method providing improved estimates of gene-specific variances without assuming that all genes have the same variance

You also have several ways to specify the criteria for inclusion of a gene in a gene list.

  • a p-value less than a specified threshold
  • specified limits on the number of false discoveries 
  • specified limits on the proportion of false discoveries

top

How do I specify the groups I want to compare in BRB-ArrayTools?

It is quick and easy to assign arrays to groups for group or class comparison. Briefly, when data are imported, an Experimental Description Excel worksheet page is created with a column of the array labels. Additional columns can be added to define class (or group) variables. Each row represents an array, arrays can be excluded by leaving a cell blank. You can create this worksheet in advance of data import or before an analysis.

top

How can BRB-ArrayTools help manage false discoveries?

You can specify limits for the number of false discoveries or the proportion of false discoveries. Multivariate permutation tests are used to find the genes meeting the specified criteria.

The multivariate permutation tests are based on permutations of the labels of which arrays are in which classes. A large number of random permutations are considered. For each random permutation, the parametric tests are re-computed to determine a p value for each gene that is a measure of the extent it appears differentially expressed between the random classes determined by the random permutation. The genes are ordered by their p values computed for the random permutation (genes with smallest p values at the top of the list). For each potential p value threshold, the program records the number of genes in the list. This process is repeated for a large number of random permutations. Consequently, for any p value threshold, we can compute the distribution of the number of genes that would have p values smaller than that threshold for random permutations. That is the distribution of the number of false discoveries, since genes that are significant for random permutations are false discoveries. The algorithm selects a threshold p value so that the number of false discoveries is no greater than that specified by the user 95% of the time or 50% of the time. In a similar manner, we determine threshold p values so that the resulting gene list contains no more than a specified proportion of false discoveries (either 95% of the time or 50% of the time).

 The class prediction tool produces a gene list ordered with the genes having the smallest parametric p values at the top. The length of the gene list is determined by the types of false discovery control selected. Generally we recommend using all of the options:  univariate p value threshold (0.001); limiting number of false discoveries (10 default), and limiting proportion of false discoveries (0.10 default). The output tells you where to truncate the gene list in order to get each type of control.

top

Do I need biological replicates to analyze my data using BRB-ArrayTools?

Yes, you will want to consider using BRB-ArrayTools only when you have biological replicates. A biological replicate represents independent experimental samples, such as tissue samples taken from different patients.

top

Can I analyze data with two or more technical replicates?

Yes, you can analyze data including replicate arrays. A technical replicate represents extracting multiple RNA samples from the same biological specimen for independent processing or independently labeling and hybridizing aliquots of the same RNA sample. The replicate samples can be identified in BRB-ArrayTools on the Experimental Descriptor worksheet page. 

top

Can I analyze flip-dye experiments using BRB-ArrayTools?

Yes. You can identify the labeling on the Experimental Descriptor page, when the log-ratios are computed, the ratio of Green/Red will be calculated for the flipped arrays and Red/Green for others.

top

Can I use BRB-ArrayTools to analyze data from a paired data experiment, where I have sets of control and test samples from the same biological sample?

Yes. You can specify the paired data arrays using a column on the Experimental Descriptor page and selecting the Paired Data option on each analysis dialogue.

top

Can I analyze time series data?

BRB-ArrayTools is not particularly well suited for time series analysis. We hope to incorporate appropriate methods in future versions. Please email us with your suggestions!

top

Can I do Survival Analysis?

Yes, the survival analysis tool finds genes that are predictive of survival time for patients. Since some patients/animals may still be alive at the time of analysis, their survival times from entry on study is censored; that is, it is at least as long as survival measured to date, but longer by an unknown amount. There are many statistical methods for analysis of censored survival data. The most popular method is Cox’s proportional hazards model ( ). This is a regression model in which the hazard function for an individual is a function of predictor variables. In our case the predictor variables are log expression levels. The hazard function is the instantaneous force of mortality at any time conditional on having survived till that time. The proportional hazards model postulates that the logarithm of the hazard of death is a linear function of the predictor variables, linked by unknown regression coefficients. For more details see biostatistics texts or the original paper (DR Cox, Regression models and life tables, J.Royal Stat Soc B 34:187-202).

 The survival analysis tool fits proportional hazards models relating survival to each gene, one gene at a time and computes the p value for each gene for testing the hypothesis that survival time is independent of the expression level for that gene. Gene lists are created based on these p values in the same way as in the Class Comparison tool. The p values can be used to identify gene lists using  multivariate permutation tests for controlling the number or proportion of false discoveries. Or the gene list can simply consist of the genes with p values less than a specified threshold (0.001 is default).

 top

What is Class Prediction?

Class prediction involves constructing a multivariate predictor based on the differential expression across 2 or more groups (or classes). The predictor can then be used to classify unknown samples by group based on gene expression data. For example, validated predictors can be used to develop sensitive, accurate, and specific diagnostic tests. Class prediction applied to gene expression experiments presents considerable challenges due to the number of genes analyzed relative to the number of biological samples. Ideally, predictors can be developed using a set of experimental data, and then validated by demonstrating accurate prediction on a new set of independent experimental data.

BRB-ArrayTools provides four methods for constructing multivariate predictors.

  • compound covariate predictor (limited to 2 groups)
  • k-nearest neighbors
  • nearest centroid
  • support vector machines (limited to 2 groups)
  • diagonal linear discriminant analysis

As part of development of the predictor, BRB-ArrayTools includes a leave-one-out cross validation procedure to estimate the misclassification rate of each type of predictor and a permutation step to estimate the significance of the cross-validated misclassification rate. In addition, you can specify a training set for development of the predictor and obtain class predictions for samples from an independent dataset (a dataset not used to develop the predictor).

top

What is Multi-Dimensional Scaling?

You can use multi-dimensional scaling to represent high dimensional data graphically in 3-dimensions. The pair-wise similarity is preserved, so points on the plot close to each other are more similar than points far apart. If you use Euclidean distances as the MDS distance metric, MDS is equivalent to Principle Component Analysis (PCA).

BRB-ArrayTools offers MDS of samples as a feature. You can create 3-D rotating plots, labeled by group identifier, to examine the similarity/dissimilarity of your samples. The rotating plots can be included in a PowerPoint presentation. 

top

How do I import data from the NIH mAdB database?

NIH mAdB users can extract data from the database for one-step automated loading into BRB-ArrayTools. The BRB-ArrayTools Users Guide provides step-by-step instructions. 

top

How do I import Affymetrix chip data?

Tab delimited CHP files exported from MAS 4.0 or 5.0 in Pivot or Metric file format can be easily imported using the GUI dialogue. You can also import multi-chip data using the GUI dialogue. The User Manual provides a more detailed explanation and examples. 

What is the format for the gene identification data for cDNA arrays?

Various identifiers can be associated with each spot, such as spot numbers, well numbers, clone names, clone identifiers, probe set identifiers, UniGene identifiers, GenBank accession numbers, etc. The gene identifiers may be located alongside the expression data in the same files, or may be contained in a separate file which is used as a look-up table for the genes on all the arrays.  If the gene identifiers are contained in a separate file, then there must be corresponding columns within the expression data file(s) and the gene identifier file, containing gene ids which can be used for matching the gene identifiers with the expression data.

The column which is designated within BRB-ArrayTools as clone id should contain an organization-prefixed clone id (e.g., a prefix such as "IMAGE:", "ATCC:", "TIGR:" etc.).  These clone ids can be used to link to clone reports in the NCBI database.  Note that clone reports in the NCI mAdb database are only available for clones in the NCI Advanced Technology Center inventory or for other expression array sets which are tracked by BIMAS/CIT/NIH.  All clone identifiers found within a clone id column which are numeric and have no prefix will be assumed to have a prefix of IMAGE by default.

Probe set ids are used to link to feature reports in the NCI mAdb database.  Currently feature reports are available for the Human Genome U133 A and B chips, and for the Mouse Genome U74 A-C chips.

UniGene cluster ids and gene symbols are used to search for the UniGene annotation mirrored in the NCBI database.  GenBank accession numbers are used to search for the GenBank annotation which is also mirrored in the NCBI database.

A minimum of one gene identifier is required for use in collating the dataset.  However, the user may wish to enter any or all of the above gene identifiers, if they are available, to enhance the usability of the output from the analyses.

top

Can I import my own gene lists?

Yes, you can incorporate your own gene lists into your analysis by providing a simple text file with a unique id column heading matching the id in your experimental project file. The User Guide provides more information. Currently there is not an automated procedure for updating gene lists in the event lists are created from evolving databases (ie not static).

top

Can I annotate my data using the stored gene lists (lists derived from analysis, provided with the software, and/or imported by the user)?

BRB-ArrayTools provides a utility to merge experimental data with stored gene lists. The html analysis output reports include a column listing gene lists which include the selected genes.  You can also annotate your data by accessing the Stanford SOURCE database. 

top

Can I access information about selected interesting genes from public databases? 

Yes, most typical identifiers associated with each spot imported with the expression data will be automatically hyperlinked for quick access to the appropriate public database in analysis output. Identifiers include clone identifiers, probe set identifiers, UniGene identifiers, GenBank accession numbers, etc. 

The column which is designated within BRB-ArrayTools as clone id should contain an organization-prefixed clone id (e.g., a prefix such as "IMAGE:", "ATCC:", "TIGR:" etc.).  These clone ids can be used to link to clone reports in the NCBI database.  Note that clone reports in the NCI mAdb database are only available for clones in the NCI Advanced Technology Center inventory or for other expression array sets which are tracked by BIMAS/CIT/NIH.  All clone identifiers found within a clone id column which are numeric and have no prefix will be assumed to have a prefix of IMAGE by default.

Probe set ids are used to link to feature reports in the NCI mAdb database.  Currently feature reports are available for the Human Genome U133 A and B chips, and for the Mouse Genome U74 A-C chips.

UniGene cluster ids and gene symbols are used to search for the UniGene annotation mirrored in the NCBI database.  GenBank accession numbers are used to search for the GenBank annotation which is also mirrored in the NCBI database.

A minimum of one gene identifier is required for use in collating the dataset.  However, the user may wish to enter any or all of the above gene identifiers, if they are available, to enhance the usability of the output from the analyses.

.

top

Can I save my gene lists for export into other packages?

Yes, gene lists are saved and stored as simple ASCII text files including the unique id identified during the data import function, in addition to Clone ID, Probe Set ID, GenBank accession number, Unigene cluster ID and/or gene symbol if present on the imported gene identifier file..

top

What if I have questions about how to use BRB-ArrayTools?

You can use email (Amy Lam) or the BRB ArrayTools Message Board

top

What if I have questions about the design and analysis of my experiment?

In addition to BRB-ArrayTools questions, we can try to help you with your statistical questions. You can email Amy Lam or  Richard Simon to contact us.

top

Do you provide workshops and hands-on training?

Yes, we offer a 7 hour seminar and hands-on training workshop at the NIH approximately every 2 months. Please see the Workshops section for more information.

top