|
What are all of the features available in BRB-Array Tools? What are the system requirements? What types of filtering options are available? What are the options for normalization? Can I do clustering with BRB-ArrayTools? What statistical methods are available for comparing gene expression across two or more groups? How do I specify the groups I want to compare in BRB-ArrayTools? How can BRB-ArrayTools help manage false discoveries? Do I need biological replicates to analyze my data using BRB-ArrayTools? Can I analyze data with two or more technical replicates? Can I analyze flip-dye experiments using BRB-ArrayTools? Can I analyze time series data? What is Multi-Dimensional Scaling? How do I import data from the NIH mAdB database? How do I import Affymetrix chip data? What is the format for the gene identification data for cDNA arrays? Can I import my own gene lists? Can I access information about selected interesting genes from public databases? Can I save my gene lists for export into other packages? What if I have questions about how to use BRB-ArrayTools? What if I have questions about the design and analysis of my experiment? Do you provide workshops and hands-on training? ==== What are all of the features available in BRB-Array Tools? The features available in V3 are listed below. Data Import
Filtering
Normalization
ScatterPlots
Unsupervised Analysis
Supervised Analysis Class Comparison:
Class Prediction
Survival Analysis What are the system requirements?
What types of filtering options are available? Filtering criteria can be specified prior to data import and also prior to each analysis. Spots can be filtered out using any of the following 3 methods:
Genes can be filtered using either of the following 2 methods:
You have the option of filtering out genes with a specified number of missing values. Missing values are imputed when necessary. What are the options for normalization? For dual-channel experiments, you can use any of the 3 following methods:
For single-channel experiments:
Affymetrix chip experiments:
Can I do clustering with BRB-ArrayTools? You can do hierarchal clustering of genes and/or samples and create cluster dendograms and color heat maps of all the genes. The output includes plots of the median gene expression within a cluster vs array. Lists representing genes within a cluster can be saved for further analysis. If you do cluster analysis of samples, you can compute a statistical reproducibility index providing a measure of the possibility samples clustered together by chance. What statistical methods are available for comparing gene expression across two or more groups? You have several options for carrying out this type of analysis.
You also have several ways to specify the criteria for inclusion of a gene in a gene list.
How do I specify the groups I want to compare in BRB-ArrayTools? It is quick and easy to assign arrays to groups for group or class comparison. Briefly, when data are imported, an Experimental Description Excel worksheet page is created with a column of the array labels. Additional columns can be added to define class (or group) variables. Each row represents an array, arrays can be excluded by leaving a cell blank. You can create this worksheet in advance of data import or before an analysis. How can BRB-ArrayTools help manage false discoveries? You can specify limits for the number of false discoveries or the proportion of false discoveries. Multivariate permutation tests are used to find the genes meeting the specified criteria. The multivariate permutation tests are based on permutations of the labels of which arrays are in which classes. A large number of random permutations are considered. For each random permutation, the parametric tests are re-computed to determine a p value for each gene that is a measure of the extent it appears differentially expressed between the random classes determined by the random permutation. The genes are ordered by their p values computed for the random permutation (genes with smallest p values at the top of the list). For each potential p value threshold, the program records the number of genes in the list. This process is repeated for a large number of random permutations. Consequently, for any p value threshold, we can compute the distribution of the number of genes that would have p values smaller than that threshold for random permutations. That is the distribution of the number of false discoveries, since genes that are significant for random permutations are false discoveries. The algorithm selects a threshold p value so that the number of false discoveries is no greater than that specified by the user 95% of the time or 50% of the time. In a similar manner, we determine threshold p values so that the resulting gene list contains no more than a specified proportion of false discoveries (either 95% of the time or 50% of the time). The class prediction tool produces a gene list ordered with the genes having the smallest parametric p values at the top. The length of the gene list is determined by the types of false discovery control selected. Generally we recommend using all of the options: univariate p value threshold (0.001); limiting number of false discoveries (10 default), and limiting proportion of false discoveries (0.10 default). The output tells you where to truncate the gene list in order to get each type of control. Do I need biological replicates to analyze my data using BRB-ArrayTools? Yes, you will want to consider using BRB-ArrayTools only when you have biological replicates. A biological replicate represents independent experimental samples, such as tissue samples taken from different patients. Can I analyze data with two or more technical replicates? Yes, you can analyze data including replicate arrays. A technical replicate represents extracting multiple RNA samples from the same biological specimen for independent processing or independently labeling and hybridizing aliquots of the same RNA sample. The replicate samples can be identified in BRB-ArrayTools on the Experimental Descriptor worksheet page. Can I analyze flip-dye experiments using BRB-ArrayTools? Yes. You can identify the labeling on the Experimental Descriptor page, when the log-ratios are computed, the ratio of Green/Red will be calculated for the flipped arrays and Red/Green for others. Can I use BRB-ArrayTools to analyze data from a paired data experiment, where I have sets of control and test samples from the same biological sample? Yes. You can specify the paired data arrays using a column on the Experimental Descriptor page and selecting the Paired Data option on each analysis dialogue. Can I analyze time series data? BRB-ArrayTools is not particularly well suited for time series analysis. We hope to incorporate appropriate methods in future versions. Please email us with your suggestions! Yes, the survival analysis tool finds genes that are predictive of survival time for patients. Since some patients/animals may still be alive at the time of analysis, their survival times from entry on study is censored; that is, it is at least as long as survival measured to date, but longer by an unknown amount. There are many statistical methods for analysis of censored survival data. The most popular method is Cox’s proportional hazards model ( ). This is a regression model in which the hazard function for an individual is a function of predictor variables. In our case the predictor variables are log expression levels. The hazard function is the instantaneous force of mortality at any time conditional on having survived till that time. The proportional hazards model postulates that the logarithm of the hazard of death is a linear function of the predictor variables, linked by unknown regression coefficients. For more details see biostatistics texts or the original paper (DR Cox, Regression models and life tables, J.Royal Stat Soc B 34:187-202). The survival analysis tool fits proportional hazards models relating survival to each gene, one gene at a time and computes the p value for each gene for testing the hypothesis that survival time is independent of the expression level for that gene. Gene lists are created based on these p values in the same way as in the Class Comparison tool. The p values can be used to identify gene lists using multivariate permutation tests for controlling the number or proportion of false discoveries. Or the gene list can simply consist of the genes with p values less than a specified threshold (0.001 is default). Class prediction involves constructing a multivariate predictor based on the differential expression across 2 or more groups (or classes). The predictor can then be used to classify unknown samples by group based on gene expression data. For example, validated predictors can be used to develop sensitive, accurate, and specific diagnostic tests. Class prediction applied to gene expression experiments presents considerable challenges due to the number of genes analyzed relative to the number of biological samples. Ideally, predictors can be developed using a set of experimental data, and then validated by demonstrating accurate prediction on a new set of independent experimental data. BRB-ArrayTools provides four methods for constructing multivariate predictors.
As part of development of the predictor, BRB-ArrayTools includes a leave-one-out cross validation procedure to estimate the misclassification rate of each type of predictor and a permutation step to estimate the significance of the cross-validated misclassification rate. In addition, you can specify a training set for development of the predictor and obtain class predictions for samples from an independent dataset (a dataset not used to develop the predictor). What is Multi-Dimensional Scaling? You can use multi-dimensional scaling to represent high dimensional data graphically in 3-dimensions. The pair-wise similarity is preserved, so points on the plot close to each other are more similar than points far apart. If you use Euclidean distances as the MDS distance metric, MDS is equivalent to Principle Component Analysis (PCA). BRB-ArrayTools offers MDS of samples as a feature. You can create 3-D rotating plots, labeled by group identifier, to examine the similarity/dissimilarity of your samples. The rotating plots can be included in a PowerPoint presentation. How do I import data from the NIH mAdB database? NIH mAdB users can extract data from the database for one-step automated loading into BRB-ArrayTools. The BRB-ArrayTools Users Guide provides step-by-step instructions. How do I import Affymetrix chip data? Tab delimited CHP files exported from MAS 4.0 or 5.0 in Pivot or Metric file format can be easily imported using the GUI dialogue. You can also import multi-chip data using the GUI dialogue. The User Manual provides a more detailed explanation and examples. What is the format for the gene identification data for cDNA arrays? Various identifiers can be associated with each spot, such as spot numbers, well numbers, clone names, clone identifiers, probe set identifiers, UniGene identifiers, GenBank accession numbers, etc. The gene identifiers may be located alongside the expression data in the same files, or may be contained in a separate file which is used as a look-up table for the genes on all the arrays. If the gene identifiers are contained in a separate file, then there must be corresponding columns within the expression data file(s) and the gene identifier file, containing gene ids which can be used for matching the gene identifiers with the expression data. The column which is designated within BRB-ArrayTools as clone id should contain an organization-prefixed clone id (e.g., a prefix such as "IMAGE:", "ATCC:", "TIGR:" etc.). These clone ids can be used to link to clone reports in the NCBI database. Note that clone reports in the NCI mAdb database are only available for clones in the NCI Advanced Technology Center inventory or for other expression array sets which are tracked by BIMAS/CIT/NIH. All clone identifiers found within a clone id column which are numeric and have no prefix will be assumed to have a prefix of IMAGE by default. Probe set ids are used to link to feature reports in the NCI mAdb database. Currently feature reports are available for the Human Genome U133 A and B chips, and for the Mouse Genome U74 A-C chips. UniGene cluster ids and gene symbols are used to search for the UniGene annotation mirrored in the NCBI database. GenBank accession numbers are used to search for the GenBank annotation which is also mirrored in the NCBI database. A minimum of one gene identifier is required for use in collating the dataset. However, the user may wish to enter any or all of the above gene identifiers, if they are available, to enhance the usability of the output from the analyses. Can I import my own gene lists? Yes, you can incorporate your own gene lists into your analysis by providing a simple text file with a unique id column heading matching the id in your experimental project file. The User Guide provides more information. Currently there is not an automated procedure for updating gene lists in the event lists are created from evolving databases (ie not static). Can I annotate my data using the stored gene lists (lists derived from analysis, provided with the software, and/or imported by the user)? BRB-ArrayTools provides a utility to merge experimental data with stored gene lists. The html analysis output reports include a column listing gene lists which include the selected genes. You can also annotate your data by accessing the Stanford SOURCE database. Can I access information about selected interesting genes from public databases? Yes, most typical identifiers associated with each spot imported with the expression data will be automatically hyperlinked for quick access to the appropriate public database in analysis output. Identifiers include clone identifiers, probe set identifiers, UniGene identifiers, GenBank accession numbers, etc. The column which is designated within BRB-ArrayTools as clone id should contain an organization-prefixed clone id (e.g., a prefix such as "IMAGE:", "ATCC:", "TIGR:" etc.). These clone ids can be used to link to clone reports in the NCBI database. Note that clone reports in the NCI mAdb database are only available for clones in the NCI Advanced Technology Center inventory or for other expression array sets which are tracked by BIMAS/CIT/NIH. All clone identifiers found within a clone id column which are numeric and have no prefix will be assumed to have a prefix of IMAGE by default. Probe set ids are used to link to feature reports in the NCI mAdb database. Currently feature reports are available for the Human Genome U133 A and B chips, and for the Mouse Genome U74 A-C chips. UniGene cluster ids and gene symbols are used to search for the UniGene annotation mirrored in the NCBI database. GenBank accession numbers are used to search for the GenBank annotation which is also mirrored in the NCBI database. A minimum of one gene identifier is required for use in collating the dataset. However, the user may wish to enter any or all of the above gene identifiers, if they are available, to enhance the usability of the output from the analyses. . Can I save my gene lists for export into other packages? Yes, gene lists are saved and stored as simple ASCII text files including the unique id identified during the data import function, in addition to Clone ID, Probe Set ID, GenBank accession number, Unigene cluster ID and/or gene symbol if present on the imported gene identifier file.. What if I have questions about how to use BRB-ArrayTools? You can use email (Amy Lam) or the BRB ArrayTools Message Board. What if I have questions about the design and analysis of my experiment? In addition to BRB-ArrayTools questions, we can try to help you with your statistical questions. You can email Amy Lam or Richard Simon to contact us. Do you provide workshops and hands-on training? Yes, we offer a 7 hour seminar and hands-on training workshop at the NIH approximately every 2 months. Please see the Workshops section for more information.
|