Preparation for Somatic Mutation Annotator

One may download COSMIC VCF, dbSNP VCF and reference genome files required for running the somatic mutation annotator. Below is the guidance about how to fetch these files.

Download COSMIC VCF

In order to download COSMIC data, one is required to register first in the COSMIC website. Please go to the log-in page to register. After one has registered, please follow the instruction at the download page to download COSMIC data. There are two ways one can try to download COSMIC VCF files:

  • SFTP using command line
    To log in to the remote host, please open the terminal and type
    sftp username@sftp-cancer.sanger.ac.uk .
    Please use your username and password to log in to the remote host, and then the sftp> prompt displays. For example, if one wants to download the version 76 of COSMIC VCF file associated with GRCh37 genome build, please type:
    sftp> cd /cosmic/grch37/cosmic/v76/VCF
    sftp> get CosmicCodingMuts.vcf.gz
    Likewise, for GRCh38, please type: sftp> cd /cosmic/grch38/cosmic/v76/VCF
    sftp> get CosmicCodingMuts.vcf.gz
    To quit, please type:
    sftp> quit
  • Download through a GUI Client
    We recommend you install a user-friendly GUI client called FileZilla at the FileZilla website. Once FileZilla has been installed, please open it and input the credentials that are required to log by following the download page. Afterwards, please click on “Quickconnect”, and you will be connected to the COSMIC database. COSMIC VCF files are provided for GRCh37 and GRCh38, respectively. For example, if your raw VCF file is associated with GRCh37 or hg19, you can download the version 76 of COSMIC file located at /cosmic/grch37/cosmic/v76/VCF/CosmicCodingMuts.vcf.gz. Notice that, since we are interested in the coding mutations, please download CosmicCodingMuts.vcf.gz instead of CosmicNonCodingVariants.vcf.gz. using filezilla
Please visit the COSMIC ftp for the latest version.

Download dbSNP VCF

One can download the latest dbSNP VCF file from the dbSNP website. For example, one can download the 146 build of dbSNP VCF file through the following command line:
$ wget ftp://ftp.ncbi.nih.gov/snp/organisms/human_9606_b146_GRCh38p2/VCF/common_all_20151104.vcf.gz,
for GRCh38. As for GRCh37, please download the dbSNP VCF file via the command line:
$ wget ftp://ftp.ncbi.nih.gov/snp/organisms/human_9606_b146_GRCh37p13/VCF/common_all_20151104.vcf.gz .

Please visit the dbSNP website for the latest build.

Download reference genome files

One can download the reference genome files by following the instruction in the tutorial section here.

About dbNSFP database files

If one runs the somatic mutation annotator for the first time, both ANNOVAR and SnpEff will automatically download the dbNSFP database files. For ANNOVAR, the files associated with one selected genome build will be downloaded from the ANNOVAR website to humandb/ under the ANNOVAR directory, whereas, for SnpEff, the following files will be downloaded to variantAnnoDatabase/dbNSFP/ under the user's home directory. Please notice that one must put the database and index files in the same folder.

  • GRCh37 / hg19 (dbNSFP version 3.2 Academic, for SnpEff):
    • Database. Save file as dbNSFP3.2a_hg19.txt.gz
    • Index. Save file as dbNSFP3.2a_hg19.txt.gz.tbi
  • GRCh38 / hg38 (dbNSFP version 3.2 Academic, for SnpEff):
    • Database. Save file as dbNSFP3.2a_hg38_sorted.txt.gz
    • Index. Save file as dbNSFP3.2a_hg38_sorted.txt.gz.tbi