3. Download Data Source (download)

To annotate variants, MutAnno need data source files and can download them with download sub-command.

mutanno download \
        -source_path datasource_directory \
        -source all \
        -version latest \
        -refversion hg38 \
        -websource mutanno
  • -source_path : data source path (default: mutanno_source)

  • -source : source name (default: all)

  • -version : source version (default: latest)

  • -refversion : reference version [hg19, hg38] (default: hg38)

  • -websource : web source indication [``, mutanno] (default: ``). When -websource mutanno is used, only preprocessed files can be download from MutAnno web source.

Note

If -source is all, all data sources can be downloaded and preprocessed.

3.1. Datasource List

All preprocessed data is available in dropbox shared folder.

Variant annotation (hg38)

hg38 (GRCh38)

Source name

Category

Download

link

VEP *

Annotation

manually

dbSNP *

PopulationDB

auto

gnomAD *

PopulationDB

auto

UK10K *

PopulationDB

auto

TOPMED *

PopulationDB

auto

CLINVAR *

VariantDB

auto

COSMIC *

VariantDB

auto

SPLICEAI *

Pathogenicity

manually

PRIMATEAI *

Pathogenicity

manually

CADD *

Pathogenicity

auto

GERP *

Conservation

auto

PHASTCONS *

Conservation

auto

PHYLOP *

Conservation

auto

SIPHY *

Conservation

auto

SUPER_DUPLICATES *

Repeatitive

auto

SIMPLE_REPEAT *

Repeatitive

auto

RMSK *

Repeatitive

auto

NESTED_REPEATS *

Repeatitive

auto

MICROSATELLITE *

Repeatitive

auto

  • auto: support to download and preprocess automatically in mutanno

  • star(*): the star(*) means this version is used in CGAP project.

hg19 (GRCh37)

Source name

Category

Download

link

CADD

Pathogenicity

auto

Gene annotation

Source name

Download

link

ENSEMBLgene *

auto

ENSEMBLgeneGRCh37 *

auto

CYTOBAND *

auto

RefSeq *

auto

HGNC *

auto

ClinGen *

auto

ClinGenDisease *

auto

ENSEMBLIDxrefTrscriptID *

auto

ENSEMBLIDxref *

auto

dbNSFP *

auto

gnomADmetrics *

auto

Marrvel *

auto

CassaNatGenet2017 *

manual

GTEx *

auto

BrainSpan *

auto

BrainAtlas *

auto

GenCode *

auto

3.2. Download methods

  1. download and preprocess automatically.

    1
    2
    3
    4
    5
    mutanno download \
            -source_path datasource_directory \
            -source phylop \
            -version 20way \
            -refversion hg38
    
  2. download preprocessed file from mutanno dropbox

    -websource mutanno option doesn’t run preprocessing module.

    1
    2
    3
    4
    5
    6
    mutanno download \
            -source_path datasource_directory \
            -source phylop \
            -version 20way \
            -refversion hg38 \
            -websource mutanno
    
  3. download manually (using wget), and then run preprocess module.

    1
    2
    3
    4
    5
    6
    wget ftp://hgdownload.cse.ucsc.edu/goldenPath/hg38/phyloP20way/hg38.phyloP20way.bw
    
    mutanno preprocess \
            -infile datasource_directory/hg38.phyloP20way.bw \
            -ds phylop.datastructure.json \
            -out datasource_directory/hg38.phyloP20way.mti.gz
    

3.3. Variant annotation

3.3.1. VEP

  • MutAnno doesn’t support to download VEP raw data automatically, but support to download preprocessed files from MutAnno dropbox

3.3.1.1. Download preprocessed files from MutAnno dropbox

mutanno download \
        -source_path datasource_directory \
        -source vep \
        -version latest \
        -refversion hg38 \
        -websource mutanno

3.3.1.2. Make VEP result files and then run preprocess

  1. make mock vcf files

    mutanno vcfmaker \
            -out test.vcf
    
  2. run VEP

    ./bin/ensembl-vep-release-99/vep \
          -i chr1_100001_200000.vcf \
          -o chr1_100001_200000.vcf.vep.txt \
          --hgvs \
          --fasta GRCh38_full_analysis_set_plus_decoy_hla.fa \
          --assembly GRCh38 \
          --use_given_ref \
          --offline \
          --cache_version 99 \
          --dir_cache ./bin/nonindexed_vep_cache/homo_sapiens_vep \
          --everything \
          --force_overwrite \
          --vcf \
          --plugin MaxEntScan,./bin/VEP_plugins-release-99/fordownload \
          --plugin TSSDistance \
          --dir_plugins ./bin/VEP_plugins-release-99 \
          --plugin SpliceRegion,Extended
    
  3. preprocess VEP result (convert .mti)

    1
    2
    3
    4
    mutanno preprocess \
            -infile datasource_directory/chr1_100001_200000.vcf.vep.txt \
            -ds vep.datastructure.json \
            -out datasource_directory/chr1_100001_200000.vcf.vep.mti.gz
    

    We can merge the chopped mti files into one single files using vcf-merger.

3.4. Population data

3.4.1. dbSNP

  • web resource: NCBI refseq

  • MutAnno supports to 1) download and preprocess automatically, 2) download preprocessed files from MutAnno dropbox, 3) download manually and then run preporcess moduels.

3.4.2. gnomAD

  • web resource: gnomAD broser

  • MutAnno supports to 1) download and preprocess automatically, 2) download preprocessed files from MutAnno dropbox, 3) download manually and then run preporcess moduels.

  • For the hg19, v2.1.1 is available. And for the hg39, v3.0 is available.

3.4.3. UK10K

  • web resource: UK10K of Sanger institude

  • MutAnno supports to 1) download and preprocess automatically, 2) download preprocessed files from MutAnno dropbox, 3) download manually and then run preporcess moduels.

  • Only hg19 version of UK10K is available. For the hg38 version, MutAnno do the liftover from hg19 in the preprocessing.

3.5. Conservation

3.5.1. GERP

  • Download data file (.bw) from ensembl ftp

  • Convert .bw file to .wig using bigWigToWig

  • MutAnno supports to 1) download and preprocess automatically, 2) download preprocessed files from MutAnno dropbox, 3) download manually and then run preporcess moduels.

Note

The current GERP version is 111_mammals (veriosn date is 7/18/20/). This part needs to be updated.

3.5.2. PHASTCONS

  • Download data file from USCS database

  • MutAnno supports to 1) download and preprocess automatically, 2) download preprocessed files from MutAnno dropbox, 3) download manually and then run preporcess moduels.

3.5.3. PHYLOP

3.5.4. SIPHY

  • web resource: gnomAD broser

  • MutAnno supports to 1) download and preprocess automatically, 2) download preprocessed files from MutAnno dropbox, 3) download manually and then run preporcess moduels.

3.6. Pathogenicity

  • web resource: gnomAD broser

  • MutAnno supports to 1) download and preprocess automatically, 2) download preprocessed files from MutAnno dropbox, 3) download manually and then run preporcess moduels.

3.6.1. CADD

  • web resource: CADD web source

  • MutAnno supports to 1) download and preprocess automatically, 2) download preprocessed files from MutAnno dropbox, 3) download manually and then run preporcess moduels.

3.6.2. SpliceAI

  • web resource: gnomAD broser

  • MutAnno supports to 1) download and preprocess automatically, 2) download preprocessed files from MutAnno dropbox, 3) download manually and then run preporcess moduels.

3.7. Variant database

3.7.1. CLINVAR

  • web resource: NCBI ClinVar broser

  • MutAnno supports to 1) download and preprocess automatically, 2) download preprocessed files from MutAnno dropbox, 3) download manually and then run preporcess moduels.

3.7.2. COSMIC

  • web resource: gnomAD broser

  • MutAnno supports to 1) download and preprocess automatically, 2) download preprocessed files from MutAnno dropbox, 3) download manually and then run preporcess moduels.

https://www.ncbi.nlm.nih.gov/variation/docs/ClinVar_vcf_files/

3.8. Gene annotation

For the gene annotation, MutAnno requires several annotation data files from the public resources. You can download and preproces each data sources, and download the preprocessed files from MutAnno storage using the followin comand.

 mutanno download \
         -source_path datasource_directory \
         -source geneannot \
         -version latest \
         -refversion hg38
         -websource mutanno

3.8.1. ENSEMBLgene

  • web resource: ENSEMBL ftp

  • MutAnno supports to 1) download and preprocess automatically, 2) download preprocessed files from MutAnno dropbox, 3) download manually and then run preporcess moduels.

     mutanno download \
             -source_path datasource_directory \
             -source ensemblgene \
             -version latest \
             -refversion hg38
    

3.8.2. ENSEMBLgeneGRCh37

  • web resource: ENSEMBL ftp

  • MutAnno supports to 1) download and preprocess automatically, 2) download preprocessed files from MutAnno dropbox, 3) download manually and then run preporcess moduels.

     mutanno download \
             -source_path datasource_directory \
             -source ensemblgenegrch37 \
             -version latest \
             -refversion hg38
    

3.8.3. CYTOBAND

  • web resource: UCSC ftp

  • MutAnno supports to 1) download and preprocess automatically, 2) download preprocessed files from MutAnno dropbox, 3) download manually and then run preporcess moduels.

     mutanno download \
             -source_path datasource_directory \
             -source cytoband \
             -version latest \
             -refversion hg38
    

3.8.4. RefSeq

  • web resource: RefSeqGene data from NCBI ftp

  • MutAnno supports to 1) download and preprocess automatically, 2) download preprocessed files from MutAnno dropbox, 3) download manually and then run preporcess moduels.

     mutanno download \
             -source_path datasource_directory \
             -source refseq \
             -version latest \
             -refversion hg38
    

3.8.5. HGNC

  • web resource: HGNC data from EBI ftp

  • MutAnno supports to 1) download and preprocess automatically, 2) download preprocessed files from MutAnno dropbox, 3) download manually and then run preporcess moduels.

     mutanno download \
             -source_path datasource_directory \
             -source hgnc \
             -version latest \
             -refversion hg38
    

3.8.6. ClinGen

  • web resource: ClinGen curation status site

  • MutAnno supports to 1) download and preprocess automatically, 2) download preprocessed files from MutAnno dropbox, 3) download manually and then run preporcess moduels.

     mutanno download \
             -source_path datasource_directory \
             -source clingen \
             -version latest \
             -refversion hg38
    

3.8.7. ClinGenDisease

  • web resource: ClinGen gene-validity site

  • MutAnno supports to 1) download and preprocess automatically, 2) download preprocessed files from MutAnno dropbox, 3) download manually and then run preporcess moduels.

     mutanno download \
             -source_path datasource_directory \
             -source clingendisease \
             -version latest \
             -refversion hg38
    

3.8.8. ENSEMBLIDxrefTrscriptID

  • web resource: Uniprot ftp

  • MutAnno supports to 1) download and preprocess automatically, 2) download preprocessed files from MutAnno dropbox, 3) download manually and then run preporcess moduels.

     mutanno download \
             -source_path datasource_directory \
             -source ensemblidxreftrscriptid \
             -version latest \
             -refversion hg38
    

3.8.9. ENSEMBLIDxref

  • web resource: ENSEMBL ftp

  • MutAnno supports to 1) download and preprocess automatically, 2) download preprocessed files from MutAnno dropbox, 3) download manually and then run preporcess moduels.

     mutanno download \
             -source_path datasource_directory \
             -source ensemblifxref \
             -version latest \
             -refversion hg38
    

3.8.10. dbNSFP

  • web resource: dbNSFP ftp

  • MutAnno supports to 1) download and preprocess automatically, 2) download preprocessed files from MutAnno dropbox, 3) download manually and then run preporcess moduels.

     mutanno download \
             -source_path datasource_directory \
             -source dbnsfp \
             -version latest \
             -refversion hg38
    

3.8.11. gnomADmetrics

  • web resource: gnomAD

  • MutAnno supports to 1) download and preprocess automatically, 2) download preprocessed files from MutAnno dropbox, 3) download manually and then run preporcess moduels.

     mutanno download \
             -source_path datasource_directory \
             -source gnomadmetrics \
             -version latest \
             -refversion hg38
    

3.8.12. Marrvel

  • web resource: Marrvel

  • MutAnno supports to 1) download and preprocess automatically, 2) download preprocessed files from MutAnno dropbox, 3) download manually and then run preporcess moduels.

     mutanno download \
             -source_path datasource_directory \
             -source marrvel \
             -version latest \
             -refversion hg38
    

3.8.13. CassaNatGenet2017

  • web resource: s_het score from Cassa et al. Nat. Genet. 2017

  • MutAnno doesn’t support to download raw source file and preprocess automatically. But it supports to download the preprocessed file from MutAnno storage.

     mutanno download \
             -source_path datasource_directory \
             -source ccassanatgenet2017 \
             -version latest \
             -refversion hg38
             -websource mutanno
    

3.8.14. GTEx

  • web resource: GTEx dataset

  • MutAnno supports to 1) download and preprocess automatically, 2) download preprocessed files from MutAnno dropbox, 3) download manually and then run preporcess moduels.

     mutanno download \
             -source_path datasource_directory \
             -source gtex \
             -version latest \
             -refversion hg38
    

3.8.15. BrainSpan

  • web resource: BrainSpan ftp

  • MutAnno supports to 1) download and preprocess automatically, 2) download preprocessed files from MutAnno dropbox, 3) download manually and then run preporcess moduels.

     mutanno download \
             -source_path datasource_directory \
             -source brainspan \
             -version latest \
             -refversion hg38
    

3.8.16. BrainAtlas

  • web resource: BrainAtlas ftp

  • MutAnno supports to 1) download and preprocess automatically, 2) download preprocessed files from MutAnno dropbox, 3) download manually and then run preporcess moduels.

     mutanno download \
             -source_path datasource_directory \
             -source brainatlas \
             -version latest \
             -refversion hg38
    

3.8.17. GenCode

  • web resource: GenCode ftp

  • MutAnno supports to 1) download and preprocess automatically, 2) download preprocessed files from MutAnno dropbox, 3) download manually and then run preporcess moduels.

     mutanno download \
             -source_path datasource_directory \
             -source gencode \
             -version latest \
             -refversion hg38