3. Download Data Source (download)¶
To annotate variants, MutAnno need data source files and can download them with download sub-command.
mutanno download \
-source_path datasource_directory \
-source all \
-version latest \
-refversion hg38 \
-websource mutanno
-source_path: data source path (default: mutanno_source)-source: source name (default: all)-version: source version (default: latest)-refversion: reference version [hg19, hg38] (default: hg38)-websource: web source indication [``, mutanno] (default: ``). When -websource mutanno is used, only preprocessed files can be download from MutAnno web source.
Note
If -source is all, all data sources can be downloaded and preprocessed.
3.1. Datasource List¶
All preprocessed data is available in dropbox shared folder.
Variant annotation (hg38)
hg38 (GRCh38) |
|||
Source name |
Category |
Download |
link |
VEP * |
Annotation |
manually |
|
dbSNP * |
PopulationDB |
auto |
|
gnomAD * |
PopulationDB |
auto |
|
UK10K * |
PopulationDB |
auto |
|
TOPMED * |
PopulationDB |
auto |
|
CLINVAR * |
VariantDB |
auto |
|
COSMIC * |
VariantDB |
auto |
|
SPLICEAI * |
Pathogenicity |
manually |
|
PRIMATEAI * |
Pathogenicity |
manually |
|
CADD * |
Pathogenicity |
auto |
|
GERP * |
Conservation |
auto |
|
PHASTCONS * |
Conservation |
auto |
|
PHYLOP * |
Conservation |
auto |
|
SIPHY * |
Conservation |
auto |
|
SUPER_DUPLICATES * |
Repeatitive |
auto |
|
SIMPLE_REPEAT * |
Repeatitive |
auto |
|
RMSK * |
Repeatitive |
auto |
|
NESTED_REPEATS * |
Repeatitive |
auto |
|
MICROSATELLITE * |
Repeatitive |
auto |
|
auto: support to download and preprocess automatically in mutanno
star(*): the star(*) means this version is used in CGAP project.
hg19 (GRCh37) |
|||
Source name |
Category |
Download |
link |
CADD |
Pathogenicity |
auto |
|
Gene annotation
Source name |
Download |
link |
ENSEMBLgene * |
auto |
|
ENSEMBLgeneGRCh37 * |
auto |
|
CYTOBAND * |
auto |
|
RefSeq * |
auto |
|
HGNC * |
auto |
|
ClinGen * |
auto |
|
ClinGenDisease * |
auto |
|
ENSEMBLIDxrefTrscriptID * |
auto |
|
ENSEMBLIDxref * |
auto |
|
dbNSFP * |
auto |
|
gnomADmetrics * |
auto |
|
Marrvel * |
auto |
|
CassaNatGenet2017 * |
manual |
|
GTEx * |
auto |
|
BrainSpan * |
auto |
|
BrainAtlas * |
auto |
|
GenCode * |
auto |
3.2. Download methods¶
download and preprocess automatically.
1 2 3 4 5
mutanno download \ -source_path datasource_directory \ -source phylop \ -version 20way \ -refversion hg38
download preprocessed file from mutanno dropbox
-websource mutanno option doesn’t run preprocessing module.
1 2 3 4 5 6
mutanno download \ -source_path datasource_directory \ -source phylop \ -version 20way \ -refversion hg38 \ -websource mutanno
download manually (using wget), and then run preprocess module.
1 2 3 4 5 6
wget ftp://hgdownload.cse.ucsc.edu/goldenPath/hg38/phyloP20way/hg38.phyloP20way.bw mutanno preprocess \ -infile datasource_directory/hg38.phyloP20way.bw \ -ds phylop.datastructure.json \ -out datasource_directory/hg38.phyloP20way.mti.gz
3.3. Variant annotation¶
3.3.1. VEP¶
MutAnno doesn’t support to download VEP raw data automatically, but support to download preprocessed files from MutAnno dropbox
3.3.1.1. Download preprocessed files from MutAnno dropbox¶
mutanno download \ -source_path datasource_directory \ -source vep \ -version latest \ -refversion hg38 \ -websource mutanno
3.3.1.2. Make VEP result files and then run preprocess¶
make mock vcf files
mutanno vcfmaker \ -out test.vcf
run VEP
./bin/ensembl-vep-release-99/vep \ -i chr1_100001_200000.vcf \ -o chr1_100001_200000.vcf.vep.txt \ --hgvs \ --fasta GRCh38_full_analysis_set_plus_decoy_hla.fa \ --assembly GRCh38 \ --use_given_ref \ --offline \ --cache_version 99 \ --dir_cache ./bin/nonindexed_vep_cache/homo_sapiens_vep \ --everything \ --force_overwrite \ --vcf \ --plugin MaxEntScan,./bin/VEP_plugins-release-99/fordownload \ --plugin TSSDistance \ --dir_plugins ./bin/VEP_plugins-release-99 \ --plugin SpliceRegion,Extended
Note
download VEP file (v99) from ENSEMBL .
untar and ungzip the downloaded file.
install plugins (https://uswest.ensembl.org/info/docs/tools/vep/script/vep_plugins.html)
preprocess VEP result (convert .mti)
1 2 3 4
mutanno preprocess \ -infile datasource_directory/chr1_100001_200000.vcf.vep.txt \ -ds vep.datastructure.json \ -out datasource_directory/chr1_100001_200000.vcf.vep.mti.gz
We can merge the chopped mti files into one single files using vcf-merger.
3.4. Population data¶
3.4.1. dbSNP¶
web resource: NCBI refseq
MutAnno supports to 1) download and preprocess automatically, 2) download preprocessed files from MutAnno dropbox, 3) download manually and then run preporcess moduels.
3.4.2. gnomAD¶
web resource: gnomAD broser
MutAnno supports to 1) download and preprocess automatically, 2) download preprocessed files from MutAnno dropbox, 3) download manually and then run preporcess moduels.
For the hg19, v2.1.1 is available. And for the hg39, v3.0 is available.
3.4.3. UK10K¶
web resource: UK10K of Sanger institude
MutAnno supports to 1) download and preprocess automatically, 2) download preprocessed files from MutAnno dropbox, 3) download manually and then run preporcess moduels.
Only hg19 version of UK10K is available. For the hg38 version, MutAnno do the liftover from hg19 in the preprocessing.
3.5. Conservation¶
3.5.1. GERP¶
Download data file (.bw) from ensembl ftp
Convert .bw file to .wig using bigWigToWig
MutAnno supports to 1) download and preprocess automatically, 2) download preprocessed files from MutAnno dropbox, 3) download manually and then run preporcess moduels.
Note
The current GERP version is 111_mammals (veriosn date is 7/18/20/). This part needs to be updated.
3.5.2. PHASTCONS¶
Download data file from USCS database
MutAnno supports to 1) download and preprocess automatically, 2) download preprocessed files from MutAnno dropbox, 3) download manually and then run preporcess moduels.
3.5.3. PHYLOP¶
web resource: UCSC database phyloP100way, phyloP30way, phyloP20way
MutAnno supports to 1) download and preprocess automatically, 2) download preprocessed files from MutAnno dropbox, 3) download manually and then run preporcess moduels.
3.5.4. SIPHY¶
web resource: gnomAD broser
MutAnno supports to 1) download and preprocess automatically, 2) download preprocessed files from MutAnno dropbox, 3) download manually and then run preporcess moduels.
3.6. Pathogenicity¶
web resource: gnomAD broser
MutAnno supports to 1) download and preprocess automatically, 2) download preprocessed files from MutAnno dropbox, 3) download manually and then run preporcess moduels.
3.6.1. CADD¶
web resource: CADD web source
MutAnno supports to 1) download and preprocess automatically, 2) download preprocessed files from MutAnno dropbox, 3) download manually and then run preporcess moduels.
3.6.2. SpliceAI¶
web resource: gnomAD broser
MutAnno supports to 1) download and preprocess automatically, 2) download preprocessed files from MutAnno dropbox, 3) download manually and then run preporcess moduels.
3.7. Variant database¶
3.7.1. CLINVAR¶
web resource: NCBI ClinVar broser
MutAnno supports to 1) download and preprocess automatically, 2) download preprocessed files from MutAnno dropbox, 3) download manually and then run preporcess moduels.
3.7.2. COSMIC¶
web resource: gnomAD broser
MutAnno supports to 1) download and preprocess automatically, 2) download preprocessed files from MutAnno dropbox, 3) download manually and then run preporcess moduels.
https://www.ncbi.nlm.nih.gov/variation/docs/ClinVar_vcf_files/
3.8. Gene annotation¶
For the gene annotation, MutAnno requires several annotation data files from the public resources. You can download and preproces each data sources, and download the preprocessed files from MutAnno storage using the followin comand.
mutanno download \ -source_path datasource_directory \ -source geneannot \ -version latest \ -refversion hg38 -websource mutanno
3.8.1. ENSEMBLgene¶
web resource: ENSEMBL ftp
MutAnno supports to 1) download and preprocess automatically, 2) download preprocessed files from MutAnno dropbox, 3) download manually and then run preporcess moduels.
mutanno download \ -source_path datasource_directory \ -source ensemblgene \ -version latest \ -refversion hg38
3.8.2. ENSEMBLgeneGRCh37¶
web resource: ENSEMBL ftp
MutAnno supports to 1) download and preprocess automatically, 2) download preprocessed files from MutAnno dropbox, 3) download manually and then run preporcess moduels.
mutanno download \ -source_path datasource_directory \ -source ensemblgenegrch37 \ -version latest \ -refversion hg38
3.8.3. CYTOBAND¶
web resource: UCSC ftp
MutAnno supports to 1) download and preprocess automatically, 2) download preprocessed files from MutAnno dropbox, 3) download manually and then run preporcess moduels.
mutanno download \ -source_path datasource_directory \ -source cytoband \ -version latest \ -refversion hg38
3.8.4. RefSeq¶
web resource: RefSeqGene data from NCBI ftp
MutAnno supports to 1) download and preprocess automatically, 2) download preprocessed files from MutAnno dropbox, 3) download manually and then run preporcess moduels.
mutanno download \ -source_path datasource_directory \ -source refseq \ -version latest \ -refversion hg38
3.8.5. HGNC¶
web resource: HGNC data from EBI ftp
MutAnno supports to 1) download and preprocess automatically, 2) download preprocessed files from MutAnno dropbox, 3) download manually and then run preporcess moduels.
mutanno download \ -source_path datasource_directory \ -source hgnc \ -version latest \ -refversion hg38
3.8.6. ClinGen¶
web resource: ClinGen curation status site
MutAnno supports to 1) download and preprocess automatically, 2) download preprocessed files from MutAnno dropbox, 3) download manually and then run preporcess moduels.
mutanno download \ -source_path datasource_directory \ -source clingen \ -version latest \ -refversion hg38
3.8.7. ClinGenDisease¶
web resource: ClinGen gene-validity site
MutAnno supports to 1) download and preprocess automatically, 2) download preprocessed files from MutAnno dropbox, 3) download manually and then run preporcess moduels.
mutanno download \ -source_path datasource_directory \ -source clingendisease \ -version latest \ -refversion hg38
3.8.8. ENSEMBLIDxrefTrscriptID¶
web resource: Uniprot ftp
MutAnno supports to 1) download and preprocess automatically, 2) download preprocessed files from MutAnno dropbox, 3) download manually and then run preporcess moduels.
mutanno download \ -source_path datasource_directory \ -source ensemblidxreftrscriptid \ -version latest \ -refversion hg38
3.8.9. ENSEMBLIDxref¶
web resource: ENSEMBL ftp
MutAnno supports to 1) download and preprocess automatically, 2) download preprocessed files from MutAnno dropbox, 3) download manually and then run preporcess moduels.
mutanno download \ -source_path datasource_directory \ -source ensemblifxref \ -version latest \ -refversion hg38
3.8.10. dbNSFP¶
web resource: dbNSFP ftp
MutAnno supports to 1) download and preprocess automatically, 2) download preprocessed files from MutAnno dropbox, 3) download manually and then run preporcess moduels.
mutanno download \ -source_path datasource_directory \ -source dbnsfp \ -version latest \ -refversion hg38
3.8.11. gnomADmetrics¶
web resource: gnomAD
MutAnno supports to 1) download and preprocess automatically, 2) download preprocessed files from MutAnno dropbox, 3) download manually and then run preporcess moduels.
mutanno download \ -source_path datasource_directory \ -source gnomadmetrics \ -version latest \ -refversion hg38
3.8.12. Marrvel¶
web resource: Marrvel
MutAnno supports to 1) download and preprocess automatically, 2) download preprocessed files from MutAnno dropbox, 3) download manually and then run preporcess moduels.
mutanno download \ -source_path datasource_directory \ -source marrvel \ -version latest \ -refversion hg38
3.8.13. CassaNatGenet2017¶
web resource: s_het score from Cassa et al. Nat. Genet. 2017
MutAnno doesn’t support to download raw source file and preprocess automatically. But it supports to download the preprocessed file from MutAnno storage.
mutanno download \ -source_path datasource_directory \ -source ccassanatgenet2017 \ -version latest \ -refversion hg38 -websource mutanno
3.8.14. GTEx¶
web resource: GTEx dataset
MutAnno supports to 1) download and preprocess automatically, 2) download preprocessed files from MutAnno dropbox, 3) download manually and then run preporcess moduels.
mutanno download \ -source_path datasource_directory \ -source gtex \ -version latest \ -refversion hg38
3.8.15. BrainSpan¶
web resource: BrainSpan ftp
MutAnno supports to 1) download and preprocess automatically, 2) download preprocessed files from MutAnno dropbox, 3) download manually and then run preporcess moduels.
mutanno download \ -source_path datasource_directory \ -source brainspan \ -version latest \ -refversion hg38
3.8.16. BrainAtlas¶
web resource: BrainAtlas ftp
MutAnno supports to 1) download and preprocess automatically, 2) download preprocessed files from MutAnno dropbox, 3) download manually and then run preporcess moduels.
mutanno download \ -source_path datasource_directory \ -source brainatlas \ -version latest \ -refversion hg38
3.8.17. GenCode¶
web resource: GenCode ftp
MutAnno supports to 1) download and preprocess automatically, 2) download preprocessed files from MutAnno dropbox, 3) download manually and then run preporcess moduels.
mutanno download \ -source_path datasource_directory \ -source gencode \ -version latest \ -refversion hg38