6. Annotate Variants (annot)

MutAnno can read all variants of VCF file (-vcf), annotate them with annotation source file (-sourfile) and data structure file (-ds), and generate annotated VCF file (-out). For the variant annotation, the simple command example is:

mutanno annot \
        -vcf input.vcf.gz \
        -out out_annotated.vcf \
        -ds datastructure.json \
        -sourfile annotation_sourcefile.tsi.gz

6.1. Input file

6.1.1. VCF file (-vcf)

MutAnno can import text-based VCF file(.vcf) or gzipped/bgzipped VCF file (.vcf.gz) as input file. MutAnno cannot support mutliple VCF files, but a single vcf file.

mutanno annot -vcf input.vcf
mutanno annot -vcf input.vcf.gz

6.1.2. Data source file (-sourcefile)

If you have single source file which is generated by mutanno makedata, you can assign the file with this option. You can see more details for generation of this single source file in here.

mutanno annot -sourcefile annotation_sourcefile.tsi.gz

Note

But, if you don’t have the single source file and only have the multiple downloaded source files, you can add the file path in the data structure`(`-ds``) file. In this case, you don’t need to use this -sourcefile option.

6.1.3. Data structure file (-ds)

The data structure file includes the annotation feature list being extracted from source file(s). You can assign the structure file with -ds option and get more information in here.

mutanno annot -ds data_structure.json

6.2. Output file

6.2.1. Annotated output file (-out)

The output file name is assigned with -out option.

mutanno annot -out annotated.vcf
mutanno annot -out annotated.vcf.gz

6.2.2. Out file type (-outtype)

You can set out file type (vcf, json, or vcf json) with -outtype option. By default, the out file type is vcf.

mutanno annot -outtype vcf
mutanno annot -outtype json
mutanno annot -outtype vcf json

6.3. Split multi-allelic variant into single allelic variant (-split_multi_allelic_variant)

In the VCF file, there is a multi-allelic variant that has two or more alternative alleles as follows:

#CHROM   POS       ID     REF     ALT
chr1     2376453   .      A       T,G

For the multi-allelic variant, MutAnno can add the annotations of those alternative alleles in corresponding INFO field.

However, if you want to keep the annotation of each alternative allele, you can use -single_source_mode option. When you are using this option, MutAnno can split the multi alternative alleles into two or more variants using several lines. And it add the annotation of each single variant in each line.

#CHROM   POS       ID     REF     ALT
chr1     2376453   .      A       T
chr1     2376453   .      A       G
mutanno annot -single_source_mode

6.4. Add additional information

6.4.1. Add genotype (-genoinfo)

If you want to add genotype information of each samples in the INFO field, you can use -genoinfo option.

mutanno annot -genoinfo

This option can add the following information separated by |(pipe):

  • genotype(number format)

  • genotype(base format)

  • reference read depth/total depth

  • sample ID

And, each genoinfos are separated by ,(comma).

#INFO
SAMPLEGENO=1/1|C/C|0/1778|NA12877_sample,0/0|T/T|2890/0|NA12878_sample;

6.4.2. Add hg19 coordinates (-hg19)

If the reference version is hg38 (or GRCh38) and you want to add coordinates of hg19 for all variants, use -hg19 option. With this option, MutAnno can append hg19 coordinates using pyliftover librarry.

mutanno annot -hg19

Note

This option needs chain file that can be automatically downloaded from UCSC (http://hgdownload.cse.ucsc.edu/goldenpath/hg38/liftOver/). But if you want to use other chain file, you can assign the file with -chain option.

mutanno annot -hg19 -chain hg38ToHg19.over.chain.gz

6.5. Remove annotation tag

MutAnno can remove annotations (or tags) that have already been written in INFO fields of VCF file. If you want to remove some annotations, you can use -clean_tag.

mutanno annot -clean_tag SpliceAI CLINVAR gnomADgenome