# Annotation


This documentation is a work in progress and is incomplete.

Please contact developers for more details.

This page describes the default annotation sources included with the service and how to add additional annotation sources.

# Annotation tool: VEP

ELLA anno uses Ensembl's Variant Effect Predictor (VEP) (opens new window) for annotation. Settings are managed in /src/annotation/annotate.sh.

# Data sources

# Datasets used by anno

Type Name Source Note
Genome Human genome GRCh37 GATK Resource Bundle (opens new window) ELLA currently requires the GRCh37 build of the human genome, and does not support GRCh38.
Genes HGNC HGNC (opens new window) Gene ID is fetched from either NCBI gene ID, Ensembl gene ID or gene symbol.
Genes RefGene UCSC (opens new window) Used for slicing of gnomAD data.
Transcripts RefSeq (GFF) NCBI
Transcripts RefSeq (interim) NCBI See Golden Helix blog (opens new window).
Transcripts Universal Transcript Archive (UTA) Biocommons (opens new window)
Transcripts SeqRepo Biocommons (opens new window)
Population frequencies Genome Aggregation Database (gnomAD) Broad Institute (opens new window) Updating gnomAD might break a number of things the ELLA application, and should not be done without thorough testing. From v3.0, gnomAD is only available for genome build GRCh38, which is currently incompatible with ELLA.
Population frequencies In-house database No in-house database is included with the service, this must be added in your own setup.
Mutation database ClinVar NCBI (opens new window)
Mutation database Human Gene Mutation Database Qiagen (Pro version) Pro version requires a paid subscription, and the dataset is not included in /ops/datasets.json in the source code. An alternaive is to use the free version (opens new window) (3 years outdated).

# Dataset versions

Current versions of datasets used in the annotation service and scripts/commands for downloading or updating are specified in /ops/datasets.json.

# Custom annotation

If you have other data sources you wish to annotate with, you can easily extend the reference data sources by modifying /ops/datasets.json with your custom data sources. See also the vcfanno documentation (opens new window) for the vcfanno section of datasets.json.

See the (currently non-existent) /examples repo dir for examples on how to extend ELLA anno with your own data.

Last Updated: 1/22/2021, 3:18:06 PM