# Annotation
NOTE
This documentation is a work in progress and is incomplete.
Please contact developers for more details.
This page describes the default annotation sources included with the service and how to add additional annotation sources.
# Annotation tool: VEP
ELLA anno uses Ensembl's Variant Effect Predictor (VEP) (opens new window) for annotation. Settings are managed in /src/annotation/annotate.sh
.
# Data sources
# Datasets used by anno
Type | Name | Source | Note |
---|---|---|---|
Genome | Human genome GRCh37 | GATK Resource Bundle (opens new window) | ELLA currently requires the GRCh37 build of the human genome, and does not support GRCh38. |
Genes | HGNC | HGNC (opens new window) | Gene ID is fetched from either NCBI gene ID, Ensembl gene ID or gene symbol. |
Genes | RefGene | UCSC (opens new window) | Used for slicing of gnomAD data. |
Transcripts | RefSeq (GFF) | NCBI | |
Transcripts | RefSeq (interim) | NCBI | See Golden Helix blog (opens new window). |
Transcripts | Universal Transcript Archive (UTA) | Biocommons (opens new window) | |
Transcripts | SeqRepo | Biocommons (opens new window) | |
Population frequencies | Genome Aggregation Database (gnomAD) | Broad Institute (opens new window) | Updating gnomAD might break a number of things the ELLA application, and should not be done without thorough testing. From v3.0, gnomAD is only available for genome build GRCh38, which is currently incompatible with ELLA. |
Population frequencies | In-house database | No in-house database is included with the service, this must be added in your own setup. | |
Mutation database | ClinVar | NCBI (opens new window) | |
Mutation database | Human Gene Mutation Database | Qiagen (Pro version) | Pro version requires a paid subscription, and the dataset is not included in /ops/datasets.json in the source code. An alternaive is to use the free version (opens new window) (3 years outdated). |
# Dataset versions
Current versions of datasets used in the annotation service and scripts/commands for downloading or updating are specified in /ops/datasets.json
.
# Custom annotation
If you have other data sources you wish to annotate with, you can easily extend the reference data sources by modifying /ops/datasets.json
with your custom data sources. See also the vcfanno documentation (opens new window) for the vcfanno section of datasets.json
.
See the (currently non-existent) /examples
repo dir for examples on how to extend ELLA anno with your own data.