# Annotation
NOTE
This documentation is a work in progress and is incomplete.
Please contact developers for more details.
This page describes the default annotation sources included with the service and how to add additional annotation sources.
# Annotation tool: VEP
ELLA anno uses Ensembl's Variant Effect Predictor (VEP) (opens new window) for annotation. Settings are managed in /src/annotation/annotate.sh.
# Data sources
# Datasets used by anno
| Type | Name | Source | Note |
|---|---|---|---|
| Genome | Human genome GRCh37 | GATK Resource Bundle (opens new window) | ELLA currently requires the GRCh37 build of the human genome, and does not support GRCh38. |
| Genes | HGNC | HGNC (opens new window) | Gene ID is fetched from either NCBI gene ID, Ensembl gene ID or gene symbol. |
| Genes | RefGene | UCSC (opens new window) | Used for slicing of gnomAD data. |
| Transcripts | RefSeq (GFF) | NCBI | |
| Transcripts | RefSeq (interim) | NCBI | See Golden Helix blog (opens new window). |
| Transcripts | Universal Transcript Archive (UTA) | Biocommons (opens new window) | |
| Transcripts | SeqRepo | Biocommons (opens new window) | |
| Population frequencies | Genome Aggregation Database (gnomAD) | Broad Institute (opens new window) | Updating gnomAD might break a number of things the ELLA application, and should not be done without thorough testing. From v3.0, gnomAD is only available for genome build GRCh38, which is currently incompatible with ELLA. |
| Population frequencies | In-house database | No in-house database is included with the service, this must be added in your own setup. | |
| Mutation database | ClinVar | NCBI (opens new window) | |
| Mutation database | Human Gene Mutation Database | Qiagen (Pro version) | Pro version requires a paid subscription, and the dataset is not included in /ops/datasets.json in the source code. An alternaive is to use the free version (opens new window) (3 years outdated). |
# Dataset versions
Current versions of datasets used in the annotation service and scripts/commands for downloading or updating are specified in /ops/datasets.json.
# Custom annotation
If you have other data sources you wish to annotate with, you can easily extend the reference data sources by modifying /ops/datasets.json with your custom data sources. See also the vcfanno documentation (opens new window) for the vcfanno section of datasets.json.
See the (currently non-existent) /examples repo dir for examples on how to extend ELLA anno with your own data.