# Annotation
Configuration of both deposit and view of annotation is defined using a YAML-file (see annotation-config.yml
(opens new window) for an example), deposited in the database using the ella-cli. This file should contain two keys:
deposit
: Defines how the VCF data in theINFO
column should be imported into the database.view
: Defines how the resulting annotation JSON (theannotations
column in theannotation
table) should be shown in the user interface.
Using the ella-cli to deposit this file, a row is inserted into the annotationconfig
table. This table is created and populated by the migration script, using an import config that reflects the legacy annotation import (see src/vardb/datamodel/migration/alembic/data/annotation-config-legacy.yml
).
Subsequently, for each variant in the VCF, the latest inserted row in the annotationconfig
table determines how its INFO
field should be deconstructed, and further, how it should be displayed in the front end.
In addition to the config provided in this table, some configuration relevant for annotation (frequency groups) is defined in the application configuration (ella-config.yml
).
# Annotation deposit
- File:
annotation-config.yml
- Key:
deposit
All current annotation converters read info from the VCF INFO
field. The generic converters should be able to handle most annotation, but specific converters are necessary in some cases.
The list under this key follows this structure:
Subkey | Explanation | Values |
---|---|---|
name | Name of the annotation converter to use. | [string] (See available options below) |
converter_config | Config for the annotation converter specified | [object] |
The converter_config.elements
list describes where to get annotation from, and where in the annotation JSON the results should go:
Subkey | Explanation | Values |
---|---|---|
source | VCF INFO field to fetch raw annotation from | [string] |
target | JSON path describing where the converted annotation should end up | [string] Paths are split on . |
additional_sources | Additional VCF INFO fields used | List of string (optional, only used in specific converters) |
[converter specific] | Various converter specific keys | See description for each of the converters below |
NOTE
Target paths beginning with frequencies
, transcripts
and references
hold special meaning.
# Generic annotation converters
There are currently four available generic annotation converters:
- keyvalue: Use key/value pairs.
- json: Use base16/32/64-encoded JSON data.
- mapping: Use character separated key/value structures.
- meta: Use VCF
###INFO
header to create JSON structures.
Of these, the json
-converter gives the most flexibility, and keyvalue
the most transparency.
All of the examples below generate the same output structure to the column annotations
in the annotation
table:
{ 'PATH': { 'TO': { 'TARGET': { 'foo': 1, 'bar': 2 } } } }
# keyvalue (keyvalueconverter.py
)
Read key/value pairs from annotation.
Specific configuration:
Subkey | Explanation | Values |
---|---|---|
target_mode | How the processed data should be inserted into the JSON | insert (default), extend , append , merge |
target_type | Type to convert the value to | identity (default), float , int , bool , string |
target_type_throw * | Whether to throw an error if casting fails | [bool] Default: true |
split ** | String to split raw annotation value on | [string] (optional) |
*: If target_type_throw
is set to False, it will be treated as no annotation found if casting fails.
**: If split
is defined, the returned value be a list of target_type
elements.
Example config using annotation values FOO=1;BAR=2
:
- name: keyvalue
converter_config:
elements:
- source: FOO
target: PATH.TO.TARGET.foo
target_type: int
- source: BAR
target: PATH.TO.TARGET.bar
target_type: int
# json (jsonconverter.py
)
Reads base16/32/64 encoded JSON data and parses it.
Specific configuration:
Subkey | Explanation | Values |
---|---|---|
encoding | Which encoding the raw data is encoded in | base16 (default), base32 , base64 |
subpath | Subpath to extract data from | [string] Path split on . |
Example config using annotation value MYJSON=7B22666F6F223A20312C2022626172223A20327D
:
- name: json
converter_config:
elements:
- source: MYJSON
target: PATH.TO.TARGET
encoding: base16
Note: base64.b16encode(json.dumps({"foo": 1, "bar": 2}).encode()).decode() == '7B22666F6F223A20312C2022626172223A20327D'
# mapping (mappingconverter.py
)
Reads character separated (e.g. ,
) key/value structures, separated with e.g. :
.
Specific configuration:
Subkey | Explanation | Values |
---|---|---|
item_separator | Separator for the key/value pairs | [string] Default: , |
keyvalue_separator | Separator between key and value | [string] Default: : |
target_type | Type to convert the value to | identity (default), float , int , bool , string |
target_type_throw * | Whether to throw an error if casting fails | [bool] Default: true |
*: If target_type_throw
is set to False, it will be treated as no annotation found if casting fails.
Example config using annotation value DABLA=foo:1,bar:2
:
- name: mapping
converter_config:
elements:
- source: DABLA
target: PATH.TO.TARGET
item_separator: ',' # Default value
keyvalue_separator: ':' # Default value
value_target_type: int
# meta (metaconverter.py
)
Use meta information (##INFO
header) to create JSON structures, where keys are fetched from the header, and values from the annotation. Requires the meta information line to match a given regex pattern for extracting keys.
Specific configuration:
Subkey | Explanation | Values |
---|---|---|
list_separator | Split into lists | [string] (optional) |
value_separator | String that values are split on | [string] Default \| |
meta_pattern | Regex used to fetch keys from the meta description field | Regex string, default: r"(?i)[a-z_]+\|[a-z_\|]+" |
subelements | List of configs | Valid keyvalue configs |
Example config using header line (meta information) ##INFO=<ID=DABLA,Number=.,Type=String,Description="Format: foo|bar">
and annotation value DABLA=1|2
:
- name: meta
converter_config:
elements:
- source: DABLA
target: PATH.TO.TARGET
meta_pattern: (?i)[a-z_]+\|[a-z_\|]+
element_separator: '|'
subelements:
- source: foo
target_type: int
- source: bar
target_type: int
# Specific annotation converters
In addition to the generic annotation converters, the following specific converters are available:
clinvarjson
: Convert current ClinVar data to form expected by database.clinvarreferences
: Read data from ClinVar JSON structure.hgmd
: Read HGMD specific fields.hgmdextrarefs
: Read data fromHGMD__EXTRAREFS
.vep
: Read VEP CSQ-field.
# Annotation view
- File:
annotation-config.yml
- Key:
view
With the annotation JSON structure built by the deposit
key, the view
key determines what and how the annotation should displayed in the front end's Classification page.
The list defined under this key defines the key components required for showing annotation:
Subkey | Explanation | Values |
---|---|---|
section | Define which section of the classification page the annotation should be shown in | analysis , classification , frequency , external , prediction or references |
template | HTML template to use | frequencyDetails , itemList , keyValue or clinvarDetails . See html-files in frontend-legacy/src/js/widgets/annotation/ . |
source | Which part of the annotation JSON that should be passed to the template | [string] Paths are split on . |
title | Title of the sectionbox used to display the annotation | [string] Default: Last part of the source . |
url * | Link target of the title | [string] (optional) |
url_empty * | Link target of the title when no annotation is available | [string] (optional) |
config | Configuration of the view, specific to each template | [object] |
*: URLs can be written as template strings, using the allele
object and attrs.linkText
. See annotation-config.yml
(opens new window) for examples.
# Templates
# frequencyDetails
Shows frequency details in table form. Expects the data for the table to be a nested map:
{
"column1_key": {
"row1_key": <value>,
"row2_key": <value>,
...
},
"column2_key": {
"row1_key": <value>,
"row2_key": <value>,
...
}
}
Subkey | Explanation | Values |
---|---|---|
key_column | Row column name | [string] |
columns | Columns to show | Map of annotation JSON key -> column header display |
rows | Rows to show | Map of annotation JSON key -> row title display |
filter | JSON key to fetch filter values | [string] (optional) Filter values not equal to PASS will show as a warning |
precision | Float precision (for strings). | [integer] (Default: 6) |
scientific_threshold | Convert to scientific notation for frequencies below 10^-[x]. | [integer] (Default: 4) |
# itemList
Creates a simple list of items, optionally with a URL on each item.
On the config object, under the key items
, the following config structure is expected:
Subkey | Explanation | Values |
---|---|---|
subsource | Path on the source object | [string] (optional) |
url | URL to link from to from list item | [string] (optional) |
type | Type of data found at subsource | [string] object / primitives (default) |
key | Key to fetch data from | [string] (Only applicable if type is object ) |
# keyValue
Fetches key/value pairs and displays them.
On the config object, under they key names
an object of this structure is expected:
Subkey | Explanation | Values |
---|---|---|
names | Mapping of keys to display names | Key is used to fetch data from source , display name is what is shown in the UI |
# clinvarDetails
Shows data from ClinVar in a format defined by ella-anno (opens new window) and the ClinVar converter. This does not require a special config (should be set to {}
).
# Application configuration
# Included transcripts
Configure types of transcripts to include from the annotation using regex, e.g. NM_.*
for RefSeq transcripts.
- File:
ella_config.yml
(set byELLA_CONFIG
env variable) - Key:
transcripts.inclusion_regex
# Frequency groups
Defines which data should be in the external
and internal
frequency groups in the frequency filter and ACMG frequency configuration.
- File:
ella_config.yml
(set byELLA_CONFIG
env variable) - Key:
frequencies.groups
Note that this config should match groups defined in annotation-config.yml
.
# Region
Settings related to the REGION section on the CLASSIFICATION page.
- File:
ella_config.yml
(set byELLA_CONFIG
env variable) - Key:
similar_alleles
Subkey | Explanation | Values |
---|---|---|
max_variants | Max variants to display in each of the section's cards. | [integer] |
max_genomic_distance | Distance in base pairs from a variant (in either direction) within which other, finalized assessments should be searched for. | [integer] (bp) |
← UI/UX Gene panels →