dae.annotation package
Subpackages
- dae.annotation.tests package
- Submodules
- dae.annotation.tests.conftest module
- dae.annotation.tests.test_allele_score_annotator module
- dae.annotation.tests.test_annotatable module
- dae.annotation.tests.test_annotate_columns module
- dae.annotation.tests.test_annotate_columns_and_vcf module
- dae.annotation.tests.test_annotate_columns_cnv_pipeline module
- dae.annotation.tests.test_annotate_columns_cshl module
- dae.annotation.tests.test_annotate_doc module
- dae.annotation.tests.test_annotate_schema2_parquet module
- dae.annotation.tests.test_annotation_pipeline_config module
- dae.annotation.tests.test_basiscs module
- dae.annotation.tests.test_basiscs_with_debug module
- dae.annotation.tests.test_cli_annotation_context module
- dae.annotation.tests.test_cnv_collection_annotator module
- dae.annotation.tests.test_coordinates module
- dae.annotation.tests.test_effect_annotator module
- dae.annotation.tests.test_gene_score_annotator module
- dae.annotation.tests.test_gene_set_annotator module
- dae.annotation.tests.test_liftover_allele module
- dae.annotation.tests.test_liftover_annotator module
- dae.annotation.tests.test_normalize_variant module
- dae.annotation.tests.test_np_score_annotator module
- dae.annotation.tests.test_parse_ivan module
- dae.annotation.tests.test_pipeline module
- dae.annotation.tests.test_pipeline_error module
- dae.annotation.tests.test_position_score_annotator module
- dae.annotation.tests.test_reannotation module
- dae.annotation.tests.test_regions_annotation module
- dae.annotation.tests.test_regions_effect_annotation module
- dae.annotation.tests.test_regions_liftover module
- dae.annotation.tests.test_regions_normalize_allele_annotator module
- dae.annotation.tests.test_schema module
- dae.annotation.tests.test_simple_effect_annotator module
- dae.annotation.tests.test_vcf_info_score1_annotator module
- Module contents
Submodules
dae.annotation.annotatable module
- class dae.annotation.annotatable.Annotatable(chrom: str, pos: int, pos_end: int, annotatable_type: Type)[source]
Bases:
object
Base class for annotatables used in annotation pipeline.
- class Type(value, names=None, *, module=None, qualname=None, type=None, start=1, boundary=None)[source]
Bases:
Enum
Defines annotatable types.
- COMPLEX = 5
- LARGE_DELETION = 7
- LARGE_DUPLICATION = 6
- POSITION = 0
- REGION = 1
- SMALL_DELETION = 4
- SMALL_INSERTION = 3
- SUBSTITUTION = 2
- property chrom: str
- property chromosome: str
- property end_position: int
- static from_string(value: str) Annotatable [source]
Deserialize an Annotatable instance from a string value.
- property pos: int
- property pos_end: int
- property position: int
- class dae.annotation.annotatable.CNVAllele(chrom: str, pos_begin: int, pos_end: int, cnv_type: Type)[source]
Bases:
Annotatable
Defines copy number variants annotatable.
- class dae.annotation.annotatable.Position(chrom: str, pos: int)[source]
Bases:
Annotatable
Annotatable class representing a single position in a chromosome.
- class dae.annotation.annotatable.Region(chrom: str, pos_begin: int, pos_end: int)[source]
Bases:
Annotatable
Annotatable class representing a region in a chromosome.
- class dae.annotation.annotatable.VCFAllele(chrom: str, pos: int, ref: str, alt: str)[source]
Bases:
Annotatable
Defines small variants annotatable.
- property alt: str
- property alternative: str
- static from_string(value: str) VCFAllele [source]
Deserialize an Annotatable instance from a string value.
- property ref: str
- property reference: str
dae.annotation.annotate_columns module
- class dae.annotation.annotate_columns.AnnotateColumnsTool(raw_args: list[str] | None = None, gpf_instance: GPFInstance | None = None)[source]
Bases:
AnnotationTool
Annotation tool for TSV-style text files.
- dae.annotation.annotate_columns.combine(args: Any, pipeline_config: list[dict[str, Any]] | RawFullConfig, pipeline_config_old: str | None, grr_definition: dict | None, partfile_paths: list[str], out_file_path: str, ref_genome_id: str) None [source]
Combine annotated region parts into a single VCF file.
dae.annotation.annotate_doc module
dae.annotation.annotate_schema2_parquet module
- class dae.annotation.annotate_schema2_parquet.AnnotateSchema2ParquetTool(raw_args=None, gpf_instance=None)[source]
Bases:
AnnotationTool
Annotation tool for the Parquet file format.
- dae.annotation.annotate_schema2_parquet.cli(raw_args: list[str] | None = None, gpf_instance: GPFInstance | None = None) None [source]
Entry method for AnnotateSchema2ParquetTool.
dae.annotation.annotate_utils module
- class dae.annotation.annotate_utils.AnnotationTool(raw_args: list[str] | None = None, gpf_instance: GPFInstance | None = None)[source]
Bases:
object
Base class for annotation tools. Format-agnostic.
- static annotate(handler: AbstractFormat, batch_mode: bool) None [source]
Run annotation.
- static produce_annotation_pipeline(pipeline_config: list[dict[str, Any]], pipeline_config_old: str | None, grr_definition: dict | None, *, allow_repeated_attributes: bool, work_dir: Path | None = None, full_reannotation: bool = False) AnnotationPipeline [source]
Produce an annotation or reannotation pipeline.
- dae.annotation.annotate_utils.produce_partfile_paths(input_file_path: str, regions: list[Region], work_dir: str) list[str] [source]
Produce a list of file paths for output region part files.
dae.annotation.annotate_vcf module
- class dae.annotation.annotate_vcf.AnnotateVCFTool(raw_args: list[str] | None = None, gpf_instance: GPFInstance | None = None)[source]
Bases:
AnnotationTool
Annotation tool for the VCF file format.
dae.annotation.annotation_config module
- class dae.annotation.annotation_config.AnnotationConfigParser[source]
Bases:
object
Parser for annotation configuration.
- static has_wildcard(string: str) bool [source]
Ascertain whether a string contains a valid wildcard.
- static match_labels_query(query: dict[str, str], resource_labels: dict[str, str]) bool [source]
Check if the labels query for a wildcard matches.
- static parse_complete(raw: dict[str, Any], idx: int, grr: GenomicResourceRepo | None = None) list[AnnotatorInfo] [source]
Parse a full-form annotation config.
- static parse_minimal(raw: str, idx: int) AnnotatorInfo [source]
Parse a minimal-form annotation config.
- static parse_raw(pipeline_raw_config: list[dict[str, Any]] | RawFullConfig | None, grr: GenomicResourceRepo | None = None) tuple[AnnotationPreamble | None, list[AnnotatorInfo]] [source]
Parse raw dictionary annotation pipeline configuration.
- static parse_raw_attribute_config(raw_attribute_config: dict[str, Any]) AttributeInfo [source]
Parse annotation attribute raw configuration.
- static parse_raw_attributes(raw_attributes_config: Any) list[AttributeInfo] [source]
Parse annotator pipeline attribute configuration.
- static parse_short(raw: dict[str, Any], idx: int, grr: GenomicResourceRepo | None = None) list[AnnotatorInfo] [source]
Parse a short-form annotation config.
- static parse_str(content: str, source_file_name: str | None = None, grr: GenomicResourceRepo | None = None) tuple[AnnotationPreamble | None, list[AnnotatorInfo]] [source]
Parse annotation pipeline configuration string.
- static query_resources(annotator_type: str, wildcard: str, grr: GenomicResourceRepo) list[str] [source]
Collect resources matching a given query.
- class dae.annotation.annotation_config.AnnotationPreamble(summary: str, description: str, input_reference_genome: str, input_reference_genome_res: dae.genomic_resources.repository.GenomicResource | None, metadata: dict[str, Any])[source]
Bases:
object
- description: str
- input_reference_genome: str
- input_reference_genome_res: GenomicResource | None
- metadata: dict[str, Any]
- summary: str
- class dae.annotation.annotation_config.AnnotatorInfo(_type: str, attributes: list[AttributeInfo], parameters: ParamsUsageMonitor | dict[str, Any], documentation: str = '', resources: list[GenomicResource] | None = None, annotator_id: str = 'N/A')[source]
Bases:
object
Defines annotator configuration.
- annotator_id: str
- attributes: list[AttributeInfo]
- documentation: str = ''
- parameters: ParamsUsageMonitor
- resources: list[GenomicResource]
- type: str
- class dae.annotation.annotation_config.AttributeInfo(name: str, source: str, *, internal: bool, parameters: ParamsUsageMonitor | dict[str, Any], _type: str = 'str', description: str = '', documentation: str | None = None)[source]
Bases:
object
Defines annotation attribute configuration.
- description: str = ''
- property documentation: str
- internal: bool
- name: str
- parameters: ParamsUsageMonitor
- source: str
- type: str = 'str'
- class dae.annotation.annotation_config.ParamsUsageMonitor(data: dict[str, Any])[source]
Bases:
Mapping
Class to monitor usage of annotator parameters.
- class dae.annotation.annotation_config.RawFullConfig[source]
Bases:
TypedDict
- annotators: list[dict[str, Any]]
- preamble: RawPreamble
dae.annotation.annotation_factory module
Factory for creation of annotation pipeline.
- dae.annotation.annotation_factory.build_annotation_pipeline(config: list[dict[str, Any]] | RawFullConfig, grr: GenomicResourceRepo, *, allow_repeated_attributes: bool = False, work_dir: Path | None = None, config_old_raw: str | None = None, full_reannotation: bool = False) AnnotationPipeline [source]
Build an annotation pipeline.
- dae.annotation.annotation_factory.check_for_repeated_attributes_in_annotator(annotator_config: AnnotatorInfo) None [source]
Check for repeated attributes in annotator configuration.
- dae.annotation.annotation_factory.check_for_repeated_attributes_in_pipeline(pipeline: AnnotationPipeline, *, allow_repeated_attributes: bool = False) None [source]
Check for repeated attributes in pipeline configuration.
- dae.annotation.annotation_factory.check_for_unused_parameters(info: AnnotatorInfo) None [source]
Check annotator configuration for unused parameters.
- dae.annotation.annotation_factory.copy_annotation_pipeline(pipeline: AnnotationPipeline) AnnotationPipeline [source]
Copy an annotation pipeline instance.
- dae.annotation.annotation_factory.copy_reannotation_pipeline(pipeline: ReannotationPipeline) ReannotationPipeline [source]
Copy a reannotation pipeline instance.
- dae.annotation.annotation_factory.get_annotator_factory(annotator_type: str) Callable[[AnnotationPipeline, AnnotatorInfo], Annotator] [source]
Find and return a factory function for creation of an annotator type.
If the specified annotator type is not found, this function raises ValueError exception.
- Returns:
the annotator factory for the specified annotator type.
- Raises:
ValueError – when can’t find an annotator factory for the specified annotator type.
- dae.annotation.annotation_factory.get_available_annotator_types() list[str] [source]
Return the list of all registered annotator factory types.
- dae.annotation.annotation_factory.load_pipeline_from_file(raw_path: str, grr: GenomicResourceRepo, *, allow_repeated_attributes: bool = False, work_dir: Path | None = None) AnnotationPipeline [source]
Load an annotation pipeline from a configuration file.
- dae.annotation.annotation_factory.load_pipeline_from_yaml(raw: str, grr: GenomicResourceRepo, *, allow_repeated_attributes: bool = False, work_dir: Path | None = None) AnnotationPipeline [source]
Load an annotation pipeline from a YAML-formatted string.
- dae.annotation.annotation_factory.register_annotator_factory(annotator_type: str, factory: Callable[[AnnotationPipeline, AnnotatorInfo], Annotator]) None [source]
Register additional annotator factory.
By default all genotype storage factories should be registered at [dae.genotype_storage.factories] extenstion point. All registered factories are loaded automatically. This function should be used if you want to bypass extension point mechanism and register addition genotype storage factory programatically.
- dae.annotation.annotation_factory.resolve_repeated_attributes(pipeline: AnnotationPipeline, repeated_attributes: set[str]) None [source]
Resolve repeated attributes in pipeline configuration via renaming.
dae.annotation.annotation_pipeline module
Provides annotation pipeline class.
- class dae.annotation.annotation_pipeline.AnnotationPipeline(repository: GenomicResourceRepo)[source]
Bases:
object
Provides annotation pipeline abstraction.
- annotate(annotatable: Annotatable, context: dict | None = None) dict [source]
Apply all annotators to an annotatable.
- batch_annotate(annotatables: list[Annotatable | None], contexts: list[dict] | None = None, batch_work_dir: str | None = None) list[dict] [source]
Apply all annotators to a list of annotatables.
- build_pipeline_genomic_context() GenomicContext [source]
Create a genomic context from the pipeline parameters.
- get_annotator_by_attribute_info(attribute_info: AttributeInfo) Annotator | None [source]
- get_attribute_info(attribute_name: str) AttributeInfo | None [source]
- get_attributes() list[AttributeInfo] [source]
- get_info() list[AnnotatorInfo] [source]
- open() AnnotationPipeline [source]
Open all annotators in the pipeline and mark it as open.
- class dae.annotation.annotation_pipeline.Annotator(pipeline: AnnotationPipeline | None, info: AnnotatorInfo)[source]
Bases:
ABC
Annotator provides a set of attrubutes for a given Annotatable.
- abstract annotate(annotatable: Annotatable | None, context: dict[str, Any]) dict[str, Any] [source]
Produce annotation attributes for an annotatable.
- property attributes: list[AttributeInfo]
- batch_annotate(annotatables: list[Annotatable | None], contexts: list[dict[str, Any]], batch_work_dir: str | None = None) Iterable[dict[str, Any]] [source]
- get_info() AnnotatorInfo [source]
- property resource_ids: set[str]
- property resources: list[GenomicResource]
- property used_context_attributes: tuple[str, ...]
- class dae.annotation.annotation_pipeline.AnnotatorDecorator(child: Annotator)[source]
Bases:
Annotator
Defines annotator decorator base class.
- class dae.annotation.annotation_pipeline.FullReannotationPipeline(pipeline_new: AnnotationPipeline, pipeline_old: AnnotationPipeline)[source]
Bases:
ReannotationPipeline
Special-case ReannotationPipeline.
Completely removes all old attributes and runs every new annotator, without reusing anything.
- class dae.annotation.annotation_pipeline.InputAnnotableAnnotatorDecorator(child: Annotator)[source]
Bases:
AnnotatorDecorator
Defines annotator decorator to use input annotatable if defined.
- annotate(annotatable: Annotatable | None, context: dict[str, Any]) dict[str, Any] [source]
Produce annotation attributes for an annotatable.
- property used_context_attributes: tuple[str, ...]
- class dae.annotation.annotation_pipeline.ReannotationPipeline(pipeline_new: AnnotationPipeline, pipeline_old: AnnotationPipeline)[source]
Bases:
AnnotationPipeline
Special pipeline that handles reannotation of a previous pipeline.
- AnnotationDependencyGraph
alias of
dict
[AnnotatorInfo
,list
[tuple
[AnnotatorInfo
,AttributeInfo
]]]
- annotate(annotatable: Annotatable, record: dict | None) dict [source]
Apply all annotators to an annotatable.
- static build_dependency_graph(pipeline: AnnotationPipeline) AnnotationDependencyGraph [source]
Make dependency graph for an annotation pipeline.
- get_attributes() list[AttributeInfo] [source]
- get_dependencies_for(info: AnnotatorInfo) set[AnnotatorInfo] [source]
Get all dependencies for a given annotator.
- get_dependents_for(info: AnnotatorInfo) set[AnnotatorInfo] [source]
Get all dependents for a given annotator.
- class dae.annotation.annotation_pipeline.ValueTransformAnnotatorDecorator(child: Annotator, value_transformers: dict[str, Callable[[Any], Any]])[source]
Bases:
AnnotatorDecorator
Define value transformer annotator decorator.
- annotate(annotatable: Annotatable | None, context: dict[str, Any]) dict[str, Any] [source]
Produce annotation attributes for an annotatable.
dae.annotation.annotator_base module
Provides base class for annotators.
- class dae.annotation.annotator_base.AnnotatorBase(pipeline: AnnotationPipeline | None, info: AnnotatorInfo, source_type_desc: dict[str, tuple[str, str]])[source]
Bases:
Annotator
Base implementation of the Annotator class.
- annotate(annotatable: Annotatable | None, context: dict[str, Any]) dict[str, Any] [source]
Produce annotation attributes for an annotatable.
- batch_annotate(annotatables: list[Annotatable | None], contexts: list[dict[str, Any]], batch_work_dir: str | None = None) list[dict[str, Any]] [source]
dae.annotation.cnv_collection_annotator module
- class dae.annotation.cnv_collection_annotator.CnvCollectionAnnotator(pipeline: AnnotationPipeline, info: AnnotatorInfo)[source]
Bases:
Annotator
Simple effect annotator class.
- annotate(annotatable: Annotatable | None, context: dict[str, Any]) dict[str, Any] [source]
Produce annotation attributes for an annotatable.
- dae.annotation.cnv_collection_annotator.build_cnv_collection_annotator(pipeline: AnnotationPipeline, info: AnnotatorInfo) Annotator [source]
dae.annotation.context module
- class dae.annotation.context.CLIAnnotationContext(context_objects: dict[str, Any], source: tuple[str, ...])[source]
Bases:
CLIGenomicContext
Defines annotation pipeline genomics context.
- static add_context_arguments(parser: ArgumentParser) None [source]
Add command line arguments to the argument parser.
- static context_builder(args: Namespace) CLIAnnotationContext [source]
Build a CLI genomic context.
- static get_pipeline(context: GenomicContext) AnnotationPipeline [source]
Construct an annotation pipeline.
dae.annotation.debug_annotator module
- class dae.annotation.debug_annotator.HelloWorldAnnotator(pipeline: AnnotationPipeline, info: AnnotatorInfo)[source]
Bases:
Annotator
Defines example annotator.
- annotate(annotatable: Annotatable | None, context: dict[str, Any]) dict[str, Any] [source]
Produce annotation attributes for an annotatable.
- dae.annotation.debug_annotator.build_annotator(pipeline: AnnotationPipeline, info: AnnotatorInfo) Annotator [source]
Create an example hello world annotator.
dae.annotation.docker_annotator module
- class dae.annotation.docker_annotator.DockerAnnotator(pipeline: AnnotationPipeline | None, info: AnnotatorInfo)[source]
Bases:
AnnotatorBase
Base class for annotators that use docker containers.
dae.annotation.effect_annotator module
- class dae.annotation.effect_annotator.EffectAnnotatorAdapter(pipeline: AnnotationPipeline, info: AnnotatorInfo)[source]
Bases:
AnnotatorBase
Adapts effect annotator to be used in annotation infrastructure.
- dae.annotation.effect_annotator.build_effect_annotator(pipeline: AnnotationPipeline, info: AnnotatorInfo) Annotator [source]
dae.annotation.format_handlers module
- class dae.annotation.format_handlers.AbstractFormat(pipeline_config: list[dict[str, Any]] | RawFullConfig, pipeline_config_old: str | None, cli_args: dict, grr_definition: dict | None, region: Region | None)[source]
Bases:
object
Abstract class of input/output handlers for various formats.
This class and its children are responsible for correctly reading from and writing to formats that can be annotated by our system.
They convert the raw input data to types that can be passed to the annotation pipeline and then convert it back to its native format, as well as handling the reading, updating and writing of metadata the format may possess.
Each child class handles the specific differences of a single format.
- class dae.annotation.format_handlers.ColumnsFormat(pipeline_config: list[dict[str, Any]] | RawFullConfig, pipeline_config_old: str | None, cli_args: dict, grr_definition: dict | None, region: Region | None, input_path: str, output_path: str, ref_genome_id: str | None)[source]
Bases:
AbstractFormat
Handler for delimiter-separated values text files.
- class dae.annotation.format_handlers.ParquetFormat(pipeline_config: list[dict[str, Any]] | RawFullConfig, pipeline_config_old: str | None, cli_args: dict, grr_definition: dict | None, region: Region | None, input_layout: Schema2DatasetLayout, output_dir: str, bucket_idx: int)[source]
Bases:
AbstractFormat
Handler for Schema2 Parquet datasets.
- class dae.annotation.format_handlers.VCFFormat(pipeline_config: list[dict[str, Any]] | RawFullConfig, pipeline_config_old: str | None, cli_args: dict, grr_definition: dict | None, region: Region | None, input_path: str, output_path: str)[source]
Bases:
AbstractFormat
Handler for VCF format files.
dae.annotation.gene_score_annotator module
Module containing the gene score annotator.
- class dae.annotation.gene_score_annotator.GeneScoreAnnotator(pipeline: AnnotationPipeline | None, info: AnnotatorInfo, gene_score_resource: GenomicResource, input_gene_list: str)[source]
Bases:
Annotator
Gene score annotator class.
- DEFAULT_AGGREGATOR_TYPE = 'dict'
- aggregate_gene_values(score_id: str, gene_symbols: list[str], aggregator_type: str) Any [source]
Aggregate gene score values.
- annotate(_: Annotatable | None, context: dict[str, Any]) dict[str, Any] [source]
Produce annotation attributes for an annotatable.
- property used_context_attributes: tuple[str, ...]
- dae.annotation.gene_score_annotator.build_gene_score_annotator(pipeline: AnnotationPipeline, info: AnnotatorInfo) Annotator [source]
Create a gene score annotator.
dae.annotation.gene_set_annotator module
- class dae.annotation.gene_set_annotator.GeneSetAnnotator(pipeline: AnnotationPipeline | None, info: AnnotatorInfo, gene_set_resource: GenomicResource, input_gene_list: str)[source]
Bases:
AnnotatorBase
Gene set annotator class.
- property used_context_attributes: tuple[str, ...]
- dae.annotation.gene_set_annotator.build_gene_set_annotator(pipeline: AnnotationPipeline, info: AnnotatorInfo) Annotator [source]
Create a gene set annotator.
dae.annotation.liftover_annotator module
Provides a lift over annotator and helpers.
- class dae.annotation.liftover_annotator.AbstractLiftoverAnnotator(pipeline: AnnotationPipeline | None, info: AnnotatorInfo, chain: LiftoverChain, source_genome: ReferenceGenome, target_genome: ReferenceGenome)[source]
Bases:
AnnotatorBase
Liftovver annotator class.
- liftover_cnv(cnv_allele: Annotatable) Annotatable | None [source]
Liftover CNV allele annotatable.
- liftover_position(position: Annotatable) Annotatable | None [source]
Liftover position annotatable.
- liftover_region(region: Annotatable) Annotatable | None [source]
Liftover region annotatable.
- class dae.annotation.liftover_annotator.BasicLiftoverAnnotator(pipeline: AnnotationPipeline | None, info: AnnotatorInfo, chain: LiftoverChain, source_genome: ReferenceGenome, target_genome: ReferenceGenome)[source]
Bases:
AbstractLiftoverAnnotator
Basic liftover annotator class.
- class dae.annotation.liftover_annotator.BcfLiftoverAnnotator(pipeline: AnnotationPipeline | None, info: AnnotatorInfo, chain: LiftoverChain, source_genome: ReferenceGenome, target_genome: ReferenceGenome)[source]
Bases:
AbstractLiftoverAnnotator
BCF tools liftover re-implementation annotator class.
- dae.annotation.liftover_annotator.basic_liftover_allele(chrom: str, pos: int, ref: str, alt: str, liftover_chain: LiftoverChain, source_genome: ReferenceGenome, target_genome: ReferenceGenome) tuple[str, int, str, str] | None [source]
Basic liftover an allele.
- dae.annotation.liftover_annotator.basic_liftover_variant(chrom: str, pos: int, ref: str, alts: list[str], liftover_chain: LiftoverChain, source_genome: ReferenceGenome, target_genome: ReferenceGenome) tuple[str, int, str, list[str]] | None [source]
Basic liftover variant utility function.
- dae.annotation.liftover_annotator.bcf_liftover_allele(chrom: str, pos: int, ref: str, alt: str, liftover_chain: LiftoverChain, source_genome: ReferenceGenome, target_genome: ReferenceGenome) tuple[str, int, str, str] | None [source]
Liftover a variant.
- dae.annotation.liftover_annotator.bcf_liftover_variant(chrom: str, pos: int, ref: str, alts: list[str], liftover_chain: LiftoverChain, source_genome: ReferenceGenome, target_genome: ReferenceGenome) tuple[str, int, str, list[str]] | None [source]
BCF liftover variant utility function.
- dae.annotation.liftover_annotator.build_liftover_annotator(pipeline: AnnotationPipeline, info: AnnotatorInfo) Annotator [source]
Create a liftover annotator.
dae.annotation.normalize_allele_annotator module
Provides normalize allele annotator and helpers.
- class dae.annotation.normalize_allele_annotator.NormalizeAlleleAnnotator(pipeline: AnnotationPipeline, info: AnnotatorInfo)[source]
Bases:
AnnotatorBase
Annotator to normalize VCF alleles.
- dae.annotation.normalize_allele_annotator.build_normalize_allele_annotator(pipeline: AnnotationPipeline, info: AnnotatorInfo) Annotator [source]
- dae.annotation.normalize_allele_annotator.normalize_allele(allele: VCFAllele, genome: ReferenceGenome) VCFAllele [source]
Normalize an allele.
Using algorithm defined in following https://genome.sph.umich.edu/wiki/Variant_Normalization
dae.annotation.parquet module
- dae.annotation.parquet.annotate_parquet(input_layout: Schema2DatasetLayout, output_dir: str, pipeline_config: list[dict[str, Any]], region: str, grr_definition: dict, bucket_idx: int, allow_repeated_attributes: bool, full_reannotation: bool) None [source]
Run annotation over a given directory of Parquet files.
- dae.annotation.parquet.backup_schema2_study(directory: str) Schema2DatasetLayout [source]
Backup current meta and summary data for a parquet study.
Renames the meta Parquet file and summary variants directory by attaching a suffix with the current date, then returns a corrected layout using the newly-renamed paths. This clears the way for then new ‘meta’ and ‘summary’ that will be produced when reannotating a Parquet study in place.
- dae.annotation.parquet.merge_partitioned(summary_dir: str, partition_dir: str, partition_descriptor: PartitionDescriptor) None [source]
Helper method to merge Parquet files in partitioned studies.
- dae.annotation.parquet.produce_regions(target_region: str | None, region_size: int, contig_lens: dict[str, int]) list[str] [source]
Produce regions to annotate by.
- dae.annotation.parquet.produce_schema2_annotation_tasks(task_graph: TaskGraph, loader: ParquetLoader, output_dir: str, raw_pipeline: list[dict[str, Any]] | RawFullConfig, grr: GenomicResourceRepo, region_size: int, allow_repeated_attributes: bool, target_region: str | None = None, *, full_reannotation: bool = False) list[Task] [source]
Produce TaskGraph tasks for Parquet file annotation.
- dae.annotation.parquet.produce_schema2_merging_tasks(task_graph: TaskGraph, annotation_tasks: list[Task], loader: ParquetLoader, output_layout: Schema2DatasetLayout) list[Task] [source]
Produce TaskGraph tasks for Parquet file merging.
- dae.annotation.parquet.symlink_pedigree_and_family_variants(src_layout: Schema2DatasetLayout, dest_layout: Schema2DatasetLayout) None [source]
Mirror pedigree and family variants data using symlinks.
- dae.annotation.parquet.write_new_meta(loader: ParquetLoader, pipeline: AnnotationPipeline, output_layout: Schema2DatasetLayout) None [source]
Produce and write new metadata to the output Parquet dataset.
dae.annotation.reannotate_instance module
- class dae.annotation.reannotate_instance.ReannotateInstanceTool(raw_args: list[str] | None = None, gpf_instance: GPFInstance | None = None)[source]
Bases:
AnnotationTool
Annotation tool to reannotate the configured GPF instance
- dae.annotation.reannotate_instance.cli(raw_args: list[str] | None = None, gpf_instance: GPFInstance | None = None) None [source]
Entry point method for instance reannotation tool.
dae.annotation.record_to_annotatable module
- class dae.annotation.record_to_annotatable.CSHLAlleleRecordToAnnotatable(columns: tuple, ref_genome: ReferenceGenome | None)[source]
Bases:
RecordToAnnotable
Transform a CSHL variant record into a VCF allele annotatable.
- build(record: dict[str, str]) Annotatable [source]
- class dae.annotation.record_to_annotatable.DaeAlleleRecordToAnnotatable(columns: tuple, ref_genome: ReferenceGenome | None)[source]
Bases:
RecordToAnnotable
Transform a CSHL variant record into a VCF allele annotatable.
- build(record: dict[str, str]) Annotatable [source]
- class dae.annotation.record_to_annotatable.RecordToAnnotable(columns: tuple, ref_genome: ReferenceGenome | None)[source]
Bases:
ABC
Base class for record to annotable transformation.
- abstract build(record: dict[str, str]) Annotatable [source]
- class dae.annotation.record_to_annotatable.RecordToCNVAllele(columns: tuple, ref_genome: ReferenceGenome | None)[source]
Bases:
RecordToAnnotable
Transform a columns record into a CNV allele annotatable.
- build(record: dict[str, str]) Annotatable [source]
- class dae.annotation.record_to_annotatable.RecordToPosition(columns: tuple, ref_genome: ReferenceGenome | None)[source]
Bases:
RecordToAnnotable
- build(record: dict[str, str]) Annotatable [source]
- class dae.annotation.record_to_annotatable.RecordToRegion(columns: tuple, ref_genome: ReferenceGenome | None)[source]
Bases:
RecordToAnnotable
- build(record: dict[str, str]) Annotatable [source]
- class dae.annotation.record_to_annotatable.RecordToVcfAllele(columns: tuple, ref_genome: ReferenceGenome | None)[source]
Bases:
RecordToAnnotable
- build(record: dict[str, str]) Annotatable [source]
- class dae.annotation.record_to_annotatable.VcfLikeRecordToVcfAllele(columns: tuple, ref_genome: ReferenceGenome | None)[source]
Bases:
RecordToAnnotable
Transform a columns record into VCF allele annotatable.
- build(record: dict[str, str]) Annotatable [source]
- dae.annotation.record_to_annotatable.add_record_to_annotable_arguments(parser: ArgumentParser) None [source]
- dae.annotation.record_to_annotatable.build_record_to_annotatable(parameters: dict[str, str], available_columns: set[str], ref_genome: ReferenceGenome | None = None) RecordToAnnotable [source]
Transform a variant record into an annotatable.
dae.annotation.score_annotator module
This contains the implementation of the three score annotators.
Genomic score annotators defined are positions_score, np_score, and allele_score.
- class dae.annotation.score_annotator.AlleleScoreAnnotator(pipeline: AnnotationPipeline, info: AnnotatorInfo)[source]
Bases:
GenomicScoreAnnotatorBase
This class implements allele_score annotator.
- annotate(annotatable: Annotatable | None, context: dict[str, Any]) dict[str, Any] [source]
Produce annotation attributes for an annotatable.
- build_score_aggregator_documentation(attr_info: AttributeInfo) list[str] [source]
Collect score aggregator documentation.
- class dae.annotation.score_annotator.GenomicScoreAnnotatorBase(pipeline: AnnotationPipeline, info: AnnotatorInfo, score: GenomicScore)[source]
Bases:
Annotator
Genomic score base annotator.
- add_score_aggregator_documentation(attribute_info: AttributeInfo, aggregator: str, attribute_conf_agg: str | None) None [source]
Collect score aggregator documentation.
- abstract build_score_aggregator_documentation(attr_info: AttributeInfo) list[str] [source]
Construct score aggregator documentation.
- class dae.annotation.score_annotator.PositionScoreAnnotator(pipeline: AnnotationPipeline, info: AnnotatorInfo)[source]
Bases:
GenomicScoreAnnotatorBase
This class implements the position_score annotator.
The position_score annotator requires the resrouce_id parameter, whose value must be an id of a genomic resource of type position_score.
The position_score resource provides a set of scores (see …) that the position_score annotator uses as attributes to assign to the annotatable.
The position_score annotator recognized one attribute level parameter called position_aggregator that controls how the position scores are aggregator for annotates that ref to a region of the reference genome.
- annotate(annotatable: Annotatable | None, context: dict[str, Any]) dict[str, Any] [source]
Produce annotation attributes for an annotatable.
- build_score_aggregator_documentation(attr_info: AttributeInfo) list[str] [source]
Collect score aggregator documentation.
- dae.annotation.score_annotator.build_allele_score_annotator(pipeline: AnnotationPipeline, info: AnnotatorInfo) Annotator [source]
- dae.annotation.score_annotator.build_np_score_annotator(pipeline: AnnotationPipeline, info: AnnotatorInfo) Annotator [source]
- dae.annotation.score_annotator.build_position_score_annotator(pipeline: AnnotationPipeline, info: AnnotatorInfo) Annotator [source]
- dae.annotation.score_annotator.get_genomic_resource(pipeline: AnnotationPipeline, info: AnnotatorInfo, resource_types: set[str]) GenomicResource [source]
Return genomic score resource used for given genomic score annotator.
dae.annotation.simple_effect_annotator module
- class dae.annotation.simple_effect_annotator.SimpleEffectAnnotator(pipeline: AnnotationPipeline, info: AnnotatorInfo)[source]
Bases:
AnnotatorBase
Simple effect annotator class.
- call_region(chrom: str, beg: int, end: int, transcripts: list[TranscriptModel], func_name: str, classification: str) tuple[str, set[str]] | None [source]
Call a region with a specific classification.
- cds_intron_regions(transcript: TranscriptModel) list[Region] [source]
Return whether region is CDS intron.
- noncoding_regions(transcript: TranscriptModel) list[Region] [source]
Return whether the region is noncoding.
- peripheral_regions(transcript: TranscriptModel) list[Region] [source]
Return whether the region is peripheral.
- run_annotate(chrom: str, beg: int, end: int) tuple[str, set[str]] [source]
Return classification with a set of affected genes.
- utr_regions(transcript: TranscriptModel) list[Region] [source]
Return whether the region is classified as UTR.
- dae.annotation.simple_effect_annotator.build_simple_effect_annotator(pipeline: AnnotationPipeline, info: AnnotatorInfo) Annotator [source]