dae.genomic_resources package
Subpackages
- dae.genomic_resources.gene_models package
- Submodules
- dae.genomic_resources.gene_models.gene_models module
ExonGeneModelsGeneModels.add_transcript_model()GeneModels.chrom_gene_models()GeneModels.close()GeneModels.gene_models_by_gene_name()GeneModels.gene_models_by_location()GeneModels.gene_names()GeneModels.get_schema()GeneModels.has_chromosome()GeneModels.is_loaded()GeneModels.load()GeneModels.relabel_chromosomes()GeneModels.reset()GeneModels.resource_idGeneModels.update_indexes()
TranscriptModelTranscriptModel.all_regions()TranscriptModel.calc_frames()TranscriptModel.cds_len()TranscriptModel.cds_regions()TranscriptModel.get_exon_number_for()TranscriptModel.is_coding()TranscriptModel.test_frames()TranscriptModel.total_len()TranscriptModel.update_frames()TranscriptModel.utr3_len()TranscriptModel.utr3_regions()TranscriptModel.utr5_len()TranscriptModel.utr5_regions()
create_regions_from_genes()get_parser()infer_gene_model_parser()join_gene_models()load_gene_mapping()load_gene_models()parse_ccds_gene_models_format()parse_default_gene_models_format()parse_gtf_gene_models_format()parse_known_gene_models_format()parse_raw()parse_ref_flat_gene_models_format()parse_ref_seq_gene_models_format()parse_ucscgenepred_models_format()probe_columns()probe_header()
- dae.genomic_resources.gene_models.gene_models_factory module
- dae.genomic_resources.gene_models.serialization module
build_gtf_record()calc_frame_for_gtf_cds_feature()collect_cds_regions()collect_gtf_cds_regions()collect_gtf_start_codon_regions()collect_gtf_stop_codon_regions()find_exon_cds_region_for_gtf_cds_feature()gene_models_to_gtf()gtf_canonical_index()save_as_default_gene_models()transcript_to_gtf()
- Module contents
- dae.genomic_resources.genomic_position_table package
- Submodules
- dae.genomic_resources.genomic_position_table.line module
- dae.genomic_resources.genomic_position_table.table module
GenomicPositionTableGenomicPositionTable.ALTGenomicPositionTable.CHROMGenomicPositionTable.POS_BEGINGenomicPositionTable.POS_ENDGenomicPositionTable.REFGenomicPositionTable.close()GenomicPositionTable.get_all_records()GenomicPositionTable.get_chromosome_length()GenomicPositionTable.get_chromosomes()GenomicPositionTable.get_column_key()GenomicPositionTable.get_file_chromosomes()GenomicPositionTable.get_records_in_region()GenomicPositionTable.map_chromosome()GenomicPositionTable.open()GenomicPositionTable.unmap_chromosome()
adjust_zero_based_line()get_idx()zero_based_adjust()
- dae.genomic_resources.genomic_position_table.table_bigwig module
- dae.genomic_resources.genomic_position_table.table_inmemory module
InmemoryGenomicPositionTableInmemoryGenomicPositionTable.FORMAT_DEFInmemoryGenomicPositionTable.close()InmemoryGenomicPositionTable.get_all_records()InmemoryGenomicPositionTable.get_chromosome_length()InmemoryGenomicPositionTable.get_file_chromosomes()InmemoryGenomicPositionTable.get_records_in_region()InmemoryGenomicPositionTable.open()
- dae.genomic_resources.genomic_position_table.table_tabix module
TabixGenomicPositionTableTabixGenomicPositionTable.BUFFER_MAXSIZETabixGenomicPositionTable.close()TabixGenomicPositionTable.get_all_records()TabixGenomicPositionTable.get_chromosome_length()TabixGenomicPositionTable.get_chromosomes()TabixGenomicPositionTable.get_file_chromosomes()TabixGenomicPositionTable.get_line_iterator()TabixGenomicPositionTable.get_records_in_region()TabixGenomicPositionTable.open()
- dae.genomic_resources.genomic_position_table.table_vcf module
- dae.genomic_resources.genomic_position_table.utils module
- Module contents
BigWigLineBigWigTableLineLineBufferTabixGenomicPositionTableTabixGenomicPositionTable.BUFFER_MAXSIZETabixGenomicPositionTable.alt_keyTabixGenomicPositionTable.chrom_keyTabixGenomicPositionTable.chrom_mapTabixGenomicPositionTable.chrom_orderTabixGenomicPositionTable.close()TabixGenomicPositionTable.get_all_records()TabixGenomicPositionTable.get_chromosome_length()TabixGenomicPositionTable.get_chromosomes()TabixGenomicPositionTable.get_file_chromosomes()TabixGenomicPositionTable.get_line_iterator()TabixGenomicPositionTable.get_records_in_region()TabixGenomicPositionTable.headerTabixGenomicPositionTable.jump_thresholdTabixGenomicPositionTable.line_iteratorTabixGenomicPositionTable.open()TabixGenomicPositionTable.pos_begin_keyTabixGenomicPositionTable.pos_end_keyTabixGenomicPositionTable.pysam_fileTabixGenomicPositionTable.ref_keyTabixGenomicPositionTable.rev_chrom_mapTabixGenomicPositionTable.stats
VCFGenomicPositionTableVCFGenomicPositionTable.CHROMVCFGenomicPositionTable.POS_BEGINVCFGenomicPositionTable.POS_ENDVCFGenomicPositionTable.alt_keyVCFGenomicPositionTable.chrom_keyVCFGenomicPositionTable.chrom_mapVCFGenomicPositionTable.chrom_orderVCFGenomicPositionTable.get_file_chromosomes()VCFGenomicPositionTable.get_line_iterator()VCFGenomicPositionTable.headerVCFGenomicPositionTable.jump_thresholdVCFGenomicPositionTable.line_iteratorVCFGenomicPositionTable.open()VCFGenomicPositionTable.pos_begin_keyVCFGenomicPositionTable.pos_end_keyVCFGenomicPositionTable.pysam_fileVCFGenomicPositionTable.ref_keyVCFGenomicPositionTable.rev_chrom_mapVCFGenomicPositionTable.stats
VCFLinebuild_genomic_position_table()
- dae.genomic_resources.implementations package
- Submodules
- dae.genomic_resources.implementations.annotation_pipeline_impl module
AnnotationPipelineImplementationAnnotationPipelineImplementation.add_statistics_build_tasks()AnnotationPipelineImplementation.calc_info_hash()AnnotationPipelineImplementation.calc_statistics_hash()AnnotationPipelineImplementation.filesAnnotationPipelineImplementation.get_info()AnnotationPipelineImplementation.get_statistics_info()AnnotationPipelineImplementation.get_template()
- dae.genomic_resources.implementations.gene_models_impl module
- dae.genomic_resources.implementations.genomic_scores_impl module
CnvCollectionImplementationGenomicScoreImplementationGenomicScoreImplementation.add_statistics_build_tasks()GenomicScoreImplementation.calc_info_hash()GenomicScoreImplementation.calc_statistics_hash()GenomicScoreImplementation.filesGenomicScoreImplementation.get_config_histograms()GenomicScoreImplementation.get_info()GenomicScoreImplementation.get_statistics_info()GenomicScoreImplementation.get_template()GenomicScoreImplementation.resource_id
build_score_implementation_from_resource()
- dae.genomic_resources.implementations.liftover_chain_impl module
- dae.genomic_resources.implementations.reference_genome_impl module
ChromosomeStatisticGenomeStatisticGenomeStatisticsMixinReferenceGenomeImplementationReferenceGenomeImplementation.add_statistics_build_tasks()ReferenceGenomeImplementation.calc_info_hash()ReferenceGenomeImplementation.calc_statistics_hash()ReferenceGenomeImplementation.filesReferenceGenomeImplementation.get_info()ReferenceGenomeImplementation.get_statistics()ReferenceGenomeImplementation.get_statistics_info()ReferenceGenomeImplementation.get_template()
ReferenceGenomeStatistics
- Module contents
- dae.genomic_resources.statistics package
Submodules
dae.genomic_resources.aggregators module
- class dae.genomic_resources.aggregators.Aggregator[source]
Bases:
ABCBase class for score aggregators.
- class dae.genomic_resources.aggregators.ConcatAggregator[source]
Bases:
AggregatorAggregator that concatenates all passed values.
- class dae.genomic_resources.aggregators.CountAggregator[source]
Bases:
AggregatorAggregator that counts values.
- class dae.genomic_resources.aggregators.DictAggregator[source]
Bases:
AggregatorAggregator that builds a dictionary of all passed values.
- class dae.genomic_resources.aggregators.JoinAggregator(separator: str)[source]
Bases:
AggregatorAggregator that joins all passed values using a separator.
- class dae.genomic_resources.aggregators.ListAggregator[source]
Bases:
AggregatorAggregator that builds a list of all passed values.
- class dae.genomic_resources.aggregators.MaxAggregator[source]
Bases:
AggregatorMaximum value aggregator for genomic scores.
- class dae.genomic_resources.aggregators.MeanAggregator[source]
Bases:
AggregatorAggregator for genomic scores that calculates mean value.
- class dae.genomic_resources.aggregators.MedianAggregator[source]
Bases:
AggregatorAggregator for genomic scores that calculates median value.
- class dae.genomic_resources.aggregators.MinAggregator[source]
Bases:
AggregatorMinimum value aggregator for genomic scores.
- class dae.genomic_resources.aggregators.ModeAggregator[source]
Bases:
AggregatorAggregator for genomic scores that calculates mode value.
- dae.genomic_resources.aggregators.build_aggregator(aggregator_type: str) Aggregator[source]
- dae.genomic_resources.aggregators.create_aggregator(aggregator_def: dict[str, Any]) Aggregator[source]
Create an aggregator by aggregator definition.
- dae.genomic_resources.aggregators.create_aggregator_definition(aggregator_type: str) dict[str, Any][source]
Parse an aggregator definition string.
- dae.genomic_resources.aggregators.get_aggregator_class(aggregator: str) Callable[[], Aggregator][source]
dae.genomic_resources.cached_repository module
Provides caching genomic resources.
- class dae.genomic_resources.cached_repository.CacheResource(resource: GenomicResource, protocol: CachingProtocol)[source]
Bases:
GenomicResourceRepresents resources stored in cache.
- class dae.genomic_resources.cached_repository.CachingProtocol(remote_protocol: ReadOnlyRepositoryProtocol, local_protocol: FsspecReadWriteProtocol)[source]
Bases:
ReadOnlyRepositoryProtocolDefines caching GRR repository protocol.
- file_exists(resource: GenomicResource, filename: str) bool[source]
Check if given file exist in give resource.
- get_all_resources() Generator[GenomicResource, None, None][source]
Return generator for all resources in the repository.
- get_resource_file_url(resource: GenomicResource, filename: str) str[source]
Return url of a file in the resource.
- get_resource_url(resource: GenomicResource) str[source]
Return url of the specified resources.
- load_manifest(resource: GenomicResource) Manifest[source]
Load resource manifest.
- open_bigwig_file(resource: GenomicResource, filename: str) Any[source]
Open a bigwig file in a resource and return it.
Not all repositories support this method. Repositories that do no support this method raise and exception.
- open_raw_file(resource: GenomicResource, filename: str, mode: str = 'rt', **kwargs: str | bool | None) IO[source]
Open file in a resource and returns a file-like object.
- open_tabix_file(resource: GenomicResource, filename: str, index_filename: str | None = None) TabixFile[source]
Open a tabix file in a resource and return a pysam tabix file.
Not all repositories support this method. Repositories that do no support this method raise and exception.
- open_vcf_file(resource: GenomicResource, filename: str, index_filename: str | None = None) VariantFile[source]
Open a vcf file in a resource and return a pysam VariantFile.
Not all repositories support this method. Repositories that do no support this method raise and exception.
- refresh_cached_resource(resource: GenomicResource) None[source]
Refresh all resource files in cache if neccessary.
- refresh_cached_resource_file(resource: GenomicResource, filename: str) tuple[str, str][source]
Refresh a resource file in cache if neccessary.
- class dae.genomic_resources.cached_repository.GenomicResourceCachedRepo(child: GenomicResourceRepo, cache_url: str, **kwargs: str | None)[source]
Bases:
GenomicResourceRepoDefines caching genomic resources repository.
- find_resource(resource_id: str, version_constraint: str | None = None, repository_id: str | None = None) GenomicResource | None[source]
Return requested resource or None if not found.
- get_all_resources() Generator[GenomicResource, None, None][source]
Return a generator over all resource in the repository.
- get_resource(resource_id: str, version_constraint: str | None = None, repository_id: str | None = None) GenomicResource[source]
Return one resource with id qual to resource_id.
If resource is not found, exception is raised.
- dae.genomic_resources.cached_repository.cache_resources(repository: GenomicResourceRepo, resource_ids: Iterable[str] | None, workers: int | None = None) None[source]
Cache resources from a list of remote resource IDs.
dae.genomic_resources.cli module
Provides CLI for management of genomic resources repositories.
- dae.genomic_resources.cli.cli_browse(cli_args: list[str] | None = None) None[source]
Provide CLI for repository browsing.
- dae.genomic_resources.cli.cli_manage(cli_args: list[str] | None = None) None[source]
Provide CLI for repository management.
- dae.genomic_resources.cli.collect_dvc_entries(proto: ReadWriteRepositoryProtocol, res: GenomicResource) dict[str, ManifestEntry][source]
Collect manifest entries defined by .dvc files.
dae.genomic_resources.draw_score_histograms module
dae.genomic_resources.fsspec_protocol module
Provides GRR protocols based on fsspec library.
- class dae.genomic_resources.fsspec_protocol.FsspecReadOnlyProtocol(proto_id: str, url: str, filesystem: AbstractFileSystem)[source]
Bases:
ReadOnlyRepositoryProtocolProvides fsspec genomic resources repository protocol.
- file_exists(resource: GenomicResource, filename: str) bool[source]
Check if given file exist in give resource.
- get_all_resources() Generator[GenomicResource, None, None][source]
Return generator over all resources in the repository.
- load_manifest(resource: GenomicResource) Manifest[source]
Load resource manifest.
- open_bigwig_file(resource: GenomicResource, filename: str) Any[source]
Open a bigwig file in a resource and return it.
Not all repositories support this method. Repositories that do no support this method raise and exception.
- open_raw_file(resource: GenomicResource, filename: str, mode: str = 'rt', **kwargs: str | bool | None) IO[source]
Open file in a resource and returns a file-like object.
- open_tabix_file(resource: GenomicResource, filename: str, index_filename: str | None = None) TabixFile[source]
Open a tabix file in a resource and return a pysam tabix file.
Not all repositories support this method. Repositories that do no support this method raise and exception.
- open_vcf_file(resource: GenomicResource, filename: str, index_filename: str | None = None) VariantFile[source]
Open a vcf file in a resource and return a pysam VariantFile.
Not all repositories support this method. Repositories that do no support this method raise and exception.
- class dae.genomic_resources.fsspec_protocol.FsspecReadWriteProtocol(proto_id: str, url: str, filesystem: AbstractFileSystem)[source]
Bases:
FsspecReadOnlyProtocol,ReadWriteRepositoryProtocolProvides fsspec genomic resources repository protocol.
- build_content_file() list[dict[str, Any]][source]
Build the content of the repository (i.e ‘.CONTENTS.json’ file).
- collect_all_resources() Generator[GenomicResource, None, None][source]
Return generator over all resources managed by this protocol.
- collect_resource_entries(resource: GenomicResource) Manifest[source]
Scan the resource and resturn a manifest.
- copy_resource_file(remote_resource: GenomicResource, dest_resource: GenomicResource, filename: str) ResourceFileState | None[source]
Copy a resource file into repository.
- delete_resource_file(resource: GenomicResource, filename: str) None[source]
Delete a resource file and it’s internal state.
- get_all_resources() Generator[GenomicResource, None, None][source]
Return generator over all resources in the repository.
- get_resource_file_size(resource: GenomicResource, filename: str) int[source]
Return the size of a resource file.
- get_resource_file_timestamp(resource: GenomicResource, filename: str) float[source]
Return the timestamp (ISO formatted) of a resource file.
- load_resource_file_state(resource: GenomicResource, filename: str) ResourceFileState | None[source]
Load resource file state from internal GRR state.
If the specified resource file has no internal state returns None.
- obtain_resource_file_lock(resource: GenomicResource, filename: str, timeout: float = -1) AbstractContextManager[source]
Lock a resource’s file.
- save_resource_file_state(resource: GenomicResource, state: ResourceFileState) None[source]
Save resource file state into internal GRR state.
- update_resource_file(remote_resource: GenomicResource, dest_resource: GenomicResource, filename: str) ResourceFileState | None[source]
Update a resource file into repository if needed.
- dae.genomic_resources.fsspec_protocol.build_fsspec_protocol(proto_id: str, root_url: str, **kwargs: str | None) FsspecReadOnlyProtocol | FsspecReadWriteProtocol[source]
Create fsspec GRR protocol based on the root url.
- dae.genomic_resources.fsspec_protocol.build_inmemory_protocol(proto_id: str, root_path: str, content: dict[str, Any]) FsspecReadWriteProtocol[source]
Build and return an embedded fsspec protocol for testing.
- dae.genomic_resources.fsspec_protocol.build_local_resource(dirname: str, config: dict[str, Any]) GenomicResource[source]
Build a resource from a local filesystem directory.
dae.genomic_resources.genomic_context module
Genomic context provides a way to collect various genomic resources from various sources and make them available through a single interface.
The module follows a registry-based approach. Providers register themselves
and are later consulted (in priority order) to build individual
GenomicContext
instances. Every created context is combined into a
PriorityGenomicContext, offering a single access point for
resources such as genomic resource repositories, reference genomes,
gene models, annotation pipelines, etc. Providers can be registered
programmatically via register_context_provider() or discovered
automatically through entry points.
Example usage of genomic context in a tool with command line interface:
import argparse
import sys
from dae.genomic_resources.genomic_context import (
context_providers_add_argparser_arguments,
context_providers_init,
get_genomic_context,
)
parser = argparse.ArgumentParser()
context_providers_add_argparser_arguments(parser)
args = parser.parse_args(sys.argv[1:])
context_providers_init(**vars(args))
genomic_context = get_genomic_context()
If you don’t need command line arguments you can do:
context_providers_init()
genomic_context = get_genomic_context()
When you need a CLI with all defaults and without modifying the argument parser you can do:
context_providers_init_with_argparser("GenomicTool")
genomic_context = get_genomic_context()
- class dae.genomic_resources.genomic_context.DefaultRepositoryContextProvider[source]
Bases:
GenomicContextProviderProvide access to the default genomic resources repository.
The default repository is resolved via
build_genomic_resource_repository()using the environment configuration. The resulting context exposes a single key,"genomic_resources_repository", which can be consumed by other code participating in the context chain.- add_argparser_arguments(parser: ArgumentParser) None[source]
Declare command line arguments for this provider.
The default repository provider is fully configuration driven and has nothing to expose on the CLI, so the method intentionally leaves the parser untouched. The override exists to make the behaviour explicit in the generated documentation.
- init(**kwargs: Any) GenomicContext[source]
Instantiate a context backed by the default GRR.
Parameters
- **kwargs
Accepted for interface compatibility; the provider ignores runtime keyword arguments because everything is derived from the global configuration.
Returns
- GenomicContext
A context exposing a single
genomic_resources_repositoryentry pointing at the default repository instance.
- dae.genomic_resources.genomic_context.clear_registered_contexts() None[source]
Forget all contexts created by
context_providers_init().This function exists primarily for testing scenarios where the global registry should be reset between test cases.
- dae.genomic_resources.genomic_context.context_providers_add_argparser_arguments(parser: ArgumentParser) None[source]
Delegate command line argument registration to each provider.
Parameters
- parser
The parser that should receive additional arguments from every registered provider.
- dae.genomic_resources.genomic_context.context_providers_init(**kwargs: Any) None[source]
Materialize contexts from every registered provider.
The function walks all registered providers in priority order and asks each of them to initialise a
GenomicContext. The resulting contexts are stored for later retrieval viaget_genomic_context().Notes
Providers are invoked at most once per process. Subsequent calls are ignored until
clear_registered_contexts()is executed, which is especially helpful in unit tests.Parameters
- **kwargs
Keyword arguments forwarded to every provider’s
initmethod.
- dae.genomic_resources.genomic_context.context_providers_init_with_argparser(toolname: str = 'GenomicTool') None[source]
Initialise providers using arguments parsed from
sys.argv.Parameters
- toolname
The program name presented to
argparse.ArgumentParser.
Notes
This helper is useful for simple tools that do not customise their argument parser but still want to expose the command line options defined by registered context providers.
- dae.genomic_resources.genomic_context.get_genomic_context() GenomicContext[source]
Return a priority context that merges every registered context.
The returned
PriorityGenomicContextrespects the registration order, giving precedence to contexts added most recently when multiple contexts expose the same key.
- dae.genomic_resources.genomic_context.register_context(context: GenomicContext) None[source]
Record context so it participates in future lookups.
Parameters
- context
The context instance to be considered when
get_genomic_context()is invoked.
- dae.genomic_resources.genomic_context.register_context_provider(context_provider: GenomicContextProvider) None[source]
Register context_provider so it participates in initialization.
Parameters
- context_provider
The provider implementation that should be considered when contexts are assembled. Providers are stored in registration order and later sorted by their priority before initialization.
dae.genomic_resources.genomic_context_base module
Base classes and interfaces for genomic context management.
This module defines the foundational abstractions for organizing and
accessing genomic resources through a unified context system. The central
concept is GenomicContext, which acts as a key-value store
exposing resources like genomic repositories, reference genomes, gene
models, and annotation pipelines. Providers implementing
GenomicContextProvider are responsible for building concrete
context instances, often by consulting configuration files or command-line
arguments.
The module also provides two concrete context implementations:
SimpleGenomicContext for straightforward dictionary-backed contexts
and PriorityGenomicContext for merging multiple contexts with
fallback semantics.
Key Constants
- GC_GRR_KEYstr
Standard key for the genomic resources repository object.
- GC_REFERENCE_GENOME_KEYstr
Standard key for the reference genome object.
- GC_GENE_MODELS_KEYstr
Standard key for the gene models object.
- GC_ANNOTATION_PIPELINE_KEYstr
Standard key for the annotation pipeline object.
See Also
- dae.genomic_resources.genomic_context
High-level orchestration and provider registration functions.
- class dae.genomic_resources.genomic_context_base.GenomicContext[source]
Bases:
ABCAbstract base class for genomic context implementations.
A genomic context serves as a registry of genomic resources, exposing them via string keys. Typical resources include genomic resource repositories, reference genomes, gene models, and annotation pipelines. Subclasses must implement the key-value retrieval logic and report which keys are available.
Notes
The class provides three typed convenience accessors (
get_reference_genome(),get_gene_models(),get_genomic_resources_repository()) that validate the underlying object types before returning them. These accessors raiseValueErrorif the stored object does not match the expected type.- abstract get_context_keys() set[str][source]
Report all keys exposed by this context.
Returns
- set[str]
The complete collection of keys under which objects can be retrieved. May be empty if the context holds no resources.
- abstract get_context_object(key: str) Any | None[source]
Retrieve a context object by its key.
Parameters
- key
The string identifier for the desired resource.
Returns
- Any | None
The stored object if the key is present, otherwise
None.
Notes
Implementations must return
Nonewhen the key is absent rather than raisingKeyError. This convention allows callers to safely query for optional resources.
- get_gene_models() GeneModels | None[source]
Retrieve and validate the gene models from the context.
Returns
- GeneModels | None
The gene models instance if present and correctly typed, or
Nonewhen the key is absent.
Raises
- ValueError
If the context entry for
GC_GENE_MODELS_KEYis present but does not contain aGeneModelsinstance.
- get_genomic_resources_repository() GenomicResourceRepo | None[source]
Retrieve and validate the genomic resources repository.
Returns
- GenomicResourceRepo | None
The repository instance if present and correctly typed, or
Nonewhen the key is absent.
Raises
- ValueError
If the context entry for
GC_GRR_KEYis present but does not contain aGenomicResourceRepoinstance.
- get_reference_genome() ReferenceGenome | None[source]
Retrieve and validate the reference genome from the context.
Returns
- ReferenceGenome | None
The reference genome instance if present and correctly typed, or
Nonewhen the key is absent.
Raises
- ValueError
If the context entry for
GC_REFERENCE_GENOME_KEYis present but does not contain aReferenceGenomeinstance.
- class dae.genomic_resources.genomic_context_base.GenomicContextProvider(provider_type: str, provider_priority: int)[source]
Bases:
ABCAbstract base class for genomic context providers.
Providers are responsible for building
GenomicContextinstances by consulting external configuration sources, command-line arguments, or environment settings. Each provider is identified by a unique type name and assigned a priority that determines the order in which providers are invoked during context initialization.Providers typically register themselves at module import time by calling
dae.genomic_resources.genomic_context.register_context_provider(). The registration system later sorts providers by priority (descending) and type name, then invokes theirinit()method to produce contexts.Attributes
- _provider_typestr
A unique identifier describing this provider.
- _provider_priorityint
The numeric priority; higher values are consulted first.
- abstract add_argparser_arguments(parser: ArgumentParser) None[source]
Register command-line arguments that configure the provider.
Parameters
- parser
The
argparse.ArgumentParserinstance that should receive additional arguments.
Notes
Providers may add optional or required arguments. When invoked, the parsed argument namespace will be passed to
init()as keyword arguments. If a provider does not require CLI arguments it should leave the parser untouched.
- get_context_provider_priority() int[source]
Return the provider’s numeric priority.
Returns
- int
The priority assigned at construction time.
- get_context_provider_type() str[source]
Return the provider’s type identifier.
Returns
- str
The unique type name assigned at construction time.
- abstract init(**kwargs: Any) GenomicContext | None[source]
Build a genomic context using the provided configuration.
Parameters
- **kwargs
Keyword arguments typically derived from command-line parsing, environment variables, or configuration files. The exact keys depend on what the provider declared in
add_argparser_arguments().
Returns
- GenomicContext | None
A new context instance if the provider successfully assembled the required resources, or
Noneif the provider chooses to abstain (for example when optional arguments are omitted).
Notes
Returning
Noneallows a provider to conditionally participate. Other providers may then supply default or fallback contexts.
- class dae.genomic_resources.genomic_context_base.PriorityGenomicContext(contexts: Iterable[GenomicContext])[source]
Bases:
GenomicContextComposite context implementing priority-based fallback lookup.
This context merges multiple underlying contexts, consulting them in order when a resource is requested. The first context that provides a non-None value for a given key wins. This strategy allows CLI or user-supplied contexts to override defaults from configuration-driven providers.
Parameters
- contexts
An iterable of
GenomicContextinstances, ordered by descending precedence. When a resource is requested, the priority context walks the sequence and returns the first non-None result.
Attributes
- contextsIterable[GenomicContext]
The ordered collection of underlying contexts.
Notes
At construction time the context logs the sources of all constituent contexts to aid debugging. If no contexts are provided a warning is logged to indicate that no resources will be available.
- get_context_keys() set[str][source]
Compute the union of all keys from underlying contexts.
Returns
- set[str]
The merged set of keys available across all constituent contexts. If multiple contexts expose the same key the set contains it only once.
- get_context_object(key: str) Any | None[source]
Retrieve a resource using priority-based fallback.
Parameters
- key
The string identifier of the desired resource.
Returns
- Any | None
The first non-None object found among the underlying contexts, or
Noneif every context returnsNone(or if no contexts are available).
Notes
Each context is queried in order. When a context returns a non-None value the search stops and that value is returned. A log entry is generated to identify which context supplied the object.
- class dae.genomic_resources.genomic_context_base.SimpleGenomicContext(context_objects: dict[str, Any], source: str)[source]
Bases:
GenomicContextDictionary-backed implementation of
GenomicContext.This concrete context stores resource objects in a simple dictionary and returns them on demand. It is commonly used by providers that assemble a fixed set of resources at initialization time.
Parameters
- context_objects
A mapping from string keys to resource objects. Typical keys include
GC_GRR_KEY,GC_REFERENCE_GENOME_KEY,GC_GENE_MODELS_KEY, andGC_ANNOTATION_PIPELINE_KEY.- source
A human-readable label identifying the origin of this context, such as a provider name or file path.
Attributes
- _contextdict[str, Any]
The internal dictionary holding the resource objects.
- _sourcestr
The stored source label.
- get_context_keys() set[str][source]
Report all available keys.
Returns
- set[str]
The set of keys under which resources are stored.
dae.genomic_resources.genomic_context_cli module
Command-line helpers for configuring genomic resource contexts.
This module exposes CLIGenomicContextProvider, a concrete
implementation of
GenomicContextProvider
that resolves genomic resources based on command-line arguments. Tools can
register the provider to let their users supply a genomic resources
repository, reference genome, and gene models at runtime.
- class dae.genomic_resources.genomic_context_cli.CLIGenomicContextProvider[source]
Bases:
GenomicContextProviderResolve genomic resources from command-line arguments.
The provider allows CLI tools to override the default genomic resources repository, reference genome, and gene models. When invoked without any overrides, it falls back to the previously initialised genomic context so that defaults from
gpf_instanceor other providers remain available.- add_argparser_arguments(parser: ArgumentParser) None[source]
Expose CLI options that control genomic resource resolution.
Parameters
- parser
The argument parser that should receive the provider specific options.
- init(**kwargs: Any) GenomicContext | None[source]
Create a
SimpleGenomicContextbased on CLI arguments.Parameters
- **kwargs
Arguments produced from the command-line parser. The provider recognises
grr_filename,grr_directory,reference_genome_resource_id, andgene_models_resource_id.
Returns
- GenomicContext | None
A context containing the resolved objects, or
Noneif the genomic resources repository could not be determined.
dae.genomic_resources.genomic_scores module
- class dae.genomic_resources.genomic_scores.AlleleScore(resource: GenomicResource)[source]
Bases:
GenomicScoreDefines allele genomic scores.
- class Mode(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)[source]
Bases:
EnumAllele score mode.
- ALLELES = 2
- SUBSTITUTIONS = 1
- fetch_region(chrom: str | None, pos_begin: int | None, pos_end: int | None, scores: list[str] | None = None) Generator[tuple[int, str | None, str | None, list[str | int | float | bool | None] | None], None, None][source]
Return position score values in a region.
- fetch_scores(chrom: str, position: int, reference: str, alternative: str, scores: list[str] | None = None) list[str | int | float | bool | None] | None[source]
Fetch score values at specified genomic position and nucleotide.
- fetch_scores_agg(chrom: str, pos_begin: int, pos_end: int, scores: list[AlleleScoreQuery] | None = None) list[Aggregator][source]
Fetch score values in a region and aggregates them.
- open() AlleleScore[source]
Open genomic score resource and returns it.
- class dae.genomic_resources.genomic_scores.AlleleScoreAggr(score: 'str', position_aggregator: 'Aggregator', allele_aggregator: 'Aggregator')[source]
Bases:
object- allele_aggregator: Aggregator
- position_aggregator: Aggregator
- score: str
- class dae.genomic_resources.genomic_scores.AlleleScoreQuery(score: 'str', position_aggregator: 'str | None' = None, allele_aggregator: 'str | None' = None)[source]
Bases:
object- allele_aggregator: str | None = None
- position_aggregator: str | None = None
- score: str
- class dae.genomic_resources.genomic_scores.CNV(chrom: str, pos_begin: int, pos_end: int, attributes: dict[str, Any])[source]
Bases:
objectCopy number object from a cnv_collection.
- attributes: dict[str, Any]
- chrom: str
- pos_begin: int
- pos_end: int
- property size: int
- class dae.genomic_resources.genomic_scores.CnvCollection(resource: GenomicResource)[source]
Bases:
GenomicScoreA collection of CNVs.
- fetch_cnvs(chrom: str, start: int, stop: int, scores: list[str] | None = None) list[CNV][source]
Return list of CNVs that overlap with the provided region.
- open() CnvCollection[source]
Open genomic score resource and returns it.
- class dae.genomic_resources.genomic_scores.GenomicScore(resource: GenomicResource)[source]
Bases:
ResourceConfigValidationMixinGenomic scores base class.
PositionScore, NPScore and AlleleScore inherit from this class. Statistics builder implementation uses only GenomicScore interface to build all defined statistics.
- get_default_annotation_attribute(score_id: str) str | None[source]
Return default annotation attribute for a score.
Returns None if the score is not included in the default annotation. Returns the name of the attribute if present or the score if not.
- get_histogram_filename(score_id: str) str[source]
Return the histogram filename for a genomic score.
- get_number_range(score_id: str) tuple[float, float] | None[source]
Return the value range for a number score.
- get_score_histogram(score_id: str) NullHistogram | CategoricalHistogram | NumberHistogram[source]
Return defined histogram for a score.
- open() GenomicScore[source]
Open genomic score resource and returns it.
- class dae.genomic_resources.genomic_scores.PositionScore(resource: GenomicResource)[source]
Bases:
GenomicScoreDefines position genomic score.
- fetch_region(chrom: str, pos_begin: int | None, pos_end: int | None, scores: list[str] | None = None) Generator[tuple[int, int, list[str | int | float | bool | None] | None], None, None][source]
Return position score values in a region.
- fetch_scores(chrom: str, position: int, scores: list[str] | None = None) list[str | int | float | bool | None] | None[source]
Fetch score values at specific genomic position.
- fetch_scores_agg(chrom: str, pos_begin: int, pos_end: int, scores: list[str] | list[PositionScoreQuery] | None = None) list[Aggregator][source]
Fetch score values in a region and aggregates them.
- Case 1:
- res.fetch_scores_agg(“1”, 10, 20) –>
all score with default aggregators
- Case 2:
- res.fetch_scores_agg(“1”, 10, 20,
non_default_aggregators={“bla”:”max”}) –>
all score with default aggregators but ‘bla’ should use ‘max’
- get_region_scores(chrom: str, pos_beg: int, pos_end: int, score_id: str) list[str | int | float | bool | None][source]
Return score values in a region.
- open() PositionScore[source]
Open genomic score resource and returns it.
- class dae.genomic_resources.genomic_scores.PositionScoreAggr(score: 'str', position_aggregator: 'Aggregator')[source]
Bases:
object- position_aggregator: Aggregator
- score: str
- class dae.genomic_resources.genomic_scores.PositionScoreQuery(score: 'str', position_aggregator: 'str | None' = None)[source]
Bases:
object- position_aggregator: str | None = None
- score: str
- class dae.genomic_resources.genomic_scores.ScoreDef(score_id: str, desc: str, value_type: str, pos_aggregator: str | None, allele_aggregator: str | None, small_values_desc: str | None, large_values_desc: str | None, hist_conf: NullHistogramConfig | CategoricalHistogramConfig | NumberHistogramConfig | None)[source]
Bases:
objectScore configuration definition.
- allele_aggregator: str | None
- desc: str
- hist_conf: NullHistogramConfig | CategoricalHistogramConfig | NumberHistogramConfig | None
- large_values_desc: str | None
- pos_aggregator: str | None
- score_id: str
- small_values_desc: str | None
- value_type: str
- class dae.genomic_resources.genomic_scores.ScoreLine(line: LineBase, score_defs: dict[str, _ScoreDef])[source]
Bases:
objectAbstraction for a genomic score line. Wraps the line adapter.
- property alt: str | None
- property chrom: str
- get_score(score_id: str) str | int | float | bool | None[source]
Get and parse configured score from line.
- property pos_begin: int
- property pos_end: int
- property ref: str | None
- dae.genomic_resources.genomic_scores.build_score_from_resource(resource: GenomicResource) GenomicScore[source]
Build a genomic score resource and return the coresponding score.
- dae.genomic_resources.genomic_scores.build_score_from_resource_id(resource_id: str, grr: GenomicResourceRepo | None = None) GenomicScore[source]
dae.genomic_resources.group_repository module
Provides group genomic resources repository.
- class dae.genomic_resources.group_repository.GenomicResourceGroupRepo(children: list[GenomicResourceRepo], repo_id: str | None = None)[source]
Bases:
GenomicResourceRepoDefines group genomic resources repository.
- find_resource(resource_id: str, version_constraint: str | None = None, repository_id: str | None = None) GenomicResource | None[source]
Return one resource with id qual to resource_id.
If resource is not found, None is returned.
- get_all_resources() Generator[GenomicResource, None, None][source]
Return a generator over all resource in the repository.
- get_resource(resource_id: str, version_constraint: str | None = None, repository_id: str | None = None) GenomicResource[source]
Return one resource with id qual to resource_id.
If resource is not found, exception is raised.
dae.genomic_resources.histogram module
Handling of genomic scores statistics.
Currently we support only genomic scores histograms.
- class dae.genomic_resources.histogram.CategoricalHistogram(config: CategoricalHistogramConfig, counter: dict[str | int, int] | None = None)[source]
Bases:
StatisticClass for categorical data histograms.
- UNIQUE_VALUES_LIMIT = 100
- add_value(value: str | int | None, count: int = 1) None[source]
Add a value to the categorical histogram.
Returns true if successfully added and false if failed. Will fail if too many values are accumulated.
- static deserialize(content: str) CategoricalHistogram[source]
Create a statistic from serialized data.
- property display_values: dict[str | int, int]
Return categorical histogram display values in order.
- static from_dict(data: dict[str, Any]) CategoricalHistogram[source]
- plot(outfile: IO, score_id: str, y_axis_label: str | None = None, small_values_description: str | None = None, large_values_description: str | None = None) None[source]
Plot histogram and save it into outfile.
- property raw_values: dict[str | int, int]
- type = 'categorical_histogram'
- class dae.genomic_resources.histogram.CategoricalHistogramConfig(displayed_values_count: int | None = 20, displayed_values_percent: float | None = None, value_order: list[str | int] | None = None, y_log_scale: bool = False, label_rotation: int = 0, plot_function: str | None = None, enforce_type: bool = True, natural_order: bool = False, allow_only_whole_values_y: bool = False)[source]
Bases:
objectConfiguration class for categorical histograms.
- allow_only_whole_values_y: bool = False
- static default_config() CategoricalHistogramConfig[source]
- displayed_values_count: int | None = 20
- displayed_values_percent: float | None = None
- enforce_type: bool = True
- static from_dict(parsed: dict[str, Any]) CategoricalHistogramConfig[source]
Create categorical histogram config from configuratin dict.
- label_rotation: int = 0
- natural_order: bool = False
- plot_function: str | None = None
- value_order: list[str | int] | None = None
- y_log_scale: bool = False
- exception dae.genomic_resources.histogram.HistogramError[source]
Bases:
BaseExceptionClass used for histogram specific errors.
Histograms should be nullified when a HistogramError occurs.
- class dae.genomic_resources.histogram.HistogramStatisticMixin[source]
Bases:
objectMixin for creating statistics classes with histograms.
- class dae.genomic_resources.histogram.NullHistogram(config: NullHistogramConfig | None)[source]
Bases:
StatisticClass for annulled histograms.
- static deserialize(content: str) NullHistogram[source]
Create a statistic from serialized data.
- static from_dict(data: dict[str, Any]) NullHistogram[source]
Build a null histogram from a dict.
- type = 'null_histogram'
- class dae.genomic_resources.histogram.NullHistogramConfig(reason: str)[source]
Bases:
objectConfiguration class for null histograms.
- static default_config() NullHistogramConfig[source]
- static from_dict(parsed: dict[str, Any]) NullHistogramConfig[source]
Create Null histogram from configuration dict.
- reason: str
- class dae.genomic_resources.histogram.NumberHistogram(config: NumberHistogramConfig, bins: ndarray | None = None, bars: ndarray | None = None)[source]
Bases:
StatisticClass to represent a histogram.
- static deserialize(content: str) NumberHistogram[source]
Create a statistic from serialized data.
- static from_dict(data: dict[str, Any]) NumberHistogram[source]
Build a number histogram from a dict.
- plot(outfile: IO, score_id: str, y_axis_label: str | None = None, small_values_description: str | None = None, large_values_description: str | None = None) None[source]
Plot histogram and save it into outfile.
- type = 'number_histogram'
- class dae.genomic_resources.histogram.NumberHistogramConfig(view_range: tuple[float | None, float | None], number_of_bins: int = 100, x_log_scale: bool = False, y_log_scale: bool = False, x_min_log: float | None = None, plot_function: str | None = None)[source]
Bases:
objectConfiguration class for number histograms.
- static default_config(min_max: MinMaxValue | None) NumberHistogramConfig[source]
Build a number histogram config from a parsed yaml file.
- static from_dict(parsed: dict[str, Any]) NumberHistogramConfig[source]
Build a number histogram config from a parsed yaml file.
- number_of_bins: int = 100
- plot_function: str | None = None
- view_range: tuple[float | None, float | None]
- x_log_scale: bool = False
- x_min_log: float | None = None
- y_log_scale: bool = False
- dae.genomic_resources.histogram.build_default_histogram_conf(value_type: str, **kwargs: Any) NumberHistogramConfig | CategoricalHistogramConfig | NullHistogramConfig[source]
Build default histogram config for given value type.
- dae.genomic_resources.histogram.build_empty_histogram(config: NullHistogramConfig | CategoricalHistogramConfig | NumberHistogramConfig) NumberHistogram | CategoricalHistogram | NullHistogram[source]
Create an empty histogram from a deserialize histogram dictionary.
- dae.genomic_resources.histogram.build_histogram_config(config: dict[str, Any] | None) NullHistogramConfig | CategoricalHistogramConfig | NumberHistogramConfig | None[source]
Create histogram config form configuration dict.
- dae.genomic_resources.histogram.load_histogram(resource: GenomicResource, filename: str) NullHistogram | CategoricalHistogram | NumberHistogram[source]
Load and return a histogram in a resource.
On an error or missing histogram, an appropriate NullHistogram is returned.
- dae.genomic_resources.histogram.plot_histogram(res: GenomicResource, image_filename: str, hist: NullHistogram | CategoricalHistogram | NumberHistogram, score_id: str, small_values_desc: str | None = None, large_values_desc: str | None = None) None[source]
Plot histogram and save it into the resource.
- dae.genomic_resources.histogram.save_histogram(resource: GenomicResource, filename: str, histogram: NullHistogram | CategoricalHistogram | NumberHistogram) None[source]
Save histogram into a resource.
dae.genomic_resources.liftover_chain module
Provides LiftOver chain resource.
- class dae.genomic_resources.liftover_chain.LiftoverChain(resource: GenomicResource)[source]
Bases:
ResourceConfigValidationMixinDefines Lift Over chain wrapper around pyliftover objects.
- convert_coordinate(chrom: str, pos: int) tuple[str, int, str, int] | None[source]
Lift over a genomic coordinate.
- property files: set[str]
- static map_chromosome(chrom: str, mapping: dict[str, str] | None) str[source]
Map a chromosome (contig) name according to configuration.
- open() LiftoverChain[source]
- dae.genomic_resources.liftover_chain.build_liftover_chain_from_resource(resource: GenomicResource) LiftoverChain[source]
Load a Lift Over chain from GRR resource.
- dae.genomic_resources.liftover_chain.build_liftover_chain_from_resource_id(resource_id: str, grr: GenomicResourceRepo | None = None) LiftoverChain[source]
dae.genomic_resources.reference_genome module
- class dae.genomic_resources.reference_genome.ReferenceGenome(resource: GenomicResource)[source]
Bases:
ResourceConfigValidationMixinProvides an interface for quering a reference genome.
- property chrom_prefix: str
Return a prefix of all chromosomes of the reference genome.
- property chromosomes: list[str]
Return a list of all chromosomes of the reference genome.
- fetch(chrom: str, start: int, stop: int | None, buffer_size: int = 512) Generator[str, None, None][source]
Yield the nucleotides in a specific region.
While line feed calculation can be inaccurate because not every fetch will start at the start of a line, line feeds add extra characters to read and the output is limited by the amount of nucleotides expected to be read.
- get_sequence(chrom: str, start: int, stop: int) str[source]
Return sequence of nucleotides from specified chromosome region.
- is_pseudoautosomal(chrom: str, pos: int) bool[source]
Return true if specified position is pseudoautosomal.
- open() ReferenceGenome[source]
Open reference genome resources.
- property resource_id: str
- dae.genomic_resources.reference_genome.build_reference_genome_from_file(filename: str) ReferenceGenome[source]
Open a reference genome from a file.
- dae.genomic_resources.reference_genome.build_reference_genome_from_resource(resource: GenomicResource) ReferenceGenome[source]
Open a reference genome from resource.
- dae.genomic_resources.reference_genome.build_reference_genome_from_resource_id(resource_id: str, grr: GenomicResourceRepo | None = None) ReferenceGenome[source]
dae.genomic_resources.repository module
Provides basic classes for genomic resources and repositories.
+———————+ +—————–+
+—–| GenomicResourceRepo |--------------------| GenomicResource | | +———————+ +—————–+ | ^ ^ | | | | | | | +—————————–+ +—————————-+ | | | GenomicResourceProtocolRepo | —-| ReadOnlyRepositoryProtocol | | | +—————————–+ +—————————-+ | | ^ | | | | +————————–+ +—————————–+ +—-| GenomicResourceGroupRepo | | ReadWriteRepositoryProtocol |
+————————–+ +—————————–+
- class dae.genomic_resources.repository.GenomicResource(resource_id: str, version: tuple[int, ...], protocol: ReadOnlyRepositoryProtocol | ReadWriteRepositoryProtocol, config: dict[str, Any] | None = None, manifest: Manifest | None = None)[source]
Bases:
objectBase class for genomic resources.
- get_file_content(filename: str, *, uncompress: bool = True, mode: str = 't') Any[source]
Return the content of file in a resource.
- get_genomic_resource_id_version() str[source]
Return a string combinint resource ID and version.
Returns a string of the form aa/bb/cc[3.2] for a genomic resource with id aa/bb/cc and version 3.2. If the version is 0 the string will be aa/bb/cc.
- open_raw_file(filename: str, mode: str = 'rt', **kwargs: str | bool | None) IO[source]
Open a file in the resource and returns a File-like object.
- class dae.genomic_resources.repository.GenomicResourceProtocolRepo(proto: ReadOnlyRepositoryProtocol | ReadWriteRepositoryProtocol)[source]
Bases:
GenomicResourceRepoBase class for real genomic resources repositories.
- find_resource(resource_id: str, version_constraint: str | None = None, repository_id: str | None = None) GenomicResource | None[source]
Return one resource with id qual to resource_id.
If resource is not found, None is returned.
- get_all_resources() Generator[GenomicResource, None, None][source]
Return a generator over all resource in the repository.
- get_resource(resource_id: str, version_constraint: str | None = None, repository_id: str | None = None) GenomicResource[source]
Return one resource with id qual to resource_id.
If resource is not found, exception is raised.
- class dae.genomic_resources.repository.GenomicResourceRepo(repo_id: str)[source]
Bases:
ABCBase class for genomic resources repositories.
- property definition: dict[str, Any] | None
- abstract find_resource(resource_id: str, version_constraint: str | None = None, repository_id: str | None = None) GenomicResource | None[source]
Return one resource with id qual to resource_id.
If resource is not found, None is returned.
- abstract get_all_resources() Generator[GenomicResource, None, None][source]
Return a generator over all resource in the repository.
- abstract get_resource(resource_id: str, version_constraint: str | None = None, repository_id: str | None = None) GenomicResource[source]
Return one resource with id qual to resource_id.
If resource is not found, exception is raised.
- property repo_id: str
- class dae.genomic_resources.repository.Manifest[source]
Bases:
objectProvides genomic resource manifest object.
- add(entry: ManifestEntry) None[source]
Add manifest enry to the manifest.
- static from_file_content(file_content: str) Manifest[source]
Produce a manifest from manifest file content.
- static from_manifest_entries(manifest_entries: list[dict[str, Any]]) Manifest[source]
Produce a manifest from parsed manifest file content.
- to_manifest_entries() list[dict[str, Any]][source]
Transform manifest to list of dictionaries.
Helpfull when storing the manifest.
- update(entries: dict[str, ManifestEntry]) None[source]
- class dae.genomic_resources.repository.ManifestEntry(name: str, size: int, md5: str | None)[source]
Bases:
objectProvides an entry into manifest object.
- md5: str | None
- name: str
- size: int
- class dae.genomic_resources.repository.ManifestUpdate(manifest: Manifest, entries_to_delete: set[str], entries_to_update: set[str])[source]
Bases:
objectProvides a manifest update object.
- entries_to_delete: set[str]
- entries_to_update: set[str]
- class dae.genomic_resources.repository.Mode(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)[source]
Bases:
EnumProtocol mode.
- READONLY = 1
- READWRITE = 2
- class dae.genomic_resources.repository.ReadOnlyRepositoryProtocol(proto_id: str, url: str)[source]
Bases:
ABCDefines read only genomic resources repository protocol.
- CHUNK_SIZE = 32768
- build_genomic_resource(resource_id: str, version: tuple[int, ...], config: dict | None = None, manifest: Manifest | None = None) GenomicResource[source]
Build a genomic resource based on this protocol.
- compute_md5_sum(resource: GenomicResource, filename: str) str[source]
Compute a md5 hash for a file in the resource.
- abstract file_exists(resource: GenomicResource, filename: str) bool[source]
Check if given file exist in give resource.
- find_resource(resource_id: str, version_constraint: str | None = None) GenomicResource | None[source]
Return requested resource or None if not found.
- abstract get_all_resources() Generator[GenomicResource, None, None][source]
Return generator for all resources in the repository.
- get_file_content(resource: GenomicResource, filename: str, *, uncompress: bool = True, mode: str = 't') Any[source]
Return content of a file in given resource.
- get_manifest(resource: GenomicResource) Manifest[source]
Load and returns a resource manifest.
- get_resource(resource_id: str, version_constraint: str | None = None) GenomicResource[source]
Return requested resource or raises exception if not found.
In case resource is not found a FileNotFoundError exception is raised.
- get_resource_file_url(resource: GenomicResource, filename: str) str[source]
Return url of a file in the resource.
- get_resource_url(resource: GenomicResource) str[source]
Return url of the specified resources.
- abstract load_manifest(resource: GenomicResource) Manifest[source]
Load resource manifest.
- load_yaml(resource: GenomicResource, filename: str) Any[source]
Return parsed YAML file.
- abstract open_bigwig_file(resource: GenomicResource, filename: str) Any[source]
Open a bigwig file in a resource and return it.
Not all repositories support this method. Repositories that do no support this method raise and exception.
- abstract open_raw_file(resource: GenomicResource, filename: str, mode: str = 'rt', **kwargs: str | bool | None) IO[source]
Open file in a resource and returns a file-like object.
- abstract open_tabix_file(resource: GenomicResource, filename: str, index_filename: str | None = None) TabixFile[source]
Open a tabix file in a resource and return a pysam tabix file.
Not all repositories support this method. Repositories that do no support this method raise and exception.
- abstract open_vcf_file(resource: GenomicResource, filename: str, index_filename: str | None = None) VariantFile[source]
Open a vcf file in a resource and return a pysam VariantFile.
Not all repositories support this method. Repositories that do no support this method raise and exception.
- class dae.genomic_resources.repository.ReadWriteRepositoryProtocol(proto_id: str, url: str)[source]
Bases:
ReadOnlyRepositoryProtocolDefines read write genomic resources repository protocol.
- abstract build_content_file() list[dict[str, Any]][source]
Build the content of the repository (i.e ‘.CONTENTS.json’ file).
- build_manifest(resource: GenomicResource, prebuild_entries: dict[str, ManifestEntry] | None = None) Manifest[source]
Build full manifest for the resource.
- build_resource_file_state(resource: GenomicResource, filename: str, **kwargs: str | float | int | None) ResourceFileState[source]
Build resource file state.
- check_update_manifest(resource: GenomicResource, prebuild_entries: dict[str, ManifestEntry] | None = None) ManifestUpdate[source]
Check if the resource manifest needs update.
- abstract collect_all_resources() Generator[GenomicResource, None, None][source]
Return generator for all resources managed by this protocol.
- abstract collect_resource_entries(resource: GenomicResource) Manifest[source]
Scan the resource and returns manifest with all files.
- copy_resource(remote_resource: GenomicResource) GenomicResource[source]
Copy a remote resource into repository.
- abstract copy_resource_file(remote_resource: GenomicResource, dest_resource: GenomicResource, filename: str) ResourceFileState | None[source]
Copy a remote resource file into local repository.
- abstract delete_resource_file(resource: GenomicResource, filename: str) None[source]
Delete a resource file and it’s internal state.
- get_manifest(resource: GenomicResource) Manifest[source]
Load or build a resource manifest.
- get_or_create_resource(resource_id: str, version: tuple[int, ...]) GenomicResource[source]
Return a resource with specified ID and version.
If the resource is not found create an empty resource.
- abstract get_resource_file_size(resource: GenomicResource, filename: str) int[source]
Return the size of a resource file.
- abstract get_resource_file_timestamp(resource: GenomicResource, filename: str) float[source]
Return the timestamp (ISO formatted) of a resource file.
- abstract load_resource_file_state(resource: GenomicResource, filename: str) ResourceFileState | None[source]
Load resource file state from internal GRR state.
If the specified resource file has no internal state returns None.
- save_index(resource: GenomicResource, contents: str) None[source]
Save an index HTML file into the genomic resource’s directory.
- save_manifest(resource: GenomicResource, manifest: Manifest) None[source]
Save manifest into genomic resource’s directory.
- abstract save_resource_file_state(resource: GenomicResource, state: ResourceFileState) None[source]
Save resource file state into internal GRR state.
- update_manifest(resource: GenomicResource, prebuild_entries: dict[str, ManifestEntry] | None = None) Manifest[source]
Update or create full manifest for the resource.
- update_resource(remote_resource: GenomicResource, files_to_copy: set[str] | None = None) GenomicResource[source]
Copy a remote resource into repository.
Allows copying of a subset of files from the resource via files_to_copy. If files_to_copy is None, copies all files.
- abstract update_resource_file(remote_resource: GenomicResource, dest_resource: GenomicResource, filename: str) ResourceFileState | None[source]
Update a resource file into repository if needed.
- class dae.genomic_resources.repository.ResourceFileState(filename: str, size: int, timestamp: float, md5: str)[source]
Bases:
objectDefines resource file state saved into internal GRR state.
- filename: str
- md5: str
- size: int
- timestamp: float
- dae.genomic_resources.repository.is_gr_id_token(token: str) bool[source]
Check if token can be used as a genomic resource ID.
Genomic Resource Id Token is a string with one or more letters, numbers, ‘.’, ‘_’, or ‘-’. The function checks if the parameter token is a Genomic REsource Id Token.
- dae.genomic_resources.repository.is_version_constraint_satisfied(version_constraint: str | None, version: tuple[int, ...]) bool[source]
Check if a version matches a version constraint.
- dae.genomic_resources.repository.parse_gr_id_version_token(token: str) tuple[str, tuple[int, ...]][source]
Parse genomic resource ID with version.
Genomic Resource Id Version Token is a Genomic Resource Id Token with an optional version appened. If present, the version suffix has the form “(3.3.2)”. The default version is (0). Returns None if s in not a Genomic Resource Id Version. Otherwise returns token,version tupple
- dae.genomic_resources.repository.parse_resource_id_version(resource_path: str) tuple[str, tuple[int, ...]][source]
Parse genomic resource id and version path into Id, Version tuple.
An optional version (0,) appened if needed. If present, the version suffix has the form “(3.3.2)”. The default version is (0,). Returns tuple (None, None) if the path does not match the resource_id/version requirements. Otherwise returns tuple (resource_id, version).
dae.genomic_resources.repository_factory module
Provides a factory for building genomic resources repostiories.
- dae.genomic_resources.repository_factory.build_genomic_resource_group_repository(repo_id: str, children: list[GenomicResourceRepo]) GenomicResourceRepo[source]
- dae.genomic_resources.repository_factory.build_genomic_resource_repository(definition: dict | None = None, file_name: str | None = None) GenomicResourceRepo[source]
Build a GRR using a definition dict or yaml file.
- dae.genomic_resources.repository_factory.build_resource_implementation(res: GenomicResource) GenomicResourceImplementation[source]
Build a resource implementation from a resource.
- dae.genomic_resources.repository_factory.get_default_grr_definition() dict[str, Any][source]
Return default genomic resources repository definition.
dae.genomic_resources.resource_implementation module
- class dae.genomic_resources.resource_implementation.GenomicResourceImplementation(genomic_resource: GenomicResource)[source]
Bases:
ABCBase class used by resource implementations.
Resources are just a folder on a repository. Resource implementations are classes that know how to use the contents of the resource.
- abstract add_statistics_build_tasks(task_graph: TaskGraph, **kwargs: Any) list[Task][source]
Add tasks for calculating resource statistics to a task graph.
- abstract calc_statistics_hash() bytes[source]
Compute the statistics hash.
This hash is used to decide whether the resource statistics should be recomputed.
- property files: set[str]
Return a list of resource files the implementation utilises.
- abstract get_info(**kwargs: Any) str[source]
Construct the contents of the implementation’s HTML info page.
- get_statistics() ResourceStatistics | None[source]
Try and load resource statistics.
- abstract get_statistics_info(**kwargs: Any) str[source]
Construct the contents of the implementation’s HTML statistics info page.
- reload_statistics() ResourceStatistics | None[source]
- property resource_id: str
- class dae.genomic_resources.resource_implementation.InfoImplementationMixin[source]
Bases:
objectMixin that provides generic template info page generation interface.
- class FileEntry(name: str, size: str, md5: str | None)[source]
Bases:
objectProvides an entry into manifest object.
- md5: str | None
- name: str
- size: str
- get_statistics_template_data() dict[source]
Return a data dictionary to be used by the statistics template.
Will transform the description in the meta section using markdown.
- get_template_data() dict[source]
Return a data dictionary to be used by the template.
Will transform the description in the meta section using markdown.
- resource: GenomicResource
- class dae.genomic_resources.resource_implementation.ResourceConfigValidationMixin[source]
Bases:
objectMixin that provides validation of resource configuration.
- classmethod validate_and_normalize_schema(config: dict, resource: GenomicResource) dict[source]
Validate the resource schema and return the normalized version.
dae.genomic_resources.testing module
Provides tools usefult for testing.
- dae.genomic_resources.testing.build_filesystem_test_protocol(root_path: Path, *, repair: bool = True) FsspecReadWriteProtocol[source]
Build and return an filesystem fsspec protocol for testing.
The root_path is expected to point to a directory structure with all the resources.
- dae.genomic_resources.testing.build_filesystem_test_repository(root_path: Path) GenomicResourceProtocolRepo[source]
Build and return an filesystem fsspec repository for testing.
The root_path is expected to point to a directory structure with all the resources.
- dae.genomic_resources.testing.build_filesystem_test_resource(root_path: Path) GenomicResource[source]
- dae.genomic_resources.testing.build_http_test_protocol(root_path: Path, *, repair: bool = True) Generator[FsspecReadOnlyProtocol, None, None][source]
Populate Apache2 directory and construct HTTP genomic resource protocol.
The Apache2 is used to serve the GRR. This root_path directory should be a valid filesystem genomic resource repository.
- dae.genomic_resources.testing.build_inmemory_test_protocol(content: dict[str, Any]) FsspecReadWriteProtocol[source]
Build and return an embedded fsspec protocol for testing.
- dae.genomic_resources.testing.build_inmemory_test_repository(content: dict[str, Any]) GenomicResourceProtocolRepo[source]
Create an embedded GRR repository using passed content.
- dae.genomic_resources.testing.build_inmemory_test_resource(content: dict[str, Any]) GenomicResource[source]
Create a test resource based on content passed.
The passed content should appropriate for a single resource. Example content: {
- “genomic_resource.yaml”: textwrap.dedent(‘’’
type: position_score table:
filename: data.txt
- scores:
- id: aaaa
type: float desc: “” name: sc
‘’’), “data.txt”: convert_to_tab_separated(‘’’
#chrom start end sc 1 10 12 1.1 2 13 14 1.2
‘’’)
}
- dae.genomic_resources.testing.build_s3_test_bucket(s3filesystem: S3FileSystem | None = None) str[source]
Create an s3 test buckent.
- dae.genomic_resources.testing.build_s3_test_filesystem(endpoint_url: str | None = None) S3FileSystem[source]
Create an S3 fsspec filesystem connected to the S3 server.
- dae.genomic_resources.testing.build_s3_test_protocol(root_path: Path) Generator[FsspecReadWriteProtocol, None, None][source]
Construct fsspec genomic resource protocol.
The S3 bucket is populated with resource from filesystem GRR pointed by the root_path.
- dae.genomic_resources.testing.convert_to_tab_separated(content: str) str[source]
Convert a string into tab separated file content.
Useful for testing purposes. If you need to have a space in the file content use ‘||’.
- dae.genomic_resources.testing.copy_proto_genomic_resources(dest_proto: FsspecReadWriteProtocol, src_proto: FsspecReadOnlyProtocol) None[source]
- dae.genomic_resources.testing.proto_builder(scheme: str, content: dict) Generator[FsspecReadOnlyProtocol | FsspecReadWriteProtocol, None, None][source]
Build a test genomic resource protocol with specified content.
- dae.genomic_resources.testing.resource_builder(scheme: str, content: dict) Generator[GenomicResource, None, None][source]
- dae.genomic_resources.testing.s3_test_protocol() FsspecReadWriteProtocol[source]
Build an S3 fsspec testing protocol on top of existing S3 server.
- dae.genomic_resources.testing.setup_bigwig(out_path: Path, content: str, chrom_lens: dict[str, int]) Path[source]
Setup a bigwig format variants file using bedGraph-style content.
Example: chr1 0 100 0.0 chr1 100 120 1.0 chr1 125 126 200.0
- dae.genomic_resources.testing.setup_dae_transmitted(root_path: Path, summary_content: str, toomany_content: str) tuple[Path, Path][source]
Set up a DAE transmitted variants file using passed content.
- dae.genomic_resources.testing.setup_directories(root_dir: Path, content: str | dict[str, Any]) None[source]
Set up directory and subdirectory structures using the content.
- dae.genomic_resources.testing.setup_empty_gene_models(out_path: Path) GeneModels[source]
Set up empty gene models.
- dae.genomic_resources.testing.setup_gene_models(out_path: Path, content: str, fileformat: str | None = None, config: str | None = None) GeneModels[source]
Set up gene models in refflat format using the passed content.
- dae.genomic_resources.testing.setup_genome(out_path: Path, content: str) ReferenceGenome[source]
Set up reference genome using the content.
- dae.genomic_resources.testing.setup_gzip(gzip_path: Path, gzip_content: str) Path[source]
Set up a gzipped TSV file.
dae.genomic_resources.variant_utils module
- dae.genomic_resources.variant_utils.maximally_extend_variant(chrom: str, pos: int, ref: str, alts: list[str], genome: ReferenceGenome) tuple[str, int, str, list[str]][source]
Maximally extend a variant.
- dae.genomic_resources.variant_utils.normalize_variant(chrom: str, pos: int, ref: str, alts: list[str], genome: ReferenceGenome) tuple[str, int, str, list[str]][source]
Normalize a variant.
Using algorithm defined in the https://genome.sph.umich.edu/wiki/Variant_Normalization
Module contents
- dae.genomic_resources.get_resource_implementation_builder(resource_type: str) Callable[[GenomicResource], GenomicResourceImplementation] | None[source]
Return an implementation builder for a certain resource type.
If the builder is not registered, then it will search for an entry point in the found implementations list. If an entry point is found, it will be loaded and registered and returned.