dae.genomic_resources package

Subpackages

Submodules

dae.genomic_resources.aggregators module

class dae.genomic_resources.aggregators.Aggregator[source]

Bases: ABC

Base class for score aggregators.

add(value: Any, count: int = 1, **kwargs: Any) None[source]
aggregate(values: list[Any]) Any[source]
clear() None[source]
get_final() Any[source]
get_total_count() int[source]
get_used_count() int[source]
class dae.genomic_resources.aggregators.ConcatAggregator[source]

Bases: Aggregator

Aggregator that concatenates all passed values.

get_final() Any[source]
class dae.genomic_resources.aggregators.CountAggregator[source]

Bases: Aggregator

Aggregator that counts values.

get_final() Any[source]
class dae.genomic_resources.aggregators.DictAggregator[source]

Bases: Aggregator

Aggregator that builds a dictionary of all passed values.

get_final() Any[source]
class dae.genomic_resources.aggregators.JoinAggregator(separator: str)[source]

Bases: Aggregator

Aggregator that joins all passed values using a separator.

get_final() Any[source]
class dae.genomic_resources.aggregators.ListAggregator[source]

Bases: Aggregator

Aggregator that builds a list of all passed values.

get_final() Any[source]
class dae.genomic_resources.aggregators.MaxAggregator[source]

Bases: Aggregator

Maximum value aggregator for genomic scores.

get_final() Any[source]
class dae.genomic_resources.aggregators.MeanAggregator[source]

Bases: Aggregator

Aggregator for genomic scores that calculates mean value.

get_final() Any[source]
class dae.genomic_resources.aggregators.MedianAggregator[source]

Bases: Aggregator

Aggregator for genomic scores that calculates median value.

get_final() Any[source]
class dae.genomic_resources.aggregators.MinAggregator[source]

Bases: Aggregator

Minimum value aggregator for genomic scores.

get_final() Any[source]
class dae.genomic_resources.aggregators.ModeAggregator[source]

Bases: Aggregator

Aggregator for genomic scores that calculates mode value.

get_final() Any[source]
dae.genomic_resources.aggregators.build_aggregator(aggregator_type: str) Aggregator[source]
dae.genomic_resources.aggregators.create_aggregator(aggregator_def: dict[str, Any]) Aggregator[source]

Create an aggregator by aggregator definition.

dae.genomic_resources.aggregators.create_aggregator_definition(aggregator_type: str) dict[str, Any][source]

Parse an aggregator definition string.

dae.genomic_resources.aggregators.get_aggregator_class(aggregator: str) Callable[[], Aggregator][source]
dae.genomic_resources.aggregators.validate_aggregator(aggregator_type: str) None[source]

dae.genomic_resources.cached_repository module

Provides caching genomic resources.

class dae.genomic_resources.cached_repository.CacheResource(resource: GenomicResource, protocol: CachingProtocol)[source]

Bases: GenomicResource

Represents resources stored in cache.

class dae.genomic_resources.cached_repository.CachingProtocol(remote_protocol: ReadOnlyRepositoryProtocol, local_protocol: FsspecReadWriteProtocol)[source]

Bases: ReadOnlyRepositoryProtocol

Defines caching GRR repository protocol.

file_exists(resource: GenomicResource, filename: str) bool[source]

Check if given file exist in give resource.

get_all_resources() Generator[GenomicResource, None, None][source]

Return generator for all resources in the repository.

get_resource_file_url(resource: GenomicResource, filename: str) str[source]

Return url of a file in the resource.

get_resource_url(resource: GenomicResource) str[source]

Return url of the specified resources.

get_url() str[source]

Return the repository URL.

invalidate() None[source]

Invalidate internal cache of repository protocol.

load_manifest(resource: GenomicResource) Manifest[source]

Load resource manifest.

open_bigwig_file(resource: GenomicResource, filename: str) Any[source]

Open a bigwig file in a resource and return it.

Not all repositories support this method. Repositories that do no support this method raise and exception.

open_raw_file(resource: GenomicResource, filename: str, mode: str = 'rt', **kwargs: str | bool | None) IO[source]

Open file in a resource and returns a file-like object.

open_tabix_file(resource: GenomicResource, filename: str, index_filename: str | None = None) TabixFile[source]

Open a tabix file in a resource and return a pysam tabix file.

Not all repositories support this method. Repositories that do no support this method raise and exception.

open_vcf_file(resource: GenomicResource, filename: str, index_filename: str | None = None) VariantFile[source]

Open a vcf file in a resource and return a pysam VariantFile.

Not all repositories support this method. Repositories that do no support this method raise and exception.

refresh_cached_resource(resource: GenomicResource) None[source]

Refresh all resource files in cache if neccessary.

refresh_cached_resource_file(resource: GenomicResource, filename: str) tuple[str, str][source]

Refresh a resource file in cache if neccessary.

class dae.genomic_resources.cached_repository.GenomicResourceCachedRepo(child: GenomicResourceRepo, cache_url: str, **kwargs: str | None)[source]

Bases: GenomicResourceRepo

Defines caching genomic resources repository.

find_resource(resource_id: str, version_constraint: str | None = None, repository_id: str | None = None) GenomicResource | None[source]

Return requested resource or None if not found.

get_all_resources() Generator[GenomicResource, None, None][source]

Return a generator over all resource in the repository.

get_resource(resource_id: str, version_constraint: str | None = None, repository_id: str | None = None) GenomicResource[source]

Return one resource with id qual to resource_id.

If resource is not found, exception is raised.

get_resource_cached_files(resource_id: str) set[str][source]

Get a set of filenames of cached files for a given resource.

invalidate() None[source]

Invalidate internal state of the repository.

dae.genomic_resources.cached_repository.cache_resources(repository: GenomicResourceRepo, resource_ids: Iterable[str] | None, workers: int | None = None) None[source]

Cache resources from a list of remote resource IDs.

dae.genomic_resources.cli module

Provides CLI for management of genomic resources repositories.

dae.genomic_resources.cli.cli_browse(cli_args: list[str] | None = None) None[source]

Provide CLI for repository browsing.

dae.genomic_resources.cli.cli_manage(cli_args: list[str] | None = None) None[source]

Provide CLI for repository management.

dae.genomic_resources.cli.collect_dvc_entries(proto: ReadWriteRepositoryProtocol, res: GenomicResource) dict[str, ManifestEntry][source]

Collect manifest entries defined by .dvc files.

dae.genomic_resources.draw_score_histograms module

dae.genomic_resources.draw_score_histograms.main(argv: list[str] | None = None) None[source]

Liftover dae variants tool main function.

dae.genomic_resources.draw_score_histograms.parse_cli_arguments() ArgumentParser[source]

Create CLI parser.

dae.genomic_resources.fsspec_protocol module

Provides GRR protocols based on fsspec library.

class dae.genomic_resources.fsspec_protocol.FsspecReadOnlyProtocol(proto_id: str, url: str, filesystem: AbstractFileSystem)[source]

Bases: ReadOnlyRepositoryProtocol

Provides fsspec genomic resources repository protocol.

close() None[source]

Close the genomic resource.

file_exists(resource: GenomicResource, filename: str) bool[source]

Check if given file exist in give resource.

get_all_resources() Generator[GenomicResource, None, None][source]

Return generator over all resources in the repository.

get_url() str[source]

Return the repository URL.

invalidate() None[source]

Invalidate internal cache of repository protocol.

load_manifest(resource: GenomicResource) Manifest[source]

Load resource manifest.

open_bigwig_file(resource: GenomicResource, filename: str) Any[source]

Open a bigwig file in a resource and return it.

Not all repositories support this method. Repositories that do no support this method raise and exception.

open_raw_file(resource: GenomicResource, filename: str, mode: str = 'rt', **kwargs: str | bool | None) IO[source]

Open file in a resource and returns a file-like object.

open_tabix_file(resource: GenomicResource, filename: str, index_filename: str | None = None) TabixFile[source]

Open a tabix file in a resource and return a pysam tabix file.

Not all repositories support this method. Repositories that do no support this method raise and exception.

open_vcf_file(resource: GenomicResource, filename: str, index_filename: str | None = None) VariantFile[source]

Open a vcf file in a resource and return a pysam VariantFile.

Not all repositories support this method. Repositories that do no support this method raise and exception.

class dae.genomic_resources.fsspec_protocol.FsspecReadWriteProtocol(proto_id: str, url: str, filesystem: AbstractFileSystem)[source]

Bases: FsspecReadOnlyProtocol, ReadWriteRepositoryProtocol

Provides fsspec genomic resources repository protocol.

build_content_file() list[dict[str, Any]][source]

Build the content of the repository (i.e ‘.CONTENTS.json’ file).

build_index_info(repository_template: Template) dict[source]

Build info dict for the repository.

collect_all_resources() Generator[GenomicResource, None, None][source]

Return generator over all resources managed by this protocol.

collect_resource_entries(resource: GenomicResource) Manifest[source]

Scan the resource and resturn a manifest.

copy_resource_file(remote_resource: GenomicResource, dest_resource: GenomicResource, filename: str) ResourceFileState | None[source]

Copy a resource file into repository.

delete_resource_file(resource: GenomicResource, filename: str) None[source]

Delete a resource file and it’s internal state.

get_all_resources() Generator[GenomicResource, None, None][source]

Return generator over all resources in the repository.

get_resource_file_size(resource: GenomicResource, filename: str) int[source]

Return the size of a resource file.

get_resource_file_timestamp(resource: GenomicResource, filename: str) float[source]

Return the timestamp (ISO formatted) of a resource file.

load_resource_file_state(resource: GenomicResource, filename: str) ResourceFileState | None[source]

Load resource file state from internal GRR state.

If the specified resource file has no internal state returns None.

obtain_resource_file_lock(resource: GenomicResource, filename: str, timeout: float = -1) AbstractContextManager[source]

Lock a resource’s file.

save_resource_file_state(resource: GenomicResource, state: ResourceFileState) None[source]

Save resource file state into internal GRR state.

update_resource_file(remote_resource: GenomicResource, dest_resource: GenomicResource, filename: str) ResourceFileState | None[source]

Update a resource file into repository if needed.

dae.genomic_resources.fsspec_protocol.build_fsspec_protocol(proto_id: str, root_url: str, **kwargs: str | None) FsspecReadOnlyProtocol | FsspecReadWriteProtocol[source]

Create fsspec GRR protocol based on the root url.

dae.genomic_resources.fsspec_protocol.build_inmemory_protocol(proto_id: str, root_path: str, content: dict[str, Any]) FsspecReadWriteProtocol[source]

Build and return an embedded fsspec protocol for testing.

dae.genomic_resources.fsspec_protocol.build_local_resource(dirname: str, config: dict[str, Any]) GenomicResource[source]

Build a resource from a local filesystem directory.

dae.genomic_resources.genomic_context module

Genomic context provides a way to collect various genomic resources from various sources and make them available through a single interface.

The module follows a registry-based approach. Providers register themselves and are later consulted (in priority order) to build individual GenomicContext instances. Every created context is combined into a PriorityGenomicContext, offering a single access point for resources such as genomic resource repositories, reference genomes, gene models, annotation pipelines, etc. Providers can be registered programmatically via register_context_provider() or discovered automatically through entry points.

Example usage of genomic context in a tool with command line interface:

import argparse
import sys

from dae.genomic_resources.genomic_context import (
    context_providers_add_argparser_arguments,
    context_providers_init,
    get_genomic_context,
)


parser = argparse.ArgumentParser()
context_providers_add_argparser_arguments(parser)

args = parser.parse_args(sys.argv[1:])
context_providers_init(**vars(args))
genomic_context = get_genomic_context()

If you don’t need command line arguments you can do:

context_providers_init()
genomic_context = get_genomic_context()

When you need a CLI with all defaults and without modifying the argument parser you can do:

context_providers_init_with_argparser("GenomicTool")
genomic_context = get_genomic_context()
class dae.genomic_resources.genomic_context.DefaultRepositoryContextProvider[source]

Bases: GenomicContextProvider

Provide access to the default genomic resources repository.

The default repository is resolved via build_genomic_resource_repository() using the environment configuration. The resulting context exposes a single key, "genomic_resources_repository", which can be consumed by other code participating in the context chain.

add_argparser_arguments(parser: ArgumentParser) None[source]

Declare command line arguments for this provider.

The default repository provider is fully configuration driven and has nothing to expose on the CLI, so the method intentionally leaves the parser untouched. The override exists to make the behaviour explicit in the generated documentation.

init(**kwargs: Any) GenomicContext[source]

Instantiate a context backed by the default GRR.

Parameters

**kwargs

Accepted for interface compatibility; the provider ignores runtime keyword arguments because everything is derived from the global configuration.

Returns

GenomicContext

A context exposing a single genomic_resources_repository entry pointing at the default repository instance.

dae.genomic_resources.genomic_context.clear_registered_contexts() None[source]

Forget all contexts created by context_providers_init().

This function exists primarily for testing scenarios where the global registry should be reset between test cases.

dae.genomic_resources.genomic_context.context_providers_add_argparser_arguments(parser: ArgumentParser) None[source]

Delegate command line argument registration to each provider.

Parameters

parser

The parser that should receive additional arguments from every registered provider.

dae.genomic_resources.genomic_context.context_providers_init(**kwargs: Any) None[source]

Materialize contexts from every registered provider.

The function walks all registered providers in priority order and asks each of them to initialise a GenomicContext. The resulting contexts are stored for later retrieval via get_genomic_context().

Notes

Providers are invoked at most once per process. Subsequent calls are ignored until clear_registered_contexts() is executed, which is especially helpful in unit tests.

Parameters

**kwargs

Keyword arguments forwarded to every provider’s init method.

dae.genomic_resources.genomic_context.context_providers_init_with_argparser(toolname: str = 'GenomicTool') None[source]

Initialise providers using arguments parsed from sys.argv.

Parameters

toolname

The program name presented to argparse.ArgumentParser.

Notes

This helper is useful for simple tools that do not customise their argument parser but still want to expose the command line options defined by registered context providers.

dae.genomic_resources.genomic_context.get_genomic_context() GenomicContext[source]

Return a priority context that merges every registered context.

The returned PriorityGenomicContext respects the registration order, giving precedence to contexts added most recently when multiple contexts expose the same key.

dae.genomic_resources.genomic_context.register_context(context: GenomicContext) None[source]

Record context so it participates in future lookups.

Parameters

context

The context instance to be considered when get_genomic_context() is invoked.

dae.genomic_resources.genomic_context.register_context_provider(context_provider: GenomicContextProvider) None[source]

Register context_provider so it participates in initialization.

Parameters

context_provider

The provider implementation that should be considered when contexts are assembled. Providers are stored in registration order and later sorted by their priority before initialization.

dae.genomic_resources.genomic_context_base module

Base classes and interfaces for genomic context management.

This module defines the foundational abstractions for organizing and accessing genomic resources through a unified context system. The central concept is GenomicContext, which acts as a key-value store exposing resources like genomic repositories, reference genomes, gene models, and annotation pipelines. Providers implementing GenomicContextProvider are responsible for building concrete context instances, often by consulting configuration files or command-line arguments.

The module also provides two concrete context implementations: SimpleGenomicContext for straightforward dictionary-backed contexts and PriorityGenomicContext for merging multiple contexts with fallback semantics.

Key Constants

GC_GRR_KEYstr

Standard key for the genomic resources repository object.

GC_REFERENCE_GENOME_KEYstr

Standard key for the reference genome object.

GC_GENE_MODELS_KEYstr

Standard key for the gene models object.

GC_ANNOTATION_PIPELINE_KEYstr

Standard key for the annotation pipeline object.

See Also

dae.genomic_resources.genomic_context

High-level orchestration and provider registration functions.

class dae.genomic_resources.genomic_context_base.GenomicContext[source]

Bases: ABC

Abstract base class for genomic context implementations.

A genomic context serves as a registry of genomic resources, exposing them via string keys. Typical resources include genomic resource repositories, reference genomes, gene models, and annotation pipelines. Subclasses must implement the key-value retrieval logic and report which keys are available.

Notes

The class provides three typed convenience accessors (get_reference_genome(), get_gene_models(), get_genomic_resources_repository()) that validate the underlying object types before returning them. These accessors raise ValueError if the stored object does not match the expected type.

abstract get_context_keys() set[str][source]

Report all keys exposed by this context.

Returns

set[str]

The complete collection of keys under which objects can be retrieved. May be empty if the context holds no resources.

abstract get_context_object(key: str) Any | None[source]

Retrieve a context object by its key.

Parameters

key

The string identifier for the desired resource.

Returns

Any | None

The stored object if the key is present, otherwise None.

Notes

Implementations must return None when the key is absent rather than raising KeyError. This convention allows callers to safely query for optional resources.

get_gene_models() GeneModels | None[source]

Retrieve and validate the gene models from the context.

Returns

GeneModels | None

The gene models instance if present and correctly typed, or None when the key is absent.

Raises

ValueError

If the context entry for GC_GENE_MODELS_KEY is present but does not contain a GeneModels instance.

get_genomic_resources_repository() GenomicResourceRepo | None[source]

Retrieve and validate the genomic resources repository.

Returns

GenomicResourceRepo | None

The repository instance if present and correctly typed, or None when the key is absent.

Raises

ValueError

If the context entry for GC_GRR_KEY is present but does not contain a GenomicResourceRepo instance.

get_reference_genome() ReferenceGenome | None[source]

Retrieve and validate the reference genome from the context.

Returns

ReferenceGenome | None

The reference genome instance if present and correctly typed, or None when the key is absent.

Raises

ValueError

If the context entry for GC_REFERENCE_GENOME_KEY is present but does not contain a ReferenceGenome instance.

abstract get_source() str[source]

Identify the origin of this context.

Returns

str

A human-readable label describing the source, such as a provider name or a file path. Useful for debugging and logging when multiple contexts are combined.

class dae.genomic_resources.genomic_context_base.GenomicContextProvider(provider_type: str, provider_priority: int)[source]

Bases: ABC

Abstract base class for genomic context providers.

Providers are responsible for building GenomicContext instances by consulting external configuration sources, command-line arguments, or environment settings. Each provider is identified by a unique type name and assigned a priority that determines the order in which providers are invoked during context initialization.

Providers typically register themselves at module import time by calling dae.genomic_resources.genomic_context.register_context_provider(). The registration system later sorts providers by priority (descending) and type name, then invokes their init() method to produce contexts.

Attributes

_provider_typestr

A unique identifier describing this provider.

_provider_priorityint

The numeric priority; higher values are consulted first.

abstract add_argparser_arguments(parser: ArgumentParser) None[source]

Register command-line arguments that configure the provider.

Parameters

parser

The argparse.ArgumentParser instance that should receive additional arguments.

Notes

Providers may add optional or required arguments. When invoked, the parsed argument namespace will be passed to init() as keyword arguments. If a provider does not require CLI arguments it should leave the parser untouched.

get_context_provider_priority() int[source]

Return the provider’s numeric priority.

Returns

int

The priority assigned at construction time.

get_context_provider_type() str[source]

Return the provider’s type identifier.

Returns

str

The unique type name assigned at construction time.

abstract init(**kwargs: Any) GenomicContext | None[source]

Build a genomic context using the provided configuration.

Parameters

**kwargs

Keyword arguments typically derived from command-line parsing, environment variables, or configuration files. The exact keys depend on what the provider declared in add_argparser_arguments().

Returns

GenomicContext | None

A new context instance if the provider successfully assembled the required resources, or None if the provider chooses to abstain (for example when optional arguments are omitted).

Notes

Returning None allows a provider to conditionally participate. Other providers may then supply default or fallback contexts.

class dae.genomic_resources.genomic_context_base.PriorityGenomicContext(contexts: Iterable[GenomicContext])[source]

Bases: GenomicContext

Composite context implementing priority-based fallback lookup.

This context merges multiple underlying contexts, consulting them in order when a resource is requested. The first context that provides a non-None value for a given key wins. This strategy allows CLI or user-supplied contexts to override defaults from configuration-driven providers.

Parameters

contexts

An iterable of GenomicContext instances, ordered by descending precedence. When a resource is requested, the priority context walks the sequence and returns the first non-None result.

Attributes

contextsIterable[GenomicContext]

The ordered collection of underlying contexts.

Notes

At construction time the context logs the sources of all constituent contexts to aid debugging. If no contexts are provided a warning is logged to indicate that no resources will be available.

get_context_keys() set[str][source]

Compute the union of all keys from underlying contexts.

Returns

set[str]

The merged set of keys available across all constituent contexts. If multiple contexts expose the same key the set contains it only once.

get_context_object(key: str) Any | None[source]

Retrieve a resource using priority-based fallback.

Parameters

key

The string identifier of the desired resource.

Returns

Any | None

The first non-None object found among the underlying contexts, or None if every context returns None (or if no contexts are available).

Notes

Each context is queried in order. When a context returns a non-None value the search stops and that value is returned. A log entry is generated to identify which context supplied the object.

get_source() str[source]

Generate a composite source identifier.

Returns

str

A string of the form "PriorityGenomicContext(source1|source2|...)" listing the sources of all underlying contexts in priority order.

class dae.genomic_resources.genomic_context_base.SimpleGenomicContext(context_objects: dict[str, Any], source: str)[source]

Bases: GenomicContext

Dictionary-backed implementation of GenomicContext.

This concrete context stores resource objects in a simple dictionary and returns them on demand. It is commonly used by providers that assemble a fixed set of resources at initialization time.

Parameters

context_objects

A mapping from string keys to resource objects. Typical keys include GC_GRR_KEY, GC_REFERENCE_GENOME_KEY, GC_GENE_MODELS_KEY, and GC_ANNOTATION_PIPELINE_KEY.

source

A human-readable label identifying the origin of this context, such as a provider name or file path.

Attributes

_contextdict[str, Any]

The internal dictionary holding the resource objects.

_sourcestr

The stored source label.

get_context_keys() set[str][source]

Report all available keys.

Returns

set[str]

The set of keys under which resources are stored.

get_context_object(key: str) Any | None[source]

Retrieve a resource by key.

Parameters

key

The string identifier of the desired resource.

Returns

Any | None

The stored object if the key exists, otherwise None.

get_source() str[source]

Return the source label.

Returns

str

The human-readable identifier assigned at construction time.

dae.genomic_resources.genomic_context_cli module

Command-line helpers for configuring genomic resource contexts.

This module exposes CLIGenomicContextProvider, a concrete implementation of GenomicContextProvider that resolves genomic resources based on command-line arguments. Tools can register the provider to let their users supply a genomic resources repository, reference genome, and gene models at runtime.

class dae.genomic_resources.genomic_context_cli.CLIGenomicContextProvider[source]

Bases: GenomicContextProvider

Resolve genomic resources from command-line arguments.

The provider allows CLI tools to override the default genomic resources repository, reference genome, and gene models. When invoked without any overrides, it falls back to the previously initialised genomic context so that defaults from gpf_instance or other providers remain available.

add_argparser_arguments(parser: ArgumentParser) None[source]

Expose CLI options that control genomic resource resolution.

Parameters

parser

The argument parser that should receive the provider specific options.

init(**kwargs: Any) GenomicContext | None[source]

Create a SimpleGenomicContext based on CLI arguments.

Parameters

**kwargs

Arguments produced from the command-line parser. The provider recognises grr_filename, grr_directory, reference_genome_resource_id, and gene_models_resource_id.

Returns

GenomicContext | None

A context containing the resolved objects, or None if the genomic resources repository could not be determined.

dae.genomic_resources.genomic_scores module

class dae.genomic_resources.genomic_scores.AlleleScore(resource: GenomicResource)[source]

Bases: GenomicScore

Defines allele genomic scores.

class Mode(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)[source]

Bases: Enum

Allele score mode.

ALLELES = 2
SUBSTITUTIONS = 1
static from_name(name: str) Mode[source]
alleles_mode() bool[source]

Return True if the score is in alleles mode.

fetch_region(chrom: str | None, pos_begin: int | None, pos_end: int | None, scores: list[str] | None = None) Generator[tuple[int, str | None, str | None, list[str | int | float | bool | None] | None], None, None][source]

Return position score values in a region.

fetch_scores(chrom: str, position: int, reference: str, alternative: str, scores: list[str] | None = None) list[str | int | float | bool | None] | None[source]

Fetch score values at specified genomic position and nucleotide.

fetch_scores_agg(chrom: str, pos_begin: int, pos_end: int, scores: list[AlleleScoreQuery] | None = None) list[Aggregator][source]

Fetch score values in a region and aggregates them.

static get_schema() dict[str, Any][source]

Return schema to be used for config validation.

open() AlleleScore[source]

Open genomic score resource and returns it.

substitutions_mode() bool[source]

Return True if the score is in substitutions mode.

class dae.genomic_resources.genomic_scores.AlleleScoreAggr(score: 'str', position_aggregator: 'Aggregator', allele_aggregator: 'Aggregator')[source]

Bases: object

allele_aggregator: Aggregator
position_aggregator: Aggregator
score: str
class dae.genomic_resources.genomic_scores.AlleleScoreQuery(score: 'str', position_aggregator: 'str | None' = None, allele_aggregator: 'str | None' = None)[source]

Bases: object

allele_aggregator: str | None = None
position_aggregator: str | None = None
score: str
class dae.genomic_resources.genomic_scores.CNV(chrom: str, pos_begin: int, pos_end: int, attributes: dict[str, Any])[source]

Bases: object

Copy number object from a cnv_collection.

attributes: dict[str, Any]
chrom: str
pos_begin: int
pos_end: int
property size: int
class dae.genomic_resources.genomic_scores.CnvCollection(resource: GenomicResource)[source]

Bases: GenomicScore

A collection of CNVs.

fetch_cnvs(chrom: str, start: int, stop: int, scores: list[str] | None = None) list[CNV][source]

Return list of CNVs that overlap with the provided region.

static get_schema() dict[str, Any][source]

Return schema to be used for config validation.

open() CnvCollection[source]

Open genomic score resource and returns it.

class dae.genomic_resources.genomic_scores.GenomicScore(resource: GenomicResource)[source]

Bases: ResourceConfigValidationMixin

Genomic scores base class.

PositionScore, NPScore and AlleleScore inherit from this class. Statistics builder implementation uses only GenomicScore interface to build all defined statistics.

close() None[source]
get_all_chromosomes() list[str][source]
get_all_scores() list[str][source]
get_config() dict[str, Any][source]
get_default_annotation_attribute(score_id: str) str | None[source]

Return default annotation attribute for a score.

Returns None if the score is not included in the default annotation. Returns the name of the attribute if present or the score if not.

get_default_annotation_attributes() list[Any][source]

Collect default annotation attributes.

get_histogram_filename(score_id: str) str[source]

Return the histogram filename for a genomic score.

get_histogram_image_filename(score_id: str) str[source]
get_histogram_image_url(score_id: str) str | None[source]
get_number_range(score_id: str) tuple[float, float] | None[source]

Return the value range for a number score.

static get_schema() dict[str, Any][source]

Return schema to be used for config validation.

get_score_definition(score_id: str) _ScoreDef | None[source]
get_score_histogram(score_id: str) NullHistogram | CategoricalHistogram | NumberHistogram[source]

Return defined histogram for a score.

is_open() bool[source]
open() GenomicScore[source]

Open genomic score resource and returns it.

class dae.genomic_resources.genomic_scores.PositionScore(resource: GenomicResource)[source]

Bases: GenomicScore

Defines position genomic score.

fetch_region(chrom: str, pos_begin: int | None, pos_end: int | None, scores: list[str] | None = None) Generator[tuple[int, int, list[str | int | float | bool | None] | None], None, None][source]

Return position score values in a region.

fetch_scores(chrom: str, position: int, scores: list[str] | None = None) list[str | int | float | bool | None] | None[source]

Fetch score values at specific genomic position.

fetch_scores_agg(chrom: str, pos_begin: int, pos_end: int, scores: list[str] | list[PositionScoreQuery] | None = None) list[Aggregator][source]

Fetch score values in a region and aggregates them.

Case 1:
res.fetch_scores_agg(“1”, 10, 20) –>

all score with default aggregators

Case 2:
res.fetch_scores_agg(“1”, 10, 20,

non_default_aggregators={“bla”:”max”}) –>

all score with default aggregators but ‘bla’ should use ‘max’

get_region_scores(chrom: str, pos_beg: int, pos_end: int, score_id: str) list[str | int | float | bool | None][source]

Return score values in a region.

static get_schema() dict[str, Any][source]

Return schema to be used for config validation.

open() PositionScore[source]

Open genomic score resource and returns it.

class dae.genomic_resources.genomic_scores.PositionScoreAggr(score: 'str', position_aggregator: 'Aggregator')[source]

Bases: object

position_aggregator: Aggregator
score: str
class dae.genomic_resources.genomic_scores.PositionScoreQuery(score: 'str', position_aggregator: 'str | None' = None)[source]

Bases: object

position_aggregator: str | None = None
score: str
class dae.genomic_resources.genomic_scores.ScoreDef(score_id: str, desc: str, value_type: str, pos_aggregator: str | None, allele_aggregator: str | None, small_values_desc: str | None, large_values_desc: str | None, hist_conf: NullHistogramConfig | CategoricalHistogramConfig | NumberHistogramConfig | None)[source]

Bases: object

Score configuration definition.

allele_aggregator: str | None
desc: str
hist_conf: NullHistogramConfig | CategoricalHistogramConfig | NumberHistogramConfig | None
large_values_desc: str | None
pos_aggregator: str | None
score_id: str
small_values_desc: str | None
value_type: str
class dae.genomic_resources.genomic_scores.ScoreLine(line: LineBase, score_defs: dict[str, _ScoreDef])[source]

Bases: object

Abstraction for a genomic score line. Wraps the line adapter.

property alt: str | None
property chrom: str
get_available_scores() tuple[Any, ...][source]
get_score(score_id: str) str | int | float | bool | None[source]

Get and parse configured score from line.

property pos_begin: int
property pos_end: int
property ref: str | None
dae.genomic_resources.genomic_scores.build_score_from_resource(resource: GenomicResource) GenomicScore[source]

Build a genomic score resource and return the coresponding score.

dae.genomic_resources.genomic_scores.build_score_from_resource_id(resource_id: str, grr: GenomicResourceRepo | None = None) GenomicScore[source]

dae.genomic_resources.group_repository module

Provides group genomic resources repository.

class dae.genomic_resources.group_repository.GenomicResourceGroupRepo(children: list[GenomicResourceRepo], repo_id: str | None = None)[source]

Bases: GenomicResourceRepo

Defines group genomic resources repository.

find_resource(resource_id: str, version_constraint: str | None = None, repository_id: str | None = None) GenomicResource | None[source]

Return one resource with id qual to resource_id.

If resource is not found, None is returned.

get_all_resources() Generator[GenomicResource, None, None][source]

Return a generator over all resource in the repository.

get_resource(resource_id: str, version_constraint: str | None = None, repository_id: str | None = None) GenomicResource[source]

Return one resource with id qual to resource_id.

If resource is not found, exception is raised.

invalidate() None[source]

Invalidate internal state of the repository.

dae.genomic_resources.histogram module

Handling of genomic scores statistics.

Currently we support only genomic scores histograms.

class dae.genomic_resources.histogram.CategoricalHistogram(config: CategoricalHistogramConfig, counter: dict[str | int, int] | None = None)[source]

Bases: Statistic

Class for categorical data histograms.

UNIQUE_VALUES_LIMIT = 100
add_value(value: str | int | None, count: int = 1) None[source]

Add a value to the categorical histogram.

Returns true if successfully added and false if failed. Will fail if too many values are accumulated.

static deserialize(content: str) CategoricalHistogram[source]

Create a statistic from serialized data.

property display_values: dict[str | int, int]

Return categorical histogram display values in order.

static from_dict(data: dict[str, Any]) CategoricalHistogram[source]
merge(other: Statistic) None[source]

Merge with other histogram.

plot(outfile: IO, score_id: str, y_axis_label: str | None = None, small_values_description: str | None = None, large_values_description: str | None = None) None[source]

Plot histogram and save it into outfile.

property raw_values: dict[str | int, int]
serialize() str[source]

Return a serialized version of this statistic.

to_dict() dict[str, Any][source]
type = 'categorical_histogram'
values_domain() str[source]
class dae.genomic_resources.histogram.CategoricalHistogramConfig(displayed_values_count: int | None = 20, displayed_values_percent: float | None = None, value_order: list[str | int] | None = None, y_log_scale: bool = False, label_rotation: int = 0, plot_function: str | None = None, enforce_type: bool = True, natural_order: bool = False, allow_only_whole_values_y: bool = False)[source]

Bases: object

Configuration class for categorical histograms.

allow_only_whole_values_y: bool = False
static default_config() CategoricalHistogramConfig[source]
displayed_values_count: int | None = 20
displayed_values_percent: float | None = None
enforce_type: bool = True
static from_dict(parsed: dict[str, Any]) CategoricalHistogramConfig[source]

Create categorical histogram config from configuratin dict.

label_rotation: int = 0
natural_order: bool = False
plot_function: str | None = None
to_dict() dict[str, Any][source]

Transform categorical histogram config to dict.

value_order: list[str | int] | None = None
y_log_scale: bool = False
exception dae.genomic_resources.histogram.HistogramError[source]

Bases: BaseException

Class used for histogram specific errors.

Histograms should be nullified when a HistogramError occurs.

class dae.genomic_resources.histogram.HistogramStatisticMixin[source]

Bases: object

Mixin for creating statistics classes with histograms.

static get_histogram_file(score_id: str) str[source]
static get_histogram_image_file(score_id: str) str[source]
class dae.genomic_resources.histogram.NullHistogram(config: NullHistogramConfig | None)[source]

Bases: Statistic

Class for annulled histograms.

add_value(value: Any, count: int = 1) None[source]

Add a value to the statistic.

static deserialize(content: str) NullHistogram[source]

Create a statistic from serialized data.

static from_dict(data: dict[str, Any]) NullHistogram[source]

Build a null histogram from a dict.

merge(other: Any) None[source]

Merge the values from another statistic in place.

plot(_outfile: IO, _score_id: str) None[source]
serialize() str[source]

Return a serialized version of this statistic.

to_dict() dict[str, Any][source]
type = 'null_histogram'
values_domain() str[source]
class dae.genomic_resources.histogram.NullHistogramConfig(reason: str)[source]

Bases: object

Configuration class for null histograms.

static default_config() NullHistogramConfig[source]
static from_dict(parsed: dict[str, Any]) NullHistogramConfig[source]

Create Null histogram from configuration dict.

reason: str
to_dict() dict[str, Any][source]
class dae.genomic_resources.histogram.NumberHistogram(config: NumberHistogramConfig, bins: ndarray | None = None, bars: ndarray | None = None)[source]

Bases: Statistic

Class to represent a histogram.

add_value(value: float | None, count: int = 1) None[source]

Add value to the histogram.

choose_bin_lin(value: float) int[source]

Compute bin index for a passed value for linear x-scale.

choose_bin_log(value: float) int[source]

Compute bin index for a passed value for log x-scale.

static deserialize(content: str) NumberHistogram[source]

Create a statistic from serialized data.

static from_dict(data: dict[str, Any]) NumberHistogram[source]

Build a number histogram from a dict.

merge(other: Statistic) None[source]

Merge two histograms.

plot(outfile: IO, score_id: str, y_axis_label: str | None = None, small_values_description: str | None = None, large_values_description: str | None = None) None[source]

Plot histogram and save it into outfile.

serialize() str[source]

Return a serialized version of this statistic.

to_dict() dict[str, Any][source]
type = 'number_histogram'
values_domain() str[source]
view_max() float[source]
view_min() float[source]
class dae.genomic_resources.histogram.NumberHistogramConfig(view_range: tuple[float | None, float | None], number_of_bins: int = 100, x_log_scale: bool = False, y_log_scale: bool = False, x_min_log: float | None = None, plot_function: str | None = None)[source]

Bases: object

Configuration class for number histograms.

static default_config(min_max: MinMaxValue | None) NumberHistogramConfig[source]

Build a number histogram config from a parsed yaml file.

static from_dict(parsed: dict[str, Any]) NumberHistogramConfig[source]

Build a number histogram config from a parsed yaml file.

has_view_range() bool[source]
number_of_bins: int = 100
plot_function: str | None = None
to_dict() dict[str, Any][source]

Transform number histogram config to dict.

view_range: tuple[float | None, float | None]
x_log_scale: bool = False
x_min_log: float | None = None
y_log_scale: bool = False
dae.genomic_resources.histogram.build_default_histogram_conf(value_type: str, **kwargs: Any) NumberHistogramConfig | CategoricalHistogramConfig | NullHistogramConfig[source]

Build default histogram config for given value type.

dae.genomic_resources.histogram.build_empty_histogram(config: NullHistogramConfig | CategoricalHistogramConfig | NumberHistogramConfig) NumberHistogram | CategoricalHistogram | NullHistogram[source]

Create an empty histogram from a deserialize histogram dictionary.

dae.genomic_resources.histogram.build_histogram_config(config: dict[str, Any] | None) NullHistogramConfig | CategoricalHistogramConfig | NumberHistogramConfig | None[source]

Create histogram config form configuration dict.

dae.genomic_resources.histogram.load_histogram(resource: GenomicResource, filename: str) NullHistogram | CategoricalHistogram | NumberHistogram[source]

Load and return a histogram in a resource.

On an error or missing histogram, an appropriate NullHistogram is returned.

dae.genomic_resources.histogram.plot_histogram(res: GenomicResource, image_filename: str, hist: NullHistogram | CategoricalHistogram | NumberHistogram, score_id: str, small_values_desc: str | None = None, large_values_desc: str | None = None) None[source]

Plot histogram and save it into the resource.

dae.genomic_resources.histogram.save_histogram(resource: GenomicResource, filename: str, histogram: NullHistogram | CategoricalHistogram | NumberHistogram) None[source]

Save histogram into a resource.

dae.genomic_resources.liftover_chain module

Provides LiftOver chain resource.

class dae.genomic_resources.liftover_chain.LiftoverChain(resource: GenomicResource)[source]

Bases: ResourceConfigValidationMixin

Defines Lift Over chain wrapper around pyliftover objects.

close() None[source]
convert_coordinate(chrom: str, pos: int) tuple[str, int, str, int] | None[source]

Lift over a genomic coordinate.

property files: set[str]
static get_schema() dict[str, Any][source]

Return schema to be used for config validation.

is_open() bool[source]
static map_chromosome(chrom: str, mapping: dict[str, str] | None) str[source]

Map a chromosome (contig) name according to configuration.

open() LiftoverChain[source]
dae.genomic_resources.liftover_chain.build_liftover_chain_from_resource(resource: GenomicResource) LiftoverChain[source]

Load a Lift Over chain from GRR resource.

dae.genomic_resources.liftover_chain.build_liftover_chain_from_resource_id(resource_id: str, grr: GenomicResourceRepo | None = None) LiftoverChain[source]

dae.genomic_resources.reference_genome module

class dae.genomic_resources.reference_genome.ReferenceGenome(resource: GenomicResource)[source]

Bases: ResourceConfigValidationMixin

Provides an interface for quering a reference genome.

property chrom_prefix: str

Return a prefix of all chromosomes of the reference genome.

property chromosomes: list[str]

Return a list of all chromosomes of the reference genome.

close() None[source]

Close reference genome sequence file-like objects.

fetch(chrom: str, start: int, stop: int | None, buffer_size: int = 512) Generator[str, None, None][source]

Yield the nucleotides in a specific region.

While line feed calculation can be inaccurate because not every fetch will start at the start of a line, line feeds add extra characters to read and the output is limited by the amount of nucleotides expected to be read.

get_all_chrom_lengths() dict[str, int][source]

Return list of all chromosomes lengths.

get_chrom_length(chrom: str) int[source]

Return the length of a specified chromosome.

static get_schema() dict[str, Any][source]

Return schema to be used for config validation.

get_sequence(chrom: str, start: int, stop: int) str[source]

Return sequence of nucleotides from specified chromosome region.

is_open() bool[source]
is_pseudoautosomal(chrom: str, pos: int) bool[source]

Return true if specified position is pseudoautosomal.

open() ReferenceGenome[source]

Open reference genome resources.

property resource_id: str
split_into_regions(region_size: int, chromosome: str | None = None) Generator[Region, None, None][source]

Split the reference genome into regions and yield them.

Can specify a specific chromosome to limit the regions to be in that chromosome only.

dae.genomic_resources.reference_genome.build_reference_genome_from_file(filename: str) ReferenceGenome[source]

Open a reference genome from a file.

dae.genomic_resources.reference_genome.build_reference_genome_from_resource(resource: GenomicResource) ReferenceGenome[source]

Open a reference genome from resource.

dae.genomic_resources.reference_genome.build_reference_genome_from_resource_id(resource_id: str, grr: GenomicResourceRepo | None = None) ReferenceGenome[source]

dae.genomic_resources.repository module

Provides basic classes for genomic resources and repositories.

+———————+ +—————–+

+—–| GenomicResourceRepo |--------------------| GenomicResource | | +———————+ +—————–+ | ^ ^ | | | | | | | +—————————–+ +—————————-+ | | | GenomicResourceProtocolRepo | —-| ReadOnlyRepositoryProtocol | | | +—————————–+ +—————————-+ | | ^ | | | | +————————–+ +—————————–+ +—-| GenomicResourceGroupRepo | | ReadWriteRepositoryProtocol |

+————————–+ +—————————–+

class dae.genomic_resources.repository.GenomicResource(resource_id: str, version: tuple[int, ...], protocol: ReadOnlyRepositoryProtocol | ReadWriteRepositoryProtocol, config: dict[str, Any] | None = None, manifest: Manifest | None = None)[source]

Bases: object

Base class for genomic resources.

file_exists(filename: str) bool[source]

Check if filename exists in this resource.

get_config() dict[str, Any][source]

Return the resouce configuration.

get_description() str[source]

Return resource description.

get_file_content(filename: str, *, uncompress: bool = True, mode: str = 't') Any[source]

Return the content of file in a resource.

get_file_url(filename: str) str[source]
get_full_id() str[source]

Return genomic resource ID with version.

get_genomic_resource_id_version() str[source]

Return a string combinint resource ID and version.

Returns a string of the form aa/bb/cc[3.2] for a genomic resource with id aa/bb/cc and version 3.2. If the version is 0 the string will be aa/bb/cc.

get_id() str[source]

Return genomic resource ID.

get_labels() dict[str, Any][source]

Return resource labels.

get_manifest() Manifest[source]

Load resource manifest if it exists. Otherwise builds it.

get_repo_url() str[source]

Return repository’s URL.

get_summary() str | None[source]

Return resource summary.

get_type() str[source]

Return resource type as defined in ‘genomic_resource.yaml’.

get_url() str[source]
get_version_str() str[source]

Return version string of the form ‘3.1’.

invalidate() None[source]

Clean up cached attributes like manifest, etc.

open_bigwig_file(filename: str) Any[source]

Open a bigwig file and return it.

open_raw_file(filename: str, mode: str = 'rt', **kwargs: str | bool | None) IO[source]

Open a file in the resource and returns a File-like object.

open_tabix_file(filename: str, index_filename: str | None = None) TabixFile[source]

Open a tabix file and returns a pysam.TabixFile.

open_vcf_file(filename: str, index_filename: str | None = None) VariantFile[source]

Open a vcf file and returns a pysam.VariantFile.

class dae.genomic_resources.repository.GenomicResourceProtocolRepo(proto: ReadOnlyRepositoryProtocol | ReadWriteRepositoryProtocol)[source]

Bases: GenomicResourceRepo

Base class for real genomic resources repositories.

close() None[source]
find_resource(resource_id: str, version_constraint: str | None = None, repository_id: str | None = None) GenomicResource | None[source]

Return one resource with id qual to resource_id.

If resource is not found, None is returned.

get_all_resources() Generator[GenomicResource, None, None][source]

Return a generator over all resource in the repository.

get_resource(resource_id: str, version_constraint: str | None = None, repository_id: str | None = None) GenomicResource[source]

Return one resource with id qual to resource_id.

If resource is not found, exception is raised.

invalidate() None[source]

Invalidate internal state of the repository.

class dae.genomic_resources.repository.GenomicResourceRepo(repo_id: str)[source]

Bases: ABC

Base class for genomic resources repositories.

close() None[source]
property definition: dict[str, Any] | None
abstract find_resource(resource_id: str, version_constraint: str | None = None, repository_id: str | None = None) GenomicResource | None[source]

Return one resource with id qual to resource_id.

If resource is not found, None is returned.

abstract get_all_resources() Generator[GenomicResource, None, None][source]

Return a generator over all resource in the repository.

abstract get_resource(resource_id: str, version_constraint: str | None = None, repository_id: str | None = None) GenomicResource[source]

Return one resource with id qual to resource_id.

If resource is not found, exception is raised.

abstract invalidate() None[source]

Invalidate internal state of the repository.

property repo_id: str
class dae.genomic_resources.repository.Manifest[source]

Bases: object

Provides genomic resource manifest object.

add(entry: ManifestEntry) None[source]

Add manifest enry to the manifest.

static from_file_content(file_content: str) Manifest[source]

Produce a manifest from manifest file content.

static from_manifest_entries(manifest_entries: list[dict[str, Any]]) Manifest[source]

Produce a manifest from parsed manifest file content.

get_files() list[tuple[str, int]][source]
names() set[str][source]

Return set of all file names from the manifest.

to_manifest_entries() list[dict[str, Any]][source]

Transform manifest to list of dictionaries.

Helpfull when storing the manifest.

update(entries: dict[str, ManifestEntry]) None[source]
class dae.genomic_resources.repository.ManifestEntry(name: str, size: int, md5: str | None)[source]

Bases: object

Provides an entry into manifest object.

md5: str | None
name: str
size: int
class dae.genomic_resources.repository.ManifestUpdate(manifest: Manifest, entries_to_delete: set[str], entries_to_update: set[str])[source]

Bases: object

Provides a manifest update object.

entries_to_delete: set[str]
entries_to_update: set[str]
manifest: Manifest
class dae.genomic_resources.repository.Mode(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)[source]

Bases: Enum

Protocol mode.

READONLY = 1
READWRITE = 2
class dae.genomic_resources.repository.ReadOnlyRepositoryProtocol(proto_id: str, url: str)[source]

Bases: ABC

Defines read only genomic resources repository protocol.

CHUNK_SIZE = 32768
build_genomic_resource(resource_id: str, version: tuple[int, ...], config: dict | None = None, manifest: Manifest | None = None) GenomicResource[source]

Build a genomic resource based on this protocol.

compute_md5_sum(resource: GenomicResource, filename: str) str[source]

Compute a md5 hash for a file in the resource.

abstract file_exists(resource: GenomicResource, filename: str) bool[source]

Check if given file exist in give resource.

find_resource(resource_id: str, version_constraint: str | None = None) GenomicResource | None[source]

Return requested resource or None if not found.

abstract get_all_resources() Generator[GenomicResource, None, None][source]

Return generator for all resources in the repository.

get_file_content(resource: GenomicResource, filename: str, *, uncompress: bool = True, mode: str = 't') Any[source]

Return content of a file in given resource.

get_id() str[source]

Return the repository ID.

get_manifest(resource: GenomicResource) Manifest[source]

Load and returns a resource manifest.

get_resource(resource_id: str, version_constraint: str | None = None) GenomicResource[source]

Return requested resource or raises exception if not found.

In case resource is not found a FileNotFoundError exception is raised.

get_resource_file_url(resource: GenomicResource, filename: str) str[source]

Return url of a file in the resource.

get_resource_url(resource: GenomicResource) str[source]

Return url of the specified resources.

abstract get_url() str[source]

Return the repository URL.

abstract invalidate() None[source]

Invalidate internal cache of repository protocol.

abstract load_manifest(resource: GenomicResource) Manifest[source]

Load resource manifest.

load_yaml(resource: GenomicResource, filename: str) Any[source]

Return parsed YAML file.

mode() Mode[source]

Return repository protocol mode - READONLY or READWRITE.

abstract open_bigwig_file(resource: GenomicResource, filename: str) Any[source]

Open a bigwig file in a resource and return it.

Not all repositories support this method. Repositories that do no support this method raise and exception.

abstract open_raw_file(resource: GenomicResource, filename: str, mode: str = 'rt', **kwargs: str | bool | None) IO[source]

Open file in a resource and returns a file-like object.

abstract open_tabix_file(resource: GenomicResource, filename: str, index_filename: str | None = None) TabixFile[source]

Open a tabix file in a resource and return a pysam tabix file.

Not all repositories support this method. Repositories that do no support this method raise and exception.

abstract open_vcf_file(resource: GenomicResource, filename: str, index_filename: str | None = None) VariantFile[source]

Open a vcf file in a resource and return a pysam VariantFile.

Not all repositories support this method. Repositories that do no support this method raise and exception.

class dae.genomic_resources.repository.ReadWriteRepositoryProtocol(proto_id: str, url: str)[source]

Bases: ReadOnlyRepositoryProtocol

Defines read write genomic resources repository protocol.

abstract build_content_file() list[dict[str, Any]][source]

Build the content of the repository (i.e ‘.CONTENTS.json’ file).

build_manifest(resource: GenomicResource, prebuild_entries: dict[str, ManifestEntry] | None = None) Manifest[source]

Build full manifest for the resource.

build_resource_file_state(resource: GenomicResource, filename: str, **kwargs: str | float | int | None) ResourceFileState[source]

Build resource file state.

check_update_manifest(resource: GenomicResource, prebuild_entries: dict[str, ManifestEntry] | None = None) ManifestUpdate[source]

Check if the resource manifest needs update.

abstract collect_all_resources() Generator[GenomicResource, None, None][source]

Return generator for all resources managed by this protocol.

abstract collect_resource_entries(resource: GenomicResource) Manifest[source]

Scan the resource and returns manifest with all files.

copy_resource(remote_resource: GenomicResource) GenomicResource[source]

Copy a remote resource into repository.

abstract copy_resource_file(remote_resource: GenomicResource, dest_resource: GenomicResource, filename: str) ResourceFileState | None[source]

Copy a remote resource file into local repository.

abstract delete_resource_file(resource: GenomicResource, filename: str) None[source]

Delete a resource file and it’s internal state.

get_manifest(resource: GenomicResource) Manifest[source]

Load or build a resource manifest.

get_or_create_resource(resource_id: str, version: tuple[int, ...]) GenomicResource[source]

Return a resource with specified ID and version.

If the resource is not found create an empty resource.

abstract get_resource_file_size(resource: GenomicResource, filename: str) int[source]

Return the size of a resource file.

abstract get_resource_file_timestamp(resource: GenomicResource, filename: str) float[source]

Return the timestamp (ISO formatted) of a resource file.

abstract load_resource_file_state(resource: GenomicResource, filename: str) ResourceFileState | None[source]

Load resource file state from internal GRR state.

If the specified resource file has no internal state returns None.

mode() Mode[source]

Return repository protocol mode - READONLY or READWRITE.

save_index(resource: GenomicResource, contents: str) None[source]

Save an index HTML file into the genomic resource’s directory.

save_manifest(resource: GenomicResource, manifest: Manifest) None[source]

Save manifest into genomic resource’s directory.

abstract save_resource_file_state(resource: GenomicResource, state: ResourceFileState) None[source]

Save resource file state into internal GRR state.

update_manifest(resource: GenomicResource, prebuild_entries: dict[str, ManifestEntry] | None = None) Manifest[source]

Update or create full manifest for the resource.

update_resource(remote_resource: GenomicResource, files_to_copy: set[str] | None = None) GenomicResource[source]

Copy a remote resource into repository.

Allows copying of a subset of files from the resource via files_to_copy. If files_to_copy is None, copies all files.

abstract update_resource_file(remote_resource: GenomicResource, dest_resource: GenomicResource, filename: str) ResourceFileState | None[source]

Update a resource file into repository if needed.

class dae.genomic_resources.repository.ResourceFileState(filename: str, size: int, timestamp: float, md5: str)[source]

Bases: object

Defines resource file state saved into internal GRR state.

filename: str
md5: str
size: int
timestamp: float
dae.genomic_resources.repository.is_gr_id_token(token: str) bool[source]

Check if token can be used as a genomic resource ID.

Genomic Resource Id Token is a string with one or more letters, numbers, ‘.’, ‘_’, or ‘-’. The function checks if the parameter token is a Genomic REsource Id Token.

dae.genomic_resources.repository.is_version_constraint_satisfied(version_constraint: str | None, version: tuple[int, ...]) bool[source]

Check if a version matches a version constraint.

dae.genomic_resources.repository.parse_gr_id_version_token(token: str) tuple[str, tuple[int, ...]][source]

Parse genomic resource ID with version.

Genomic Resource Id Version Token is a Genomic Resource Id Token with an optional version appened. If present, the version suffix has the form “(3.3.2)”. The default version is (0). Returns None if s in not a Genomic Resource Id Version. Otherwise returns token,version tupple

dae.genomic_resources.repository.parse_resource_id_version(resource_path: str) tuple[str, tuple[int, ...]][source]

Parse genomic resource id and version path into Id, Version tuple.

An optional version (0,) appened if needed. If present, the version suffix has the form “(3.3.2)”. The default version is (0,). Returns tuple (None, None) if the path does not match the resource_id/version requirements. Otherwise returns tuple (resource_id, version).

dae.genomic_resources.repository.version_string_to_suffix(version: str) str[source]

Transform version string into resource ID version suffix.

dae.genomic_resources.repository.version_tuple_to_string(version: tuple[int, ...]) str[source]
dae.genomic_resources.repository.version_tuple_to_suffix(version: tuple[int, ...]) str[source]

Transform version tuple into resource ID version suffix.

dae.genomic_resources.repository_factory module

Provides a factory for building genomic resources repostiories.

dae.genomic_resources.repository_factory.build_genomic_resource_group_repository(repo_id: str, children: list[GenomicResourceRepo]) GenomicResourceRepo[source]
dae.genomic_resources.repository_factory.build_genomic_resource_repository(definition: dict | None = None, file_name: str | None = None) GenomicResourceRepo[source]

Build a GRR using a definition dict or yaml file.

dae.genomic_resources.repository_factory.build_resource_implementation(res: GenomicResource) GenomicResourceImplementation[source]

Build a resource implementation from a resource.

dae.genomic_resources.repository_factory.get_default_grr_definition() dict[str, Any][source]

Return default genomic resources repository definition.

dae.genomic_resources.repository_factory.get_default_grr_definition_path() str | None[source]

Return a path to default genomic resources repository definition.

dae.genomic_resources.repository_factory.load_definition_file(filename: str) Any[source]

Load GRR definition from a YAML file.

dae.genomic_resources.resource_implementation module

class dae.genomic_resources.resource_implementation.GenomicResourceImplementation(genomic_resource: GenomicResource)[source]

Bases: ABC

Base class used by resource implementations.

Resources are just a folder on a repository. Resource implementations are classes that know how to use the contents of the resource.

abstract add_statistics_build_tasks(task_graph: TaskGraph, **kwargs: Any) list[Task][source]

Add tasks for calculating resource statistics to a task graph.

abstract calc_info_hash() bytes[source]

Compute and return the info hash.

abstract calc_statistics_hash() bytes[source]

Compute the statistics hash.

This hash is used to decide whether the resource statistics should be recomputed.

property files: set[str]

Return a list of resource files the implementation utilises.

get_config() dict[source]
abstract get_info(**kwargs: Any) str[source]

Construct the contents of the implementation’s HTML info page.

get_statistics() ResourceStatistics | None[source]

Try and load resource statistics.

abstract get_statistics_info(**kwargs: Any) str[source]

Construct the contents of the implementation’s HTML statistics info page.

reload_statistics() ResourceStatistics | None[source]
property resource_id: str
class dae.genomic_resources.resource_implementation.InfoImplementationMixin[source]

Bases: object

Mixin that provides generic template info page generation interface.

class FileEntry(name: str, size: str, md5: str | None)[source]

Bases: object

Provides an entry into manifest object.

md5: str | None
name: str
size: str
get_info() str[source]

Construct the contents of the implementation’s HTML info page.

get_statistics_info() str[source]

Construct the contents of the implementation’s HTML info page.

get_statistics_template_data() dict[source]

Return a data dictionary to be used by the statistics template.

Will transform the description in the meta section using markdown.

get_template() Template[source]
get_template_data() dict[source]

Return a data dictionary to be used by the template.

Will transform the description in the meta section using markdown.

resource: GenomicResource
class dae.genomic_resources.resource_implementation.ResourceConfigValidationMixin[source]

Bases: object

Mixin that provides validation of resource configuration.

abstract static get_schema() dict[source]

Return schema to be used for config validation.

classmethod validate_and_normalize_schema(config: dict, resource: GenomicResource) dict[source]

Validate the resource schema and return the normalized version.

class dae.genomic_resources.resource_implementation.ResourceStatistics(resource_id: str)[source]

Bases: object

Base class for statistics.

Subclasses should be created using mixins defined for each statistic type that the resource contains.

static get_statistics_folder() str[source]
dae.genomic_resources.resource_implementation.get_base_resource_schema() dict[str, Any][source]

dae.genomic_resources.testing module

Provides tools usefult for testing.

dae.genomic_resources.testing.build_filesystem_test_protocol(root_path: Path, *, repair: bool = True) FsspecReadWriteProtocol[source]

Build and return an filesystem fsspec protocol for testing.

The root_path is expected to point to a directory structure with all the resources.

dae.genomic_resources.testing.build_filesystem_test_repository(root_path: Path) GenomicResourceProtocolRepo[source]

Build and return an filesystem fsspec repository for testing.

The root_path is expected to point to a directory structure with all the resources.

dae.genomic_resources.testing.build_filesystem_test_resource(root_path: Path) GenomicResource[source]
dae.genomic_resources.testing.build_http_test_protocol(root_path: Path, *, repair: bool = True) Generator[FsspecReadOnlyProtocol, None, None][source]

Populate Apache2 directory and construct HTTP genomic resource protocol.

The Apache2 is used to serve the GRR. This root_path directory should be a valid filesystem genomic resource repository.

dae.genomic_resources.testing.build_inmemory_test_protocol(content: dict[str, Any]) FsspecReadWriteProtocol[source]

Build and return an embedded fsspec protocol for testing.

dae.genomic_resources.testing.build_inmemory_test_repository(content: dict[str, Any]) GenomicResourceProtocolRepo[source]

Create an embedded GRR repository using passed content.

dae.genomic_resources.testing.build_inmemory_test_resource(content: dict[str, Any]) GenomicResource[source]

Create a test resource based on content passed.

The passed content should appropriate for a single resource. Example content: {

“genomic_resource.yaml”: textwrap.dedent(‘’’

type: position_score table:

filename: data.txt

scores:
  • id: aaaa

    type: float desc: “” name: sc

‘’’), “data.txt”: convert_to_tab_separated(‘’’

#chrom start end sc 1 10 12 1.1 2 13 14 1.2

‘’’)

}

dae.genomic_resources.testing.build_s3_test_bucket(s3filesystem: S3FileSystem | None = None) str[source]

Create an s3 test buckent.

dae.genomic_resources.testing.build_s3_test_filesystem(endpoint_url: str | None = None) S3FileSystem[source]

Create an S3 fsspec filesystem connected to the S3 server.

dae.genomic_resources.testing.build_s3_test_protocol(root_path: Path) Generator[FsspecReadWriteProtocol, None, None][source]

Construct fsspec genomic resource protocol.

The S3 bucket is populated with resource from filesystem GRR pointed by the root_path.

dae.genomic_resources.testing.convert_to_tab_separated(content: str) str[source]

Convert a string into tab separated file content.

Useful for testing purposes. If you need to have a space in the file content use ‘||’.

dae.genomic_resources.testing.copy_proto_genomic_resources(dest_proto: FsspecReadWriteProtocol, src_proto: FsspecReadOnlyProtocol) None[source]
dae.genomic_resources.testing.proto_builder(scheme: str, content: dict) Generator[FsspecReadOnlyProtocol | FsspecReadWriteProtocol, None, None][source]

Build a test genomic resource protocol with specified content.

dae.genomic_resources.testing.resource_builder(scheme: str, content: dict) Generator[GenomicResource, None, None][source]
dae.genomic_resources.testing.s3_test_protocol() FsspecReadWriteProtocol[source]

Build an S3 fsspec testing protocol on top of existing S3 server.

dae.genomic_resources.testing.s3_test_server_endpoint() str[source]
dae.genomic_resources.testing.setup_bigwig(out_path: Path, content: str, chrom_lens: dict[str, int]) Path[source]

Setup a bigwig format variants file using bedGraph-style content.

Example: chr1 0 100 0.0 chr1 100 120 1.0 chr1 125 126 200.0

dae.genomic_resources.testing.setup_dae_transmitted(root_path: Path, summary_content: str, toomany_content: str) tuple[Path, Path][source]

Set up a DAE transmitted variants file using passed content.

dae.genomic_resources.testing.setup_denovo(denovo_path: Path, content: str) Path[source]
dae.genomic_resources.testing.setup_directories(root_dir: Path, content: str | dict[str, Any]) None[source]

Set up directory and subdirectory structures using the content.

dae.genomic_resources.testing.setup_empty_gene_models(out_path: Path) GeneModels[source]

Set up empty gene models.

dae.genomic_resources.testing.setup_gene_models(out_path: Path, content: str, fileformat: str | None = None, config: str | None = None) GeneModels[source]

Set up gene models in refflat format using the passed content.

dae.genomic_resources.testing.setup_genome(out_path: Path, content: str) ReferenceGenome[source]

Set up reference genome using the content.

dae.genomic_resources.testing.setup_gzip(gzip_path: Path, gzip_content: str) Path[source]

Set up a gzipped TSV file.

dae.genomic_resources.testing.setup_pedigree(ped_path: Path, content: str) Path[source]
dae.genomic_resources.testing.setup_tabix(tabix_path: Path, tabix_content: str, **kwargs: bool | str | int) tuple[str, str][source]

Set up a tabix file.

dae.genomic_resources.testing.setup_vcf(out_path: Path, content: str, *, csi: bool = False) Path[source]

Set up a VCF file using the content.

dae.genomic_resources.variant_utils module

dae.genomic_resources.variant_utils.maximally_extend_variant(chrom: str, pos: int, ref: str, alts: list[str], genome: ReferenceGenome) tuple[str, int, str, list[str]][source]

Maximally extend a variant.

dae.genomic_resources.variant_utils.normalize_variant(chrom: str, pos: int, ref: str, alts: list[str], genome: ReferenceGenome) tuple[str, int, str, list[str]][source]

Normalize a variant.

Using algorithm defined in the https://genome.sph.umich.edu/wiki/Variant_Normalization

Module contents

dae.genomic_resources.get_resource_implementation_builder(resource_type: str) Callable[[GenomicResource], GenomicResourceImplementation] | None[source]

Return an implementation builder for a certain resource type.

If the builder is not registered, then it will search for an entry point in the found implementations list. If an entry point is found, it will be loaded and registered and returned.