dae.genomic_resources package
Subpackages
- dae.genomic_resources.gene_models package
- Subpackages
- dae.genomic_resources.gene_models.tests package
- Submodules
- dae.genomic_resources.gene_models.tests.test_gene_models module
- dae.genomic_resources.gene_models.tests.test_gene_models_gtf_serialization module
- dae.genomic_resources.gene_models.tests.test_gene_models_impl module
- dae.genomic_resources.gene_models.tests.test_gene_models_resource module
- Module contents
- dae.genomic_resources.gene_models.tests package
- Submodules
- dae.genomic_resources.gene_models.gene_models module
Exon
GeneModels
GeneModels.add_transcript_model()
GeneModels.gene_models_by_gene_name()
GeneModels.gene_models_by_location()
GeneModels.gene_names()
GeneModels.get_schema()
GeneModels.is_loaded()
GeneModels.load()
GeneModels.relabel_chromosomes()
GeneModels.reset()
GeneModels.resource_id
GeneModels.update_indexes()
TranscriptModel
TranscriptModel.all_regions()
TranscriptModel.calc_frames()
TranscriptModel.cds_len()
TranscriptModel.cds_regions()
TranscriptModel.get_exon_number_for()
TranscriptModel.is_coding()
TranscriptModel.test_frames()
TranscriptModel.total_len()
TranscriptModel.update_frames()
TranscriptModel.utr3_len()
TranscriptModel.utr3_regions()
TranscriptModel.utr5_len()
TranscriptModel.utr5_regions()
build_gene_models_from_file()
build_gene_models_from_resource()
create_regions_from_genes()
join_gene_models()
- dae.genomic_resources.gene_models.parsing module
get_parser()
infer_gene_model_parser()
load_gene_mapping()
load_gene_models()
parse_ccds_gene_models_format()
parse_default_gene_models_format()
parse_gtf_gene_models_format()
parse_known_gene_models_format()
parse_raw()
parse_ref_flat_gene_models_format()
parse_ref_seq_gene_models_format()
parse_ucscgenepred_models_format()
probe_columns()
probe_file_format()
probe_header()
- dae.genomic_resources.gene_models.serialization module
build_gtf_record()
calc_frame_for_gtf_cds_feature()
collect_cds_regions()
collect_gtf_cds_regions()
collect_gtf_start_codon_regions()
collect_gtf_stop_codon_regions()
find_exon_cds_region_for_gtf_cds_feature()
gene_models_to_gtf()
gtf_canonical_index()
save_as_default_gene_models()
transcript_to_gtf()
- Module contents
Exon
GeneModels
GeneModels.add_transcript_model()
GeneModels.gene_models_by_gene_name()
GeneModels.gene_models_by_location()
GeneModels.gene_names()
GeneModels.get_schema()
GeneModels.is_loaded()
GeneModels.load()
GeneModels.relabel_chromosomes()
GeneModels.reset()
GeneModels.resource_id
GeneModels.update_indexes()
TranscriptModel
TranscriptModel.all_regions()
TranscriptModel.calc_frames()
TranscriptModel.cds_len()
TranscriptModel.cds_regions()
TranscriptModel.get_exon_number_for()
TranscriptModel.is_coding()
TranscriptModel.test_frames()
TranscriptModel.total_len()
TranscriptModel.update_frames()
TranscriptModel.utr3_len()
TranscriptModel.utr3_regions()
TranscriptModel.utr5_len()
TranscriptModel.utr5_regions()
build_gene_models_from_file()
build_gene_models_from_resource()
create_regions_from_genes()
gene_models_to_gtf()
join_gene_models()
save_as_default_gene_models()
- Subpackages
- dae.genomic_resources.genomic_position_table package
- Subpackages
- dae.genomic_resources.genomic_position_table.tests package
- Submodules
- dae.genomic_resources.genomic_position_table.tests.test_bigwig module
- dae.genomic_resources.genomic_position_table.tests.test_genomic_position_table module
- dae.genomic_resources.genomic_position_table.tests.test_inmemory_genomic_position_table module
- dae.genomic_resources.genomic_position_table.tests.test_line_buffer module
- Module contents
- dae.genomic_resources.genomic_position_table.tests package
- Submodules
- dae.genomic_resources.genomic_position_table.line module
- dae.genomic_resources.genomic_position_table.table module
GenomicPositionTable
GenomicPositionTable.ALT
GenomicPositionTable.CHROM
GenomicPositionTable.POS_BEGIN
GenomicPositionTable.POS_END
GenomicPositionTable.REF
GenomicPositionTable.close()
GenomicPositionTable.get_all_records()
GenomicPositionTable.get_chromosome_length()
GenomicPositionTable.get_chromosomes()
GenomicPositionTable.get_column_key()
GenomicPositionTable.get_file_chromosomes()
GenomicPositionTable.get_records_in_region()
GenomicPositionTable.map_chromosome()
GenomicPositionTable.open()
GenomicPositionTable.unmap_chromosome()
adjust_zero_based_line()
get_idx()
zero_based_adjust()
- dae.genomic_resources.genomic_position_table.table_bigwig module
- dae.genomic_resources.genomic_position_table.table_inmemory module
InmemoryGenomicPositionTable
InmemoryGenomicPositionTable.FORMAT_DEF
InmemoryGenomicPositionTable.close()
InmemoryGenomicPositionTable.get_all_records()
InmemoryGenomicPositionTable.get_chromosome_length()
InmemoryGenomicPositionTable.get_file_chromosomes()
InmemoryGenomicPositionTable.get_records_in_region()
InmemoryGenomicPositionTable.open()
- dae.genomic_resources.genomic_position_table.table_tabix module
TabixGenomicPositionTable
TabixGenomicPositionTable.BUFFER_MAXSIZE
TabixGenomicPositionTable.close()
TabixGenomicPositionTable.get_all_records()
TabixGenomicPositionTable.get_chromosome_length()
TabixGenomicPositionTable.get_chromosomes()
TabixGenomicPositionTable.get_file_chromosomes()
TabixGenomicPositionTable.get_line_iterator()
TabixGenomicPositionTable.get_records_in_region()
TabixGenomicPositionTable.open()
- dae.genomic_resources.genomic_position_table.table_vcf module
- dae.genomic_resources.genomic_position_table.utils module
- Module contents
BigWigLine
BigWigTable
Line
LineBuffer
TabixGenomicPositionTable
TabixGenomicPositionTable.BUFFER_MAXSIZE
TabixGenomicPositionTable.alt_key
TabixGenomicPositionTable.chrom_key
TabixGenomicPositionTable.chrom_map
TabixGenomicPositionTable.chrom_order
TabixGenomicPositionTable.close()
TabixGenomicPositionTable.get_all_records()
TabixGenomicPositionTable.get_chromosome_length()
TabixGenomicPositionTable.get_chromosomes()
TabixGenomicPositionTable.get_file_chromosomes()
TabixGenomicPositionTable.get_line_iterator()
TabixGenomicPositionTable.get_records_in_region()
TabixGenomicPositionTable.header
TabixGenomicPositionTable.jump_threshold
TabixGenomicPositionTable.line_iterator
TabixGenomicPositionTable.open()
TabixGenomicPositionTable.pos_begin_key
TabixGenomicPositionTable.pos_end_key
TabixGenomicPositionTable.pysam_file
TabixGenomicPositionTable.ref_key
TabixGenomicPositionTable.rev_chrom_map
TabixGenomicPositionTable.stats
VCFGenomicPositionTable
VCFGenomicPositionTable.CHROM
VCFGenomicPositionTable.POS_BEGIN
VCFGenomicPositionTable.POS_END
VCFGenomicPositionTable.alt_key
VCFGenomicPositionTable.chrom_key
VCFGenomicPositionTable.chrom_map
VCFGenomicPositionTable.chrom_order
VCFGenomicPositionTable.get_file_chromosomes()
VCFGenomicPositionTable.get_line_iterator()
VCFGenomicPositionTable.header
VCFGenomicPositionTable.jump_threshold
VCFGenomicPositionTable.line_iterator
VCFGenomicPositionTable.open()
VCFGenomicPositionTable.pos_begin_key
VCFGenomicPositionTable.pos_end_key
VCFGenomicPositionTable.pysam_file
VCFGenomicPositionTable.ref_key
VCFGenomicPositionTable.rev_chrom_map
VCFGenomicPositionTable.stats
VCFLine
build_genomic_position_table()
- Subpackages
- dae.genomic_resources.implementations package
- Submodules
- dae.genomic_resources.implementations.annotation_pipeline_impl module
- dae.genomic_resources.implementations.gene_models_impl module
- dae.genomic_resources.implementations.genomic_scores_impl module
GenomicScoreImplementation
GenomicScoreImplementation.add_statistics_build_tasks()
GenomicScoreImplementation.calc_info_hash()
GenomicScoreImplementation.calc_statistics_hash()
GenomicScoreImplementation.files
GenomicScoreImplementation.get_config_histograms()
GenomicScoreImplementation.get_info()
GenomicScoreImplementation.get_template()
GenomicScoreImplementation.resource_id
build_score_implementation_from_resource()
- dae.genomic_resources.implementations.liftover_chain_impl module
- dae.genomic_resources.implementations.reference_genome_impl module
ChromosomeStatistic
GenomeStatistic
GenomeStatisticsMixin
ReferenceGenomeImplementation
ReferenceGenomeImplementation.add_statistics_build_tasks()
ReferenceGenomeImplementation.calc_info_hash()
ReferenceGenomeImplementation.calc_statistics_hash()
ReferenceGenomeImplementation.files
ReferenceGenomeImplementation.get_info()
ReferenceGenomeImplementation.get_statistics()
ReferenceGenomeImplementation.get_template()
ReferenceGenomeStatistics
- Module contents
- dae.genomic_resources.statistics package
- dae.genomic_resources.tests package
- Submodules
- dae.genomic_resources.tests.conftest module
- dae.genomic_resources.tests.test_aggregators module
- dae.genomic_resources.tests.test_allele_score module
- dae.genomic_resources.tests.test_annotation_pipeline_impl module
- dae.genomic_resources.tests.test_cached_repo module
- dae.genomic_resources.tests.test_caching_protocol module
- dae.genomic_resources.tests.test_cli module
- dae.genomic_resources.tests.test_cli_browse module
- dae.genomic_resources.tests.test_cli_info module
- dae.genomic_resources.tests.test_cli_manifest module
- dae.genomic_resources.tests.test_cli_repair module
- dae.genomic_resources.tests.test_cli_stats module
- dae.genomic_resources.tests.test_cnv_collection module
- dae.genomic_resources.tests.test_core_with_inmemory_repo module
- dae.genomic_resources.tests.test_draw_score_histograms module
- dae.genomic_resources.tests.test_fsspec_protocol module
- dae.genomic_resources.tests.test_fsspec_protocol_open_raw_file module
- dae.genomic_resources.tests.test_fsspec_protocol_reads module
- dae.genomic_resources.tests.test_fsspec_protocol_update_resource_file module
- dae.genomic_resources.tests.test_genomic_context module
- dae.genomic_resources.tests.test_genomic_scores module
- dae.genomic_resources.tests.test_group_repository module
- dae.genomic_resources.tests.test_histogram_categorical module
- dae.genomic_resources.tests.test_histogram_number module
- dae.genomic_resources.tests.test_implementation_plugins module
- dae.genomic_resources.tests.test_inmemory_protocol module
- dae.genomic_resources.tests.test_liftover module
- dae.genomic_resources.tests.test_liftover_chain_resource module
- dae.genomic_resources.tests.test_np_score module
- dae.genomic_resources.tests.test_position_score module
- dae.genomic_resources.tests.test_reference_genome_resource module
- dae.genomic_resources.tests.test_repository_factory module
- dae.genomic_resources.tests.test_repository_helpers module
- dae.genomic_resources.tests.test_resource_state_and_manifest module
- dae.genomic_resources.tests.test_score_statistics_state module
- dae.genomic_resources.tests.test_testing module
- dae.genomic_resources.tests.test_the_fixture_repo module
- dae.genomic_resources.tests.test_variant_utils module
- dae.genomic_resources.tests.test_vcf_info_score module
- Module contents
Submodules
dae.genomic_resources.aggregators module
- class dae.genomic_resources.aggregators.Aggregator[source]
Bases:
ABC
Base class for score aggregators.
- class dae.genomic_resources.aggregators.ConcatAggregator[source]
Bases:
Aggregator
Aggregator that concatenates all passed values.
- class dae.genomic_resources.aggregators.DictAggregator[source]
Bases:
Aggregator
Aggregator that builds a dictionary of all passed values.
- class dae.genomic_resources.aggregators.JoinAggregator(separator: str)[source]
Bases:
Aggregator
Aggregator that joins all passed values using a separator.
- class dae.genomic_resources.aggregators.ListAggregator[source]
Bases:
Aggregator
Aggregator that builds a list of all passed values.
- class dae.genomic_resources.aggregators.MaxAggregator[source]
Bases:
Aggregator
Maximum value aggregator for genomic scores.
- class dae.genomic_resources.aggregators.MeanAggregator[source]
Bases:
Aggregator
Aggregator for genomic scores that calculates mean value.
- class dae.genomic_resources.aggregators.MedianAggregator[source]
Bases:
Aggregator
Aggregator for genomic scores that calculates median value.
- class dae.genomic_resources.aggregators.MinAggregator[source]
Bases:
Aggregator
Minimum value aggregator for genomic scores.
- class dae.genomic_resources.aggregators.ModeAggregator[source]
Bases:
Aggregator
Aggregator for genomic scores that calculates mode value.
- dae.genomic_resources.aggregators.build_aggregator(aggregator_type: str) Aggregator [source]
- dae.genomic_resources.aggregators.create_aggregator(aggregator_def: dict[str, Any]) Aggregator [source]
Create an aggregator by aggregator definition.
- dae.genomic_resources.aggregators.create_aggregator_definition(aggregator_type: str) dict[str, Any] [source]
Parse an aggregator definition string.
- dae.genomic_resources.aggregators.get_aggregator_class(aggregator: str) Callable[[], Aggregator] [source]
dae.genomic_resources.cached_repository module
Provides caching genomic resources.
- class dae.genomic_resources.cached_repository.CacheResource(resource: GenomicResource, protocol: CachingProtocol)[source]
Bases:
GenomicResource
Represents resources stored in cache.
- class dae.genomic_resources.cached_repository.CachingProtocol(remote_protocol: ReadOnlyRepositoryProtocol, local_protocol: FsspecReadWriteProtocol)[source]
Bases:
ReadOnlyRepositoryProtocol
Defines caching GRR repository protocol.
- file_exists(resource: GenomicResource, filename: str) bool [source]
Check if given file exist in give resource.
- get_all_resources() Generator[GenomicResource, None, None] [source]
Return generator for all resources in the repository.
- get_resource_file_url(resource: GenomicResource, filename: str) str [source]
Return url of a file in the resource.
- get_resource_url(resource: GenomicResource) str [source]
Return url of the specified resources.
- load_manifest(resource: GenomicResource) Manifest [source]
Load resource manifest.
- open_bigwig_file(resource: GenomicResource, filename: str) Any [source]
Open a bigwig file in a resource and return it.
Not all repositories support this method. Repositories that do no support this method raise and exception.
- open_raw_file(resource: GenomicResource, filename: str, mode: str = 'rt', **kwargs: str | bool | None) IO [source]
Open file in a resource and returns a file-like object.
- open_tabix_file(resource: GenomicResource, filename: str, index_filename: str | None = None) TabixFile [source]
Open a tabix file in a resource and return a pysam tabix file.
Not all repositories support this method. Repositories that do no support this method raise and exception.
- open_vcf_file(resource: GenomicResource, filename: str, index_filename: str | None = None) VariantFile [source]
Open a vcf file in a resource and return a pysam VariantFile.
Not all repositories support this method. Repositories that do no support this method raise and exception.
- refresh_cached_resource(resource: GenomicResource) None [source]
Refresh all resource files in cache if neccessary.
- refresh_cached_resource_file(resource: GenomicResource, filename: str) tuple[str, str] [source]
Refresh a resource file in cache if neccessary.
- class dae.genomic_resources.cached_repository.GenomicResourceCachedRepo(child: GenomicResourceRepo, cache_url: str, **kwargs: str | None)[source]
Bases:
GenomicResourceRepo
Defines caching genomic resources repository.
- find_resource(resource_id: str, version_constraint: str | None = None, repository_id: str | None = None) GenomicResource | None [source]
Return requested resource or None if not found.
- get_all_resources() Generator[GenomicResource, None, None] [source]
Return a generator over all resource in the repository.
- get_resource(resource_id: str, version_constraint: str | None = None, repository_id: str | None = None) GenomicResource [source]
Return one resource with id qual to resource_id.
If resource is not found, exception is raised.
- dae.genomic_resources.cached_repository.cache_resources(repository: GenomicResourceRepo, resource_ids: Iterable[str] | None, workers: int | None = None) None [source]
Cache resources from a list of remote resource IDs.
dae.genomic_resources.cli module
Provides CLI for management of genomic resources repositories.
- dae.genomic_resources.cli.cli_browse(cli_args: list[str] | None = None) None [source]
Provide CLI for repository browsing.
- dae.genomic_resources.cli.cli_manage(cli_args: list[str] | None = None) None [source]
Provide CLI for repository management.
- dae.genomic_resources.cli.collect_dvc_entries(proto: ReadWriteRepositoryProtocol, res: GenomicResource) dict[str, ManifestEntry] [source]
Collect manifest entries defined by .dvc files.
dae.genomic_resources.cnv_collection module
- class dae.genomic_resources.cnv_collection.CNV(chrom: str, pos_begin: int, pos_end: int, attributes: dict[str, Any])[source]
Bases:
object
Copy number object from a cnv_collection.
- attributes: dict[str, Any]
- chrom: str
- pos_begin: int
- pos_end: int
- property size: int
- class dae.genomic_resources.cnv_collection.CnvCollection(resource: GenomicResource)[source]
Bases:
object
A collection of CNVs.
- fetch_cnvs(chrom: str, start: int, stop: int) list[CNV] [source]
Return list of CNVs that overlap with the provided region.
- open() CnvCollection [source]
Open genomic score resource and returns it.
- class dae.genomic_resources.cnv_collection.CnvCollectionImplementation(genomic_resource: GenomicResource)[source]
Bases:
GenomicResourceImplementation
,InfoImplementationMixin
Assists in the management of resource of type cnv_collection.
- add_statistics_build_tasks(task_graph: TaskGraph, **kwargs: str) list[Task] [source]
Add tasks for calculating resource statistics to a task graph.
- calc_statistics_hash() bytes [source]
Compute the statistics hash.
This hash is used to decide whether the resource statistics should be recomputed.
- property files: set[str]
Return a list of resource files the implementation utilises.
dae.genomic_resources.draw_score_histograms module
dae.genomic_resources.fsspec_protocol module
Provides GRR protocols based on fsspec library.
- class dae.genomic_resources.fsspec_protocol.FsspecReadOnlyProtocol(proto_id: str, url: str, filesystem: AbstractFileSystem)[source]
Bases:
ReadOnlyRepositoryProtocol
Provides fsspec genomic resources repository protocol.
- file_exists(resource: GenomicResource, filename: str) bool [source]
Check if given file exist in give resource.
- get_all_resources() Generator[GenomicResource, None, None] [source]
Return generator over all resources in the repository.
- load_manifest(resource: GenomicResource) Manifest [source]
Load resource manifest.
- open_bigwig_file(resource: GenomicResource, filename: str) Any [source]
Open a bigwig file in a resource and return it.
Not all repositories support this method. Repositories that do no support this method raise and exception.
- open_raw_file(resource: GenomicResource, filename: str, mode: str = 'rt', **kwargs: str | bool | None) IO [source]
Open file in a resource and returns a file-like object.
- open_tabix_file(resource: GenomicResource, filename: str, index_filename: str | None = None) TabixFile [source]
Open a tabix file in a resource and return a pysam tabix file.
Not all repositories support this method. Repositories that do no support this method raise and exception.
- open_vcf_file(resource: GenomicResource, filename: str, index_filename: str | None = None) VariantFile [source]
Open a vcf file in a resource and return a pysam VariantFile.
Not all repositories support this method. Repositories that do no support this method raise and exception.
- class dae.genomic_resources.fsspec_protocol.FsspecReadWriteProtocol(proto_id: str, url: str, filesystem: AbstractFileSystem)[source]
Bases:
FsspecReadOnlyProtocol
,ReadWriteRepositoryProtocol
Provides fsspec genomic resources repository protocol.
- build_content_file() list[dict[str, Any]] [source]
Build the content of the repository (i.e ‘.CONTENTS’ file).
- collect_all_resources() Generator[GenomicResource, None, None] [source]
Return generator over all resources managed by this protocol.
- collect_resource_entries(resource: GenomicResource) Manifest [source]
Scan the resource and resturn a manifest.
- copy_resource_file(remote_resource: GenomicResource, dest_resource: GenomicResource, filename: str) ResourceFileState | None [source]
Copy a resource file into repository.
- delete_resource_file(resource: GenomicResource, filename: str) None [source]
Delete a resource file and it’s internal state.
- get_all_resources() Generator[GenomicResource, None, None] [source]
Return generator over all resources in the repository.
- get_resource_file_size(resource: GenomicResource, filename: str) int [source]
Return the size of a resource file.
- get_resource_file_timestamp(resource: GenomicResource, filename: str) float [source]
Return the timestamp (ISO formatted) of a resource file.
- load_resource_file_state(resource: GenomicResource, filename: str) ResourceFileState | None [source]
Load resource file state from internal GRR state.
If the specified resource file has no internal state returns None.
- obtain_resource_file_lock(resource: GenomicResource, filename: str) AbstractContextManager [source]
Lock a resource’s file.
- save_resource_file_state(resource: GenomicResource, state: ResourceFileState) None [source]
Save resource file state into internal GRR state.
- update_resource_file(remote_resource: GenomicResource, dest_resource: GenomicResource, filename: str) ResourceFileState | None [source]
Update a resource file into repository if needed.
- dae.genomic_resources.fsspec_protocol.build_fsspec_protocol(proto_id: str, root_url: str, **kwargs: str | None) FsspecReadOnlyProtocol | FsspecReadWriteProtocol [source]
Create fsspec GRR protocol based on the root url.
- dae.genomic_resources.fsspec_protocol.build_inmemory_protocol(proto_id: str, root_path: str, content: dict[str, Any]) FsspecReadWriteProtocol [source]
Build and return an embedded fsspec protocol for testing.
- dae.genomic_resources.fsspec_protocol.build_local_resource(dirname: str, config: dict[str, Any]) GenomicResource [source]
Build a resource from a local filesystem directory.
dae.genomic_resources.genomic_context module
- class dae.genomic_resources.genomic_context.CLIGenomicContext(context_objects: dict[str, Any], source: tuple[str, ...])[source]
Bases:
SimpleGenomicContext
Defines CLI genomics context.
- static add_context_arguments(parser: ArgumentParser) None [source]
Add command line arguments to the argument parser.
- static context_builder(args: Namespace) CLIGenomicContext [source]
Build a CLI genomic context.
- class dae.genomic_resources.genomic_context.DefaultRepositoryContextProvider[source]
Bases:
SimpleGenomicContextProvider
Genomic context provider for default GRR.
- static context_builder() GenomicContext [source]
- class dae.genomic_resources.genomic_context.GenomicContext[source]
Bases:
ABC
Abstract base class for genomic context.
- abstract get_context_keys() set[str] [source]
Return set of all keys that could be found in the context.
- abstract get_context_object(key: str) Any | None [source]
Return a genomic context object corresponding to the passed key.
If there is no such object returns None.
- get_gene_models() GeneModels | None [source]
Return gene models from context.
- get_genomic_resources_repository() GenomicResourceRepo | None [source]
Return genomic resources repository from context.
- get_reference_genome() ReferenceGenome | None [source]
Return reference genome from context.
- class dae.genomic_resources.genomic_context.GenomicContextProvider[source]
Bases:
ABC
Abstract base class for genomic contexts provider.
- abstract get_contexts() Iterable[GenomicContext] [source]
- class dae.genomic_resources.genomic_context.PriorityGenomicContext(contexts: Iterable[GenomicContext])[source]
Bases:
GenomicContext
Defines a priority genomic context.
- class dae.genomic_resources.genomic_context.SimpleGenomicContext(context_objects: dict[str, Any], source: tuple[str, ...])[source]
Bases:
GenomicContext
Simple implementation of genomic context.
- class dae.genomic_resources.genomic_context.SimpleGenomicContextProvider(context_builder: Callable[[], GenomicContext | None], provider_type: str, priority: int)[source]
Bases:
GenomicContextProvider
Simple implementation of genomic contexts provider.
- get_contexts() Iterable[GenomicContext] [source]
- dae.genomic_resources.genomic_context.get_genomic_context() GenomicContext [source]
- dae.genomic_resources.genomic_context.register_context(context: GenomicContext) None [source]
- dae.genomic_resources.genomic_context.register_context_provider(context_provider: GenomicContextProvider) None [source]
Register genomic context provider.
dae.genomic_resources.genomic_scores module
- class dae.genomic_resources.genomic_scores.AlleleScore(resource: GenomicResource)[source]
Bases:
GenomicScore
Defines allele genomic scores.
- fetch_scores(chrom: str, position: int, reference: str, alternative: str, scores: list[str] | None = None) list[Any] | None [source]
Fetch scores values for specific allele.
- fetch_scores_agg(chrom: str, pos_begin: int, pos_end: int, scores: list[AlleleScoreQuery] | None = None) list[Aggregator] [source]
Fetch score values in a region and aggregates them.
- open() AlleleScore [source]
Open genomic score resource and returns it.
- class dae.genomic_resources.genomic_scores.AlleleScoreAggr(score: 'str', position_aggregator: 'Aggregator', allele_aggregator: 'Aggregator')[source]
Bases:
object
- allele_aggregator: Aggregator
- position_aggregator: Aggregator
- score: str
- class dae.genomic_resources.genomic_scores.AlleleScoreQuery(score: 'str', position_aggregator: 'str | None' = None, allele_aggregator: 'str | None' = None)[source]
Bases:
object
- allele_aggregator: str | None = None
- position_aggregator: str | None = None
- score: str
- class dae.genomic_resources.genomic_scores.GenomicScore(resource: GenomicResource)[source]
Bases:
ResourceConfigValidationMixin
Genomic scores base class.
PositionScore, NPScore and AlleleScore inherit from this class. Statistics builder implementation uses only GenomicScore interface to build all defined statistics.
- fetch_region(chrom: str, pos_begin: int | None, pos_end: int | None, scores: Iterable[str]) Iterator[dict[str, str | int | float | bool | None]] [source]
Return score values in a region.
- get_default_annotation_attribute(score_id: str) str | None [source]
Return default annotation attribute for a score.
Returns None if the score is not included in the default annotation. Returns the name of the attribute if present or the score if not.
- get_histogram_filename(score_id: str) str [source]
Return the histogram filename for a genomic score.
- get_number_range(score_id: str) tuple[float, float] | None [source]
Return the value range for a number score.
- get_score_histogram(score_id: str) NullHistogram | CategoricalHistogram | NumberHistogram [source]
Return defined histogram for a score.
- open() GenomicScore [source]
Open genomic score resource and returns it.
- class dae.genomic_resources.genomic_scores.NPScore(resource: GenomicResource)[source]
Bases:
GenomicScore
Defines nucleotide-position genomic score.
- fetch_scores(chrom: str, position: int, reference: str, alternative: str, scores: list[str] | None = None) list[Any] | None [source]
Fetch score values at specified genomic position and nucleotide.
- fetch_scores_agg(chrom: str, pos_begin: int, pos_end: int, scores: list[NPScoreQuery] | None = None) list[Aggregator] [source]
Fetch score values in a region and aggregates them.
- class dae.genomic_resources.genomic_scores.NPScoreAggr(score: 'str', position_aggregator: 'Aggregator', nucleotide_aggregator: 'Aggregator')[source]
Bases:
object
- nucleotide_aggregator: Aggregator
- position_aggregator: Aggregator
- score: str
- class dae.genomic_resources.genomic_scores.NPScoreQuery(score: 'str', position_aggregator: 'str | None' = None, nucleotide_aggregator: 'str | None' = None)[source]
Bases:
object
- nucleotide_aggregator: str | None = None
- position_aggregator: str | None = None
- score: str
- class dae.genomic_resources.genomic_scores.PositionScore(resource: GenomicResource)[source]
Bases:
GenomicScore
Defines position genomic score.
- fetch_scores(chrom: str, position: int, scores: list[str] | None = None) list[Any] | None [source]
Fetch score values at specific genomic position.
- fetch_scores_agg(chrom: str, pos_begin: int, pos_end: int, scores: list[PositionScoreQuery] | None = None) list[Aggregator] [source]
Fetch score values in a region and aggregates them.
- Case 1:
- res.fetch_scores_agg(“1”, 10, 20) –>
all score with default aggregators
- Case 2:
- res.fetch_scores_agg(“1”, 10, 20,
non_default_aggregators={“bla”:”max”}) –>
all score with default aggregators but ‘bla’ should use ‘max’
- open() PositionScore [source]
Open genomic score resource and returns it.
- class dae.genomic_resources.genomic_scores.PositionScoreAggr(score: 'str', position_aggregator: 'Aggregator')[source]
Bases:
object
- position_aggregator: Aggregator
- score: str
- class dae.genomic_resources.genomic_scores.PositionScoreQuery(score: 'str', position_aggregator: 'str | None' = None)[source]
Bases:
object
- position_aggregator: str | None = None
- score: str
- class dae.genomic_resources.genomic_scores.ScoreDef(score_id: str, desc: str, value_type: str, pos_aggregator: str | None, nuc_aggregator: str | None, allele_aggregator: str | None, small_values_desc: str | None, large_values_desc: str | None, hist_conf: NullHistogramConfig | CategoricalHistogramConfig | NumberHistogramConfig | None)[source]
Bases:
object
Score configuration definition.
- allele_aggregator: str | None
- desc: str
- hist_conf: NullHistogramConfig | CategoricalHistogramConfig | NumberHistogramConfig | None
- large_values_desc: str | None
- nuc_aggregator: str | None
- pos_aggregator: str | None
- score_id: str
- small_values_desc: str | None
- value_type: str
- class dae.genomic_resources.genomic_scores.ScoreLine(line: LineBase, score_defs: dict[str, _ScoreDef])[source]
Bases:
object
Abstraction for a genomic score line. Wraps the line adapter.
- property alt: str | None
- property chrom: str
- property pos_begin: int
- property pos_end: int
- property ref: str | None
- dae.genomic_resources.genomic_scores.build_score_from_resource(resource: GenomicResource) GenomicScore [source]
Build a genomic score resource and return the coresponding score.
dae.genomic_resources.group_repository module
Provides group genomic resources repository.
- class dae.genomic_resources.group_repository.GenomicResourceGroupRepo(children: list[GenomicResourceRepo], repo_id: str | None = None)[source]
Bases:
GenomicResourceRepo
Defines group genomic resources repository.
- find_resource(resource_id: str, version_constraint: str | None = None, repository_id: str | None = None) GenomicResource | None [source]
Return one resource with id qual to resource_id.
If resource is not found, None is returned.
- get_all_resources() Generator[GenomicResource, None, None] [source]
Return a generator over all resource in the repository.
- get_resource(resource_id: str, version_constraint: str | None = None, repository_id: str | None = None) GenomicResource [source]
Return one resource with id qual to resource_id.
If resource is not found, exception is raised.
dae.genomic_resources.histogram module
Handling of genomic scores statistics.
Currently we support only genomic scores histograms.
- class dae.genomic_resources.histogram.CategoricalHistogram(config: CategoricalHistogramConfig, counter: dict[str | int, int] | None = None)[source]
Bases:
Statistic
Class for categorical data histograms.
- UNIQUE_VALUES_LIMIT = 100
- add_value(value: str | int | None) None [source]
Add a value to the categorical histogram.
Returns true if successfully added and false if failed. Will fail if too many values are accumulated.
- static deserialize(content: str) CategoricalHistogram [source]
Create a statistic from serialized data.
- property display_values: dict[str | int, int]
Return categorical histogram display values in order.
- static from_dict(data: dict[str, Any]) CategoricalHistogram [source]
- plot(outfile: IO, score_id: str, small_values_description: str | None = None, large_values_description: str | None = None) None [source]
Plot histogram and save it into outfile.
- property raw_values: dict[str | int, int]
- type = 'categorical_histogram'
- class dae.genomic_resources.histogram.CategoricalHistogramConfig(displayed_values_count: int | None = 20, displayed_values_percent: float | None = None, value_order: list[str | int] | None = None, y_log_scale: bool = False, plot_function: str | None = None, enforce_type: bool = True)[source]
Bases:
object
Configuration class for categorical histograms.
- static default_config() CategoricalHistogramConfig [source]
- displayed_values_count: int | None = 20
- displayed_values_percent: float | None = None
- enforce_type: bool = True
- static from_dict(parsed: dict[str, Any]) CategoricalHistogramConfig [source]
Create categorical histogram config from configuratin dict.
- plot_function: str | None = None
- value_order: list[str | int] | None = None
- y_log_scale: bool = False
- exception dae.genomic_resources.histogram.HistogramError[source]
Bases:
BaseException
Class used for histogram specific errors.
Histograms should be nullified when a HistogramError occurs.
- class dae.genomic_resources.histogram.HistogramStatisticMixin[source]
Bases:
object
Mixin for creating statistics classes with histograms.
- class dae.genomic_resources.histogram.NullHistogram(config: NullHistogramConfig | None)[source]
Bases:
Statistic
Class for annulled histograms.
- static deserialize(content: str) NullHistogram [source]
Create a statistic from serialized data.
- static from_dict(data: dict[str, Any]) NullHistogram [source]
Build a null histogram from a dict.
- type = 'null_histogram'
- class dae.genomic_resources.histogram.NullHistogramConfig(reason: str)[source]
Bases:
object
Configuration class for null histograms.
- static default_config() NullHistogramConfig [source]
- static from_dict(parsed: dict[str, Any]) NullHistogramConfig [source]
Create Null histogram from configuration dict.
- reason: str
- class dae.genomic_resources.histogram.NumberHistogram(config: NumberHistogramConfig, bins: ndarray | None = None, bars: ndarray | None = None)[source]
Bases:
Statistic
Class to represent a histogram.
- static deserialize(content: str) NumberHistogram [source]
Create a statistic from serialized data.
- static from_dict(data: dict[str, Any]) NumberHistogram [source]
Build a number histogram from a dict.
- plot(outfile: IO, score_id: str, small_values_description: str | None = None, large_values_description: str | None = None) None [source]
Plot histogram and save it into outfile.
- type = 'number_histogram'
- class dae.genomic_resources.histogram.NumberHistogramConfig(view_range: tuple[float | None, float | None], number_of_bins: int = 100, x_log_scale: bool = False, y_log_scale: bool = False, x_min_log: float | None = None, plot_function: str | None = None)[source]
Bases:
object
Configuration class for number histograms.
- static default_config(min_max: MinMaxValue | None) NumberHistogramConfig [source]
Build a number histogram config from a parsed yaml file.
- static from_dict(parsed: dict[str, Any]) NumberHistogramConfig [source]
Build a number histogram config from a parsed yaml file.
- number_of_bins: int = 100
- plot_function: str | None = None
- view_range: tuple[float | None, float | None]
- x_log_scale: bool = False
- x_min_log: float | None = None
- y_log_scale: bool = False
- dae.genomic_resources.histogram.build_default_histogram_conf(value_type: str, **kwargs: Any) NumberHistogramConfig | CategoricalHistogramConfig | NullHistogramConfig [source]
Build default histogram config for given value type.
- dae.genomic_resources.histogram.build_empty_histogram(config: NullHistogramConfig | CategoricalHistogramConfig | NumberHistogramConfig) NumberHistogram | CategoricalHistogram | NullHistogram [source]
Create an empty histogram from a deserialize histogram dictionary.
- dae.genomic_resources.histogram.build_histogram_config(config: dict[str, Any] | None) NullHistogramConfig | CategoricalHistogramConfig | NumberHistogramConfig | None [source]
Create histogram config form configuration dict.
- dae.genomic_resources.histogram.load_histogram(resource: GenomicResource, filename: str) NullHistogram | CategoricalHistogram | NumberHistogram [source]
Load and return a histogram in a resource.
On an error or missing histogram, an appropriate NullHistogram is returned.
- dae.genomic_resources.histogram.plot_histogram(res: GenomicResource, image_filename: str, hist: NullHistogram | CategoricalHistogram | NumberHistogram, score_id: str, small_values_desc: str | None = None, large_values_desc: str | None = None) None [source]
Plot histogram and save it into the resource.
- dae.genomic_resources.histogram.save_histogram(resource: GenomicResource, filename: str, histogram: NullHistogram | CategoricalHistogram | NumberHistogram) None [source]
Save histogram into a resource.
dae.genomic_resources.liftover_chain module
Provides LiftOver chain resource.
- class dae.genomic_resources.liftover_chain.LiftoverChain(resource: GenomicResource)[source]
Bases:
ResourceConfigValidationMixin
Defines Lift Over chain wrapper around pyliftover objects.
- convert_coordinate(chrom: str, pos: int) tuple[str, int, str, int] | None [source]
Lift over a genomic coordinate.
- property files: set[str]
- static map_chromosome(chrom: str, mapping: dict[str, str] | None) str [source]
Map a chromosome (contig) name according to configuration.
- open() LiftoverChain [source]
- dae.genomic_resources.liftover_chain.build_liftover_chain_from_resource(resource: GenomicResource) LiftoverChain [source]
Load a Lift Over chain from GRR resource.
dae.genomic_resources.reference_genome module
- class dae.genomic_resources.reference_genome.ReferenceGenome(resource: GenomicResource)[source]
Bases:
ResourceConfigValidationMixin
Provides an interface for quering a reference genome.
- property chrom_prefix: str
Return a prefix of all chromosomes of the reference genome.
- property chromosomes: list[str]
Return a list of all chromosomes of the reference genome.
- fetch(chrom: str, start: int, stop: int | None, buffer_size: int = 512) Generator[str, None, None] [source]
Yield the nucleotides in a specific region.
While line feed calculation can be inaccurate because not every fetch will start at the start of a line, line feeds add extra characters to read and the output is limited by the amount of nucleotides expected to be read.
- get_sequence(chrom: str, start: int, stop: int) str [source]
Return sequence of nucleotides from specified chromosome region.
- is_pseudoautosomal(chrom: str, pos: int) bool [source]
Return true if specified position is pseudoautosomal.
- open() ReferenceGenome [source]
Open reference genome resources.
- property resource_id: str
- dae.genomic_resources.reference_genome.build_reference_genome_from_file(filename: str) ReferenceGenome [source]
Open a reference genome from a file.
- dae.genomic_resources.reference_genome.build_reference_genome_from_resource(resource: GenomicResource) ReferenceGenome [source]
Open a reference genome from resource.
dae.genomic_resources.repository module
Provides basic classes for genomic resources and repositories.
+———————+ +—————–+
+—–| GenomicResourceRepo |--------------------| GenomicResource | | +———————+ +—————–+ | ^ ^ | | | | | | | +—————————–+ +—————————-+ | | | GenomicResourceProtocolRepo | —-| ReadOnlyRepositoryProtocol | | | +—————————–+ +—————————-+ | | ^ | | | | +————————–+ +—————————–+ +—-| GenomicResourceGroupRepo | | ReadWriteRepositoryProtocol |
+————————–+ +—————————–+
- class dae.genomic_resources.repository.GenomicResource(resource_id: str, version: tuple[int, ...], protocol: ReadOnlyRepositoryProtocol | ReadWriteRepositoryProtocol, config: dict[str, Any] | None = None, manifest: Manifest | None = None)[source]
Bases:
object
Base class for genomic resources.
- get_file_content(filename: str, *, uncompress: bool = True, mode: str = 't') Any [source]
Return the content of file in a resource.
- get_genomic_resource_id_version() str [source]
Return a string combinint resource ID and version.
Returns a string of the form aa/bb/cc[3.2] for a genomic resource with id aa/bb/cc and version 3.2. If the version is 0 the string will be aa/bb/cc.
- open_raw_file(filename: str, mode: str = 'rt', **kwargs: str | bool | None) IO [source]
Open a file in the resource and returns a File-like object.
- class dae.genomic_resources.repository.GenomicResourceProtocolRepo(proto: ReadOnlyRepositoryProtocol | ReadWriteRepositoryProtocol)[source]
Bases:
GenomicResourceRepo
Base class for real genomic resources repositories.
- find_resource(resource_id: str, version_constraint: str | None = None, repository_id: str | None = None) GenomicResource | None [source]
Return one resource with id qual to resource_id.
If resource is not found, None is returned.
- get_all_resources() Generator[GenomicResource, None, None] [source]
Return a generator over all resource in the repository.
- get_resource(resource_id: str, version_constraint: str | None = None, repository_id: str | None = None) GenomicResource [source]
Return one resource with id qual to resource_id.
If resource is not found, exception is raised.
- class dae.genomic_resources.repository.GenomicResourceRepo(repo_id: str)[source]
Bases:
ABC
Base class for genomic resources repositories.
- property definition: dict[str, Any] | None
- abstract find_resource(resource_id: str, version_constraint: str | None = None, repository_id: str | None = None) GenomicResource | None [source]
Return one resource with id qual to resource_id.
If resource is not found, None is returned.
- abstract get_all_resources() Generator[GenomicResource, None, None] [source]
Return a generator over all resource in the repository.
- abstract get_resource(resource_id: str, version_constraint: str | None = None, repository_id: str | None = None) GenomicResource [source]
Return one resource with id qual to resource_id.
If resource is not found, exception is raised.
- property repo_id: str
- class dae.genomic_resources.repository.Manifest[source]
Bases:
object
Provides genomic resource manifest object.
- add(entry: ManifestEntry) None [source]
Add manifest enry to the manifest.
- static from_file_content(file_content: str) Manifest [source]
Produce a manifest from manifest file content.
- static from_manifest_entries(manifest_entries: list[dict[str, Any]]) Manifest [source]
Produce a manifest from parsed manifest file content.
- to_manifest_entries() list[dict[str, Any]] [source]
Transform manifest to list of dictionaries.
Helpfull when storing the manifest.
- update(entries: dict[str, ManifestEntry]) None [source]
- class dae.genomic_resources.repository.ManifestEntry(name: str, size: int, md5: str | None)[source]
Bases:
object
Provides an entry into manifest object.
- md5: str | None
- name: str
- size: int
- class dae.genomic_resources.repository.ManifestUpdate(manifest: Manifest, entries_to_delete: set[str], entries_to_update: set[str])[source]
Bases:
object
Provides a manifest update object.
- entries_to_delete: set[str]
- entries_to_update: set[str]
- class dae.genomic_resources.repository.Mode(value, names=None, *, module=None, qualname=None, type=None, start=1, boundary=None)[source]
Bases:
Enum
Protocol mode.
- READONLY = 1
- READWRITE = 2
- class dae.genomic_resources.repository.ReadOnlyRepositoryProtocol(proto_id: str, url: str)[source]
Bases:
ABC
Defines read only genomic resources repository protocol.
- CHUNK_SIZE = 32768
- build_genomic_resource(resource_id: str, version: tuple[int, ...], config: dict | None = None, manifest: Manifest | None = None) GenomicResource [source]
Build a genomic resource based on this protocol.
- compute_md5_sum(resource: GenomicResource, filename: str) str [source]
Compute a md5 hash for a file in the resource.
- abstract file_exists(resource: GenomicResource, filename: str) bool [source]
Check if given file exist in give resource.
- find_resource(resource_id: str, version_constraint: str | None = None) GenomicResource | None [source]
Return requested resource or None if not found.
- abstract get_all_resources() Generator[GenomicResource, None, None] [source]
Return generator for all resources in the repository.
- get_file_content(resource: GenomicResource, filename: str, *, uncompress: bool = True, mode: str = 't') Any [source]
Return content of a file in given resource.
- get_manifest(resource: GenomicResource) Manifest [source]
Load and returns a resource manifest.
- get_resource(resource_id: str, version_constraint: str | None = None) GenomicResource [source]
Return requested resource or raises exception if not found.
In case resource is not found a FileNotFoundError exception is raised.
- get_resource_file_url(resource: GenomicResource, filename: str) str [source]
Return url of a file in the resource.
- get_resource_url(resource: GenomicResource) str [source]
Return url of the specified resources.
- abstract load_manifest(resource: GenomicResource) Manifest [source]
Load resource manifest.
- load_yaml(resource: GenomicResource, filename: str) Any [source]
Return parsed YAML file.
- abstract open_bigwig_file(resource: GenomicResource, filename: str) Any [source]
Open a bigwig file in a resource and return it.
Not all repositories support this method. Repositories that do no support this method raise and exception.
- abstract open_raw_file(resource: GenomicResource, filename: str, mode: str = 'rt', **kwargs: str | bool | None) IO [source]
Open file in a resource and returns a file-like object.
- abstract open_tabix_file(resource: GenomicResource, filename: str, index_filename: str | None = None) TabixFile [source]
Open a tabix file in a resource and return a pysam tabix file.
Not all repositories support this method. Repositories that do no support this method raise and exception.
- abstract open_vcf_file(resource: GenomicResource, filename: str, index_filename: str | None = None) VariantFile [source]
Open a vcf file in a resource and return a pysam VariantFile.
Not all repositories support this method. Repositories that do no support this method raise and exception.
- class dae.genomic_resources.repository.ReadWriteRepositoryProtocol(proto_id: str, url: str)[source]
Bases:
ReadOnlyRepositoryProtocol
Defines read write genomic resources repository protocol.
- abstract build_content_file() list[dict[str, Any]] [source]
Build the content of the repository (i.e ‘.CONTENTS’ file).
- build_manifest(resource: GenomicResource, prebuild_entries: dict[str, ManifestEntry] | None = None) Manifest [source]
Build full manifest for the resource.
- build_resource_file_state(resource: GenomicResource, filename: str, **kwargs: str | float | int | None) ResourceFileState [source]
Build resource file state.
- check_update_manifest(resource: GenomicResource, prebuild_entries: dict[str, ManifestEntry] | None = None) ManifestUpdate [source]
Check if the resource manifest needs update.
- abstract collect_all_resources() Generator[GenomicResource, None, None] [source]
Return generator for all resources managed by this protocol.
- abstract collect_resource_entries(resource: GenomicResource) Manifest [source]
Scan the resource and returns manifest with all files.
- copy_resource(remote_resource: GenomicResource) GenomicResource [source]
Copy a remote resource into repository.
- abstract copy_resource_file(remote_resource: GenomicResource, dest_resource: GenomicResource, filename: str) ResourceFileState | None [source]
Copy a remote resource file into local repository.
- abstract delete_resource_file(resource: GenomicResource, filename: str) None [source]
Delete a resource file and it’s internal state.
- get_manifest(resource: GenomicResource) Manifest [source]
Load or build a resource manifest.
- get_or_create_resource(resource_id: str, version: tuple[int, ...]) GenomicResource [source]
Return a resource with specified ID and version.
If the resource is not found create an empty resource.
- abstract get_resource_file_size(resource: GenomicResource, filename: str) int [source]
Return the size of a resource file.
- abstract get_resource_file_timestamp(resource: GenomicResource, filename: str) float [source]
Return the timestamp (ISO formatted) of a resource file.
- abstract load_resource_file_state(resource: GenomicResource, filename: str) ResourceFileState | None [source]
Load resource file state from internal GRR state.
If the specified resource file has no internal state returns None.
- save_index(resource: GenomicResource, contents: str) None [source]
Save an index HTML file into the genomic resource’s directory.
- save_manifest(resource: GenomicResource, manifest: Manifest) None [source]
Save manifest into genomic resource’s directory.
- abstract save_resource_file_state(resource: GenomicResource, state: ResourceFileState) None [source]
Save resource file state into internal GRR state.
- update_manifest(resource: GenomicResource, prebuild_entries: dict[str, ManifestEntry] | None = None) Manifest [source]
Update or create full manifest for the resource.
- update_resource(remote_resource: GenomicResource, files_to_copy: set[str] | None = None) GenomicResource [source]
Copy a remote resource into repository.
Allows copying of a subset of files from the resource via files_to_copy. If files_to_copy is None, copies all files.
- abstract update_resource_file(remote_resource: GenomicResource, dest_resource: GenomicResource, filename: str) ResourceFileState | None [source]
Update a resource file into repository if needed.
- class dae.genomic_resources.repository.ResourceFileState(filename: str, size: int, timestamp: float, md5: str)[source]
Bases:
object
Defines resource file state saved into internal GRR state.
- filename: str
- md5: str
- size: int
- timestamp: float
- dae.genomic_resources.repository.is_gr_id_token(token: str) bool [source]
Check if token can be used as a genomic resource ID.
Genomic Resource Id Token is a string with one or more letters, numbers, ‘.’, ‘_’, or ‘-’. The function checks if the parameter token is a Genomic REsource Id Token.
- dae.genomic_resources.repository.is_version_constraint_satisfied(version_constraint: str | None, version: tuple[int, ...]) bool [source]
Check if a version matches a version constraint.
- dae.genomic_resources.repository.parse_gr_id_version_token(token: str) tuple[str, tuple[int, ...]] [source]
Parse genomic resource ID with version.
Genomic Resource Id Version Token is a Genomic Resource Id Token with an optional version appened. If present, the version suffix has the form “(3.3.2)”. The default version is (0). Returns None if s in not a Genomic Resource Id Version. Otherwise returns token,version tupple
- dae.genomic_resources.repository.parse_resource_id_version(resource_path: str) tuple[str, tuple[int, ...]] [source]
Parse genomic resource id and version path into Id, Version tuple.
An optional version (0,) appened if needed. If present, the version suffix has the form “(3.3.2)”. The default version is (0,). Returns tuple (None, None) if the path does not match the resource_id/version requirements. Otherwise returns tuple (resource_id, version).
dae.genomic_resources.repository_factory module
Provides a factory for building genomic resources repostiories.
- dae.genomic_resources.repository_factory.build_genomic_resource_group_repository(repo_id: str, children: list[GenomicResourceRepo]) GenomicResourceRepo [source]
- dae.genomic_resources.repository_factory.build_genomic_resource_repository(definition: dict | None = None, file_name: str | None = None) GenomicResourceRepo [source]
Build a GRR using a definition dict or yaml file.
- dae.genomic_resources.repository_factory.build_resource_implementation(res: GenomicResource) GenomicResourceImplementation [source]
Build a resource implementation from a resource.
- dae.genomic_resources.repository_factory.get_default_grr_definition() dict[str, Any] [source]
Return default genomic resources repository definition.
dae.genomic_resources.resource_implementation module
- class dae.genomic_resources.resource_implementation.GenomicResourceImplementation(genomic_resource: GenomicResource)[source]
Bases:
ABC
Base class used by resource implementations.
Resources are just a folder on a repository. Resource implementations are classes that know how to use the contents of the resource.
- abstract add_statistics_build_tasks(task_graph: TaskGraph, **kwargs: Any) list[Task] [source]
Add tasks for calculating resource statistics to a task graph.
- abstract calc_statistics_hash() bytes [source]
Compute the statistics hash.
This hash is used to decide whether the resource statistics should be recomputed.
- property files: set[str]
Return a list of resource files the implementation utilises.
- abstract get_info(**kwargs: Any) str [source]
Construct the contents of the implementation’s HTML info page.
- get_statistics() ResourceStatistics | None [source]
Try and load resource statistics.
- reload_statistics() ResourceStatistics | None [source]
- property resource_id: str
- class dae.genomic_resources.resource_implementation.InfoImplementationMixin[source]
Bases:
object
Mixin that provides generic template info page generation interface.
- get_template_data() dict [source]
Return a data dictionary to be used by the template.
Will transform the description in the meta section using markdown.
- resource: GenomicResource
- class dae.genomic_resources.resource_implementation.ResourceConfigValidationMixin[source]
Bases:
object
Mixin that provides validation of resource configuration.
- classmethod validate_and_normalize_schema(config: dict, resource: GenomicResource) dict [source]
Validate the resource schema and return the normalized version.
dae.genomic_resources.testing module
Provides tools usefult for testing.
- dae.genomic_resources.testing.build_filesystem_test_protocol(root_path: Path, *, repair: bool = True) FsspecReadWriteProtocol [source]
Build and return an filesystem fsspec protocol for testing.
The root_path is expected to point to a directory structure with all the resources.
- dae.genomic_resources.testing.build_filesystem_test_repository(root_path: Path) GenomicResourceProtocolRepo [source]
Build and return an filesystem fsspec repository for testing.
The root_path is expected to point to a directory structure with all the resources.
- dae.genomic_resources.testing.build_filesystem_test_resource(root_path: Path) GenomicResource [source]
- dae.genomic_resources.testing.build_http_test_protocol(root_path: Path, *, repair: bool = True) Generator[FsspecReadOnlyProtocol, None, None] [source]
Run an HTTP range server and construct genomic resource protocol.
The HTTP range server is used to serve directory pointed by root_path. This directory should be a valid filesystem genomic resource repository.
- dae.genomic_resources.testing.build_inmemory_test_protocol(content: dict[str, Any]) FsspecReadWriteProtocol [source]
Build and return an embedded fsspec protocol for testing.
- dae.genomic_resources.testing.build_inmemory_test_repository(content: dict[str, Any]) GenomicResourceProtocolRepo [source]
Create an embedded GRR repository using passed content.
- dae.genomic_resources.testing.build_inmemory_test_resource(content: dict[str, Any]) GenomicResource [source]
Create a test resource based on content passed.
The passed content should appropriate for a single resource. Example content: {
- “genomic_resource.yaml”: textwrap.dedent(‘’’
type: position_score table:
filename: data.txt
- scores:
- id: aaaa
type: float desc: “” name: sc
‘’’), “data.txt”: convert_to_tab_separated(‘’’
#chrom start end sc 1 10 12 1.1 2 13 14 1.2
‘’’)
}
- dae.genomic_resources.testing.build_s3_test_bucket(s3filesystem: S3FileSystem | None = None) str [source]
Create an s3 test buckent.
- dae.genomic_resources.testing.build_s3_test_filesystem(endpoint_url: str | None = None) S3FileSystem [source]
Create an S3 fsspec filesystem connected to the S3 server.
- dae.genomic_resources.testing.build_s3_test_protocol(root_path: Path) Generator[FsspecReadWriteProtocol, None, None] [source]
Run an S3 moto server and construct fsspec genomic resource protocol.
The S3 moto server is populated with resource from filesystem GRR pointed by the root_path.
- dae.genomic_resources.testing.convert_to_tab_separated(content: str) str [source]
Convert a string into tab separated file content.
Useful for testing purposes. If you need to have a space in the file content use ‘||’.
- dae.genomic_resources.testing.copy_proto_genomic_resources(dest_proto: FsspecReadWriteProtocol, src_proto: FsspecReadOnlyProtocol) None [source]
- dae.genomic_resources.testing.http_process_test_server(path: Path) Generator[str, None, None] [source]
- dae.genomic_resources.testing.http_threaded_test_server(path: Path) Generator[str, None, None] [source]
Run a range HTTP threaded server.
The HTTP range server is used to serve directory pointed by root_path.
- dae.genomic_resources.testing.proto_builder(scheme: str, content: dict) Generator[FsspecReadOnlyProtocol | FsspecReadWriteProtocol, None, None] [source]
Build a test genomic resource protocol with specified content.
- dae.genomic_resources.testing.resource_builder(scheme: str, content: dict) Generator[GenomicResource, None, None] [source]
- dae.genomic_resources.testing.s3_test_protocol() FsspecReadWriteProtocol [source]
Build an S3 fsspec testing protocol on top of existing S3 server.
- dae.genomic_resources.testing.setup_bigwig(out_path: Path, content: str, chrom_lens: dict[str, int]) Path [source]
Setup a bigwig format variants file using bedGraph-style content.
Example: chr1 0 100 0.0 chr1 100 120 1.0 chr1 125 126 200.0
- dae.genomic_resources.testing.setup_dae_transmitted(root_path: Path, summary_content: str, toomany_content: str) tuple[Path, Path] [source]
Set up a DAE transmitted variants file using passed content.
- dae.genomic_resources.testing.setup_directories(root_dir: Path, content: str | dict[str, Any]) None [source]
Set up directory and subdirectory structures using the content.
- dae.genomic_resources.testing.setup_empty_gene_models(out_path: Path) GeneModels [source]
Set up empty gene models.
- dae.genomic_resources.testing.setup_gene_models(out_path: Path, content: str, fileformat: str | None = None) GeneModels [source]
Set up gene models in refflat format using the passed content.
- dae.genomic_resources.testing.setup_genome(out_path: Path, content: str) ReferenceGenome [source]
Set up reference genome using the content.
- dae.genomic_resources.testing.setup_gzip(gzip_path: Path, gzip_content: str) Path [source]
Set up a gzipped TSV file.
dae.genomic_resources.variant_utils module
- dae.genomic_resources.variant_utils.maximally_extend_variant(chrom: str, pos: int, ref: str, alts: list[str], genome: ReferenceGenome) tuple[str, int, str, list[str]] [source]
Maximally extend a variant.
- dae.genomic_resources.variant_utils.normalize_variant(chrom: str, pos: int, ref: str, alts: list[str], genome: ReferenceGenome) tuple[str, int, str, list[str]] [source]
Normalize a variant.
Using algorithm defined in the https://genome.sph.umich.edu/wiki/Variant_Normalization
Module contents
- class dae.genomic_resources.GenomicResource(resource_id: str, version: tuple[int, ...], protocol: ReadOnlyRepositoryProtocol | ReadWriteRepositoryProtocol, config: dict[str, Any] | None = None, manifest: Manifest | None = None)[source]
Bases:
object
Base class for genomic resources.
- get_file_content(filename: str, *, uncompress: bool = True, mode: str = 't') Any [source]
Return the content of file in a resource.
- get_genomic_resource_id_version() str [source]
Return a string combinint resource ID and version.
Returns a string of the form aa/bb/cc[3.2] for a genomic resource with id aa/bb/cc and version 3.2. If the version is 0 the string will be aa/bb/cc.
- open_raw_file(filename: str, mode: str = 'rt', **kwargs: str | bool | None) IO [source]
Open a file in the resource and returns a File-like object.
- dae.genomic_resources.build_genomic_resource_repository(definition: dict | None = None, file_name: str | None = None) GenomicResourceRepo [source]
Build a GRR using a definition dict or yaml file.
- dae.genomic_resources.get_resource_implementation_builder(resource_type: str) Callable[[GenomicResource], GenomicResourceImplementation] | None [source]
Return an implementation builder for a certain resource type.
If the builder is not registered, then it will search for an entry point in the found implementations list. If an entry point is found, it will be loaded and registered and returned.