gain.utils package
Submodules
gain.utils.cnv_utils module
gain.utils.dae_utils module
- gain.utils.dae_utils.cshl2vcf_variant(location: str, variant: str, genome: ReferenceGenome | None) tuple[str, int, str, str][source]
- gain.utils.dae_utils.dae2vcf_variant(chrom: str, position: int, variant: str, genome: ReferenceGenome | None) tuple[int, str, str][source]
Convert a given CSHL-style variant to the VCF format.
gain.utils.debug_closing module
- class gain.utils.debug_closing.HasClose(*args, **kwargs)[source]
Bases:
ProtocolProtocol for objects that have a close method.
- class gain.utils.debug_closing.closing(thing: T)[source]
Bases:
AbstractContextManager,Generic[T]Context to automatically close something at the end of a block.
Code like this:
- with closing(<module>.open(<arguments>)) as f:
<block>
is equivalent to this:
f = <module>.open(<arguments>) try:
<block>
- finally:
f.close()
gain.utils.dict_utils module
gain.utils.fs_utils module
- gain.utils.fs_utils.containing_path(path: str | PathLike) str[source]
Return url to the resource that contains path.
For file paths this is equivalent to the containing directory. For urls this is equivalent to the containing resource.
- gain.utils.fs_utils.find_directory_with_a_file(filename: str, cwd: str | Path | None = None) Path | None[source]
Find a directory containing a file.
Starts from current working directory or from a directory passed.
- gain.utils.fs_utils.find_subdirectories_with_a_file(filename: str, cwd: str | Path | None = None) Sequence[Path][source]
Find a list of subdirectories containing a file.
Starts from current working directory or from a directory passed.
- gain.utils.fs_utils.is_compressed_filename(filename: str) bool[source]
Check if a file is compressed by its extension.
- gain.utils.fs_utils.modified(filename: str) datetime[source]
Return the modified timestamp of a file.
gain.utils.helpers module
- gain.utils.helpers.convert_size(size_bytes: int) str[source]
Convert an integer representing size in bytes to a human-readable string.
Copied from: https://stackoverflow.com/questions/5194057/better-way-to-convert-file-sizes-in-python
gain.utils.processing_pipeline module
- class gain.utils.processing_pipeline.Filter[source]
Bases:
AbstractContextManagerBase class for all processing pipeline filters.
- class gain.utils.processing_pipeline.PipelineProcessor(source: Source, filters: Sequence[Filter])[source]
Bases:
AbstractContextManagerA processor that can be used to process variants in a pipeline.
gain.utils.regions module
- class gain.utils.regions.BedRegion(chrom: str, start: int, stop: int)[source]
Bases:
RegionRepresents proper bed regions.
- property begin: int
- property end: int
- property start: int
- property stop: int
- class gain.utils.regions.Region(chrom: str, start: int | None = None, stop: int | None = None)[source]
Bases:
objectClass representing a genomic region.
- property begin: int | None
- property end: int | None
- intersection(other: Region) Region | None[source]
Return intersection of the region with other region.
- property start: int | None
- property stop: int | None
- gain.utils.regions.all_regions_from_chrom(regions: list[Region], chrom: str) list[Region][source]
Subset of regions in R that are from chr.
- gain.utils.regions.bedfile2regions(bed_filename: str) list[BedRegion][source]
Transform BED file into list of regions.
- gain.utils.regions.calc_bin_begin(bin_len: int, bin_idx: int) int[source]
Calculates the 1-based start position of the <bin_idx>-th bin of length <bin_len>.
n 2n 3n 4n |_______|_______|_______|
bin_len bin_begin
- gain.utils.regions.calc_bin_end(bin_len: int, bin_idx: int) int[source]
Calculates the 1-based end position of the <bin_idx>-th bin of length <bin_len>.
n 2n 3n 4n |_______|_______|_______|
bin_len bin_end
- gain.utils.regions.calc_bin_index(bin_len: int, pos: int) int[source]
Calculates the index of the <bin_len>-long bin the given 1-based position <pos> falls into.
n 2n 3n 4n |_______|_______|_______|
(bin 0) (bin 1) (bin 2)
- gain.utils.regions.collapse(source: Sequence[Region], *, is_sorted: bool = False) list[Region][source]
Collapse list of regions.
- gain.utils.regions.collapse_no_chrom(source: list[BedRegion], *, is_sorted: bool = False) list[BedRegion][source]
Collapse by ignoring the chromosome.
Useful when the caller knows that all the regions are from the same chromosome.
- gain.utils.regions.connected_component(regions: list[BedRegion]) Any[source]
Return connected component of regions.
This might be the same as collapse.
- gain.utils.regions.difference(regions1: list[Region], regions2: list[Region], *, symmetric: bool = False) list[Region][source]
Compute difference between two list of regions.
- gain.utils.regions.get_chromosome_length_tabix(tabix_file: TabixFile | VariantFile, chrom: str, step: int = 50000000, precision: int = 500000) int | None[source]
Return the length of a chromosome (or contig).
Returned value is guarnteed to be larger than the actual contig length.
- gain.utils.regions.intersection(regions1: list[Region], regions2: list[Region]) list[Region][source]
Compute intersection of two list of regions.
First collapses each for lists of regions s1 and s2 and then find the intersection.
- gain.utils.regions.regions2bedfile(regions: list[BedRegion], bed_filename: str) None[source]
Save list of regions into a BED file.
gain.utils.sql_utils module
gain.utils.stats_collection module
- class gain.utils.stats_collection.StatsCollection[source]
Bases:
MutableMapping[tuple[str, …],Any]Helper class for collection of variuos statistics.
This class would be used in the project in places where collection of statistics data about how components of the system work seems appropriate.
It provides a dict-like interface.
The keys are tuples of strings. The values could be anything, but usually they are numbers.
>>> stats = StatsCollection() >>> stats[("a",)] = 1 >>> stats[("a",)] 1 >>> stats.get(("a", 1))
The keys a treated as a hierarchy. You can get all values whose key’s start match the passed key. For example if you add following: >>> stats[(“b”, “1”)] = 42 >>> stats[(“b”, “2”)] = 43
you can get all values whose keys start with (“b”,…) using: >>> stats[(“b”,)] {(‘b’, ‘1’): 42, (‘b’, ‘2’): 43}
gain.utils.variant_utils module
Pure string utilities for variant manipulation.
- gain.utils.variant_utils.trim_parsimonious(pos: int, ref: str, alt: str) tuple[int, str, str][source]
Trim identical nucleotides on both ends and adjust position.
- gain.utils.variant_utils.trim_str_left(pos: int, ref: str, alt: str) tuple[int, str, str][source]
Trim identical nucleotides prefixes and adjust position accordingly.
- gain.utils.variant_utils.trim_str_left_right(pos: int, ref: str, alt: str) tuple[int, str, str][source]
gain.utils.verbosity_configuration module
Provides common configuration for loggers verbosity.
- class gain.utils.verbosity_configuration.VerbosityConfiguration[source]
Bases:
objectDefines common configuration of verbosity for loggers.
- static adjust_verbosity(loglevel: int) None[source]
Set logging level according to the verbosity specified.
- static set(args: Namespace | dict[str, str]) None[source]
Read verbosity settings from parsed arguments and sets logger.