dae.utils package

Subpackages

Submodules

dae.utils.cnv_utils module

dae.utils.cnv_utils.cnv_variant_type(variant)[source]
dae.utils.cnv_utils.cshl2cnv_variant(location, variant, *args)[source]

Parse location and variant into CNV variant.

dae.utils.dae_utils module

dae.utils.dae_utils.cshl2vcf_variant(location: str, variant: str, genome: ReferenceGenome) tuple[str, int, str, str][source]
dae.utils.dae_utils.dae2vcf_variant(chrom: str, position: int, variant: str, genome: ReferenceGenome) tuple[int, str, str][source]

Convert a given CSHL-style variant to the VCF format.

dae.utils.dae_utils.join_line(line: list[Any | list[Any]], sep: str = '\t') str[source]

Join an iterable representing a line into a string.

dae.utils.dae_utils.split_iterable(iterable: Iterable, max_chunk_length: int = 50) Generator[list, None, None][source]

Split an iterable into chunks of a list type.

dae.utils.debug_closing module

class dae.utils.debug_closing.closing(thing)[source]

Bases: AbstractContextManager

Context to automatically close something at the end of a block.

Code like this:

with closing(<module>.open(<arguments>)) as f:

<block>

is equivalent to this:

f = <module>.open(<arguments>) try:

<block>

finally:

f.close()

close()[source]

dae.utils.dict_utils module

dae.utils.dict_utils.recursive_dict_update(input_dict: dict, updater_dict: dict) dict[source]

Recursively update a dictionary with another dictionary.

dae.utils.dict_utils.recursive_dict_update_inplace(input_dict: dict, updater_dict: dict) None[source]

Recursively update a dictionary with another dictionary.

dae.utils.filehash module

dae.utils.filehash.hashsum(filename, hasher, blocksize=65536)[source]
dae.utils.filehash.md5sum(filename, blocksize=65536)[source]
dae.utils.filehash.sha256sum(filename, blocksize=65536)[source]

dae.utils.fixtures module

dae.utils.fixtures.change_environment(env_props)[source]

Change os.environ variables according to given dictionary.

Can be used with try/finally to restore the previous environment afterwards in the finally block.

dae.utils.fixtures.path_to_fixtures(module, *args, **kwargs)[source]

dae.utils.fs_utils module

dae.utils.fs_utils.abspath(filename: str) str[source]
dae.utils.fs_utils.containing_path(path: str | PathLike) str[source]

Return url to the resource that contains path.

For file paths this is equivalent to the containing directory. For urls this is equivalent to the containing resource.

dae.utils.fs_utils.copy(dest: str, src: str) None[source]

Copy a file or directory.

dae.utils.fs_utils.exists(filename: str) bool[source]
dae.utils.fs_utils.find_directory_with_a_file(filename: str, cwd: str | Path | None = None) Path | None[source]

Find a directory containing a file.

Starts from current working directory or from a directory passed.

dae.utils.fs_utils.glob(path: str) list[str][source]

Find files by glob-matching.

dae.utils.fs_utils.is_s3url(path: str) bool[source]
dae.utils.fs_utils.join(path: str, *paths: str) str[source]
dae.utils.fs_utils.modified(filename: str) datetime[source]

Return the modified timestamp of a file.

dae.utils.fs_utils.rm_file(path: str) None[source]

Remove a file.

dae.utils.fs_utils.sign(filename: str) str[source]

Create a signed URL representing the given path.

If the coresponding filesystem doesn’t support signing then the filename is returned as is.

dae.utils.fs_utils.tabix_index_filename(tabix_filename: str) str | None[source]

Given a Tabix/VCF filename returns a tabix index filename if exists.

dae.utils.helpers module

dae.utils.helpers.camelize_string(data: str) str[source]
dae.utils.helpers.convert_size(size_bytes: int) str[source]

Convert an integer representing size in bytes to a human-readable string.

Copied from: https://stackoverflow.com/questions/5194057/better-way-to-convert-file-sizes-in-python

dae.utils.helpers.isnan(val)[source]
dae.utils.helpers.str2bool(value)[source]
dae.utils.helpers.study_id_from_path(filepath)[source]
dae.utils.helpers.to_response_json(data) dict[source]

Convert a dict or Box to an acceptable response JSON.

dae.utils.progress module

dae.utils.progress.progress(text='.', verbose=1)[source]
dae.utils.progress.progress_nl(verbose=1)[source]
dae.utils.progress.red_print(message)[source]

dae.utils.regions module

class dae.utils.regions.BedRegion(chrom: str, start: int, stop: int)[source]

Bases: Region

Represents proper bed regions.

property begin: int
property end: int
static from_str(region: str) BedRegion[source]

Parse string representation of a region.

property start: int
property stop: int
class dae.utils.regions.Region(chrom: str, start: int | None = None, stop: int | None = None)[source]

Bases: object

Class representing a genomic region.

property begin: int | None
contains(other: Region) bool[source]

Check if the region contains other region.

property end: int | None
static from_str(region: str) Region[source]

Parse string representation of a region.

intersection(other: Region) Region | None[source]

Return intersection of the region with other region.

intersects(other: Region) bool[source]

Check if the region intersects another.

isin(chrom: str, pos: int) bool[source]

Check if a genomic position is insde of the region.

property start: int | None
property stop: int | None
dae.utils.regions.all_regions_from_chrom(regions: list[Region], chrom: str) list[Region][source]

Subset of regions in R that are from chr.

dae.utils.regions.bedfile2regions(bed_filename: str) list[BedRegion][source]

Transform BED file into list of regions.

dae.utils.regions.calc_bin_begin(bin_len: int, bin_idx: int) int[source]

Calculates the 1-based start position of the <bin_idx>-th bin of length <bin_len>.

n 2n 3n 4n |_______|_______|_______|

bin_len bin_begin

dae.utils.regions.calc_bin_end(bin_len: int, bin_idx: int) int[source]

Calculates the 1-based end position of the <bin_idx>-th bin of length <bin_len>.

n 2n 3n 4n |_______|_______|_______|

bin_len bin_end

dae.utils.regions.calc_bin_index(bin_len: int, pos: int) int[source]

Calculates the index of the <bin_len>-long bin the given 1-based position <pos> falls into.

n 2n 3n 4n |_______|_______|_______|

(bin 0) (bin 1) (bin 2)

dae.utils.regions.coalesce(v1: int | None, v2: int) int[source]

Return first non-None value.

dae.utils.regions.collapse(source: Sequence[Region], *, is_sorted: bool = False) list[Region][source]

Collapse list of regions.

dae.utils.regions.collapse_no_chrom(source: list[BedRegion], *, is_sorted: bool = False) list[BedRegion][source]

Collapse by ignoring the chromosome.

Useful when the caller knows that all the regions are from the same chromosome.

dae.utils.regions.connected_component(regions: list[BedRegion]) Any[source]

Return connected component of regions.

This might be the same as collapse.

dae.utils.regions.difference(regions1: list[Region], regions2: list[Region], *, symmetric: bool = False) list[Region][source]

Compute difference between two list of regions.

dae.utils.regions.get_chromosome_length_tabix(tabix_file: TabixFile | VariantFile, chrom: str, step: int = 100000000, precision: int = 5000000) int | None[source]

Return the length of a chromosome (or contig).

Returned value is guarnteed to be larger than the actual contig length.

dae.utils.regions.intersection(regions1: list[Region], regions2: list[Region]) list[Region][source]

Compute intersection of two list of regions.

First collapses each for lists of regions s1 and s2 and then find the intersection.

dae.utils.regions.regions2bedfile(regions: list[BedRegion], bed_filename: str) None[source]

Save list of regions into a BED file.

dae.utils.regions.split_into_regions(chrom: str, chrom_length: int, region_size: int, start: int = 1) list[Region][source]

Return a list of regions for a chrom with a given length.

dae.utils.regions.total_length(regions: list[BedRegion]) int[source]
dae.utils.regions.union(*r: list[Region]) list[Region][source]

Collapse many lists of regions.

dae.utils.regions.unique_regions(regions: list[Region]) list[Region][source]

Remove duplicated regions.

dae.utils.sql_utils module

dae.utils.sql_utils.fill_query_parameters(query: Any, params: list[Any]) None[source]

Filll query parameters.

dae.utils.sql_utils.glot_and(left_expr: Any, right_expr: Any) Any[source]
dae.utils.sql_utils.to_duckdb_transpile(query: Any) str[source]

dae.utils.stats_collection module

class dae.utils.stats_collection.StatsCollection[source]

Bases: MutableMapping

Helper class for collection of variuos statistics.

This class would be used in the project in places where collection of statistics data about how components of the system work seems appropriate.

It provides a dict-like interface.

The keys are tuples of strings. The values could be anything, but usually they are numbers.

>>> stats = StatsCollection()
>>> stats[("a",)] = 1
>>> stats[("a",)]
1
>>> stats.get(("a", 1))

The keys a treated as a hierarchy. You can get all values whose key’s start match the passed key. For example if you add following: >>> stats[(“b”, “1”)] = 42 >>> stats[(“b”, “2”)] = 43

you can get all values whose keys start with (“b”,…) using: >>> stats[(“b”,)] {(‘b’, ‘1’): 42, (‘b’, ‘2’): 43}

inc(key: tuple[str, ...]) None[source]

Increment stats value for the specified key.

save(filename: str) None[source]

Save stats to a file.

dae.utils.variant_utils module

class dae.utils.variant_utils.BitmaskEnumTranslator(*, main_enum_type: type[Enum], partition_by_enum_type: type[Enum])[source]

Bases: object

Encoder and decoder of two enums into a single value.

It has two enum types: the main and the partition by enum. For every enum value in the partition_by enum, a tuple of bits corresponding to the main enum will be added to the value. The amount of bits in the tuple will depend on how many bitwise values the main enum holds. The amount of tuples depends on the amount of bitwise values the partition by holds.

Enums provided to this class must have bitwise values, behavior with enums without bitwise values is undefined.

apply_mask(mask: int, main_enum_value: int, partition_by_enum: Enum) int[source]

Apply a mask filter over an existing mask and return the new mask.

dae.utils.variant_utils.best2gt(best_state: ~numpy.ndarray, dtype: ~numpy.dtype[~typing.Any] | None | type[~typing.Any] | ~numpy._typing._dtype_like._SupportsDType[~numpy.dtype[~typing.Any]] | str | tuple[~typing.Any, int] | tuple[~typing.Any, ~typing.SupportsIndex | ~collections.abc.Sequence[~typing.SupportsIndex]] | list[~typing.Any] | ~numpy._typing._dtype_like._DTypeDict | tuple[~typing.Any, ~typing.Any] = <class 'numpy.int8'>) ndarray[source]

Convert a best state array to a genotype array.

dae.utils.variant_utils.complement(nucleotides: str) str[source]
dae.utils.variant_utils.fgt2str(family_genotypes: ndarray, sep: str = ';') str[source]

Convert a family genotype array to a string.

dae.utils.variant_utils.get_interval_locus_ploidy(chrom: str, pos_start: int, pos_end: int, sex: Sex, genome: ReferenceGenome) int[source]
dae.utils.variant_utils.get_locus_ploidy(chrom: str, pos: int, sex: Sex, genome: ReferenceGenome) int[source]

Return the number of ploidy at a given position in a chromosome.

dae.utils.variant_utils.gt2str(gt: ndarray) str[source]

Convert a genotype array to a string.

dae.utils.variant_utils.is_all_reference_genotype(gt: ndarray) bool[source]
dae.utils.variant_utils.is_all_unknown_genotype(gt: ndarray) bool[source]
dae.utils.variant_utils.is_reference_genotype(gt: ndarray) bool[source]
dae.utils.variant_utils.is_unknown_genotype(gt: ndarray) bool[source]
dae.utils.variant_utils.mat2str(mat: ndarray | list[list[int]], col_sep: str = '', row_sep: str = '/') str[source]

Construct sting representation of a matrix.

dae.utils.variant_utils.reference_genotype(size: int) ndarray[source]
dae.utils.variant_utils.reverse_complement(nucleotides: str) str[source]
dae.utils.variant_utils.str2fgt(fgt: str) ndarray[source]

Convert a string to a family genotype array.

dae.utils.variant_utils.str2gt(genotypes: str, split: str = ', ', dtype: ~numpy.dtype[~typing.Any] | None | type[~typing.Any] | ~numpy._typing._dtype_like._SupportsDType[~numpy.dtype[~typing.Any]] | str | tuple[~typing.Any, int] | tuple[~typing.Any, ~typing.SupportsIndex | ~collections.abc.Sequence[~typing.SupportsIndex]] | list[~typing.Any] | ~numpy._typing._dtype_like._DTypeDict | tuple[~typing.Any, ~typing.Any] = <class 'numpy.int8'>) ndarray[source]

Convert a string to a genotype array.

dae.utils.variant_utils.str2lists(mat: str, col_sep: str = '', row_sep: str = '/') list[list[int]][source]

Convert a string into a numpy matrix.

dae.utils.variant_utils.str2mat(mat: str, col_sep: str = '', row_sep: str = '/', dtype: ~numpy.dtype[~typing.Any] | None | type[~typing.Any] | ~numpy._typing._dtype_like._SupportsDType[~numpy.dtype[~typing.Any]] | str | tuple[~typing.Any, int] | tuple[~typing.Any, ~typing.SupportsIndex | ~collections.abc.Sequence[~typing.SupportsIndex]] | list[~typing.Any] | ~numpy._typing._dtype_like._DTypeDict | tuple[~typing.Any, ~typing.Any] = <class 'numpy.int8'>) ndarray[source]

Convert a string into a numpy matrix.

dae.utils.variant_utils.str2mat_adjust_colsep(mat: str) ndarray[source]

Convert a string into a numpy matrix.

dae.utils.variant_utils.trim_parsimonious(pos: int, ref: str, alt: str) tuple[int, str, str][source]

Trim identical nucleotides on both ends and adjust position.

dae.utils.variant_utils.trim_str_left(pos: int, ref: str, alt: str) tuple[int, str, str][source]

Trim identical nucleotides prefixes and adjust position accordingly.

dae.utils.variant_utils.trim_str_left_right(pos: int, ref: str, alt: str) tuple[int, str, str][source]
dae.utils.variant_utils.trim_str_right(pos: int, ref: str, alt: str) tuple[int, str, str][source]

Trim identical nucleotides suffixes and adjust position accordingly.

dae.utils.variant_utils.trim_str_right_left(pos: int, ref: str, alt: str) tuple[int, str, str][source]

dae.utils.verbosity_configuration module

Provides common configuration for loggers verbosity.

class dae.utils.verbosity_configuration.VerbosityConfiguration[source]

Bases: object

Defines common configuration of verbosity for loggers.

static set(args: Namespace) None[source]

Read verbosity settings from parsed arguments and sets logger.

static set_arguments(parser: ArgumentParser) None[source]

Add verbosity arguments to argument parser.

static set_verbosity(verbose: int) None[source]

Set logging level according to the verbosity specified.

Module contents