dae.variants_loaders.parquet package

Subpackages

Submodules

dae.variants_loaders.parquet.loader module

class dae.variants_loaders.parquet.loader.MultiReader(dirs: Iterable[str], columns: Iterable[str])[source]

Bases: object

Incrementally fetch variants from multiple files.

This class assumes variants are ordered by their bucket and summary index!

close() None[source]
property current_idx: tuple[int, int]
class dae.variants_loaders.parquet.loader.ParquetLoader(layout: Schema2DatasetLayout)[source]

Bases: object

Variants loader implementation for the Parquet format.

FAMILY_COLUMNS: ClassVar[list[str]] = ['bucket_index', 'summary_index', 'family_id', 'family_variant_data']
SUMMARY_COLUMNS: ClassVar[list[str]] = ['bucket_index', 'summary_index', 'allele_index', 'summary_variant_data', 'chromosome', 'position', 'end_position']
fetch_family_variants(region: Region | None = None) Generator[FamilyVariant, None, None][source]

Iterate over family variants.

fetch_summary_variants(region: Region | None = None) Generator[SummaryVariant, None, None][source]

Iterate over summary variants.

fetch_variants(region: Region | None = None) Generator[tuple[SummaryVariant, list[FamilyVariant]], None, None][source]

Iterate over summary and family variants.

get_family_pq_filepaths(summary_path: str) list[str][source]

Get all family parquet files for given summary parquet file.

get_summary_pq_filepaths(region: Region | None = None) Generator[list[str], None, None][source]

Produce paths to available Parquet files grouped by region.

Can filter by region if region bins are configured.

static load_from_dir(input_dir: str) ParquetLoader[source]
exception dae.variants_loaders.parquet.loader.ParquetLoaderException[source]

Bases: Exception

class dae.variants_loaders.parquet.loader.Reader(path: str, columns: Iterable[str])[source]

Bases: object

Helper class to incrementally fetch variants.

This class assumes variants are ordered by their bucket and summary index!

BATCH_SIZE = 5000
close() None[source]
property current_idx: tuple[int, int]

Module contents