dae.parquet_storage package

Submodules

dae.parquet_storage.storage module

class dae.parquet_storage.storage.ParquetGenotypeStorage(storage_config: dict[str, Any])[source]

Bases: GenotypeStorage

Genotype storage for raw parquet files.

VALIDATION_SCHEMA: ClassVar[dict] = {'dir': {'check_with': <function validate_path>, 'type': 'string'}, 'id': {'type': 'string'}, 'storage_type': {'allowed': ['parquet'], 'type': 'string'}}
build_backend(study_config: dict[str, Any], genome: ReferenceGenome, gene_models: GeneModels | None) ParquetLoaderVariants[source]

Construct a query backend for this genotype storage.

classmethod get_storage_types() set[str][source]

Return the genotype storage type.

import_dataset(study_id: str, layout: Schema2DatasetLayout) Schema2DatasetLayout[source]

Copy study parquet dataset into Schema2 genotype storage.

shutdown() GenotypeStorage[source]

No resources to close.

start() GenotypeStorage[source]

Allocate all resources needed for the genotype storage to work.

classmethod validate_and_normalize_config(config: dict) dict[source]

Normalize and validate the genotype storage configuration.

When validation passes returns the normalized and validated annotator configuration dict.

When validation fails, raises ValueError.

All genotype storage configurations are required to have:

  • “storage_type” - which storage type this configuration is used for;

  • “id” - the ID of the genotype storage instance that will be created.

class dae.parquet_storage.storage.ParquetImportStorage[source]

Bases: Schema2ImportStorage

Import storage for Parquet files.

generate_import_task_graph(project: ImportProject) TaskGraph[source]

Generate task grap for import of the project into this storage.

class dae.parquet_storage.storage.ParquetLoaderVariants(data_dir: str, reference_genome: ReferenceGenome | None = None, gene_models: GeneModels | None = None)[source]

Bases: object

Variants class that utilizes ParquetLoader to fetch variants.

build_family_variants_query_runner(*, regions: list[Region] | None = None, genes: list[str] | None = None, effect_types: list[str] | None = None, family_ids: list[str] | None = None, person_ids: list[str] | None = None, inheritance: list[str] | None = None, roles: str | None = None, sexes: str | None = None, variant_type: str | None = None, real_attr_filter: list[tuple[str, tuple[float | None, float | None]]] | None = None, ultra_rare: bool | None = None, frequency_filter: list[tuple[str, tuple[float | None, float | None]]] | None = None, return_reference: bool | None = None, return_unknown: bool | None = None, **_kwargs: Any) RawVariantsQueryRunner[source]

Return a query runner for the family variants.

build_summary_variants_query_runner(*, regions: list[Region] | None = None, genes: list[str] | None = None, effect_types: list[str] | None = None, variant_type: str | None = None, real_attr_filter: list[tuple[str, tuple[float | None, float | None]]] | None = None, ultra_rare: bool | None = None, frequency_filter: list[tuple[str, tuple[float | None, float | None]]] | None = None, return_reference: bool | None = None, return_unknown: bool | None = None, **kwargs: Any) RawVariantsQueryRunner[source]

Return a query runner for the summary variants.

property families: FamiliesData

Module contents