dae.pheno package

Subpackages

Submodules

dae.pheno.browser module

class dae.pheno.browser.PhenoBrowser(dbfile: str, *, read_only: bool = True)[source]

Bases: object

Class for handling saving and loading of phenotype browser data.

PAGE_SIZE = 1001
count_measures(instrument_name: str | None = None, keyword: str | None = None, page: int | None = None) int[source]

Find measures by keyword search.

static create_browser_tables(conn: DuckDBPyConnection) None[source]

Create tables for the browser DB.

property has_descriptions: bool

Check if the database has a description data.

property regression_display_names: dict[str, str]

Return regressions display name.

property regression_display_names_with_ids: dict[str, Any]

Return regression display names with measure IDs.

property regression_ids: list[str]
save(v: dict[str, Any]) None[source]

Save measure values into the database.

save_regression(reg: dict[str, str]) None[source]

Save regressions into the database.

save_regression_values(reg: dict[str, str]) None[source]

Save regression values into the databases.

search_measures(instrument_name: str | None = None, keyword: str | None = None, page: int | None = None, sort_by: str | None = None, order_by: str | None = None) Iterator[dict[str, Any]][source]

Find measures by keyword search.

dae.pheno.build_pheno_browser module

dae.pheno.build_pheno_browser.build_pheno_browser(pheno_db_dir: Path, storage_registry: PhenotypeStorageRegistry, pheno_data: PhenotypeData, cache_dir: Path, images_dir: Path, pheno_regressions: Box | None = None, **kwargs: dict[str, Any]) None[source]

Calculate and save pheno browser values to db.

dae.pheno.build_pheno_browser.main(argv: list[str] | None = None) int[source]

Run phenotype import tool.

dae.pheno.build_pheno_browser.pheno_cli_parser() ArgumentParser[source]

Construct argument parser for phenotype import tool.

dae.pheno.common module

class dae.pheno.common.DataDictionaryConfig(*, path: str, instrument: str | None = None, delimiter: str = '\t', instrument_column: str = 'instrumentName', measure_column: str = 'measureName', description_column: str = 'description')[source]

Bases: BaseModel

Pydantic model for data dictionary config entries.

delimiter: str
description_column: str
instrument: str | None
instrument_column: str
measure_column: str
model_computed_fields: ClassVar[dict[str, ComputedFieldInfo]] = {}

A dictionary of computed field names and their corresponding ComputedFieldInfo objects.

model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_fields: ClassVar[dict[str, FieldInfo]] = {'delimiter': FieldInfo(annotation=str, required=False, default='\t'), 'description_column': FieldInfo(annotation=str, required=False, default='description'), 'instrument': FieldInfo(annotation=Union[str, NoneType], required=False, default=None), 'instrument_column': FieldInfo(annotation=str, required=False, default='instrumentName'), 'measure_column': FieldInfo(annotation=str, required=False, default='measureName'), 'path': FieldInfo(annotation=str, required=True)}

Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo].

This replaces Model.__fields__ from Pydantic V1.

path: str
class dae.pheno.common.DestinationConfig(*, storage_id: str | None = None, storage_dir: str | None = None)[source]

Bases: BaseModel

model_computed_fields: ClassVar[dict[str, ComputedFieldInfo]] = {}

A dictionary of computed field names and their corresponding ComputedFieldInfo objects.

model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_fields: ClassVar[dict[str, FieldInfo]] = {'storage_dir': FieldInfo(annotation=Union[str, NoneType], required=False, default=None), 'storage_id': FieldInfo(annotation=Union[str, NoneType], required=False, default=None)}

Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo].

This replaces Model.__fields__ from Pydantic V1.

storage_dir: str | None
storage_id: str | None
class dae.pheno.common.GPFInstanceConfig(*, path: str)[source]

Bases: BaseModel

model_computed_fields: ClassVar[dict[str, ComputedFieldInfo]] = {}

A dictionary of computed field names and their corresponding ComputedFieldInfo objects.

model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_fields: ClassVar[dict[str, FieldInfo]] = {'path': FieldInfo(annotation=str, required=True)}

Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo].

This replaces Model.__fields__ from Pydantic V1.

path: str
class dae.pheno.common.ImportManifest(*, unix_timestamp: float, import_config: PhenoImportConfig)[source]

Bases: BaseModel

Import manifest for checking cache validity.

static create_table(connection: DuckDBPyConnection, table: Table)[source]

Create table for recording import manifests.

static from_row(row: tuple[str, Any, str]) ImportManifest[source]
static from_table(connection: DuckDBPyConnection, table: Table) list[ImportManifest][source]

Read manifests from given table.

import_config: PhenoImportConfig
is_older_than(other: ImportManifest) bool[source]
model_computed_fields: ClassVar[dict[str, ComputedFieldInfo]] = {}

A dictionary of computed field names and their corresponding ComputedFieldInfo objects.

model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_fields: ClassVar[dict[str, FieldInfo]] = {'import_config': FieldInfo(annotation=PhenoImportConfig, required=True), 'unix_timestamp': FieldInfo(annotation=float, required=True)}

Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo].

This replaces Model.__fields__ from Pydantic V1.

unix_timestamp: float
static write_to_db(connection: DuckDBPyConnection, table: Table, import_config: PhenoImportConfig)[source]

Write manifest into DB on given table.

class dae.pheno.common.InferenceConfig(*, min_individuals: int = 1, non_numeric_cutoff: float = 0.06, value_max_len: int = 32, continuous: RankRange = RankRange(min_rank=10, max_rank=None), ordinal: RankRange = RankRange(min_rank=1, max_rank=None), categorical: RankRange = RankRange(min_rank=1, max_rank=15), skip: bool = False, value_type: str | None = None, histogram_type: str | None = None)[source]

Bases: BaseModel

Classification inference configuration class.

categorical: RankRange
continuous: RankRange
histogram_type: str | None
min_individuals: int
model_computed_fields: ClassVar[dict[str, ComputedFieldInfo]] = {}

A dictionary of computed field names and their corresponding ComputedFieldInfo objects.

model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_fields: ClassVar[dict[str, FieldInfo]] = {'categorical': FieldInfo(annotation=RankRange, required=False, default=RankRange(min_rank=1, max_rank=15)), 'continuous': FieldInfo(annotation=RankRange, required=False, default=RankRange(min_rank=10, max_rank=None)), 'histogram_type': FieldInfo(annotation=Union[str, NoneType], required=False, default=None), 'min_individuals': FieldInfo(annotation=int, required=False, default=1), 'non_numeric_cutoff': FieldInfo(annotation=float, required=False, default=0.06), 'ordinal': FieldInfo(annotation=RankRange, required=False, default=RankRange(min_rank=1, max_rank=None)), 'skip': FieldInfo(annotation=bool, required=False, default=False), 'value_max_len': FieldInfo(annotation=int, required=False, default=32), 'value_type': FieldInfo(annotation=Union[str, NoneType], required=False, default=None)}

Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo].

This replaces Model.__fields__ from Pydantic V1.

non_numeric_cutoff: float
ordinal: RankRange
skip: bool
value_max_len: int
value_type: str | None
class dae.pheno.common.InstrumentConfig(*, path: str, instrument: str | None = None, delimiter: str | None = None, person_column: str | None = None)[source]

Bases: BaseModel

delimiter: str | None
instrument: str | None
model_computed_fields: ClassVar[dict[str, ComputedFieldInfo]] = {}

A dictionary of computed field names and their corresponding ComputedFieldInfo objects.

model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_fields: ClassVar[dict[str, FieldInfo]] = {'delimiter': FieldInfo(annotation=Union[str, NoneType], required=False, default=None), 'instrument': FieldInfo(annotation=Union[str, NoneType], required=False, default=None), 'path': FieldInfo(annotation=str, required=True), 'person_column': FieldInfo(annotation=Union[str, NoneType], required=False, default=None)}

Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo].

This replaces Model.__fields__ from Pydantic V1.

path: str
person_column: str | None
class dae.pheno.common.MeasureDescriptionsConfig(*, files: list[DataDictionaryConfig] | None = None, dictionary: dict[str, dict[str, str]] | None = None)[source]

Bases: BaseModel

dictionary: dict[str, dict[str, str]] | None
files: list[DataDictionaryConfig] | None
model_computed_fields: ClassVar[dict[str, ComputedFieldInfo]] = {}

A dictionary of computed field names and their corresponding ComputedFieldInfo objects.

model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_fields: ClassVar[dict[str, FieldInfo]] = {'dictionary': FieldInfo(annotation=Union[dict[str, dict[str, str]], NoneType], required=False, default=None), 'files': FieldInfo(annotation=Union[list[DataDictionaryConfig], NoneType], required=False, default=None)}

Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo].

This replaces Model.__fields__ from Pydantic V1.

class dae.pheno.common.MeasureHistogramConfigs(*, number_config: dict = {}, categorical_config: dict = {})[source]

Bases: BaseModel

Classification histogram configuration class.

categorical_config: dict
model_computed_fields: ClassVar[dict[str, ComputedFieldInfo]] = {}

A dictionary of computed field names and their corresponding ComputedFieldInfo objects.

model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_fields: ClassVar[dict[str, FieldInfo]] = {'categorical_config': FieldInfo(annotation=dict, required=False, default={}), 'number_config': FieldInfo(annotation=dict, required=False, default={})}

Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo].

This replaces Model.__fields__ from Pydantic V1.

number_config: dict
class dae.pheno.common.MeasureType(value, names=None, *, module=None, qualname=None, type=None, start=1, boundary=None)[source]

Bases: Enum

Definition of measure types.

categorical = 3
continuous = 1
static from_str(measure_type: str) MeasureType[source]
static is_numeric(measure_type: MeasureType) bool[source]
static is_text(measure_type: MeasureType) bool[source]
ordinal = 2
other = 100
raw = 5
skipped = 1000
text = 4
class dae.pheno.common.PhenoImportConfig(*, id: str, input_dir: str, work_dir: str, instrument_files: list[str | InstrumentConfig], pedigree: str, person_column: str, delimiter: str = ',', destination: DestinationConfig | None = None, gpf_instance: GPFInstanceConfig | None = None, skip_pedigree_measures: bool = False, inference_config: str | dict[str, InferenceConfig] | None = None, histogram_configs: MeasureHistogramConfigs | None = None, data_dictionary: MeasureDescriptionsConfig | None = None, study_config: StudyConfig | None = None)[source]

Bases: BaseModel

Pheno import config.

data_dictionary: MeasureDescriptionsConfig | None
delimiter: str
destination: DestinationConfig | None
gpf_instance: GPFInstanceConfig | None
histogram_configs: MeasureHistogramConfigs | None
id: str
inference_config: str | dict[str, InferenceConfig] | None
input_dir: str
instrument_files: list[str | InstrumentConfig]
model_computed_fields: ClassVar[dict[str, ComputedFieldInfo]] = {}

A dictionary of computed field names and their corresponding ComputedFieldInfo objects.

model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_fields: ClassVar[dict[str, FieldInfo]] = {'data_dictionary': FieldInfo(annotation=Union[MeasureDescriptionsConfig, NoneType], required=False, default=None), 'delimiter': FieldInfo(annotation=str, required=False, default=','), 'destination': FieldInfo(annotation=Union[DestinationConfig, NoneType], required=False, default=None), 'gpf_instance': FieldInfo(annotation=Union[GPFInstanceConfig, NoneType], required=False, default=None), 'histogram_configs': FieldInfo(annotation=Union[MeasureHistogramConfigs, NoneType], required=False, default=None), 'id': FieldInfo(annotation=str, required=True), 'inference_config': FieldInfo(annotation=Union[str, dict[str, InferenceConfig], NoneType], required=False, default=None), 'input_dir': FieldInfo(annotation=str, required=True), 'instrument_files': FieldInfo(annotation=list[Union[str, InstrumentConfig]], required=True), 'pedigree': FieldInfo(annotation=str, required=True), 'person_column': FieldInfo(annotation=str, required=True), 'skip_pedigree_measures': FieldInfo(annotation=bool, required=False, default=False), 'study_config': FieldInfo(annotation=Union[StudyConfig, NoneType], required=False, default=None), 'work_dir': FieldInfo(annotation=str, required=True)}

Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo].

This replaces Model.__fields__ from Pydantic V1.

pedigree: str
person_column: str
skip_pedigree_measures: bool
study_config: StudyConfig | None
work_dir: str
class dae.pheno.common.RankRange(*, min_rank: int | None = None, max_rank: int | None = None)[source]

Bases: BaseModel

max_rank: int | None
min_rank: int | None
model_computed_fields: ClassVar[dict[str, ComputedFieldInfo]] = {}

A dictionary of computed field names and their corresponding ComputedFieldInfo objects.

model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_fields: ClassVar[dict[str, FieldInfo]] = {'max_rank': FieldInfo(annotation=Union[int, NoneType], required=False, default=None), 'min_rank': FieldInfo(annotation=Union[int, NoneType], required=False, default=None)}

Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo].

This replaces Model.__fields__ from Pydantic V1.

class dae.pheno.common.RegressionMeasure(*, instrument_name: str, measure_names: list[str], jitter: float, display_name: str)[source]

Bases: BaseModel

display_name: str
instrument_name: str
jitter: float
measure_names: list[str]
model_computed_fields: ClassVar[dict[str, ComputedFieldInfo]] = {}

A dictionary of computed field names and their corresponding ComputedFieldInfo objects.

model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_fields: ClassVar[dict[str, FieldInfo]] = {'display_name': FieldInfo(annotation=str, required=True), 'instrument_name': FieldInfo(annotation=str, required=True), 'jitter': FieldInfo(annotation=float, required=True), 'measure_names': FieldInfo(annotation=list[str], required=True)}

Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo].

This replaces Model.__fields__ from Pydantic V1.

class dae.pheno.common.StudyConfig(*, regressions: str | dict[str, RegressionMeasure] | None = None, common_report: dict[str, Any] | None = None, person_set_collections: dict[str, Any] | None = None)[source]

Bases: BaseModel

common_report: dict[str, Any] | None
model_computed_fields: ClassVar[dict[str, ComputedFieldInfo]] = {}

A dictionary of computed field names and their corresponding ComputedFieldInfo objects.

model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_fields: ClassVar[dict[str, FieldInfo]] = {'common_report': FieldInfo(annotation=Union[dict[str, Any], NoneType], required=False, default=None), 'person_set_collections': FieldInfo(annotation=Union[dict[str, Any], NoneType], required=False, default=None), 'regressions': FieldInfo(annotation=Union[str, dict[str, RegressionMeasure], NoneType], required=False, default=None)}

Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo].

This replaces Model.__fields__ from Pydantic V1.

person_set_collections: dict[str, Any] | None
regressions: str | dict[str, RegressionMeasure] | None

dae.pheno.db module

class dae.pheno.db.PhenoDb(dbfile: str, *, read_only: bool = True)[source]

Bases: object

Class that manages access to phenotype databases.

find_instrument_values_tables() dict[str, Table][source]

Create instrument values tables.

Each row is basically a list of every measure value in the instrument for a certain person.

get_measures_df(instrument: str | None = None, measure_type: MeasureType | None = None) DataFrame[source]

Return data frame containing measures information.

instrument – an instrument name which measures should be returned. If not specified all type of measures are returned.

measure_type – a type (‘continuous’, ‘ordinal’ or ‘categorical’) of measures that should be returned. If not specified all type of measures are returned.

Each row in the returned data frame represents given measure.

Columns in the returned data frame are: measure_id, measure_name, instrument_name, description, stats, min_value, max_value, value_domain, has_probands, has_siblings, has_parents, default_filter.

get_pedigree_df() DataFrame[source]

Return individuals data from phenotype database as a dataframe.

get_people_measure_values(measure_ids: list[str], person_ids: list[str] | None = None, family_ids: list[str] | None = None, roles: list[Role] | None = None) Generator[dict[str, Any], None, None][source]

Yield lines from measure values tables.

get_people_measure_values_df(measure_ids: list[str], person_ids: list[str] | None = None, family_ids: list[str] | None = None, roles: list[Role] | None = None) DataFrame[source]

Return dataframe from measure values tables.

get_persons_df() DataFrame[source]

Return individuals data from phenotype database as a dataframe.

dae.pheno.db.generate_instrument_table_name(instrument_name: str) str[source]
dae.pheno.db.safe_db_name(name: str) str[source]

Convert a string to a db-friendly string.

dae.pheno.graphs module

class dae.pheno.graphs.GraphColumn(name, roles, status, df)[source]

Bases: object

Build a column to produce a graph from it.

all_count()[source]
static build(df, role_name, role_subroles, status)[source]

Construct a graph column object.

females_count()[source]
property label
males_count()[source]
dae.pheno.graphs.column_counts(column)[source]

Collect counts for a graph column.

dae.pheno.graphs.draw_categorical_violin_distribution(df, measure_id, *, roles_definition=None, ax=None, numerical_categories=False, max_categories=12)[source]

Draw violin distribution for categorical measures.

dae.pheno.graphs.draw_linregres(df, col1, col2, jitter: int | None = None, ax=None)[source]

Draw a graph display linear regression between two columns.

dae.pheno.graphs.draw_measure_violinplot(df, measure_id, roles_definition=None, ax=None)[source]

Draw a violin plot for a measure.

dae.pheno.graphs.draw_ordinal_violin_distribution(df, measure_id, ax=None)[source]
dae.pheno.graphs.gender_palette()[source]
dae.pheno.graphs.gender_palette_light()[source]
dae.pheno.graphs.get_columns_to_draw(roles, df)[source]

Collect columns needed for graphs.

dae.pheno.graphs.male_female_legend(color_male, color_female, ax=None)[source]

Consturct a legend for female graph.

dae.pheno.graphs.role_labels(ordered_columns)[source]
dae.pheno.graphs.set_figure_size(figure, x_count)[source]

dae.pheno.import_tools module

dae.pheno.import_tools.main(argv: list[str] | None = None) int[source]

Run phenotype import tool.

dae.pheno.import_tools.pheno_cli_parser() ArgumentParser[source]

Construct argument parser for phenotype import tool.

dae.pheno.pheno_data module

class dae.pheno.pheno_data.Instrument(name: str)[source]

Bases: object

Instrument object represents phenotype instruments.

Common fields are:

  • instrument_name

  • measures – dictionary of all measures in the instrument

class dae.pheno.pheno_data.Measure(measure_id: str, name: str)[source]

Bases: object

Measure objects represent phenotype measures.

Common fields are:

  • instrument_name

  • measure_name

  • measure_id - formed by instrument_name.`measure_name`

  • measure_type - one of ‘continuous’, ‘ordinal’, ‘categorical’

  • value_type - one of ‘float’, ‘str’, ‘int’

  • histogram_type - one of ‘number’, ‘categorical’

  • histogram_config - one of HistogramConfig or None

  • description

  • min_value - for ‘continuous’ and ‘ordinal’ measures

  • max_value - for ‘continuous’ and ‘ordinal’ measures

  • values_domain - string that represents the values

property domain: Sequence[str | float]

Return measure values domain.

classmethod from_record(row: dict[str, Any]) Measure[source]

Create Measure object from pandas data frame row.

to_json() dict[str, Any][source]

Return measure description in JSON freindly format.

class dae.pheno.pheno_data.PhenotypeData(pheno_id: str, config: dict | None = None, cache_path: Path | None = None)[source]

Bases: ABC, CommonStudyMixin

Base class for all phenotype data studies and datasets.

property browser: PhenoBrowser | None

Get or create pheno browser for phenotype data.

build_and_save(*, force: bool = False) CommonReport | None[source]

Build a common report for a study, saves it and returns the report.

If the common reports are disabled for the study, the function skips building the report and returns None.

If the report already exists the default behavior is to skip building the report. You can force building the report by passing force=True to the function.

build_report() CommonReport[source]

Generate common report JSON from genotpye data study.

close() None[source]

Close the connection to the database.

abstract count_measures(instrument: str | None, search_term: str | None, page: int | None = None) int[source]

Count measures in the DB according to filters.

static create_browser(pheno_data: PhenotypeData, *, read_only: bool = True) PhenoBrowser[source]

Load pheno browser from pheno configuration.

property families: FamiliesData
abstract generate_import_manifests() list[ImportManifest][source]

Collect all manifests in a phenotype data instance.

abstract get_children_ids(*, leaves: bool = True) list[str][source]

Return all phenotype studies’ ids in the group.

get_common_report() CommonReport | None[source]

Return a study’s common report.

get_image(image_path: str) tuple[bytes, str][source]

Return binary image data with mimetype.

get_instrument_measures(instrument_name: str) list[str][source]

Return measures for given instrument.

get_instruments() list[str][source]
get_measure(measure_id: str) Measure[source]

Return a measure by measure_id.

get_measure_description(measure_id: str) dict[str, Any][source]

Construct and return a measure description.

get_measures(instrument_name: str | None = None, measure_type: MeasureType | None = None) dict[str, Measure][source]

Return a dictionary of measures objects.

instrument_name – an instrument name which measures should be returned. If not specified all type of measures are returned.

measure_type – a type (‘continuous’, ‘ordinal’ or ‘categorical’) of measures that should be returned. If not specified all type of measures are returned.

abstract get_measures_info() dict[str, Any][source]
abstract get_pedigree_df() DataFrame[source]
abstract get_people_measure_values(measure_ids: list[str], person_ids: list[str] | None = None, family_ids: list[str] | None = None, roles: list[Role] | None = None) Generator[dict[str, Any], None, None][source]

Collect and format the values of the given measures in dict format.

Yields a dict representing every row.

measure_ids – list of measure ids which values should be returned.

person_ids – list of person IDs to filter result. Only data for individuals with person_id in the list person_ids are returned.

family_ids – list of family IDs to filter result. Only data for individuals that are members of any of the specified family_ids are returned.

roles – list of roles of individuals to select measure value for. If not specified value for individuals in all roles are returned.

get_people_measure_values_df(measure_ids: list[str], person_ids: list[str] | None = None, family_ids: list[str] | None = None, roles: list[Role] | None = None) DataFrame[source]

Collect and format the values of the given measures in a dataframe.

measure_ids – list of measure ids which values should be returned.

person_ids – list of person IDs to filter result. Only data for individuals with person_id in the list person_ids are returned.

family_ids – list of family IDs to filter result. Only data for individuals that are members of any of the specified family_ids are returned.

roles – list of roles of individuals to select measure value for. If not specified value for individuals in all roles are returned.

get_person_roles() list[str][source]

Return individuals distinct role data from phenotype database.

get_person_set_collection(person_set_collection_id: str | None) PersonSetCollection | None[source]
get_persons() dict[str, Person][source]

Return individuals data from phenotype database.

abstract get_persons_df() DataFrame[source]
abstract get_regressions() dict[str, Any][source]
has_measure(measure_id: str) bool[source]

Check if phenotype DB contains a measure by ID.

property instruments: dict[str, Instrument]
is_browser_outdated(browser: PhenoBrowser) bool[source]

Check if a rebuild is required according to manifests.

property is_group: bool
property measures: dict[str, Measure]
property person_set_collections: dict[str, PersonSetCollection]
property pheno_id: str
search_measures(instrument: str | None, search_term: str | None, page: int | None = None, sort_by: str | None = None, order_by: str | None = None) Generator[dict[str, Any], None, None][source]

Yield measures in the DB according to filters.

class dae.pheno.pheno_data.PhenotypeGroup(pheno_id: str, config: dict | None, children: list[PhenotypeData], cache_path: Path | None = None)[source]

Bases: PhenotypeData

Represents a group of phenotype data studies or groups.

count_measures(instrument: str | None, search_term: str | None, page: int | None = None) int[source]

Count measures in the DB according to filters.

property families: FamiliesData
generate_import_manifests() list[ImportManifest][source]

Collect all manifests in a phenotype data instance.

get_children_ids(*, leaves: bool = True) list[str][source]

Return all phenotype studies’ ids in the group.

get_leaves() list[PhenotypeStudy][source]

Return all phenotype studies in the group.

get_measures_info() dict[str, Any][source]
get_pedigree_df() DataFrame[source]
get_people_measure_values(measure_ids: list[str], person_ids: list[str] | None = None, family_ids: list[str] | None = None, roles: list[Role] | None = None) Generator[dict[str, Any], None, None][source]

Collect and format the values of the given measures in dict format.

Yields a dict representing every row.

measure_ids – list of measure ids which values should be returned.

person_ids – list of person IDs to filter result. Only data for individuals with person_id in the list person_ids are returned.

family_ids – list of family IDs to filter result. Only data for individuals that are members of any of the specified family_ids are returned.

roles – list of roles of individuals to select measure value for. If not specified value for individuals in all roles are returned.

get_people_measure_values_df(measure_ids: list[str], person_ids: list[str] | None = None, family_ids: list[str] | None = None, roles: list[Role] | None = None) DataFrame[source]

Collect and format the values of the given measures in a dataframe.

measure_ids – list of measure ids which values should be returned.

person_ids – list of person IDs to filter result. Only data for individuals with person_id in the list person_ids are returned.

family_ids – list of family IDs to filter result. Only data for individuals that are members of any of the specified family_ids are returned.

roles – list of roles of individuals to select measure value for. If not specified value for individuals in all roles are returned.

get_person_roles() list[str][source]

Return individuals distinct role data from phenotype database.

get_persons_df() DataFrame[source]
get_regressions() dict[str, Any][source]
property is_group: bool
property person_set_collections: dict[str, PersonSetCollection]
class dae.pheno.pheno_data.PhenotypeStudy(pheno_id: str, dbfile: str, config: dict | None = None, *, read_only: bool = True, cache_path: Path | None = None)[source]

Bases: PhenotypeData

Main class for accessing phenotype database in DAE.

To access the phenotype database create an instance of this class and call the method load().

Common fields of this class are:

  • persons – list of all individuals in the database

  • instruments – dictionary of all instruments

  • measures – dictionary of all measures

count_measures(instrument: str | None, search_term: str | None, page: int | None = None) int[source]

Count measures in the DB according to filters.

property families: FamiliesData
generate_import_manifests() list[ImportManifest][source]

Collect all manifests in a phenotype data instance.

get_children_ids(*, leaves: bool = True) list[str][source]

Return all phenotype studies’ ids in the group.

get_measures_info() dict[str, Any][source]
get_pedigree_df() DataFrame[source]
get_people_measure_values(measure_ids: list[str], person_ids: list[str] | None = None, family_ids: list[str] | None = None, roles: list[Role] | None = None) Generator[dict[str, Any], None, None][source]

Collect and format the values of the given measures in dict format.

Yields a dict representing every row.

measure_ids – list of measure ids which values should be returned.

person_ids – list of person IDs to filter result. Only data for individuals with person_id in the list person_ids are returned.

family_ids – list of family IDs to filter result. Only data for individuals that are members of any of the specified family_ids are returned.

roles – list of roles of individuals to select measure value for. If not specified value for individuals in all roles are returned.

get_people_measure_values_df(measure_ids: list[str], person_ids: list[str] | None = None, family_ids: list[str] | None = None, roles: list[Role] | None = None) DataFrame[source]

Collect and format the values of the given measures in a dataframe.

measure_ids – list of measure ids which values should be returned.

person_ids – list of person IDs to filter result. Only data for individuals with person_id in the list person_ids are returned.

family_ids – list of family IDs to filter result. Only data for individuals that are members of any of the specified family_ids are returned.

roles – list of roles of individuals to select measure value for. If not specified value for individuals in all roles are returned.

get_persons_df() DataFrame[source]
get_regressions() dict[str, Any][source]
property person_set_collections: dict[str, PersonSetCollection]
dae.pheno.pheno_data.get_pheno_browser_images_dir(dae_config: dict | None = None) Path[source]

Get images directory for pheno DB.

dae.pheno.pheno_data.get_pheno_db_dir(dae_config: dict | None) str[source]

Return the directory where phenotype data configurations are located.

dae.pheno.pheno_import module

class dae.pheno.pheno_import.ImportInstrument(files: list[pathlib.Path], name: str, delimiter: str, person_column: str)[source]

Bases: object

delimiter: str
files: list[Path]
name: str
person_column: str
class dae.pheno.pheno_import.MeasureReport(*, measure_name: str, instrument_name: str, db_name: str, measure_type: MeasureType, inference_report: InferenceReport)[source]

Bases: BaseModel

db_name: str
inference_report: InferenceReport
instrument_name: str
measure_name: str
measure_type: MeasureType
model_computed_fields: ClassVar[dict[str, ComputedFieldInfo]] = {}

A dictionary of computed field names and their corresponding ComputedFieldInfo objects.

model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_fields: ClassVar[dict[str, FieldInfo]] = {'db_name': FieldInfo(annotation=str, required=True), 'inference_report': FieldInfo(annotation=InferenceReport, required=True), 'instrument_name': FieldInfo(annotation=str, required=True), 'measure_name': FieldInfo(annotation=str, required=True), 'measure_type': FieldInfo(annotation=MeasureType, required=True)}

Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo].

This replaces Model.__fields__ from Pydantic V1.

dae.pheno.pheno_import.add_pheno_common_inference(config: dict[str, Any]) None[source]

Add pedigree columns as skipped columns to the inference config.

dae.pheno.pheno_import.collect_instruments(import_config: PhenoImportConfig) list[ImportInstrument][source]

Collect all instrument files for a given import config.

dae.pheno.pheno_import.create_import_tasks(task_graph: TaskGraph, instruments: list[ImportInstrument], instrument_measure_names: dict[str, list[str]], inference_configs: dict[str, Any], histogram_configs: MeasureHistogramConfigs | None, import_config: PhenoImportConfig, descriptions: dict[str, str]) None[source]

Add measure tasks for importing pheno data.

dae.pheno.pheno_import.create_tables(connection: DuckDBPyConnection) None[source]

Create phenotype data tables in DuckDB.

dae.pheno.pheno_import.determine_destination(gpf_instance: GPFInstance | None, config: PhenoImportConfig) tuple[str | None, Path | None, Path | None][source]

Determine where output should be placed based on configuration.

dae.pheno.pheno_import.generate_phenotype_data_config(pheno_name: str, storage_id: str | None, overrides: StudyConfig | None) str[source]

Construct phenotype data configuration from command line arguments.

dae.pheno.pheno_import.get_gpf_instance(config: PhenoImportConfig) GPFInstance | None[source]

Return a GPF instance for an import config if it can be found.

dae.pheno.pheno_import.get_output_parquet_files_dir(import_config: PhenoImportConfig) Path[source]
dae.pheno.pheno_import.handle_measure_inference_tasks(task_graph: TaskGraph, task_cache: TaskCache, task_graph_args: Namespace) dict[str, tuple[Path, Path]][source]

Read the output of the measure inference tasks into dictionaries.

dae.pheno.pheno_import.import_pheno_data(config: PhenoImportConfig, gpf_instance: GPFInstance | None = None, task_graph_args: Namespace | None = None) None[source]

Import pheno data into DuckDB.

dae.pheno.pheno_import.infer_measures(instrument: ImportInstrument, person_ids: list[str], measure_names: list[str], db_names: list[str], inf_configs: list[InferenceConfig], measure_person_values: dict[str, dict[str, Any]]) tuple[dict[str, list[Any]], dict[str, MeasureReport]][source]

Perform inference for measure values of an instrument.

dae.pheno.pheno_import.load_description_file(input_dir: str, config: DataDictionaryConfig) dict[str, str][source]

Load measure descriptions for single data dictionary.

dae.pheno.pheno_import.load_descriptions(input_dir: str, config: MeasureDescriptionsConfig | None) dict[str, str][source]

Load measure descriptions from given configuration.

dae.pheno.pheno_import.load_histogram_configs(input_dir: str, histogram_config_filepath: str | None) MeasureHistogramConfigs | None[source]

Load import histogram configuration file.

dae.pheno.pheno_import.load_inference_configs(input_dir: str, inference_config_filepath: str | None) dict[str, Any][source]

Load import inference configuration file.

dae.pheno.pheno_import.main(argv: list[str] | None = None) int[source]

Run phenotype import tool.

dae.pheno.pheno_import.merge_histogram_configs(histogram_configs: MeasureHistogramConfigs | None, measure_report: MeasureReport) NullHistogramConfig | CategoricalHistogramConfig | NumberHistogramConfig | None[source]

Merge configs by order of specificity

dae.pheno.pheno_import.merge_inference_configs(inference_configs: dict[str, Any], instrument_name: str, measure_name: str) InferenceConfig[source]

Merge configs by order of specificity

dae.pheno.pheno_import.open_file(filepath: Path) TextIO[source]
dae.pheno.pheno_import.pheno_cli_parser() ArgumentParser[source]

Construct argument parser for phenotype import tool.

dae.pheno.pheno_import.read_and_classify_measure(instrument: ImportInstrument, measure_names: list[str], descriptions: dict[str, str], import_config: PhenoImportConfig, db_names: list[str], inf_configs: list[InferenceConfig], hist_configs: MeasureHistogramConfigs | None) tuple[str, Path, Path][source]

Read a measure’s values and classify from an instrument file.

dae.pheno.pheno_import.read_instrument_measure_names(instruments: list[ImportInstrument]) dict[str, list[str]][source]

Read the headers of all the instrument files.

dae.pheno.pheno_import.read_pedigree(connection: DuckDBPyConnection, input_dir: str, pedigree_filepath: str) DataFrame[source]

Read a pedigree file into a pandas DataFrame

Also imports the pedigree data into the database.

dae.pheno.pheno_import.transform_cli_args(args: Namespace) PhenoImportConfig[source]

Create a pheno import config instance from CLI arguments.

dae.pheno.pheno_import.write_reports_to_parquet(output_file: Path, reports: dict[str, MeasureReport], descriptions: dict[str, str], hist_configs: MeasureHistogramConfigs | None) Path[source]

Write inferred instrument measure values to parquet file.

dae.pheno.pheno_import.write_results(connection: DuckDBPyConnection, instrument_pq_files: dict[str, tuple[Path, Path]], ped_df: DataFrame) None[source]

Write imported data into duckdb as measure value tables.

dae.pheno.pheno_import.write_to_parquet(instrument_name: str, filepath: Path, reports: dict[str, MeasureReport], values_table: dict[str, list[Any]]) Path[source]

Write inferred instrument measure values to parquet file.

dae.pheno.prepare_data module

class dae.pheno.prepare_data.PreparePhenoBrowserBase(pheno_db_dir: Path, storage_registry: PhenotypeStorageRegistry, phenotype_data: PhenotypeData, browser: PhenoBrowser, cache_dir: Path | None = None, images_dir: Path | None = None, pheno_regressions: Box | None = None)[source]

Bases: object

Prepares phenotype data for the phenotype browser.

LARGE_DPI = 150
SMALL_DPI = 16
add_measure_task(graph: TaskGraph, measure: Measure, pheno_dir: str, storage_registry: PhenotypeStorageRegistry, cache_dir: str) None[source]

Add task for building browser data to the task graph.

classmethod browsable_figure_path(pheno_id: str, measure: Measure, suffix: str) str[source]

Construct file path for storing a measure figures.

classmethod build_regression(phenotype_data: PhenotypeData, images_dir: str, dependent_measure: Measure, independent_measure: Measure, jitter: float) dict[str, str | float][source]

Build measure regressiongs.

classmethod build_values_categorical_distribution(pheno_id: str, images_dir: str, df: DataFrame, measure: Measure) dict[str, Any][source]

Build a categorical value distribution fiugre.

classmethod build_values_ordinal_distribution(pheno_id: str, images_dir: str, df: DataFrame, measure: Measure) dict[str, Any][source]

Build an ordinal value distribution figure.

classmethod build_values_violinplot(pheno_id: str, images_dir: str, df: DataFrame, measure: Measure) dict[str, Any][source]

Build a violin plot figure for the measure.

collect_child_configs(study: PhenotypeGroup) dict[str, dict][source]

Collect child configurations

classmethod do_measure_build(pheno_id: str, measure: Measure, storage_registry: PhenotypeStorageRegistry, images_dir: str, regression_measures: dict[str, tuple[Box, Measure]], pheno_dir: str, cache_dir: str) tuple[dict[str, Any], list[dict[str, Any]] | None][source]

Create images and regressions for a given measure.

classmethod figure_filepath(pheno_id: str, images_dir: str, measure: Measure, suffix: str) str[source]

Construct file path for storing a measure figures.

get_regression_measures(measure: Measure) dict[str, tuple[Box, Measure]][source]

Collect all regressions for a given measure.

run(**kwargs: Any) None[source]

Run browser preparations for all measures in a phenotype data.

classmethod save_fig(pheno_id: str, images_dir: str, measure: Measure, suffix: str) tuple[str | None, str | None][source]

Save measure figures.

dae.pheno.registry module

class dae.pheno.registry.PhenoRegistry(storage_registry: PhenotypeStorageRegistry, configurations: list[dict] | None = None, browser_cache_path: Path | None = None)[source]

Bases: object

Class for managing runtime instances of phenotype data.

Requires a PhenotypeStorageRegistry to function.

The registry has 2 main operations, register and get. Registering requires a study configuration and makes the registry aware of a phenotype study’s existence, making it loadable.

Getting a phenotype data requires the ID and will perform a load if necessary.

Both operations are synchronized and use a mutex to prevent faulty reads or duplicate loads of a phenotype data.

CACHE_LOCK = <unlocked _thread.lock object>
get_all_phenotype_data(*, lock: bool = True) list[PhenotypeData][source]

Return all registered phenotype data.

get_phenotype_data(data_id: str, *, lock: bool = True) PhenotypeData[source]

Return an instance of phenotype data from the registry.

If the phenotype data hasn’t been loaded it, load and cache.

get_phenotype_data_config(data_id: str) dict | None[source]
get_phenotype_data_ids(*, lock: bool = True) list[str][source]
has_phenotype_data(data_id: str, *, lock: bool = True) bool[source]
static load_configurations(pheno_data_dir: str) list[dict][source]
register_study_config(study_config: dict, *, lock: bool = True) None[source]

Register a configuration as a loadable phenotype data.

shutdown() None[source]

Shutdown the registry and all loaded phenotype data.

dae.pheno.storage module

class dae.pheno.storage.PhenotypeStorage(storage_config: dict[str, Any])[source]

Bases: object

Class that manages phenotype data storage directories.

build_phenotype_study(study_config: dict, browser_cache_path: Path | None) PhenotypeStudy[source]

Create a phenotype study object from a configuration.

static from_config(storage_config: dict[str, Any]) PhenotypeStorage[source]
shutdown() None[source]
class dae.pheno.storage.PhenotypeStorageRegistry[source]

Bases: object

Class that manages phenotype storages.

get_all_phenotype_storage_ids() list[str][source]

Return list of all registered phenotype storage IDs.

get_all_phenotype_storages() list[PhenotypeStorage][source]

Return list of registered phenotype storages.

get_default_phenotype_storage() PhenotypeStorage[source]

Return the default phenotype storage if one is defined.

Otherwise, return None.

get_phenotype_storage(storage_id: str) PhenotypeStorage[source]

Return phenotype storage with specified storage_id.

If the method can not find storage with the specified ID, it will raise ValueError exception.

register_default_storage(phenotype_storage: PhenotypeStorage) None[source]

Register a phenotype storage and make it the default storage.

register_phenotype_storage(storage: PhenotypeStorage) PhenotypeStorage[source]

Register a phenotype storage instance.

register_storage_config(storage_config: dict[str, Any]) PhenotypeStorage[source]

Create a phenotype storage using storage config and registers it.

register_storages_configs(phenotype_storages_config: dict[str, Any]) None[source]

Create and register all phenotype storages defined in config.

When defining a GPF instance, we specify a phenotype_storage section in the configuration. If you pass this whole configuration section to this method, it will create and register all phenotype storages defined in that configuration section.

shutdown() None[source]

Module contents