dae.pheno package
Subpackages
- dae.pheno.prepare package
- Submodules
- dae.pheno.prepare.measure_classifier module
Convertible
InferenceReport
InferenceReport.count_total
InferenceReport.count_unique_values
InferenceReport.count_with_values
InferenceReport.count_without_values
InferenceReport.histogram_type
InferenceReport.max_value
InferenceReport.min_individuals
InferenceReport.min_value
InferenceReport.model_computed_fields
InferenceReport.model_config
InferenceReport.model_fields
InferenceReport.value_type
InferenceReport.values_domain
convert_to_float()
convert_to_int()
convert_to_numeric()
convert_to_string()
determine_histogram_type()
force_type_inference()
inference_reference_impl()
is_convertible_to_numeric()
is_nan()
- Module contents
- dae.pheno.tests package
- Submodules
- dae.pheno.tests.conftest module
- dae.pheno.tests.test_browser module
- dae.pheno.tests.test_classifier module
- dae.pheno.tests.test_db module
- dae.pheno.tests.test_graphs module
- dae.pheno.tests.test_import_tools module
- dae.pheno.tests.test_lin_regress module
- dae.pheno.tests.test_pheno_data module
- dae.pheno.tests.test_pheno_factory module
- dae.pheno.tests.test_pheno_group module
- dae.pheno.tests.test_pheno_import module
- dae.pheno.tests.test_pheno_regression module
- dae.pheno.tests.test_prepare_data module
- dae.pheno.tests.test_registry module
- dae.pheno.tests.test_storage module
- dae.pheno.tests.test_type_inference module
- Module contents
- dae.pheno.utils package
Submodules
dae.pheno.browser module
- class dae.pheno.browser.PhenoBrowser(dbfile: str, *, read_only: bool = True)[source]
Bases:
object
Class for handling saving and loading of phenotype browser data.
- PAGE_SIZE = 1001
- count_measures(instrument_name: str | None = None, keyword: str | None = None, page: int | None = None) int [source]
Find measures by keyword search.
- static create_browser_tables(conn: DuckDBPyConnection) None [source]
Create tables for the browser DB.
- property has_descriptions: bool
Check if the database has a description data.
- property regression_display_names: dict[str, str]
Return regressions display name.
- property regression_display_names_with_ids: dict[str, Any]
Return regression display names with measure IDs.
- property regression_ids: list[str]
dae.pheno.build_pheno_browser module
- dae.pheno.build_pheno_browser.build_pheno_browser(pheno_db_dir: Path, storage_registry: PhenotypeStorageRegistry, pheno_data: PhenotypeData, cache_dir: Path, images_dir: Path, pheno_regressions: Box | None = None, **kwargs: dict[str, Any]) None [source]
Calculate and save pheno browser values to db.
dae.pheno.common module
- class dae.pheno.common.DataDictionaryConfig(*, path: str, instrument: str | None = None, delimiter: str = '\t', instrument_column: str = 'instrumentName', measure_column: str = 'measureName', description_column: str = 'description')[source]
Bases:
BaseModel
Pydantic model for data dictionary config entries.
- delimiter: str
- description_column: str
- instrument: str | None
- instrument_column: str
- measure_column: str
- model_computed_fields: ClassVar[dict[str, ComputedFieldInfo]] = {}
A dictionary of computed field names and their corresponding ComputedFieldInfo objects.
- model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- model_fields: ClassVar[dict[str, FieldInfo]] = {'delimiter': FieldInfo(annotation=str, required=False, default='\t'), 'description_column': FieldInfo(annotation=str, required=False, default='description'), 'instrument': FieldInfo(annotation=Union[str, NoneType], required=False, default=None), 'instrument_column': FieldInfo(annotation=str, required=False, default='instrumentName'), 'measure_column': FieldInfo(annotation=str, required=False, default='measureName'), 'path': FieldInfo(annotation=str, required=True)}
Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo].
This replaces Model.__fields__ from Pydantic V1.
- path: str
- class dae.pheno.common.DestinationConfig(*, storage_id: str | None = None, storage_dir: str | None = None)[source]
Bases:
BaseModel
- model_computed_fields: ClassVar[dict[str, ComputedFieldInfo]] = {}
A dictionary of computed field names and their corresponding ComputedFieldInfo objects.
- model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- model_fields: ClassVar[dict[str, FieldInfo]] = {'storage_dir': FieldInfo(annotation=Union[str, NoneType], required=False, default=None), 'storage_id': FieldInfo(annotation=Union[str, NoneType], required=False, default=None)}
Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo].
This replaces Model.__fields__ from Pydantic V1.
- storage_dir: str | None
- storage_id: str | None
- class dae.pheno.common.GPFInstanceConfig(*, path: str)[source]
Bases:
BaseModel
- model_computed_fields: ClassVar[dict[str, ComputedFieldInfo]] = {}
A dictionary of computed field names and their corresponding ComputedFieldInfo objects.
- model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- model_fields: ClassVar[dict[str, FieldInfo]] = {'path': FieldInfo(annotation=str, required=True)}
Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo].
This replaces Model.__fields__ from Pydantic V1.
- path: str
- class dae.pheno.common.ImportManifest(*, unix_timestamp: float, import_config: PhenoImportConfig)[source]
Bases:
BaseModel
Import manifest for checking cache validity.
- static create_table(connection: DuckDBPyConnection, table: Table)[source]
Create table for recording import manifests.
- static from_row(row: tuple[str, Any, str]) ImportManifest [source]
- static from_table(connection: DuckDBPyConnection, table: Table) list[ImportManifest] [source]
Read manifests from given table.
- import_config: PhenoImportConfig
- is_older_than(other: ImportManifest) bool [source]
- model_computed_fields: ClassVar[dict[str, ComputedFieldInfo]] = {}
A dictionary of computed field names and their corresponding ComputedFieldInfo objects.
- model_config: ClassVar[ConfigDict] = {}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- model_fields: ClassVar[dict[str, FieldInfo]] = {'import_config': FieldInfo(annotation=PhenoImportConfig, required=True), 'unix_timestamp': FieldInfo(annotation=float, required=True)}
Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo].
This replaces Model.__fields__ from Pydantic V1.
- unix_timestamp: float
- static write_to_db(connection: DuckDBPyConnection, table: Table, import_config: PhenoImportConfig)[source]
Write manifest into DB on given table.
- class dae.pheno.common.InferenceConfig(*, min_individuals: int = 1, non_numeric_cutoff: float = 0.06, value_max_len: int = 32, continuous: RankRange = RankRange(min_rank=10, max_rank=None), ordinal: RankRange = RankRange(min_rank=1, max_rank=None), categorical: RankRange = RankRange(min_rank=1, max_rank=15), skip: bool = False, value_type: str | None = None, histogram_type: str | None = None)[source]
Bases:
BaseModel
Classification inference configuration class.
- histogram_type: str | None
- min_individuals: int
- model_computed_fields: ClassVar[dict[str, ComputedFieldInfo]] = {}
A dictionary of computed field names and their corresponding ComputedFieldInfo objects.
- model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- model_fields: ClassVar[dict[str, FieldInfo]] = {'categorical': FieldInfo(annotation=RankRange, required=False, default=RankRange(min_rank=1, max_rank=15)), 'continuous': FieldInfo(annotation=RankRange, required=False, default=RankRange(min_rank=10, max_rank=None)), 'histogram_type': FieldInfo(annotation=Union[str, NoneType], required=False, default=None), 'min_individuals': FieldInfo(annotation=int, required=False, default=1), 'non_numeric_cutoff': FieldInfo(annotation=float, required=False, default=0.06), 'ordinal': FieldInfo(annotation=RankRange, required=False, default=RankRange(min_rank=1, max_rank=None)), 'skip': FieldInfo(annotation=bool, required=False, default=False), 'value_max_len': FieldInfo(annotation=int, required=False, default=32), 'value_type': FieldInfo(annotation=Union[str, NoneType], required=False, default=None)}
Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo].
This replaces Model.__fields__ from Pydantic V1.
- non_numeric_cutoff: float
- skip: bool
- value_max_len: int
- value_type: str | None
- class dae.pheno.common.InstrumentConfig(*, path: str, instrument: str | None = None, delimiter: str | None = None, person_column: str | None = None)[source]
Bases:
BaseModel
- delimiter: str | None
- instrument: str | None
- model_computed_fields: ClassVar[dict[str, ComputedFieldInfo]] = {}
A dictionary of computed field names and their corresponding ComputedFieldInfo objects.
- model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- model_fields: ClassVar[dict[str, FieldInfo]] = {'delimiter': FieldInfo(annotation=Union[str, NoneType], required=False, default=None), 'instrument': FieldInfo(annotation=Union[str, NoneType], required=False, default=None), 'path': FieldInfo(annotation=str, required=True), 'person_column': FieldInfo(annotation=Union[str, NoneType], required=False, default=None)}
Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo].
This replaces Model.__fields__ from Pydantic V1.
- path: str
- person_column: str | None
- class dae.pheno.common.MeasureDescriptionsConfig(*, files: list[DataDictionaryConfig] | None = None, dictionary: dict[str, dict[str, str]] | None = None)[source]
Bases:
BaseModel
- dictionary: dict[str, dict[str, str]] | None
- files: list[DataDictionaryConfig] | None
- model_computed_fields: ClassVar[dict[str, ComputedFieldInfo]] = {}
A dictionary of computed field names and their corresponding ComputedFieldInfo objects.
- model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- model_fields: ClassVar[dict[str, FieldInfo]] = {'dictionary': FieldInfo(annotation=Union[dict[str, dict[str, str]], NoneType], required=False, default=None), 'files': FieldInfo(annotation=Union[list[DataDictionaryConfig], NoneType], required=False, default=None)}
Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo].
This replaces Model.__fields__ from Pydantic V1.
- class dae.pheno.common.MeasureHistogramConfigs(*, number_config: dict = {}, categorical_config: dict = {})[source]
Bases:
BaseModel
Classification histogram configuration class.
- categorical_config: dict
- model_computed_fields: ClassVar[dict[str, ComputedFieldInfo]] = {}
A dictionary of computed field names and their corresponding ComputedFieldInfo objects.
- model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- model_fields: ClassVar[dict[str, FieldInfo]] = {'categorical_config': FieldInfo(annotation=dict, required=False, default={}), 'number_config': FieldInfo(annotation=dict, required=False, default={})}
Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo].
This replaces Model.__fields__ from Pydantic V1.
- number_config: dict
- class dae.pheno.common.MeasureType(value, names=None, *, module=None, qualname=None, type=None, start=1, boundary=None)[source]
Bases:
Enum
Definition of measure types.
- categorical = 3
- continuous = 1
- static from_str(measure_type: str) MeasureType [source]
- static is_numeric(measure_type: MeasureType) bool [source]
- static is_text(measure_type: MeasureType) bool [source]
- ordinal = 2
- other = 100
- raw = 5
- skipped = 1000
- text = 4
- class dae.pheno.common.PhenoImportConfig(*, id: str, input_dir: str, work_dir: str, instrument_files: list[str | InstrumentConfig], pedigree: str, person_column: str, delimiter: str = ',', destination: DestinationConfig | None = None, gpf_instance: GPFInstanceConfig | None = None, skip_pedigree_measures: bool = False, inference_config: str | dict[str, InferenceConfig] | None = None, histogram_configs: MeasureHistogramConfigs | None = None, data_dictionary: MeasureDescriptionsConfig | None = None, study_config: StudyConfig | None = None)[source]
Bases:
BaseModel
Pheno import config.
- data_dictionary: MeasureDescriptionsConfig | None
- delimiter: str
- destination: DestinationConfig | None
- gpf_instance: GPFInstanceConfig | None
- histogram_configs: MeasureHistogramConfigs | None
- id: str
- inference_config: str | dict[str, InferenceConfig] | None
- input_dir: str
- instrument_files: list[str | InstrumentConfig]
- model_computed_fields: ClassVar[dict[str, ComputedFieldInfo]] = {}
A dictionary of computed field names and their corresponding ComputedFieldInfo objects.
- model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- model_fields: ClassVar[dict[str, FieldInfo]] = {'data_dictionary': FieldInfo(annotation=Union[MeasureDescriptionsConfig, NoneType], required=False, default=None), 'delimiter': FieldInfo(annotation=str, required=False, default=','), 'destination': FieldInfo(annotation=Union[DestinationConfig, NoneType], required=False, default=None), 'gpf_instance': FieldInfo(annotation=Union[GPFInstanceConfig, NoneType], required=False, default=None), 'histogram_configs': FieldInfo(annotation=Union[MeasureHistogramConfigs, NoneType], required=False, default=None), 'id': FieldInfo(annotation=str, required=True), 'inference_config': FieldInfo(annotation=Union[str, dict[str, InferenceConfig], NoneType], required=False, default=None), 'input_dir': FieldInfo(annotation=str, required=True), 'instrument_files': FieldInfo(annotation=list[Union[str, InstrumentConfig]], required=True), 'pedigree': FieldInfo(annotation=str, required=True), 'person_column': FieldInfo(annotation=str, required=True), 'skip_pedigree_measures': FieldInfo(annotation=bool, required=False, default=False), 'study_config': FieldInfo(annotation=Union[StudyConfig, NoneType], required=False, default=None), 'work_dir': FieldInfo(annotation=str, required=True)}
Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo].
This replaces Model.__fields__ from Pydantic V1.
- pedigree: str
- person_column: str
- skip_pedigree_measures: bool
- study_config: StudyConfig | None
- work_dir: str
- class dae.pheno.common.RankRange(*, min_rank: int | None = None, max_rank: int | None = None)[source]
Bases:
BaseModel
- max_rank: int | None
- min_rank: int | None
- model_computed_fields: ClassVar[dict[str, ComputedFieldInfo]] = {}
A dictionary of computed field names and their corresponding ComputedFieldInfo objects.
- model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- model_fields: ClassVar[dict[str, FieldInfo]] = {'max_rank': FieldInfo(annotation=Union[int, NoneType], required=False, default=None), 'min_rank': FieldInfo(annotation=Union[int, NoneType], required=False, default=None)}
Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo].
This replaces Model.__fields__ from Pydantic V1.
- class dae.pheno.common.RegressionMeasure(*, instrument_name: str, measure_names: list[str], jitter: float, display_name: str)[source]
Bases:
BaseModel
- display_name: str
- instrument_name: str
- jitter: float
- measure_names: list[str]
- model_computed_fields: ClassVar[dict[str, ComputedFieldInfo]] = {}
A dictionary of computed field names and their corresponding ComputedFieldInfo objects.
- model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- model_fields: ClassVar[dict[str, FieldInfo]] = {'display_name': FieldInfo(annotation=str, required=True), 'instrument_name': FieldInfo(annotation=str, required=True), 'jitter': FieldInfo(annotation=float, required=True), 'measure_names': FieldInfo(annotation=list[str], required=True)}
Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo].
This replaces Model.__fields__ from Pydantic V1.
- class dae.pheno.common.StudyConfig(*, regressions: str | dict[str, RegressionMeasure] | None = None, common_report: dict[str, Any] | None = None, person_set_collections: dict[str, Any] | None = None)[source]
Bases:
BaseModel
- common_report: dict[str, Any] | None
- model_computed_fields: ClassVar[dict[str, ComputedFieldInfo]] = {}
A dictionary of computed field names and their corresponding ComputedFieldInfo objects.
- model_config: ClassVar[ConfigDict] = {}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- model_fields: ClassVar[dict[str, FieldInfo]] = {'common_report': FieldInfo(annotation=Union[dict[str, Any], NoneType], required=False, default=None), 'person_set_collections': FieldInfo(annotation=Union[dict[str, Any], NoneType], required=False, default=None), 'regressions': FieldInfo(annotation=Union[str, dict[str, RegressionMeasure], NoneType], required=False, default=None)}
Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo].
This replaces Model.__fields__ from Pydantic V1.
- person_set_collections: dict[str, Any] | None
- regressions: str | dict[str, RegressionMeasure] | None
dae.pheno.db module
- class dae.pheno.db.PhenoDb(dbfile: str, *, read_only: bool = True)[source]
Bases:
object
Class that manages access to phenotype databases.
- find_instrument_values_tables() dict[str, Table] [source]
Create instrument values tables.
Each row is basically a list of every measure value in the instrument for a certain person.
- get_measures_df(instrument: str | None = None, measure_type: MeasureType | None = None) DataFrame [source]
Return data frame containing measures information.
instrument – an instrument name which measures should be returned. If not specified all type of measures are returned.
measure_type – a type (‘continuous’, ‘ordinal’ or ‘categorical’) of measures that should be returned. If not specified all type of measures are returned.
Each row in the returned data frame represents given measure.
Columns in the returned data frame are: measure_id, measure_name, instrument_name, description, stats, min_value, max_value, value_domain, has_probands, has_siblings, has_parents, default_filter.
- get_pedigree_df() DataFrame [source]
Return individuals data from phenotype database as a dataframe.
- get_people_measure_values(measure_ids: list[str], person_ids: list[str] | None = None, family_ids: list[str] | None = None, roles: list[Role] | None = None) Generator[dict[str, Any], None, None] [source]
Yield lines from measure values tables.
dae.pheno.graphs module
- class dae.pheno.graphs.GraphColumn(name, roles, status, df)[source]
Bases:
object
Build a column to produce a graph from it.
- property label
- dae.pheno.graphs.draw_categorical_violin_distribution(df, measure_id, *, roles_definition=None, ax=None, numerical_categories=False, max_categories=12)[source]
Draw violin distribution for categorical measures.
- dae.pheno.graphs.draw_linregres(df, col1, col2, jitter: int | None = None, ax=None)[source]
Draw a graph display linear regression between two columns.
- dae.pheno.graphs.draw_measure_violinplot(df, measure_id, roles_definition=None, ax=None)[source]
Draw a violin plot for a measure.
dae.pheno.import_tools module
dae.pheno.pheno_data module
- class dae.pheno.pheno_data.Instrument(name: str)[source]
Bases:
object
Instrument object represents phenotype instruments.
Common fields are:
instrument_name
measures – dictionary of all measures in the instrument
- class dae.pheno.pheno_data.Measure(measure_id: str, name: str)[source]
Bases:
object
Measure objects represent phenotype measures.
Common fields are:
instrument_name
measure_name
measure_id - formed by instrument_name.`measure_name`
measure_type - one of ‘continuous’, ‘ordinal’, ‘categorical’
value_type - one of ‘float’, ‘str’, ‘int’
histogram_type - one of ‘number’, ‘categorical’
histogram_config - one of HistogramConfig or None
description
min_value - for ‘continuous’ and ‘ordinal’ measures
max_value - for ‘continuous’ and ‘ordinal’ measures
values_domain - string that represents the values
- property domain: Sequence[str | float]
Return measure values domain.
- class dae.pheno.pheno_data.PhenotypeData(pheno_id: str, config: dict | None = None, cache_path: Path | None = None)[source]
Bases:
ABC
,CommonStudyMixin
Base class for all phenotype data studies and datasets.
- property browser: PhenoBrowser | None
Get or create pheno browser for phenotype data.
- build_and_save(*, force: bool = False) CommonReport | None [source]
Build a common report for a study, saves it and returns the report.
If the common reports are disabled for the study, the function skips building the report and returns None.
If the report already exists the default behavior is to skip building the report. You can force building the report by passing force=True to the function.
- build_report() CommonReport [source]
Generate common report JSON from genotpye data study.
- abstract count_measures(instrument: str | None, search_term: str | None, page: int | None = None) int [source]
Count measures in the DB according to filters.
- static create_browser(pheno_data: PhenotypeData, *, read_only: bool = True) PhenoBrowser [source]
Load pheno browser from pheno configuration.
- property families: FamiliesData
- abstract generate_import_manifests() list[ImportManifest] [source]
Collect all manifests in a phenotype data instance.
- abstract get_children_ids(*, leaves: bool = True) list[str] [source]
Return all phenotype studies’ ids in the group.
- get_common_report() CommonReport | None [source]
Return a study’s common report.
- get_instrument_measures(instrument_name: str) list[str] [source]
Return measures for given instrument.
- get_measure_description(measure_id: str) dict[str, Any] [source]
Construct and return a measure description.
- get_measures(instrument_name: str | None = None, measure_type: MeasureType | None = None) dict[str, Measure] [source]
Return a dictionary of measures objects.
instrument_name – an instrument name which measures should be returned. If not specified all type of measures are returned.
measure_type – a type (‘continuous’, ‘ordinal’ or ‘categorical’) of measures that should be returned. If not specified all type of measures are returned.
- abstract get_people_measure_values(measure_ids: list[str], person_ids: list[str] | None = None, family_ids: list[str] | None = None, roles: list[Role] | None = None) Generator[dict[str, Any], None, None] [source]
Collect and format the values of the given measures in dict format.
Yields a dict representing every row.
measure_ids – list of measure ids which values should be returned.
person_ids – list of person IDs to filter result. Only data for individuals with person_id in the list person_ids are returned.
family_ids – list of family IDs to filter result. Only data for individuals that are members of any of the specified family_ids are returned.
roles – list of roles of individuals to select measure value for. If not specified value for individuals in all roles are returned.
- get_people_measure_values_df(measure_ids: list[str], person_ids: list[str] | None = None, family_ids: list[str] | None = None, roles: list[Role] | None = None) DataFrame [source]
Collect and format the values of the given measures in a dataframe.
measure_ids – list of measure ids which values should be returned.
person_ids – list of person IDs to filter result. Only data for individuals with person_id in the list person_ids are returned.
family_ids – list of family IDs to filter result. Only data for individuals that are members of any of the specified family_ids are returned.
roles – list of roles of individuals to select measure value for. If not specified value for individuals in all roles are returned.
- get_person_roles() list[str] [source]
Return individuals distinct role data from phenotype database.
- get_person_set_collection(person_set_collection_id: str | None) PersonSetCollection | None [source]
- property instruments: dict[str, Instrument]
- is_browser_outdated(browser: PhenoBrowser) bool [source]
Check if a rebuild is required according to manifests.
- property is_group: bool
- property person_set_collections: dict[str, PersonSetCollection]
- property pheno_id: str
- class dae.pheno.pheno_data.PhenotypeGroup(pheno_id: str, config: dict | None, children: list[PhenotypeData], cache_path: Path | None = None)[source]
Bases:
PhenotypeData
Represents a group of phenotype data studies or groups.
- count_measures(instrument: str | None, search_term: str | None, page: int | None = None) int [source]
Count measures in the DB according to filters.
- property families: FamiliesData
- generate_import_manifests() list[ImportManifest] [source]
Collect all manifests in a phenotype data instance.
- get_children_ids(*, leaves: bool = True) list[str] [source]
Return all phenotype studies’ ids in the group.
- get_leaves() list[PhenotypeStudy] [source]
Return all phenotype studies in the group.
- get_people_measure_values(measure_ids: list[str], person_ids: list[str] | None = None, family_ids: list[str] | None = None, roles: list[Role] | None = None) Generator[dict[str, Any], None, None] [source]
Collect and format the values of the given measures in dict format.
Yields a dict representing every row.
measure_ids – list of measure ids which values should be returned.
person_ids – list of person IDs to filter result. Only data for individuals with person_id in the list person_ids are returned.
family_ids – list of family IDs to filter result. Only data for individuals that are members of any of the specified family_ids are returned.
roles – list of roles of individuals to select measure value for. If not specified value for individuals in all roles are returned.
- get_people_measure_values_df(measure_ids: list[str], person_ids: list[str] | None = None, family_ids: list[str] | None = None, roles: list[Role] | None = None) DataFrame [source]
Collect and format the values of the given measures in a dataframe.
measure_ids – list of measure ids which values should be returned.
person_ids – list of person IDs to filter result. Only data for individuals with person_id in the list person_ids are returned.
family_ids – list of family IDs to filter result. Only data for individuals that are members of any of the specified family_ids are returned.
roles – list of roles of individuals to select measure value for. If not specified value for individuals in all roles are returned.
- get_person_roles() list[str] [source]
Return individuals distinct role data from phenotype database.
- property is_group: bool
- property person_set_collections: dict[str, PersonSetCollection]
- class dae.pheno.pheno_data.PhenotypeStudy(pheno_id: str, dbfile: str, config: dict | None = None, *, read_only: bool = True, cache_path: Path | None = None)[source]
Bases:
PhenotypeData
Main class for accessing phenotype database in DAE.
To access the phenotype database create an instance of this class and call the method load().
Common fields of this class are:
persons – list of all individuals in the database
instruments – dictionary of all instruments
measures – dictionary of all measures
- count_measures(instrument: str | None, search_term: str | None, page: int | None = None) int [source]
Count measures in the DB according to filters.
- property families: FamiliesData
- generate_import_manifests() list[ImportManifest] [source]
Collect all manifests in a phenotype data instance.
- get_children_ids(*, leaves: bool = True) list[str] [source]
Return all phenotype studies’ ids in the group.
- get_people_measure_values(measure_ids: list[str], person_ids: list[str] | None = None, family_ids: list[str] | None = None, roles: list[Role] | None = None) Generator[dict[str, Any], None, None] [source]
Collect and format the values of the given measures in dict format.
Yields a dict representing every row.
measure_ids – list of measure ids which values should be returned.
person_ids – list of person IDs to filter result. Only data for individuals with person_id in the list person_ids are returned.
family_ids – list of family IDs to filter result. Only data for individuals that are members of any of the specified family_ids are returned.
roles – list of roles of individuals to select measure value for. If not specified value for individuals in all roles are returned.
- get_people_measure_values_df(measure_ids: list[str], person_ids: list[str] | None = None, family_ids: list[str] | None = None, roles: list[Role] | None = None) DataFrame [source]
Collect and format the values of the given measures in a dataframe.
measure_ids – list of measure ids which values should be returned.
person_ids – list of person IDs to filter result. Only data for individuals with person_id in the list person_ids are returned.
family_ids – list of family IDs to filter result. Only data for individuals that are members of any of the specified family_ids are returned.
roles – list of roles of individuals to select measure value for. If not specified value for individuals in all roles are returned.
- property person_set_collections: dict[str, PersonSetCollection]
dae.pheno.pheno_import module
- class dae.pheno.pheno_import.ImportInstrument(files: list[pathlib.Path], name: str, delimiter: str, person_column: str)[source]
Bases:
object
- delimiter: str
- files: list[Path]
- name: str
- person_column: str
- class dae.pheno.pheno_import.MeasureReport(*, measure_name: str, instrument_name: str, db_name: str, measure_type: MeasureType, inference_report: InferenceReport)[source]
Bases:
BaseModel
- db_name: str
- inference_report: InferenceReport
- instrument_name: str
- measure_name: str
- measure_type: MeasureType
- model_computed_fields: ClassVar[dict[str, ComputedFieldInfo]] = {}
A dictionary of computed field names and their corresponding ComputedFieldInfo objects.
- model_config: ClassVar[ConfigDict] = {}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- model_fields: ClassVar[dict[str, FieldInfo]] = {'db_name': FieldInfo(annotation=str, required=True), 'inference_report': FieldInfo(annotation=InferenceReport, required=True), 'instrument_name': FieldInfo(annotation=str, required=True), 'measure_name': FieldInfo(annotation=str, required=True), 'measure_type': FieldInfo(annotation=MeasureType, required=True)}
Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo].
This replaces Model.__fields__ from Pydantic V1.
- dae.pheno.pheno_import.add_pheno_common_inference(config: dict[str, Any]) None [source]
Add pedigree columns as skipped columns to the inference config.
- dae.pheno.pheno_import.collect_instruments(import_config: PhenoImportConfig) list[ImportInstrument] [source]
Collect all instrument files for a given import config.
- dae.pheno.pheno_import.create_import_tasks(task_graph: TaskGraph, instruments: list[ImportInstrument], instrument_measure_names: dict[str, list[str]], inference_configs: dict[str, Any], histogram_configs: MeasureHistogramConfigs | None, import_config: PhenoImportConfig, descriptions: dict[str, str]) None [source]
Add measure tasks for importing pheno data.
- dae.pheno.pheno_import.create_tables(connection: DuckDBPyConnection) None [source]
Create phenotype data tables in DuckDB.
- dae.pheno.pheno_import.determine_destination(gpf_instance: GPFInstance | None, config: PhenoImportConfig) tuple[str | None, Path | None, Path | None] [source]
Determine where output should be placed based on configuration.
- dae.pheno.pheno_import.generate_phenotype_data_config(pheno_name: str, storage_id: str | None, overrides: StudyConfig | None) str [source]
Construct phenotype data configuration from command line arguments.
- dae.pheno.pheno_import.get_gpf_instance(config: PhenoImportConfig) GPFInstance | None [source]
Return a GPF instance for an import config if it can be found.
- dae.pheno.pheno_import.get_output_parquet_files_dir(import_config: PhenoImportConfig) Path [source]
- dae.pheno.pheno_import.handle_measure_inference_tasks(task_graph: TaskGraph, task_cache: TaskCache, task_graph_args: Namespace) dict[str, tuple[Path, Path]] [source]
Read the output of the measure inference tasks into dictionaries.
- dae.pheno.pheno_import.import_pheno_data(config: PhenoImportConfig, gpf_instance: GPFInstance | None = None, task_graph_args: Namespace | None = None) None [source]
Import pheno data into DuckDB.
- dae.pheno.pheno_import.infer_measures(instrument: ImportInstrument, person_ids: list[str], measure_names: list[str], db_names: list[str], inf_configs: list[InferenceConfig], measure_person_values: dict[str, dict[str, Any]]) tuple[dict[str, list[Any]], dict[str, MeasureReport]] [source]
Perform inference for measure values of an instrument.
- dae.pheno.pheno_import.load_description_file(input_dir: str, config: DataDictionaryConfig) dict[str, str] [source]
Load measure descriptions for single data dictionary.
- dae.pheno.pheno_import.load_descriptions(input_dir: str, config: MeasureDescriptionsConfig | None) dict[str, str] [source]
Load measure descriptions from given configuration.
- dae.pheno.pheno_import.load_histogram_configs(input_dir: str, histogram_config_filepath: str | None) MeasureHistogramConfigs | None [source]
Load import histogram configuration file.
- dae.pheno.pheno_import.load_inference_configs(input_dir: str, inference_config_filepath: str | None) dict[str, Any] [source]
Load import inference configuration file.
- dae.pheno.pheno_import.merge_histogram_configs(histogram_configs: MeasureHistogramConfigs | None, measure_report: MeasureReport) NullHistogramConfig | CategoricalHistogramConfig | NumberHistogramConfig | None [source]
Merge configs by order of specificity
- dae.pheno.pheno_import.merge_inference_configs(inference_configs: dict[str, Any], instrument_name: str, measure_name: str) InferenceConfig [source]
Merge configs by order of specificity
- dae.pheno.pheno_import.pheno_cli_parser() ArgumentParser [source]
Construct argument parser for phenotype import tool.
- dae.pheno.pheno_import.read_and_classify_measure(instrument: ImportInstrument, measure_names: list[str], descriptions: dict[str, str], import_config: PhenoImportConfig, db_names: list[str], inf_configs: list[InferenceConfig], hist_configs: MeasureHistogramConfigs | None) tuple[str, Path, Path] [source]
Read a measure’s values and classify from an instrument file.
- dae.pheno.pheno_import.read_instrument_measure_names(instruments: list[ImportInstrument]) dict[str, list[str]] [source]
Read the headers of all the instrument files.
- dae.pheno.pheno_import.read_pedigree(connection: DuckDBPyConnection, input_dir: str, pedigree_filepath: str) DataFrame [source]
Read a pedigree file into a pandas DataFrame
Also imports the pedigree data into the database.
- dae.pheno.pheno_import.transform_cli_args(args: Namespace) PhenoImportConfig [source]
Create a pheno import config instance from CLI arguments.
- dae.pheno.pheno_import.write_reports_to_parquet(output_file: Path, reports: dict[str, MeasureReport], descriptions: dict[str, str], hist_configs: MeasureHistogramConfigs | None) Path [source]
Write inferred instrument measure values to parquet file.
- dae.pheno.pheno_import.write_results(connection: DuckDBPyConnection, instrument_pq_files: dict[str, tuple[Path, Path]], ped_df: DataFrame) None [source]
Write imported data into duckdb as measure value tables.
- dae.pheno.pheno_import.write_to_parquet(instrument_name: str, filepath: Path, reports: dict[str, MeasureReport], values_table: dict[str, list[Any]]) Path [source]
Write inferred instrument measure values to parquet file.
dae.pheno.prepare_data module
- class dae.pheno.prepare_data.PreparePhenoBrowserBase(pheno_db_dir: Path, storage_registry: PhenotypeStorageRegistry, phenotype_data: PhenotypeData, browser: PhenoBrowser, cache_dir: Path | None = None, images_dir: Path | None = None, pheno_regressions: Box | None = None)[source]
Bases:
object
Prepares phenotype data for the phenotype browser.
- LARGE_DPI = 150
- SMALL_DPI = 16
- add_measure_task(graph: TaskGraph, measure: Measure, pheno_dir: str, storage_registry: PhenotypeStorageRegistry, cache_dir: str) None [source]
Add task for building browser data to the task graph.
- classmethod browsable_figure_path(pheno_id: str, measure: Measure, suffix: str) str [source]
Construct file path for storing a measure figures.
- classmethod build_regression(phenotype_data: PhenotypeData, images_dir: str, dependent_measure: Measure, independent_measure: Measure, jitter: float) dict[str, str | float] [source]
Build measure regressiongs.
- classmethod build_values_categorical_distribution(pheno_id: str, images_dir: str, df: DataFrame, measure: Measure) dict[str, Any] [source]
Build a categorical value distribution fiugre.
- classmethod build_values_ordinal_distribution(pheno_id: str, images_dir: str, df: DataFrame, measure: Measure) dict[str, Any] [source]
Build an ordinal value distribution figure.
- classmethod build_values_violinplot(pheno_id: str, images_dir: str, df: DataFrame, measure: Measure) dict[str, Any] [source]
Build a violin plot figure for the measure.
- collect_child_configs(study: PhenotypeGroup) dict[str, dict] [source]
Collect child configurations
- classmethod do_measure_build(pheno_id: str, measure: Measure, storage_registry: PhenotypeStorageRegistry, images_dir: str, regression_measures: dict[str, tuple[Box, Measure]], pheno_dir: str, cache_dir: str) tuple[dict[str, Any], list[dict[str, Any]] | None] [source]
Create images and regressions for a given measure.
- classmethod figure_filepath(pheno_id: str, images_dir: str, measure: Measure, suffix: str) str [source]
Construct file path for storing a measure figures.
dae.pheno.registry module
- class dae.pheno.registry.PhenoRegistry(storage_registry: PhenotypeStorageRegistry, configurations: list[dict] | None = None, browser_cache_path: Path | None = None)[source]
Bases:
object
Class for managing runtime instances of phenotype data.
Requires a PhenotypeStorageRegistry to function.
The registry has 2 main operations, register and get. Registering requires a study configuration and makes the registry aware of a phenotype study’s existence, making it loadable.
Getting a phenotype data requires the ID and will perform a load if necessary.
Both operations are synchronized and use a mutex to prevent faulty reads or duplicate loads of a phenotype data.
- CACHE_LOCK = <unlocked _thread.lock object>
- get_all_phenotype_data(*, lock: bool = True) list[PhenotypeData] [source]
Return all registered phenotype data.
- get_phenotype_data(data_id: str, *, lock: bool = True) PhenotypeData [source]
Return an instance of phenotype data from the registry.
If the phenotype data hasn’t been loaded it, load and cache.
dae.pheno.storage module
- class dae.pheno.storage.PhenotypeStorage(storage_config: dict[str, Any])[source]
Bases:
object
Class that manages phenotype data storage directories.
- build_phenotype_study(study_config: dict, browser_cache_path: Path | None) PhenotypeStudy [source]
Create a phenotype study object from a configuration.
- static from_config(storage_config: dict[str, Any]) PhenotypeStorage [source]
- class dae.pheno.storage.PhenotypeStorageRegistry[source]
Bases:
object
Class that manages phenotype storages.
- get_all_phenotype_storage_ids() list[str] [source]
Return list of all registered phenotype storage IDs.
- get_all_phenotype_storages() list[PhenotypeStorage] [source]
Return list of registered phenotype storages.
- get_default_phenotype_storage() PhenotypeStorage [source]
Return the default phenotype storage if one is defined.
Otherwise, return None.
- get_phenotype_storage(storage_id: str) PhenotypeStorage [source]
Return phenotype storage with specified storage_id.
If the method can not find storage with the specified ID, it will raise ValueError exception.
- register_default_storage(phenotype_storage: PhenotypeStorage) None [source]
Register a phenotype storage and make it the default storage.
- register_phenotype_storage(storage: PhenotypeStorage) PhenotypeStorage [source]
Register a phenotype storage instance.
- register_storage_config(storage_config: dict[str, Any]) PhenotypeStorage [source]
Create a phenotype storage using storage config and registers it.
- register_storages_configs(phenotype_storages_config: dict[str, Any]) None [source]
Create and register all phenotype storages defined in config.
When defining a GPF instance, we specify a phenotype_storage section in the configuration. If you pass this whole configuration section to this method, it will create and register all phenotype storages defined in that configuration section.