DM

class pylluminator.dm.DM(samples: Samples, formula: str, reference_value: dict | None = None, custom_sheet: None | DataFrame = None, drop_na=False, apply_mask=True, probe_ids: None | list[str] = None, group_column: str | None = None)

Bases: object

Methods

`__init__`(samples, formula[, ...])	Initialize the object by calcating the Differentially Methylated Probes (DMP). It fits an Ordinary Least
`compute_dmp`(samples, formula[, ...])	Find Differentially Methylated Probes (DMP) by fitting an Ordinary Least Square model (OLS) for each probe, following the given formula.
`compute_dmr`([contrast, dist_cutoff, ...])	Find Differentially Methylated Regions (DMR) based on euclidian distance between beta values
`get_top`(dm_type, contrast[, chromosome_col, ...])	Get the top DMRs from the dataframe returned by get_dmr(), ranked by the p-value of the given contrast.

Methods and attributes detail

__init__(samples: Samples, formula: str, reference_value: dict | None = None, custom_sheet: None | DataFrame = None, drop_na=False, apply_mask=True, probe_ids: None | list[str] = None, group_column: str | None = None)

Initialize the object by calcating the Differentially Methylated Probes (DMP). It fits an Ordinary Least

Square model (OLS) for each probe, following the given formula. If a group column name is given, use a Mixed Model to account for random effects.

More info on design matrices and formulas:

Parameters:

samples (Samples) – samples to use
formula (str) – R-like formula used in the design matrix to describe the statistical model. e.g. ‘~age + sex’
reference_value (dict | None) – reference value for each factor. Dictionary where keys are the factor names, and values are their reference value. Default: None
custom_sheet (pandas.DataFrame) – a sample sheet to use. By default, use the samples’ sheet. Useful if you want to filter the samples to display
drop_na (bool) – drop probes that have NA values. Default: False
apply_mask (bool) – set to True to apply mask. Default: True
probe_ids (list[str] | None) – list of probe IDs to use. Useful to work on a subset for testing purposes. Default: None
group_column (str | None) – name of the column of the sample sheet that holds replicates information. If provided, a Mixed Model will be used to account for replicates instead of an Ordinary Least Square. Default: None

Returns:

dataframe with probes as rows and p_vales and model estimates in columns, list of contrast levels

Return type:

pandas.DataFrame, list[str]

compute_dmp(samples: Samples, formula: str, reference_value: dict | None = None, custom_sheet: None | DataFrame = None, drop_na=False, apply_mask=True, probe_ids: None | list[str] = None, group_column: str | None = None)

Find Differentially Methylated Probes (DMP) by fitting an Ordinary Least Square model (OLS) for each probe, following the given formula. If a group column name is given, use a Mixed Model to account for random effects.

More info on design matrices and formulas:

Parameters:

samples (Samples) – samples to use
formula (str) – R-like formula used in the design matrix to describe the statistical model. e.g. ‘~age + sex’
reference_value (dict | None) – reference value for each factor. Dictionary where keys are the factor names, and values are their reference value. Default: None
custom_sheet (pandas.DataFrame) – a sample sheet to use. By default, use the samples’ sheet. Useful if you want to filter the samples to display
drop_na (bool) – drop probes that have NA values. Default: False
apply_mask (bool) – set to True to apply mask. Default: True
probe_ids (list[str] | None) – list of probe IDs to use. Useful to work on a subset for testing purposes. Default: None
group_column (str | None) – name of the column of the sample sheet that holds replicates information. If provided, a Mixed Model will be used to account for replicates instead of an Ordinary Least Square. Default: None

Returns:

dataframe with probes as rows and p_vales and model estimates in columns, list of contrast levels

Return type:

pandas.DataFrame, list[str]

compute_dmr(contrast: str | list[str] | None = None, dist_cutoff: float | None = None, seg_per_locus: float = 0.5, probe_ids: None | list[str] = None)

Find Differentially Methylated Regions (DMR) based on euclidian distance between beta values

Parameters:

contrast (str | list[str] | None) – contrast(s) to use for DMR detection
dist_cutoff (float | None) – cutoff used to find change points between DMRs, used on euclidian distance between beta values. If set to None (default) will be calculated depending on seg_per_locus parameter value. Default: None
seg_per_locus (float) – used if dist_cutoff is not set : defines what quartile should be used as a distance cut-off. Higher values leads to more segments. Should be 0 < seg_per_locus < 1. Default: 0.5.
probe_ids (list[str] | None) – list of probe IDs to use. Useful to work on a subset for testing purposes. Default: None

get_top(dm_type: DM_TYPE | str, contrast: str, chromosome_col='chromosome', annotation_col: str = 'genes', n_dms=10, columns_to_keep: list[str] = None) → DataFrame | None

Get the top DMRs from the dataframe returned by get_dmr(), ranked by the p-value of the given contrast. If an annotation is provided, the DMRs will be annotated with the genes associated with the probes in the DMR.

Parameters:

dm_type (DM_TYPE | str) – type of Differentially Methylated object to get (DMR or DMP).
contrast (str) – contrast to use for ranking the DMRs
chromosome_col (str) – name of the column holding the chromosome information. Default: ‘chromosome’
annotation_col (str) – name of the column holding the annotation information. Default: ‘genes’
n_dms (int) – number of DM probes/segments to return. Default: 10
columns_to_keep (list[str] | None) – list of columns to keep in the output dataframe. Default: None

Returns:

dataframe with the top DMRs

Return type:

pandas.DataFrame | None