DM

class pylluminator.dm.DM(samples: Samples, formula: str, reference_value: dict | None = None, custom_sheet: None | DataFrame = None, drop_na=False, apply_mask=True, probe_ids: None | list[str] = None, group_column: str | None = None)

Bases: object

Methods

__init__(samples, formula[, ...])

Initialize the object by calcating the Differentially Methylated Probes (DMP). It fits an Ordinary Least

compute_dmp(samples, formula[, ...])

Find Differentially Methylated Probes (DMP) by fitting an Ordinary Least Square model (OLS) for each probe, following the given formula.

compute_dmr([contrast, dist_cutoff, ...])

Find Differentially Methylated Regions (DMR) based on euclidian distance between beta values

get_top(dm_type, contrast[, chromosome_col, ...])

Get the top DMRs from the dataframe returned by get_dmr(), ranked by the p-value of the given contrast.

Methods and attributes detail

__init__(samples: Samples, formula: str, reference_value: dict | None = None, custom_sheet: None | DataFrame = None, drop_na=False, apply_mask=True, probe_ids: None | list[str] = None, group_column: str | None = None)
Initialize the object by calcating the Differentially Methylated Probes (DMP). It fits an Ordinary Least

Square model (OLS) for each probe, following the given formula. If a group column name is given, use a Mixed Model to account for random effects.

More info on design matrices and formulas:
Parameters:
  • samples (Samples) – samples to use

  • formula (str) – R-like formula used in the design matrix to describe the statistical model. e.g. ‘~age + sex’

  • reference_value (dict | None) – reference value for each factor. Dictionary where keys are the factor names, and values are their reference value. Default: None

  • custom_sheet (pandas.DataFrame) – a sample sheet to use. By default, use the samples’ sheet. Useful if you want to filter the samples to display

  • drop_na (bool) – drop probes that have NA values. Default: False

  • apply_mask (bool) – set to True to apply mask. Default: True

  • probe_ids (list[str] | None) – list of probe IDs to use. Useful to work on a subset for testing purposes. Default: None

  • group_column (str | None) – name of the column of the sample sheet that holds replicates information. If provided, a Mixed Model will be used to account for replicates instead of an Ordinary Least Square. Default: None

Returns:

dataframe with probes as rows and p_vales and model estimates in columns, list of contrast levels

Return type:

pandas.DataFrame, list[str]

compute_dmp(samples: Samples, formula: str, reference_value: dict | None = None, custom_sheet: None | DataFrame = None, drop_na=False, apply_mask=True, probe_ids: None | list[str] = None, group_column: str | None = None)

Find Differentially Methylated Probes (DMP) by fitting an Ordinary Least Square model (OLS) for each probe, following the given formula. If a group column name is given, use a Mixed Model to account for random effects.

More info on design matrices and formulas:
Parameters:
  • samples (Samples) – samples to use

  • formula (str) – R-like formula used in the design matrix to describe the statistical model. e.g. ‘~age + sex’

  • reference_value (dict | None) – reference value for each factor. Dictionary where keys are the factor names, and values are their reference value. Default: None

  • custom_sheet (pandas.DataFrame) – a sample sheet to use. By default, use the samples’ sheet. Useful if you want to filter the samples to display

  • drop_na (bool) – drop probes that have NA values. Default: False

  • apply_mask (bool) – set to True to apply mask. Default: True

  • probe_ids (list[str] | None) – list of probe IDs to use. Useful to work on a subset for testing purposes. Default: None

  • group_column (str | None) – name of the column of the sample sheet that holds replicates information. If provided, a Mixed Model will be used to account for replicates instead of an Ordinary Least Square. Default: None

Returns:

dataframe with probes as rows and p_vales and model estimates in columns, list of contrast levels

Return type:

pandas.DataFrame, list[str]

compute_dmr(contrast: str | list[str] | None = None, dist_cutoff: float | None = None, seg_per_locus: float = 0.5, probe_ids: None | list[str] = None)

Find Differentially Methylated Regions (DMR) based on euclidian distance between beta values

Parameters:
  • contrast (str | list[str] | None) – contrast(s) to use for DMR detection

  • dist_cutoff (float | None) – cutoff used to find change points between DMRs, used on euclidian distance between beta values. If set to None (default) will be calculated depending on seg_per_locus parameter value. Default: None

  • seg_per_locus (float) – used if dist_cutoff is not set : defines what quartile should be used as a distance cut-off. Higher values leads to more segments. Should be 0 < seg_per_locus < 1. Default: 0.5.

  • probe_ids (list[str] | None) – list of probe IDs to use. Useful to work on a subset for testing purposes. Default: None

get_top(dm_type: DM_TYPE | str, contrast: str, chromosome_col='chromosome', annotation_col: str = 'genes', n_dms=10, columns_to_keep: list[str] = None) DataFrame | None

Get the top DMRs from the dataframe returned by get_dmr(), ranked by the p-value of the given contrast. If an annotation is provided, the DMRs will be annotated with the genes associated with the probes in the DMR.

Parameters:
  • dm_type (DM_TYPE | str) – type of Differentially Methylated object to get (DMR or DMP).

  • contrast (str) – contrast to use for ranking the DMRs

  • chromosome_col (str) – name of the column holding the chromosome information. Default: ‘chromosome’

  • annotation_col (str) – name of the column holding the annotation information. Default: ‘genes’

  • n_dms (int) – number of DM probes/segments to return. Default: 10

  • columns_to_keep (list[str] | None) – list of columns to keep in the output dataframe. Default: None

Returns:

dataframe with the top DMRs

Return type:

pandas.DataFrame | None