DM
- class pylluminator.dm.DM(samples: Samples, formula: str, reference_value: dict | None = None, custom_sheet: None | DataFrame = None, drop_na=False, apply_mask=True, probe_ids: None | list[str] = None, group_column: str | None = None)
Bases:
objectMethods
__init__(samples, formula[, ...])Initialize the object by calcating the Differentially Methylated Probes (DMP). It fits an Ordinary Least
compute_dmp(samples, formula[, ...])Find Differentially Methylated Probes (DMP) by fitting an Ordinary Least Square model (OLS) for each probe, following the given formula.
compute_dmr([contrast, dist_cutoff, ...])Find Differentially Methylated Regions (DMR) based on euclidian distance between beta values
get_top(dm_type, contrast[, chromosome_col, ...])Get the top DMRs from the dataframe returned by get_dmr(), ranked by the p-value of the given contrast.
Methods and attributes detail
- __init__(samples: Samples, formula: str, reference_value: dict | None = None, custom_sheet: None | DataFrame = None, drop_na=False, apply_mask=True, probe_ids: None | list[str] = None, group_column: str | None = None)
- Initialize the object by calcating the Differentially Methylated Probes (DMP). It fits an Ordinary Least
Square model (OLS) for each probe, following the given formula. If a group column name is given, use a Mixed Model to account for random effects.
- More info on design matrices and formulas:
- Parameters:
samples (Samples) – samples to use
formula (str) – R-like formula used in the design matrix to describe the statistical model. e.g. ‘~age + sex’
reference_value (dict | None) – reference value for each factor. Dictionary where keys are the factor names, and values are their reference value. Default: None
custom_sheet (pandas.DataFrame) – a sample sheet to use. By default, use the samples’ sheet. Useful if you want to filter the samples to display
drop_na (bool) – drop probes that have NA values. Default: False
apply_mask (bool) – set to True to apply mask. Default: True
probe_ids (list[str] | None) – list of probe IDs to use. Useful to work on a subset for testing purposes. Default: None
group_column (str | None) – name of the column of the sample sheet that holds replicates information. If provided, a Mixed Model will be used to account for replicates instead of an Ordinary Least Square. Default: None
- Returns:
dataframe with probes as rows and p_vales and model estimates in columns, list of contrast levels
- Return type:
- compute_dmp(samples: Samples, formula: str, reference_value: dict | None = None, custom_sheet: None | DataFrame = None, drop_na=False, apply_mask=True, probe_ids: None | list[str] = None, group_column: str | None = None)
Find Differentially Methylated Probes (DMP) by fitting an Ordinary Least Square model (OLS) for each probe, following the given formula. If a group column name is given, use a Mixed Model to account for random effects.
- More info on design matrices and formulas:
- Parameters:
samples (Samples) – samples to use
formula (str) – R-like formula used in the design matrix to describe the statistical model. e.g. ‘~age + sex’
reference_value (dict | None) – reference value for each factor. Dictionary where keys are the factor names, and values are their reference value. Default: None
custom_sheet (pandas.DataFrame) – a sample sheet to use. By default, use the samples’ sheet. Useful if you want to filter the samples to display
drop_na (bool) – drop probes that have NA values. Default: False
apply_mask (bool) – set to True to apply mask. Default: True
probe_ids (list[str] | None) – list of probe IDs to use. Useful to work on a subset for testing purposes. Default: None
group_column (str | None) – name of the column of the sample sheet that holds replicates information. If provided, a Mixed Model will be used to account for replicates instead of an Ordinary Least Square. Default: None
- Returns:
dataframe with probes as rows and p_vales and model estimates in columns, list of contrast levels
- Return type:
- compute_dmr(contrast: str | list[str] | None = None, dist_cutoff: float | None = None, seg_per_locus: float = 0.5, probe_ids: None | list[str] = None)
Find Differentially Methylated Regions (DMR) based on euclidian distance between beta values
- Parameters:
contrast (str | list[str] | None) – contrast(s) to use for DMR detection
dist_cutoff (float | None) – cutoff used to find change points between DMRs, used on euclidian distance between beta values. If set to None (default) will be calculated depending on seg_per_locus parameter value. Default: None
seg_per_locus (float) – used if dist_cutoff is not set : defines what quartile should be used as a distance cut-off. Higher values leads to more segments. Should be 0 < seg_per_locus < 1. Default: 0.5.
probe_ids (list[str] | None) – list of probe IDs to use. Useful to work on a subset for testing purposes. Default: None
- get_top(dm_type: DM_TYPE | str, contrast: str, chromosome_col='chromosome', annotation_col: str = 'genes', n_dms=10, columns_to_keep: list[str] = None) DataFrame | None
Get the top DMRs from the dataframe returned by get_dmr(), ranked by the p-value of the given contrast. If an annotation is provided, the DMRs will be annotated with the genes associated with the probes in the DMR.
- Parameters:
dm_type (DM_TYPE | str) – type of Differentially Methylated object to get (DMR or DMP).
contrast (str) – contrast to use for ranking the DMRs
chromosome_col (str) – name of the column holding the chromosome information. Default: ‘chromosome’
annotation_col (str) – name of the column holding the annotation information. Default: ‘genes’
n_dms (int) – number of DM probes/segments to return. Default: 10
columns_to_keep (list[str] | None) – list of columns to keep in the output dataframe. Default: None
- Returns:
dataframe with the top DMRs
- Return type:
pandas.DataFrame | None