3. DMPs and DMRs
[1]:
from pylluminator.samples import Samples
from pylluminator.visualizations import dmr_manhattan_plot, dmp_heatmap, visualize_gene, show_chromosome_legend
from pylluminator.dm import DM
from pylluminator.utils import save_object
from pylluminator.utils import set_logger
set_logger('WARNING') # set the verbosity level, can be DEBUG, INFO, WARNING, ERROR
3.1. Load pylluminator Samples
We assume that you have already processed the .idat files according to your preferences and saved them. If not, please refer to notebook 1 - Read data and get beta values before going any further.
[2]:
my_samples = Samples.load('preprocessed_samples')
Here, we want to filter out the probes on the X or Y chromosomes.
[3]:
my_samples.mask_xy_probes()
To speed up the demo, we will only calculate DMPs and DMRs on 10% of the probes
[4]:
ten_pct_probes = int(0.1 * my_samples.nb_probes)
probe_ids = my_samples.probe_ids[:ten_pct_probes]
print(f'Selected {ten_pct_probes:,} first probes')
Selected 93,768 first probes
3.2. Differentially Methylated Probes
The second parameter needed to create a DM object (here ~ sample_type) is a R-like formula that describes the model, and is used to create the design matrix. You can use one or more predictors in the formula, e.g. ~age + sex. The predictors names must be the column names of the sample sheet.
More info on design matrices and formulas:
[5]:
my_samples.sample_sheet
[5]:
| sample_id | sample_name | sample_type | |
|---|---|---|---|
| 0 | GSM7698443 | PREC_500_2 | PREC |
| 1 | GSM7698459 | PREC_500_3 | PREC |
| 2 | GSM7698435 | PREC_500_1 | PREC |
| 3 | GSM7698446 | LNCAP_500_2 | LNCAP |
| 4 | GSM7698462 | LNCAP_500_3 | LNCAP |
| 5 | GSM7698438 | LNCAP_500_1 | LNCAP |
[6]:
my_dms = DM(my_samples, '~ sample_type', probe_ids=probe_ids, use_m_values=True)
You can now plot the results, for the 25 most variable probes:
[7]:
dmp_heatmap(my_dms, my_dms.contrasts[0], nb_probes=25, figsize=(8, 5))
3.3. Differentially Methylated Regions
We can then identify ths DMRs by grouping neighboring probes with similar methylation patterns for a given predictor contrast. Similarity is calculated based on the Euclidean distance between probes’ beta values.
[8]:
my_dms.compute_dmr(my_dms.contrasts)
save_object(my_dms, 'dms')
[9]:
# visualize the DMRs for the first contrast
dmr_manhattan_plot(my_dms, 'sample_type[T.PREC]', nb_annotated_probes=20) # by default, plots the DMRs depending on log(p-values)
# dmr_manhattan_plot(my_dms, 'sample_type[T.PREC]', y_col='sample_type[T.PREC]_avg_beta_delta', log10=False, nb_annotated_probes=20) # example to plot DMR depending on the beta values difference between the two groups
[10]:
# get top DMRs and their associated genes for the first contrast, PREC
my_dms.get_top_dmr(my_dms.contrasts[0])
[10]:
| segment_id | sample_type[T.PREC]_avg_beta_delta | chromosome | sample_type[T.PREC]_p_value_adjusted | genes | |
|---|---|---|---|---|---|
| 6 | 23362 | 11.131983 | 10 | 6.728253e-07 | ADD3;ADD3-AS1 |
| 2 | 2717 | 10.830772 | 1 | 2.927506e-08 | CSF1 |
| 4 | 12902 | 10.801344 | 5 | 1.713826e-06 | FGF18 |
| 5 | 15935 | 10.710917 | 7 | 7.439611e-07 | PDGFA |
| 7 | 26201 | 10.669350 | 11 | 1.729029e-05 | NOX4 |
| 1 | 2632 | 10.662909 | 1 | 6.264565e-05 | GPR88 |
| 0 | 2064 | 10.568130 | 1 | 1.536196e-07 | |
| 9 | 41467 | 10.537498 | 19 | 6.023383e-06 | ENSG00000289915 |
| 3 | 11071 | 10.483354 | 4 | 7.994438e-07 | FAM149A |
| 8 | 39964 | 10.463715 | 19 | 3.608337e-06 | TGFBR3L |
3.4. Gene visualization
We can then have a look at a particular gene identified as differentially methylated, for example CSF1. The heatmap of the beta values of the probes associated to this gene shows a clear methylation difference between the healthy cells (PrEC) and the prostate cancer cells (LNCAP).
[11]:
show_chromosome_legend() # display the legend for chromosome regions colors, corresponding to Giemsa staining
visualize_gene(my_samples, 'CSF1', figsize=(10, 5))