Annotations

class pylluminator.annotations.Annotations(array_type: ArrayType, genome_version: GenomeVersion, name='illumina')

Bases: object

This class contains all the metadata associated with a certain genome version (HG39, MM10…) and array type (EPICv2, 450K…). The metadata includes the manifest, the mask (if any exists), and the genome information (which is itself a combination of several dataframes, see class GenomeInfo). Masks and Manifests are automatically downloaded the first time the function is called, while GenomeInfo files are already stored in the repository.

Variables:
  • array_type – Illumina array type (EPIC, MM285…)

  • genome_version (GenomeVersion) – version of the genome (HG38, MM10…)

  • name (str) – name of the annotation: ‘illumina’ for pylluminator-data annotations, ‘updated’ for the updated annotation defined by DOI:10.1101/2025.03.12.642895 (EPICv2 only),or the name of your custom data. Default: ‘illumina’

  • genome_info (GenomeInfo) – genome metadata for the given genome version

  • probe_infos (pandas.DataFrame) – probes metadata (aka Manifest), contains the probes type, address, channel, mask info…

  • genomic_ranges (pyranges.PyRanges)

Methods

__init__(array_type, genome_version[, name])

Get annotation corresponding to the array type and genome version

copy()

Return a copy of the Annotations object

make_genomic_ranges()

Extract genomic ranges information from manifest dataframe

Attributes

non_unique_mask_names

Mask names for non-unique probes, as defined in Sesame.

quality_mask_names

Recommended mask names for each Infinium platform, as defined in Sesame.

Methods and attributes detail

__init__(array_type: ArrayType, genome_version: GenomeVersion, name='illumina')

Get annotation corresponding to the array type and genome version

Parameters:
  • array_type (ArrayType) – illumina array type (EPIC, MSA…)

  • genome_version (GenomeVersion) – genome version to load (hg32, mm10…)

  • name (str) – Name of the genome you want to load. Set to ‘illumina’ for Illumina default version, to ‘updated’ for the updated annotation defined by DOI:10.1101/2025.03.12.642895, otherwise must correspond to the folder name containing you custom data

copy()

Return a copy of the Annotations object

make_genomic_ranges() DataFrame | None

Extract genomic ranges information from manifest dataframe

property non_unique_mask_names: str

Mask names for non-unique probes, as defined in Sesame.

Returns:

mask names, each name separated by a |

Return type:

str

property quality_mask_names: str

Recommended mask names for each Infinium platform, as defined in Sesame. We’re assuming that EPIC+ arrays have the same masks as EPIC v2 arrays.

Returns:

mask names, each name separated by a |

Return type:

str