GenomeInfo

class pylluminator.annotations.GenomeInfo(name: str, genome_version: GenomeVersion)

Bases: object

Additional genome information provided by external files, downloaded from illumina-data.

Variables:
  • gap_info (pyranges.PyRanges) – contains information on gaps in the genomic sequence. These gaps represent regions that are not sequenced or are known to be problematic in the data, such as areas that may have low coverage or difficult-to-sequence regions.

  • seq_length (dict) – keys are chromosome identifiers (e.g., chr1, chrX, etc.), and the values are the corresponding sequence lengths (in base pairs)

  • transcripts_list (pandas.DataFrame) – high-level overview of the transcripts and their boundaries (start and end positions)

  • transcripts_exons (pandas.DataFrame) – information at the level of individual exons within each transcript (type, gene name, gene id…)

  • chromosome_regions (pandas.DataFrame) – Names, adresses and Giemsa stain pattern of all chromosomes’ regions

Methods

__init__(name, genome_version)

Load the files corresponding to the given genome version, and structure the information.

copy()

Return a copy of the GenomeInfo object

Methods and attributes detail

__init__(name: str, genome_version: GenomeVersion)

Load the files corresponding to the given genome version, and structure the information.

Parameters:
  • name (str) – Name of the genome you want to load. Set to ‘illumina’ for Illumina default version, to ‘updated’ for the updated annotation defined by DOI:10.1101/2025.03.12.642895 (EPICv2 only), otherwise must correspond to the folder name containing you custom data

  • genome_version (GenomeVersion) – genome version to load (hg32, mm10…)

copy()

Return a copy of the GenomeInfo object