read_idata

pylluminator.samples.read_idata(sample_sheet_df: ~pandas.core.frame.DataFrame, datadir: str | ~pathlib.Path) -> (<class 'dict'>, <class 'str'>)

Reads IDAT files for each sample in the provided sample sheet, organizes the data by sample name and channel, and returns a dictionary with the IDAT data.

Parameters:
  • sample_sheet_df (pandas.DataFrame) – A DataFrame containing sample information, including columns for ‘sample_label’, ‘sample_id’, ‘sentrix_id’, and ‘sentrix_position’. Each row corresponds to a sample in the experiment.

  • datadir (str) – The directory where the IDAT files are located.

Returns:

A dictionary where the keys are sample names (from the ‘sample_label’ column in sample_sheet_df), and the values are dictionaries mapping channel names (from Channel) to their respective IDAT data (as DataFrame objects, derived from the IdatDataset class).

Return type:

dict

Notes:
  • The function searches for IDAT files by sample ID and channel. If no files are found, it attempts to search using the Sentrix ID and position.

  • If multiple files match the search pattern, an error is logged.

  • If no matching files are found, an error is logged and the sample is skipped.

Example:

idata = read_idata(sample_sheet_df, ‘/path/to/data’)