read_idata

pylluminator.samples.read_idata(sample_sheet_df: DataFrame, datadir: str | Path) tuple[dict, str]

Reads IDAT files for each sample in the provided sample sheet, organizes the data by sample name and channel, and returns a dictionary with the IDAT data and the label column name.

Parameters:
  • sample_sheet_df (pandas.DataFrame) – A DataFrame containing sample information, including columns for ‘sample_label’, ‘sample_id’, ‘sentrix_id’, and ‘sentrix_position’. Each row corresponds to a sample in the experiment.

  • datadir (str) – The directory where the IDAT files are located.

Returns:

A tuple of a dictionary (samples) and a string (label column name). The samples dictionary keys are sample names (from the ‘sample_label’ column in sample_sheet_df), and the values are dictionaries mapping channel names (from Channel) to their respective IDAT data (as DataFrame objects, derived from the IdatDataset class).

Return type:

(dict, str)

Notes:
  • The function searches for IDAT files by sample ID and channel. If no files are found, it attempts to search using the Sentrix ID and position.

  • If multiple files match the search pattern, an error is logged.

  • If no matching files are found, an error is logged and the sample is skipped.

Example:

idata, label_column = read_idata(sample_sheet_df, ‘/path/to/data’)