read_idata

pylluminator.samples.read_idata(sample_sheet_df: DataFrame, datadir: str | Path) → tuple[dict, str]

Reads IDAT files for each sample in the provided sample sheet, organizes the data by sample name and channel, and returns a dictionary with the IDAT data and the label column name.

Parameters:

sample_sheet_df (pandas.DataFrame) – A DataFrame containing sample information, including columns for ‘sample_label’, ‘sample_id’, ‘sentrix_id’, and ‘sentrix_position’. Each row corresponds to a sample in the experiment.
datadir (str) – The directory where the IDAT files are located.

Returns:

A tuple of a dictionary (samples) and a string (label column name). The samples dictionary keys are sample names (from the ‘sample_label’ column in sample_sheet_df), and the values are dictionaries mapping channel names (from Channel) to their respective IDAT data (as DataFrame objects, derived from the IdatDataset class).

Return type:

(dict, str)

Notes:

The function searches for IDAT files by sample ID and channel. If no files are found, it attempts to search using the Sentrix ID and position.
If multiple files match the search pattern, an error is logged.
If no matching files are found, an error is logged and the sample is skipped.

Example:

idata, label_column = read_idata(sample_sheet_df, ‘/path/to/data’)