API Reference¶

The first release documents these primary public entry points.

pyEpiAneufinder.epiAneufinder(fragment_file, outdir, genome_file, blacklist, windowSize, exclude=None, sort_fragment=True, GC=True, title_karyo=None, minFrags=20000, threshold_cells_nbins=0.35, threshold_blacklist_bins=0.85, ncores=1, k=2, n_permutations=1000, alpha=0.001, plotKaryo=True, resume=False, cellRangerInput=False, keep_sorted_fragfile=False, remove_barcodes=None, selected_cells=None)¶

Run the complete pyEpiAneufinder workflow on fragment or Cell Ranger input.

The workflow bins the genome, builds a count matrix, applies quality-control filters, optionally performs GC correction, segments each chromosome, assigns copy-number states, and optionally writes a karyogram plot.

Parameters:

fragment_file (str) – Path to an input fragments file, BED-like fragment file, or a Cell Ranger matrix directory. When cellRangerInput=True, this should point to the matrix directory that contains matrix.mtx(.gz), barcodes.tsv(.gz), and peaks.bed(.gz).
outdir (str) – Directory where intermediate files and final outputs are written.
genome_file (str) – Reference genome FASTA file used for genomic binning and optional GC correction.
blacklist (str) – BED file containing genomic regions to exclude from the analysis.
windowSize (int) – Genomic bin size in base pairs.
exclude (list[str] | None, optional) – Chromosomes to exclude from processing, for example ["chrX", "chrY"].
sort_fragment (bool, optional) – If True, sort the fragment input by barcode and genomic position before building the count matrix.
GC (bool, optional) – If True, perform GC correction on the count matrix before segmentation.
title_karyo (str | None, optional) – Optional title for the saved karyogram.
minFrags (int, optional) – Minimum number of fragments required for a cell to pass filtering.
threshold_cells_nbins (float, optional) – Minimum fraction of bins with non-zero counts required for a cell to be retained.
threshold_blacklist_bins (float, optional) – Remove bins where more than this fraction of cells has zero counts.
ncores (int, optional) – Number of worker processes or threads to use where parallelization is supported.
k (int, optional) – Segmentation depth parameter. The recursive breakpoint search aims for up to 2**k segments per chromosome.
n_permutations (int, optional) – Number of permutations used for significance testing during breakpoint detection.
alpha (float, optional) – Significance threshold for breakpoint detection.
plotKaryo (bool, optional) – If True, generate a karyogram figure from the inferred copy-number states.
resume (bool, optional) – If True, reuse intermediate files already present in outdir when possible.
cellRangerInput (bool, optional) – If True, interpret fragment_file as a Cell Ranger matrix directory instead of a fragment file.
keep_sorted_fragfile (bool, optional) – If True, keep the temporary sorted fragment file produced when sort_fragment=True.
remove_barcodes (str | None, optional) – Path to a one-column TSV file listing barcodes to exclude.
selected_cells (str | None, optional) – Path to a one-column file listing barcodes to retain. When provided, only those cells are analyzed.

Returns:

The workflow writes intermediate results and output files to outdir, including segmented copy-number tables, count_matrix.h5ad, parameter metadata, and optionally Karyogram.png.

Return type:

None

pyEpiAneufinder.split_subclones(res, split_val, criterion='maxclust', dist_metric='euclidean', linkage_method='ward')¶

Split cells into subclones based on hierarchical clustering of CNV profiles.

Parameters:

res (pandas.DataFrame) – Result table from pyEpiAneufinder.epiAneufinder() containing seq, start, and end columns plus one CNV profile column per cell.
split_val (int | float) – Threshold used to cut the clustering tree. With criterion="maxclust", this is the requested number of subclones. With criterion="distance", it is the clustering distance cutoff.
criterion ({"maxclust", "distance"}, optional) – Criterion passed to scipy.cluster.hierarchy.fcluster().
dist_metric (str, optional) – Distance metric used when comparing cell-level CNV profiles.
linkage_method (str, optional) – Linkage method used to build the hierarchical clustering tree.

Returns:

Data frame with two columns: barcode and subclone.

Return type:

pandas.DataFrame

pyEpiAneufinder.karyo_gainloss(res, outdir, title=None, annot_dt=None, state_type='categorical', n_states=5, linkage_method='ward', dist_metric='euclidean', plot_width=22, plot_height=8)¶

Plot a CNV karyogram from a pyEpiAneufinder result table.

Parameters:

res (pandas.DataFrame) – Result table containing seq, start, and end columns followed by one copy-number state column per cell. The function mutates res["seq"] in place by converting it to an ordered categorical for plotting.
outdir (str) – Output path where the PNG figure will be written.
title (str | None, optional) – Figure title.
annot_dt (pandas.DataFrame | None, optional) – Optional annotation table indexed by barcode and containing an annot column used to draw a side annotation bar. When provided, annot_dt["annot"] is coerced to categorical if needed for plotting.
state_type ({"categorical", "integer", "continuous"}, optional) – Interpretation of the copy-number values in res.
n_states (int, optional) – Number of categorical states to display when state_type="categorical". Supported values are 3 and 5.
linkage_method (str, optional) – Hierarchical clustering linkage method used to order cells.
dist_metric (str, optional) – Pairwise distance metric used for clustering cell profiles.
plot_width (float, optional) – Figure width in inches.
plot_height (float, optional) – Figure height in inches.

Returns:

The function saves the karyogram image to outdir.

Return type:

None

pyEpiAneufinder.plot_single_cell_profile(outdir, cell_name, plot_path, mode=None)¶

Plot GC-corrected count distributions and genome-wide states for one cell.

Parameters:

outdir (str) – Output directory produced by pyEpiAneufinder.epiAneufinder().
cell_name (str) – Barcode of the cell to visualize.
plot_path (str) – Output path where the PNG figure will be written.
mode ({"holmes", "watson"} | None, optional) – Select a mode-specific result table. If None, use the combined five-state result_table.tsv.gz output.

Returns:

The function saves the figure to plot_path.

Return type:

None

pyEpiAneufinder.compute_aneuploidy_across_sample(df, offset=3)¶

Compute the mean fraction of altered bins across the full sample.

Parameters:

df (pandas.DataFrame) – Result table from pyEpiAneufinder.epiAneufinder().
offset (int, optional) – Number of leading metadata columns to skip before the per-cell CNV matrix.

Returns:

Sample-level aneuploidy score computed as the fraction of entries not equal to 1.

Return type:

float

pyEpiAneufinder.compute_aneuploidy_by_chr(df, offset=3)¶

Compute an aneuploidy score for each chromosome.

Parameters:

df (pandas.DataFrame) – Result table from pyEpiAneufinder.epiAneufinder().
offset (int, optional) – Number of leading metadata columns to skip before the per-cell CNV matrix.

Returns:

Single-row data frame with one column per chromosome.

Return type:

pandas.DataFrame

pyEpiAneufinder.compute_heterogeneity_across_sample(df, offset=3)¶

Compute the mean per-bin heterogeneity score across the full sample.

Parameters:

df (pandas.DataFrame) – Result table from pyEpiAneufinder.epiAneufinder().
offset (int, optional) – Number of leading metadata columns to skip before the per-cell CNV matrix.

Returns:

Mean heterogeneity score across all genomic bins.

Return type:

float

pyEpiAneufinder.compute_heterogeneity_by_chr(df, offset=3)¶

Compute a heterogeneity score for each chromosome.

Parameters:

df (pandas.DataFrame) – Result table from pyEpiAneufinder.epiAneufinder().
offset (int, optional) – Number of leading metadata columns to skip before the per-cell CNV matrix.

Returns:

Single-row data frame with one column per chromosome.

Return type:

pandas.DataFrame

pyEpiAneufinder.compute_cnv_burden_cell(df, offset=3)¶

Compute the fraction of altered bins for each cell.

Parameters:

df (pandas.DataFrame) – Result table from pyEpiAneufinder.epiAneufinder().
offset (int, optional) – Number of leading metadata columns to skip before the per-cell CNV matrix.

Returns:

Data frame with barcodes and cnv_burden columns, where cnv_burden is the fraction of bins whose state differs from 1.

Return type:

pandas.DataFrame