API Reference

The first release documents these primary public entry points.

pyEpiAneufinder.epiAneufinder(fragment_file, outdir, genome_file, blacklist, windowSize, exclude=None, sort_fragment=True, GC=True, title_karyo=None, minFrags=20000, threshold_cells_nbins=0.35, threshold_blacklist_bins=0.85, ncores=1, k=2, n_permutations=1000, alpha=0.001, plotKaryo=True, resume=False, cellRangerInput=False, keep_sorted_fragfile=False, remove_barcodes=None, selected_cells=None)

Run the complete pyEpiAneufinder workflow on fragment or Cell Ranger input.

The workflow bins the genome, builds a count matrix, applies quality-control filters, optionally performs GC correction, segments each chromosome, assigns copy-number states, and optionally writes a karyogram plot.

Parameters:
  • fragment_file (str) – Path to an input fragments file, BED-like fragment file, or a Cell Ranger matrix directory. When cellRangerInput=True, this should point to the matrix directory that contains matrix.mtx(.gz), barcodes.tsv(.gz), and peaks.bed(.gz).

  • outdir (str) – Directory where intermediate files and final outputs are written.

  • genome_file (str) – Reference genome FASTA file used for genomic binning and optional GC correction.

  • blacklist (str) – BED file containing genomic regions to exclude from the analysis.

  • windowSize (int) – Genomic bin size in base pairs.

  • exclude (list[str] | None, optional) – Chromosomes to exclude from processing, for example ["chrX", "chrY"].

  • sort_fragment (bool, optional) – If True, sort the fragment input by barcode and genomic position before building the count matrix.

  • GC (bool, optional) – If True, perform GC correction on the count matrix before segmentation.

  • title_karyo (str | None, optional) – Optional title for the saved karyogram.

  • minFrags (int, optional) – Minimum number of fragments required for a cell to pass filtering.

  • threshold_cells_nbins (float, optional) – Minimum fraction of bins with non-zero counts required for a cell to be retained.

  • threshold_blacklist_bins (float, optional) – Remove bins where more than this fraction of cells has zero counts.

  • ncores (int, optional) – Number of worker processes or threads to use where parallelization is supported.

  • k (int, optional) – Segmentation depth parameter. The recursive breakpoint search aims for up to 2**k segments per chromosome.

  • n_permutations (int, optional) – Number of permutations used for significance testing during breakpoint detection.

  • alpha (float, optional) – Significance threshold for breakpoint detection.

  • plotKaryo (bool, optional) – If True, generate a karyogram figure from the inferred copy-number states.

  • resume (bool, optional) – If True, reuse intermediate files already present in outdir when possible.

  • cellRangerInput (bool, optional) – If True, interpret fragment_file as a Cell Ranger matrix directory instead of a fragment file.

  • keep_sorted_fragfile (bool, optional) – If True, keep the temporary sorted fragment file produced when sort_fragment=True.

  • remove_barcodes (str | None, optional) – Path to a one-column TSV file listing barcodes to exclude.

  • selected_cells (str | None, optional) – Path to a one-column file listing barcodes to retain. When provided, only those cells are analyzed.

Returns:

The workflow writes intermediate results and output files to outdir, including segmented copy-number tables, count_matrix.h5ad, parameter metadata, and optionally Karyogram.png.

Return type:

None

pyEpiAneufinder.split_subclones(res, split_val, criterion='maxclust', dist_metric='euclidean', linkage_method='ward')

Split cells into subclones based on hierarchical clustering of CNV profiles.

Parameters:
  • res (pandas.DataFrame) – Result table from pyEpiAneufinder.epiAneufinder() containing seq, start, and end columns plus one CNV profile column per cell.

  • split_val (int | float) – Threshold used to cut the clustering tree. With criterion="maxclust", this is the requested number of subclones. With criterion="distance", it is the clustering distance cutoff.

  • criterion ({"maxclust", "distance"}, optional) – Criterion passed to scipy.cluster.hierarchy.fcluster().

  • dist_metric (str, optional) – Distance metric used when comparing cell-level CNV profiles.

  • linkage_method (str, optional) – Linkage method used to build the hierarchical clustering tree.

Returns:

Data frame with two columns: barcode and subclone.

Return type:

pandas.DataFrame

pyEpiAneufinder.karyo_gainloss(res, outdir, title=None, annot_dt=None, state_type='categorical', n_states=5, linkage_method='ward', dist_metric='euclidean', plot_width=22, plot_height=8)

Plot a CNV karyogram from a pyEpiAneufinder result table.

Parameters:
  • res (pandas.DataFrame) – Result table containing seq, start, and end columns followed by one copy-number state column per cell. The function mutates res["seq"] in place by converting it to an ordered categorical for plotting.

  • outdir (str) – Output path where the PNG figure will be written.

  • title (str | None, optional) – Figure title.

  • annot_dt (pandas.DataFrame | None, optional) – Optional annotation table indexed by barcode and containing an annot column used to draw a side annotation bar. When provided, annot_dt["annot"] is coerced to categorical if needed for plotting.

  • state_type ({"categorical", "integer", "continuous"}, optional) – Interpretation of the copy-number values in res.

  • n_states (int, optional) – Number of categorical states to display when state_type="categorical". Supported values are 3 and 5.

  • linkage_method (str, optional) – Hierarchical clustering linkage method used to order cells.

  • dist_metric (str, optional) – Pairwise distance metric used for clustering cell profiles.

  • plot_width (float, optional) – Figure width in inches.

  • plot_height (float, optional) – Figure height in inches.

Returns:

The function saves the karyogram image to outdir.

Return type:

None

pyEpiAneufinder.plot_single_cell_profile(outdir, cell_name, plot_path, mode=None)

Plot GC-corrected count distributions and genome-wide states for one cell.

Parameters:
  • outdir (str) – Output directory produced by pyEpiAneufinder.epiAneufinder().

  • cell_name (str) – Barcode of the cell to visualize.

  • plot_path (str) – Output path where the PNG figure will be written.

  • mode ({"holmes", "watson"} | None, optional) – Select a mode-specific result table. If None, use the combined five-state result_table.tsv.gz output.

Returns:

The function saves the figure to plot_path.

Return type:

None

pyEpiAneufinder.compute_aneuploidy_across_sample(df, offset=3)

Compute the mean fraction of altered bins across the full sample.

Parameters:
  • df (pandas.DataFrame) – Result table from pyEpiAneufinder.epiAneufinder().

  • offset (int, optional) – Number of leading metadata columns to skip before the per-cell CNV matrix.

Returns:

Sample-level aneuploidy score computed as the fraction of entries not equal to 1.

Return type:

float

pyEpiAneufinder.compute_aneuploidy_by_chr(df, offset=3)

Compute an aneuploidy score for each chromosome.

Parameters:
  • df (pandas.DataFrame) – Result table from pyEpiAneufinder.epiAneufinder().

  • offset (int, optional) – Number of leading metadata columns to skip before the per-cell CNV matrix.

Returns:

Single-row data frame with one column per chromosome.

Return type:

pandas.DataFrame

pyEpiAneufinder.compute_heterogeneity_across_sample(df, offset=3)

Compute the mean per-bin heterogeneity score across the full sample.

Parameters:
  • df (pandas.DataFrame) – Result table from pyEpiAneufinder.epiAneufinder().

  • offset (int, optional) – Number of leading metadata columns to skip before the per-cell CNV matrix.

Returns:

Mean heterogeneity score across all genomic bins.

Return type:

float

pyEpiAneufinder.compute_heterogeneity_by_chr(df, offset=3)

Compute a heterogeneity score for each chromosome.

Parameters:
  • df (pandas.DataFrame) – Result table from pyEpiAneufinder.epiAneufinder().

  • offset (int, optional) – Number of leading metadata columns to skip before the per-cell CNV matrix.

Returns:

Single-row data frame with one column per chromosome.

Return type:

pandas.DataFrame

pyEpiAneufinder.compute_cnv_burden_cell(df, offset=3)

Compute the fraction of altered bins for each cell.

Parameters:
  • df (pandas.DataFrame) – Result table from pyEpiAneufinder.epiAneufinder().

  • offset (int, optional) – Number of leading metadata columns to skip before the per-cell CNV matrix.

Returns:

Data frame with barcodes and cnv_burden columns, where cnv_burden is the fraction of bins whose state differs from 1.

Return type:

pandas.DataFrame