API Reference¶
The first release documents these primary public entry points.
- pyEpiAneufinder.epiAneufinder(fragment_file, outdir, genome_file, blacklist, windowSize, exclude=None, sort_fragment=True, GC=True, title_karyo=None, minFrags=20000, threshold_cells_nbins=0.35, threshold_blacklist_bins=0.85, ncores=1, k=2, n_permutations=1000, alpha=0.001, plotKaryo=True, resume=False, cellRangerInput=False, keep_sorted_fragfile=False, remove_barcodes=None, selected_cells=None)¶
Run the complete pyEpiAneufinder workflow on fragment or Cell Ranger input.
The workflow bins the genome, builds a count matrix, applies quality-control filters, optionally performs GC correction, segments each chromosome, assigns copy-number states, and optionally writes a karyogram plot.
- Parameters:
fragment_file (str) – Path to an input fragments file, BED-like fragment file, or a Cell Ranger matrix directory. When
cellRangerInput=True, this should point to the matrix directory that containsmatrix.mtx(.gz),barcodes.tsv(.gz), andpeaks.bed(.gz).outdir (str) – Directory where intermediate files and final outputs are written.
genome_file (str) – Reference genome FASTA file used for genomic binning and optional GC correction.
blacklist (str) – BED file containing genomic regions to exclude from the analysis.
windowSize (int) – Genomic bin size in base pairs.
exclude (list[str] | None, optional) – Chromosomes to exclude from processing, for example
["chrX", "chrY"].sort_fragment (bool, optional) – If
True, sort the fragment input by barcode and genomic position before building the count matrix.GC (bool, optional) – If
True, perform GC correction on the count matrix before segmentation.title_karyo (str | None, optional) – Optional title for the saved karyogram.
minFrags (int, optional) – Minimum number of fragments required for a cell to pass filtering.
threshold_cells_nbins (float, optional) – Minimum fraction of bins with non-zero counts required for a cell to be retained.
threshold_blacklist_bins (float, optional) – Remove bins where more than this fraction of cells has zero counts.
ncores (int, optional) – Number of worker processes or threads to use where parallelization is supported.
k (int, optional) – Segmentation depth parameter. The recursive breakpoint search aims for up to
2**ksegments per chromosome.n_permutations (int, optional) – Number of permutations used for significance testing during breakpoint detection.
alpha (float, optional) – Significance threshold for breakpoint detection.
plotKaryo (bool, optional) – If
True, generate a karyogram figure from the inferred copy-number states.resume (bool, optional) – If
True, reuse intermediate files already present inoutdirwhen possible.cellRangerInput (bool, optional) – If
True, interpretfragment_fileas a Cell Ranger matrix directory instead of a fragment file.keep_sorted_fragfile (bool, optional) – If
True, keep the temporary sorted fragment file produced whensort_fragment=True.remove_barcodes (str | None, optional) – Path to a one-column TSV file listing barcodes to exclude.
selected_cells (str | None, optional) – Path to a one-column file listing barcodes to retain. When provided, only those cells are analyzed.
- Returns:
The workflow writes intermediate results and output files to
outdir, including segmented copy-number tables,count_matrix.h5ad, parameter metadata, and optionallyKaryogram.png.- Return type:
None
- pyEpiAneufinder.split_subclones(res, split_val, criterion='maxclust', dist_metric='euclidean', linkage_method='ward')¶
Split cells into subclones based on hierarchical clustering of CNV profiles.
- Parameters:
res (pandas.DataFrame) – Result table from
pyEpiAneufinder.epiAneufinder()containingseq,start, andendcolumns plus one CNV profile column per cell.split_val (int | float) – Threshold used to cut the clustering tree. With
criterion="maxclust", this is the requested number of subclones. Withcriterion="distance", it is the clustering distance cutoff.criterion ({"maxclust", "distance"}, optional) – Criterion passed to
scipy.cluster.hierarchy.fcluster().dist_metric (str, optional) – Distance metric used when comparing cell-level CNV profiles.
linkage_method (str, optional) – Linkage method used to build the hierarchical clustering tree.
- Returns:
Data frame with two columns:
barcodeandsubclone.- Return type:
pandas.DataFrame
- pyEpiAneufinder.karyo_gainloss(res, outdir, title=None, annot_dt=None, state_type='categorical', n_states=5, linkage_method='ward', dist_metric='euclidean', plot_width=22, plot_height=8)¶
Plot a CNV karyogram from a pyEpiAneufinder result table.
- Parameters:
res (pandas.DataFrame) – Result table containing
seq,start, andendcolumns followed by one copy-number state column per cell. The function mutatesres["seq"]in place by converting it to an ordered categorical for plotting.outdir (str) – Output path where the PNG figure will be written.
title (str | None, optional) – Figure title.
annot_dt (pandas.DataFrame | None, optional) – Optional annotation table indexed by barcode and containing an
annotcolumn used to draw a side annotation bar. When provided,annot_dt["annot"]is coerced to categorical if needed for plotting.state_type ({"categorical", "integer", "continuous"}, optional) – Interpretation of the copy-number values in
res.n_states (int, optional) – Number of categorical states to display when
state_type="categorical". Supported values are3and5.linkage_method (str, optional) – Hierarchical clustering linkage method used to order cells.
dist_metric (str, optional) – Pairwise distance metric used for clustering cell profiles.
plot_width (float, optional) – Figure width in inches.
plot_height (float, optional) – Figure height in inches.
- Returns:
The function saves the karyogram image to
outdir.- Return type:
None
- pyEpiAneufinder.plot_single_cell_profile(outdir, cell_name, plot_path, mode=None)¶
Plot GC-corrected count distributions and genome-wide states for one cell.
- Parameters:
outdir (str) – Output directory produced by
pyEpiAneufinder.epiAneufinder().cell_name (str) – Barcode of the cell to visualize.
plot_path (str) – Output path where the PNG figure will be written.
mode ({"holmes", "watson"} | None, optional) – Select a mode-specific result table. If
None, use the combined five-stateresult_table.tsv.gzoutput.
- Returns:
The function saves the figure to
plot_path.- Return type:
None
- pyEpiAneufinder.compute_aneuploidy_across_sample(df, offset=3)¶
Compute the mean fraction of altered bins across the full sample.
- Parameters:
df (pandas.DataFrame) – Result table from
pyEpiAneufinder.epiAneufinder().offset (int, optional) – Number of leading metadata columns to skip before the per-cell CNV matrix.
- Returns:
Sample-level aneuploidy score computed as the fraction of entries not equal to 1.
- Return type:
float
- pyEpiAneufinder.compute_aneuploidy_by_chr(df, offset=3)¶
Compute an aneuploidy score for each chromosome.
- Parameters:
df (pandas.DataFrame) – Result table from
pyEpiAneufinder.epiAneufinder().offset (int, optional) – Number of leading metadata columns to skip before the per-cell CNV matrix.
- Returns:
Single-row data frame with one column per chromosome.
- Return type:
pandas.DataFrame
- pyEpiAneufinder.compute_heterogeneity_across_sample(df, offset=3)¶
Compute the mean per-bin heterogeneity score across the full sample.
- Parameters:
df (pandas.DataFrame) – Result table from
pyEpiAneufinder.epiAneufinder().offset (int, optional) – Number of leading metadata columns to skip before the per-cell CNV matrix.
- Returns:
Mean heterogeneity score across all genomic bins.
- Return type:
float
- pyEpiAneufinder.compute_heterogeneity_by_chr(df, offset=3)¶
Compute a heterogeneity score for each chromosome.
- Parameters:
df (pandas.DataFrame) – Result table from
pyEpiAneufinder.epiAneufinder().offset (int, optional) – Number of leading metadata columns to skip before the per-cell CNV matrix.
- Returns:
Single-row data frame with one column per chromosome.
- Return type:
pandas.DataFrame
- pyEpiAneufinder.compute_cnv_burden_cell(df, offset=3)¶
Compute the fraction of altered bins for each cell.
- Parameters:
df (pandas.DataFrame) – Result table from
pyEpiAneufinder.epiAneufinder().offset (int, optional) – Number of leading metadata columns to skip before the per-cell CNV matrix.
- Returns:
Data frame with
barcodesandcnv_burdencolumns, wherecnv_burdenis the fraction of bins whose state differs from 1.- Return type:
pandas.DataFrame