pyphotomol package#
PyPhotoMol: A Python package for mass photometry data analysis.
PyPhotoMol provides a comprehensive suite of tools for analyzing mass photometry data, including data import, histogram analysis, peak detection, Gaussian fitting, and comprehensive operation logging.
Main Classes#
PyPhotoMol : Main class for single-dataset analysis MPAnalyzer : Class for batch processing multiple datasets
Key Features#
Import from HDF5 and CSV files
Histogram creation
Peak detection
Multi-Gaussian fitting
Mass-contrast calibration
Examples
Basic single-file analysis:
>>> from pyphotomol import PyPhotoMol
>>> model = PyPhotoMol()
>>> model.import_file('data.h5')
>>> model.create_histogram(use_masses=True, window=[0, 1000], bin_width=20)
>>> model.guess_peaks(min_height=5)
>>> model.fit_histogram(
... peaks_guess=model.peaks_guess,
... mean_tolerance=200,
... std_tolerance=300
... )
>>> model.create_fit_table()
Batch processing:
>>> from pyphotomol import MPAnalyzer
>>> batch = MPAnalyzer()
>>> batch.import_files(['file1.h5', 'file2.h5', 'file3.h5'])
>>> batch.apply_to_all('count_binding_events')
>>> batch.apply_to_all('create_histogram', use_masses=True, window=[0, 1000], bin_width=20)
- class pyphotomol.PyPhotoMol[source]#
Bases:
object
Main class for analyzing mass photometry data.
The PyPhotoMol class provides a comprehensive suite of tools for importing, analyzing, and visualizing mass photometry data. It supports data import from HDF5 and CSV files, histogram creation and analysis, peak detection, Gaussian fitting, and mass-contrast calibration.
All operations are automatically logged to a comprehensive logbook that tracks parameters, results, and any errors encountered during analysis.
- contrasts#
Array of contrast values from imported data
- Type:
np.ndarray or None
- masses#
Array of mass values (in kDa) from imported data or converted from contrasts
- Type:
np.ndarray or None
- histogram_centers#
Bin centers for created histograms
- Type:
np.ndarray or None
- hist_counts#
Count values for histogram bins
- Type:
np.ndarray or None
- hist_nbins#
Number of bins in the histogram
- Type:
int or None
- hist_window#
[min, max] window used for histogram creation
- Type:
list or None
- bin_width#
Width of histogram bins
- Type:
float or None
- hist_data_type#
Type of data used for histogram (‘masses’ or ‘contrasts’)
- Type:
str or None
- peaks_guess#
Positions of detected peaks in the histogram
- Type:
np.ndarray or None
- nbinding#
Number of binding events detected
- Type:
int or None
- nunbinding#
Number of unbinding events detected
- Type:
int or None
- fitted_params#
Parameters from Multi-Gaussian fitting
- Type:
np.ndarray or None
- fitted_data#
Fitted curve data points
- Type:
np.ndarray or None
- fitted_params_errors#
Error estimates for fitted parameters
- Type:
np.ndarray or None
- masses_fitted#
Mass values corresponding to fitted peaks
- Type:
np.ndarray or None
- baseline#
Baseline value used for fitting operations (default: 0)
- Type:
float
- fit_table#
Summary table of fitting results
- Type:
pd.DataFrame or None
- calibration_dic#
Dictionary containing mass-contrast calibration parameters
- Type:
dict or None
- logbook#
List of all operations performed, with timestamps and parameters
- Type:
list
Examples
Basic workflow for mass photometry analysis:
>>> model = PyPhotoMol() >>> model.import_file('data.h5') >>> model.count_binding_events() >>> model.create_histogram(use_masses=True, window=[0, 1000], bin_width=10) >>> model.guess_peaks() >>> model.fit_histogram(peaks_guess=model.peaks_guess, mean_tolerance=200, std_tolerance=300) >>> model.print_logbook_summary()
- __init__()[source]#
Initialize a new PyPhotoMol instance.
Creates an empty instance with all data properties set to None and initializes an empty logbook for operation tracking.
- calibrate(calibration_standards)[source]#
Obtain a calibration of the type f(mass) = slope * contrast + intercept
- Parameters:
calibration_standards (list) – List with the known masses
- contrasts_to_masses(slope=1.0, intercept=0.0)[source]#
Convert contrasts to masses using a linear transformation. We assume a calibratio was done using f(mass) = slope * contrast + intercept
- Parameters:
slope (float, default 1.0) – Slope of the linear transformation.
intercept (float, default 0.0) – Intercept of the linear transformation.
- create_histogram(use_masses=True, window=[0, 2000], bin_width=10)[source]#
Create a histogram from imported contrast or mass data.
This method generates a histogram from the imported data, which is essential for subsequent peak detection and fitting operations. The histogram parameters can be customized for different types of analysis.
- Parameters:
use_masses (bool, default True) – If True, create histogram from mass data (requires masses to be available). If False, create histogram from contrast data.
window (list of two floats, default [0, 2000]) – Range for the histogram as [min, max]. Units depend on data type: - For masses: typically [0, 2000] kDa - For contrasts: typically [-1, 0] (e.g., [-0.8, -0.2])
bin_width (float, default 10) – Width of histogram bins. Units depend on data type: - For masses: typically 10-50 kDa - For contrasts: typically 0.0004-0.001
- Raises:
AttributeError – If no data has been imported yet
ValueError – If use_masses=True but no mass data is available
Notes
After histogram creation, the following attributes are populated: - self.histogram_centers : Bin center positions - self.hist_counts : Count values for each bin - self.hist_nbins : Number of bins created - self.hist_window : Window used for histogram - self.bin_width : Bin width used - self.hist_data_type : Data type used (‘masses’ or ‘contrasts’)
Examples
Create mass histogram for protein analysis:
>>> model.create_histogram(use_masses=True, window=[0, 1000], bin_width=20)
Create contrast histogram for calibration:
>>> model.create_histogram(use_masses=False, window=[-0.8, -0.2], bin_width=0.0004)
High-resolution histogram for detailed analysis:
>>> model.create_histogram(use_masses=True, window=[50, 200], bin_width=5)
- fit_histogram(peaks_guess, mean_tolerance=None, std_tolerance=None, threshold=None, baseline=0.0, fit_baseline=False)[source]#
Fit the histogram data to the guessed peaks. We use a multi-Gaussian fit to the histogram data.
The data type (masses or contrasts) is automatically detected from the histogram that was previously created using create_histogram().
- Parameters:
peaks_guess (list) – List of guessed peaks.
mean_tolerance (float) – Tolerance for the mean of the Gaussian fit. If None, it will be inferred from the peaks guesses.
std_tolerance (float) – Tolerance for the standard deviation of the Gaussian fit. If None, it will be inferred from the peaks guesses.
threshold (float, optional) – For masses: minimum value that can be observed (in kDa units). Default is 40. For contrasts: maximum value that can be observed (should be negative). Default is -0.0024. If None, defaults are applied based on detected data type.
baseline (float, default 0.0) – Baseline value to be subtracted from the fit.
fit_baseline (bool, default False) – Whether to fit a baseline to the histogram. If True, a baseline will be included in the fit and the ‘baseline’ argument will be ignored.
Examples
Fit histogram after creating it:
>>> model.create_histogram(use_masses=True, window=[0, 2000], bin_width=10) >>> model.guess_peaks() >>> model.fit_histogram(model.peaks_guess, mean_tolerance=100, std_tolerance=200)
- get_logbook(as_dataframe=True, save_to_file=None)[source]#
Retrieve the logbook of all operations performed on this instance.
The logbook contains a complete history of all method calls, including parameters used, results obtained, timestamps, and any errors encountered. This provides full traceability of the analysis workflow.
- Parameters:
as_dataframe (bool, default True) – If True, return logbook as a pandas DataFrame for easy analysis. If False, return as a list of dictionaries.
save_to_file (str, optional) – If provided, save the logbook to this file path as JSON format.
- Returns:
Logbook entries containing operation history. DataFrame columns include: - timestamp: ISO format timestamp of when operation was performed - method: Name of the method that was called - parameters: Dictionary of parameters passed to the method - result_summary: Summary of results produced (for successful operations) - notes: Additional notes about the operation - success: Boolean indicating if operation completed successfully - error: Error message (only present for failed operations)
- Return type:
pandas.DataFrame or list
Examples
Get logbook as DataFrame for analysis:
>>> model = PyPhotoMol() >>> model.import_file('data.h5') >>> logbook_df = model.get_logbook() >>> print(logbook_df[['timestamp', 'method', 'success']])
Save logbook to file:
>>> model.get_logbook(save_to_file='analysis_log.json')
Get raw logbook data:
>>> raw_logbook = model.get_logbook(as_dataframe=False)
- guess_peaks(min_height=10, min_distance=4, prominence=4)[source]#
Guess peaks in the histogram data.
The different arguments will be adjusted according to the region of the histogram. For example, the given distance will be used for mass data between 0 and 650 kDa, between 650 and 1500 kDa, the distance will be multiplied by a factor of 3, and for data above 1500 kDa, the distance will be multiplied by a factor of 8. See the guess_peaks function in utils.helpers for more details.
- Example of min_height, min_distance and prominence for contrasts:
min_height=10, min_distance=4, prominence=4
- Parameters:
min_height (int, default 10) – Minimum height of the peaks.
min_distance (int, default 4) – Minimum distance between peaks.
prominence (int, default 4) – Minimum prominence of the peaks.
- import_file(file_path)[source]#
Import mass photometry data from HDF5 or CSV files.
This method loads contrast and mass data from supported file formats. NaN values are automatically removed from the imported data. The import operation is automatically logged with file information and data statistics.
- Parameters:
file_path (str) – Path to the data file. Supported formats are: - ‘.h5’ : HDF5 files with standard mass photometry structure - ‘.csv’ : CSV files with contrast and mass columns
- Raises:
ValueError – If the file format is not supported (not .h5 or .csv)
FileNotFoundError – If the specified file does not exist
KeyError – If required data columns are missing from the file
Notes
After import, the following attributes are populated: - self.contrasts : Array of contrast values with NaN removed - self.masses : Array of mass values with NaN removed (if available)
The logbook will record: - File path and type - Number of data points imported - Range of contrast and mass values
Examples
Import HDF5 data:
>>> model = PyPhotoMol() >>> model.import_file('experiment_data.h5') >>> print(f"Imported {len(model.contrasts)} contrast measurements")
Import CSV data:
>>> model.import_file('processed_data.csv') >>> print(f"Mass range: {model.masses.min():.1f} - {model.masses.max():.1f} kDa")
- class pyphotomol.MPAnalyzer[source]#
Bases:
object
A class to handle multiple PyPhotoMol instances. This is useful for batch processing of multiple files.
- apply_to_all(method_name, *args, names=None, **kwargs)[source]#
Apply a method to all or selected PyPhotoMol instances.
- Parameters:
method_name (str) – Name of the method to apply to instances
*args (tuple) – Positional arguments to pass to the method
names (list or str, optional) – Names of specific models to apply method to. If None (default), applies to all models.
**kwargs (dict) – Keyword arguments to pass to the method
Examples
Count binding events for all models:
>>> pms.apply_to_all('count_binding_events')
Create histograms for specific models only:
>>> pms.apply_to_all('create_histogram', names=['model1', 'model2'], ... use_masses=True, window=[0, 2000], bin_width=10)
Guess peaks for a single model:
>>> pms.apply_to_all('guess_peaks', names='model1', ... min_height=10, min_distance=4, prominence=4)
- create_plotting_config(repeat_colors=True)[source]#
Create configuration dataframes for plotting multiple PhotoMol models.
- Parameters:
repeat_colors (bool, default True) – If True, repeat the same color scheme for each model’s peaks. If False, use sequential colors across all peaks from all models.
- Returns:
tuple –
legends_df: DataFrame with legends, colors, and selection flags for Gaussian traces
colors_hist_df: DataFrame with histogram colors for each model
- Return type:
(legends_df, colors_hist_df)
- get_all_logbooks(save_to_file=None)[source]#
Get combined logbooks from all models plus batch operations.
- Parameters:
save_to_file (str, optional) – Optional file path to save combined logbook as JSON
- Returns:
Combined logbooks with batch and individual model logs
- Return type:
dict
- get_batch_logbook(as_dataframe=True, save_to_file=None)[source]#
Retrieve the batch logbook of all operations performed.
- Parameters:
as_dataframe (bool, default True) – If True, return as pandas DataFrame, else as list of dicts
save_to_file (str, optional) – Optional file path to save logbook as JSON
- Returns:
Batch logbook entries
- Return type:
pandas.DataFrame or list
- get_properties(variable)[source]#
Get properties from all PyPhotoMol instances.
- Parameters:
variable (str) – The property to get from each instance.
- Returns:
List of the specified property from each instance.
- Return type:
list
Examples
Get masses from all models:
>>> masses_list = pms.get_properties('masses')
Get fit tables from all models:
>>> fit_tables = pms.get_properties('fit_table')
- pyphotomol.contrasts_to_mass(contrasts, slope, intercept)[source]#
Function to convert masses from contrasts using known calibration parameters.
Caution! slope and intercept are based on f(mass) = contrast !!!! In other words, contrast = slope*mass + intercept
- Parameters:
contrasts (np.ndarray) – Contrasts to convert.
slope (float) – Slope of the calibration line.
intercept (float) – Intercept of the calibration line.
- Returns:
Converted masses in kDa.
- Return type:
np.ndarray
- pyphotomol.create_histogram(vector, window=[0, 2000], bin_width=10)[source]#
Creates an histogram of the provided vector within a specified window and bin width.
- Parameters:
vector (np.ndarray) – The data to create the histogram from.
window (list, default [0, 2000]) – The range of values to include in the histogram [min, max].
bin_width (float, default 10) – The width of each bin in the histogram.
- Returns:
histogram_centers (np.ndarray) – The x-coordinates of the histogram bins.
hist_counts (np.ndarray) – The counts of values in each bin (Y-axis of the histogram).
hist_nbins (int) – The number of bins in the histogram.
Examples
For contrast data:
>>> centers, counts, nbins = create_histogram(contrasts, window=[-1, 0], bin_width=0.0004)
For mass data:
>>> centers, counts, nbins = create_histogram(masses, window=[0, 2000], bin_width=10)
- pyphotomol.guess_peaks(x, histogram_centers, height=14, distance=4, prominence=8, masses=True)[source]#
Try to find peaks in the histogram data.
Automatically finds peaks in histogram data with adaptive parameters based on the data range. For mass data, different distance thresholds are used for different mass ranges to account for peak spacing variations.
- Parameters:
x (np.ndarray) – The histogram counts data to find peaks in.
histogram_centers (np.ndarray) – The centers of the histogram bins corresponding to x values.
height (int, default 14) – Minimum height of peaks.
distance (int, default 4) – Minimum distance between peaks (will be scaled for different mass ranges).
prominence (int, default 8) – Minimum prominence of peaks.
masses (bool, default True) – If True, find peaks in mass data; if False, find peaks in contrast data.
- Returns:
Histogram centers of the found peaks.
- Return type:
np.ndarray
Examples
For contrast data:
>>> peaks = guess_peaks(hist_counts, hist_centers, height=10, distance=4, prominence=4, masses=False)
For mass data:
>>> peaks = guess_peaks(hist_counts, hist_centers, height=14, distance=4, prominence=8, masses=True)
- pyphotomol.fit_histogram(hist_counts, hist_centers, guess_positions=[66, 148, 480], mean_tolerance=None, std_tolerance=None, threshold=40, baseline=0, masses=True, fit_baseline=False)[source]#
Fit a histogram with multiple truncated gaussians.
- Parameters:
hist_counts (np.ndarray) – The counts of values in each bin of the histogram.
hist_centers (np.ndarray) – The centers of the histogram bins.
guess_positions (list, default [66,148,480]) – Initial guesses for the positions of the peaks.
mean_tolerance (int, default 100) – Tolerance for the peak positions. If None, it will be copied from guess_positions.
std_tolerance (int, default 200) – Maximum standard deviation for the peaks. If None, it will be copied from guess_positions.
threshold (int, default 40 for masses in kDa units) – For masses, minimum value that can be observed. For contrasts, it is be the max value that can be observed. It should be a negative value.
baseline (float, default 0) – Baseline value to be added to the fit.
masses (bool, default True) – If True, the fit is for mass data; if False, it is for contrast data.
fit_baseline (bool, default False) – If True, the fit will include a baseline parameter. The ‘baseline’ argument will be ignored.
- Returns:
popt (np.ndarray) – Optimized parameters for the fit.
fit (np.ndarray) – Fitted values for the histogram. The first column is the x-coordinates, followed by the individual Gaussian fits and the total fit.
fit_error (np.ndarray) – Errors of the fitted parameters.
Examples
For contrast data:
>>> popt, fit, errors = fit_histogram(counts, centers, mean_tolerance=0.05, std_tolerance=0.1)
- pyphotomol.create_fit_table(popt, popt_error, fit, n_binding, n_unbinding, hist_centers, masses=True, include_errors=True)[source]#
Generate a pandas DataFrame that summarizes fit results
- Parameters:
popt (np.ndarray) – Optimized parameters from the fit.
popt_error (np.ndarray) – Errors of the fitted parameters.
fit (np.ndarray) – Fitted values for the histogram.
n_binding (int) – Number of binding events.
n_unbinding (int) – Number of unbinding events.
hist_centers (np.ndarray) – The centers of the histogram bins.
masses (bool, default True) – If True, the fit is for mass data; if False, it is for contrast data.
include_errors (bool, default True) – If True, include errors in the fit table.
- Returns:
fit_table – DataFrame containing the fit results.
- Return type:
pd.DataFrame
- pyphotomol.calibrate(calib_floats, fit_table)[source]#
Calibration based on contrasts histogram
- Parameters:
calib_floats (list) – List of calibration standards in kDa (e.g. [66, 146, 480]).
fit_table (pd.DataFrame) – DataFrame containing the fit results. Created by create_fit_table.
- Returns:
calibration_dic – Dictionary containing the calibration results: - ‘standards’: Calibration standards used. - ‘exp_points’: Expected points from the fit. - ‘fit_params’: Parameters of the fit. - ‘fit_r2’: R-squared value of the fit.
- Return type:
dict
- pyphotomol.import_file_h5(filename)[source]#
Import mass photometry data from HDF5 files generated by Refeyn instruments.
This function reads contrast and mass data from standard Refeyn HDF5 file formats. It automatically handles different data structures and performs calibration conversions when necessary. NaN values are filtered out automatically.
- Parameters:
filename (str) – Path to the HDF5 file to import
- Returns:
contrasts (np.ndarray) – Array of contrast values with NaN values removed
masses_kDa (np.ndarray or None) – Array of mass values in kDa units, or None if not available. If masses are not directly available but calibration data exists, they will be computed from contrasts using the calibration parameters.
Notes
The function searches for data in the following order: 1. Direct ‘contrasts’ and ‘masses_kDa’ datasets 2. ‘calibrated_values’ with calibration parameters 3. ‘per_movie_events’ for movie-based data
- Raises:
FileNotFoundError – If the specified file does not exist
KeyError – If required datasets are not found in the HDF5 file
ValueError – If the file format is not recognized or data is corrupted
Examples
Import standard Refeyn data:
>>> contrasts, masses = import_file_h5('experiment.h5') >>> print(f"Loaded {len(contrasts)} events") >>> if masses is not None: ... print(f"Mass range: {masses.min():.1f} - {masses.max():.1f} kDa")
- pyphotomol.import_csv(filename)[source]#
Import data from a CSV file generated by a Refeyn instrument.
The CSV file should contain columns ‘contrasts’ and optionally ‘masses_kDa’. NaN values are automatically filtered out from the imported data.
- Parameters:
filename (str) – Path to the CSV file to import
- Returns:
contrasts (np.ndarray) – Array of contrast values with NaN values removed
masses_kDa (np.ndarray or None) – Array of mass values in kDa, or None if not available in the CSV file
- Raises:
FileNotFoundError – If the specified file does not exist
KeyError – If the required ‘contrasts’ column is not found in the CSV file
ValueError – If the file format is not recognized or data is corrupted
Examples
Import CSV data with contrasts only:
>>> contrasts, masses = import_csv('contrasts_only.csv') >>> print(f"Loaded {len(contrasts)} contrast measurements") >>> print(f"Masses available: {masses is not None}")
Import CSV data with both contrasts and masses:
>>> contrasts, masses = import_csv('full_data.csv') >>> if masses is not None: ... print(f"Mass range: {masses.min():.1f} - {masses.max():.1f} kDa")
- pyphotomol.plot_histograms_and_fits(analyzer, legends_df=None, colors_hist=None, plot_config: PlotConfig = None, axis_config: AxisConfig = None, layout_config: LayoutConfig = None, legend_config: LegendConfig = None)[source]#
Create a comprehensive plot of PhotoMol fit data with histograms and Gaussian traces.
- Parameters:
analyzer (pyphotomol.MPAnalyzer or pyphotomol.PyPhotoMol) – MPAnalyzer instance containing multiple PyPhotoMol models - or a single PyPhotoMol instance
legends_df (pd.DataFrame, optional) – DataFrame containing legends, colors, and selections with columns [‘legends’, ‘color’, ‘select’, ‘show_legend’] This dataframe affects the fitted curves only, not the histograms.
colors_hist (list, str, or pd.DataFrame, optional) – List of colors for histograms (one per model) If a string, it will be used for all histograms. If a DataFrame, it should have a column ‘color’ with hex color codes.
plot_config (PlotConfig, optional) – General plot configuration (dimensions, format, contrasts, etc.)
axis_config (AxisConfig, optional) – Axis styling configuration (grid, line widths, etc.)
layout_config (LayoutConfig, optional) – Layout configuration (stacked, spacing, etc.)
legend_config (LegendConfig, optional) – Legend and labeling configuration
- Returns:
Configured plotly figure object
- Return type:
go.Figure
Examples
Simple plot with default settings:
>>> fig = plot_histograms_and_fits(analyzer, colors_hist=['blue', 'red']) >>> fig.show()
Customized plot with configuration objects:
>>> plot_config = PlotConfig(plot_width=800, contrasts=True, x_range=[0, 500]) >>> layout_config = LayoutConfig(stacked=True, vertical_spacing=0.05) >>> fig = plot_histograms_and_fits(analyzer, plot_config=plot_config, ... layout_config=layout_config) >>> fig.show()
Plot with custom x-axis limits:
>>> plot_config = PlotConfig(x_range=[100, 800]) # Zoom to 100-800 kDa range >>> fig = plot_histograms_and_fits(analyzer, plot_config=plot_config) >>> fig.show()
- pyphotomol.plot_histogram(analyzer, colors_hist=None, plot_config: PlotConfig = None, axis_config: AxisConfig = None, layout_config: LayoutConfig = None)[source]#
Create a plot with only histograms from PhotoMol data (wrapper around plot_histograms_and_fits).
This function is a simplified wrapper that creates histogram-only plots without requiring fitted data or legend configuration.
- Parameters:
analyzer (pyphotomol.MPAnalyzer or pyphotomol.PyPhotoMol) – MPAnalyzer instance containing multiple PyPhotoMol models or a single PyPhotoMol instance
colors_hist (list, optional) – List of colors for histograms (one per model)
plot_config (PlotConfig, optional) – General plot configuration (dimensions, format, contrasts, etc.)
axis_config (AxisConfig, optional) – Axis styling configuration (grid, line widths, etc.)
layout_config (LayoutConfig, optional) – Layout configuration (stacked, spacing, etc.)
- Returns:
Configured plotly figure object with histograms only
- Return type:
go.Figure
Examples
Create a simple histogram plot:
>>> fig = plot_histogram(analyzer, ['#FF5733', '#33C3FF']) >>> fig.show()
Create stacked normalized histograms:
>>> plot_config = PlotConfig(normalize=True) >>> layout_config = LayoutConfig(stacked=True) >>> fig = plot_histogram(analyzer, ['blue', 'red'], ... plot_config=plot_config, layout_config=layout_config) >>> fig.show()
- pyphotomol.config_fig(fig, plot_width=800, plot_height=600, plot_type='png', plot_title_for_download='plot')[source]#
Configure plotly figure with download options and toolbar settings.
- Parameters:
fig (go.Figure) – Plotly figure object
plot_width (int, default 800) – Width of the plot in pixels
plot_height (int, default 600) – Height of the plot in pixels
plot_type (str, default "png") – Format for downloading the plot (e.g., “png”, “jpeg”)
plot_title_for_download (str, default "plot") – Title for the downloaded plot file
- Returns:
Configured plotly figure
- Return type:
go.Figure
- pyphotomol.plot_calibration(mass, contrast, slope, intercept, plot_config: PlotConfig = None, axis_config: AxisConfig = None)[source]#
Create a scatter plot of mass vs contrast with calibration line.
This function creates a visualization showing the relationship between mass and ratiometric contrast, with a fitted calibration line overlaid. This is useful for visualizing calibration quality and outliers.
- Parameters:
mass (array-like) – Array of mass values in kDa
contrast (array-like) – Array of corresponding ratiometric contrast values
slope (float) – Slope of the calibration line (contrast = slope * mass + intercept)
intercept (float) – Intercept of the calibration line
plot_config (PlotConfig, optional) – General plot configuration (dimensions, format, axis size, etc.)
axis_config (AxisConfig, optional) – Axis styling configuration (grid, line widths, etc.)
- Returns:
Plotly figure object containing the mass vs contrast calibration plot
- Return type:
go.Figure
Examples
Plot mass vs contrast calibration:
>>> import numpy as np >>> from pyphotomol.utils.plotting import plot_calibration, PlotConfig, AxisConfig >>> >>> # Simulated calibration data >>> mass = np.array([66, 146, 480]) >>> contrast = np.array([-0.1, -0.2, -0.5]) >>> slope = -0.001 >>> intercept = 0.02 >>> >>> # Simple plot with defaults >>> fig = plot_calibration(mass, contrast, slope, intercept) >>> fig.show() >>> >>> # Customized plot >>> plot_config = PlotConfig(plot_width=600, plot_height=400, font_size=12) >>> axis_config = AxisConfig(showgrid_x=False, n_y_axis_ticks=6) >>> fig = plot_calibration(mass, contrast, slope, intercept, ... plot_config=plot_config, axis_config=axis_config) >>> fig.show()
- class pyphotomol.PlotConfig(plot_width: int = 1000, plot_height: int = 400, plot_type: str = 'png', font_size: int = 14, normalize: bool = False, contrasts: bool = False, cst_factor_for_contrast: float = 1, x_range: List[float] | None = None)[source]#
Bases:
object
General plot configuration
- __init__(plot_width: int = 1000, plot_height: int = 400, plot_type: str = 'png', font_size: int = 14, normalize: bool = False, contrasts: bool = False, cst_factor_for_contrast: float = 1, x_range: List[float] | None = None) None #
- contrasts: bool = False#
- cst_factor_for_contrast: float = 1#
- font_size: int = 14#
- normalize: bool = False#
- plot_height: int = 400#
- plot_type: str = 'png'#
- plot_width: int = 1000#
- x_range: List[float] | None = None#
- class pyphotomol.AxisConfig(showgrid_x: bool = True, showgrid_y: bool = True, n_y_axis_ticks: int = 3, axis_linewidth: int = 1, axis_tickwidth: int = 1, axis_gridwidth: int = 1)[source]#
Bases:
object
Axis styling configuration
- __init__(showgrid_x: bool = True, showgrid_y: bool = True, n_y_axis_ticks: int = 3, axis_linewidth: int = 1, axis_tickwidth: int = 1, axis_gridwidth: int = 1) None #
- axis_gridwidth: int = 1#
- axis_linewidth: int = 1#
- axis_tickwidth: int = 1#
- n_y_axis_ticks: int = 3#
- showgrid_x: bool = True#
- showgrid_y: bool = True#
- class pyphotomol.LayoutConfig(stacked: bool = False, show_subplot_titles: bool = False, vertical_spacing: float = 0.1, shared_yaxes: bool = True, extra_padding_y_label: float = 0)[source]#
Bases:
object
Layout and spacing configuration
- __init__(stacked: bool = False, show_subplot_titles: bool = False, vertical_spacing: float = 0.1, shared_yaxes: bool = True, extra_padding_y_label: float = 0) None #
- extra_padding_y_label: float = 0#
- show_subplot_titles: bool = False#
- stacked: bool = False#
- vertical_spacing: float = 0.1#
- class pyphotomol.LegendConfig(add_masses_to_legend: bool = True, add_percentage_to_legend: bool = False, add_labels: bool = True, add_percentages: bool = True, line_width: int = 3)[source]#
Bases:
object
Legend and labeling configuration
- __init__(add_masses_to_legend: bool = True, add_percentage_to_legend: bool = False, add_labels: bool = True, add_percentages: bool = True, line_width: int = 3) None #
- add_labels: bool = True#
- add_masses_to_legend: bool = True#
- add_percentage_to_legend: bool = False#
- add_percentages: bool = True#
- line_width: int = 3#
Subpackages#
- pyphotomol.utils package
- Submodules
- pyphotomol.utils.data_import module
- pyphotomol.utils.helpers module
- pyphotomol.utils.palette module
- pyphotomol.utils.plotting module
Submodules#
pyphotomol.main module#
- pyphotomol.main.log_method(func)[source]#
Decorator to automatically handle errors and logging.
This decorator will: 1. Handle exceptions by logging them to the logbook 2. Re-raise exceptions after logging Note: The actual success logging is handled by each method individually
- pyphotomol.main.log_batch_method(func)[source]#
Decorator to automatically handle errors and logging for MPAnalyzer methods.
- class pyphotomol.main.PyPhotoMol[source]#
Bases:
object
Main class for analyzing mass photometry data.
The PyPhotoMol class provides a comprehensive suite of tools for importing, analyzing, and visualizing mass photometry data. It supports data import from HDF5 and CSV files, histogram creation and analysis, peak detection, Gaussian fitting, and mass-contrast calibration.
All operations are automatically logged to a comprehensive logbook that tracks parameters, results, and any errors encountered during analysis.
- contrasts#
Array of contrast values from imported data
- Type:
np.ndarray or None
- masses#
Array of mass values (in kDa) from imported data or converted from contrasts
- Type:
np.ndarray or None
- histogram_centers#
Bin centers for created histograms
- Type:
np.ndarray or None
- hist_counts#
Count values for histogram bins
- Type:
np.ndarray or None
- hist_nbins#
Number of bins in the histogram
- Type:
int or None
- hist_window#
[min, max] window used for histogram creation
- Type:
list or None
- bin_width#
Width of histogram bins
- Type:
float or None
- hist_data_type#
Type of data used for histogram (‘masses’ or ‘contrasts’)
- Type:
str or None
- peaks_guess#
Positions of detected peaks in the histogram
- Type:
np.ndarray or None
- nbinding#
Number of binding events detected
- Type:
int or None
- nunbinding#
Number of unbinding events detected
- Type:
int or None
- fitted_params#
Parameters from Multi-Gaussian fitting
- Type:
np.ndarray or None
- fitted_data#
Fitted curve data points
- Type:
np.ndarray or None
- fitted_params_errors#
Error estimates for fitted parameters
- Type:
np.ndarray or None
- masses_fitted#
Mass values corresponding to fitted peaks
- Type:
np.ndarray or None
- baseline#
Baseline value used for fitting operations (default: 0)
- Type:
float
- fit_table#
Summary table of fitting results
- Type:
pd.DataFrame or None
- calibration_dic#
Dictionary containing mass-contrast calibration parameters
- Type:
dict or None
- logbook#
List of all operations performed, with timestamps and parameters
- Type:
list
Examples
Basic workflow for mass photometry analysis:
>>> model = PyPhotoMol() >>> model.import_file('data.h5') >>> model.count_binding_events() >>> model.create_histogram(use_masses=True, window=[0, 1000], bin_width=10) >>> model.guess_peaks() >>> model.fit_histogram(peaks_guess=model.peaks_guess, mean_tolerance=200, std_tolerance=300) >>> model.print_logbook_summary()
- __init__()[source]#
Initialize a new PyPhotoMol instance.
Creates an empty instance with all data properties set to None and initializes an empty logbook for operation tracking.
- get_logbook(as_dataframe=True, save_to_file=None)[source]#
Retrieve the logbook of all operations performed on this instance.
The logbook contains a complete history of all method calls, including parameters used, results obtained, timestamps, and any errors encountered. This provides full traceability of the analysis workflow.
- Parameters:
as_dataframe (bool, default True) – If True, return logbook as a pandas DataFrame for easy analysis. If False, return as a list of dictionaries.
save_to_file (str, optional) – If provided, save the logbook to this file path as JSON format.
- Returns:
Logbook entries containing operation history. DataFrame columns include: - timestamp: ISO format timestamp of when operation was performed - method: Name of the method that was called - parameters: Dictionary of parameters passed to the method - result_summary: Summary of results produced (for successful operations) - notes: Additional notes about the operation - success: Boolean indicating if operation completed successfully - error: Error message (only present for failed operations)
- Return type:
pandas.DataFrame or list
Examples
Get logbook as DataFrame for analysis:
>>> model = PyPhotoMol() >>> model.import_file('data.h5') >>> logbook_df = model.get_logbook() >>> print(logbook_df[['timestamp', 'method', 'success']])
Save logbook to file:
>>> model.get_logbook(save_to_file='analysis_log.json')
Get raw logbook data:
>>> raw_logbook = model.get_logbook(as_dataframe=False)
- import_file(file_path)[source]#
Import mass photometry data from HDF5 or CSV files.
This method loads contrast and mass data from supported file formats. NaN values are automatically removed from the imported data. The import operation is automatically logged with file information and data statistics.
- Parameters:
file_path (str) – Path to the data file. Supported formats are: - ‘.h5’ : HDF5 files with standard mass photometry structure - ‘.csv’ : CSV files with contrast and mass columns
- Raises:
ValueError – If the file format is not supported (not .h5 or .csv)
FileNotFoundError – If the specified file does not exist
KeyError – If required data columns are missing from the file
Notes
After import, the following attributes are populated: - self.contrasts : Array of contrast values with NaN removed - self.masses : Array of mass values with NaN removed (if available)
The logbook will record: - File path and type - Number of data points imported - Range of contrast and mass values
Examples
Import HDF5 data:
>>> model = PyPhotoMol() >>> model.import_file('experiment_data.h5') >>> print(f"Imported {len(model.contrasts)} contrast measurements")
Import CSV data:
>>> model.import_file('processed_data.csv') >>> print(f"Mass range: {model.masses.min():.1f} - {model.masses.max():.1f} kDa")
- create_histogram(use_masses=True, window=[0, 2000], bin_width=10)[source]#
Create a histogram from imported contrast or mass data.
This method generates a histogram from the imported data, which is essential for subsequent peak detection and fitting operations. The histogram parameters can be customized for different types of analysis.
- Parameters:
use_masses (bool, default True) – If True, create histogram from mass data (requires masses to be available). If False, create histogram from contrast data.
window (list of two floats, default [0, 2000]) – Range for the histogram as [min, max]. Units depend on data type: - For masses: typically [0, 2000] kDa - For contrasts: typically [-1, 0] (e.g., [-0.8, -0.2])
bin_width (float, default 10) – Width of histogram bins. Units depend on data type: - For masses: typically 10-50 kDa - For contrasts: typically 0.0004-0.001
- Raises:
AttributeError – If no data has been imported yet
ValueError – If use_masses=True but no mass data is available
Notes
After histogram creation, the following attributes are populated: - self.histogram_centers : Bin center positions - self.hist_counts : Count values for each bin - self.hist_nbins : Number of bins created - self.hist_window : Window used for histogram - self.bin_width : Bin width used - self.hist_data_type : Data type used (‘masses’ or ‘contrasts’)
Examples
Create mass histogram for protein analysis:
>>> model.create_histogram(use_masses=True, window=[0, 1000], bin_width=20)
Create contrast histogram for calibration:
>>> model.create_histogram(use_masses=False, window=[-0.8, -0.2], bin_width=0.0004)
High-resolution histogram for detailed analysis:
>>> model.create_histogram(use_masses=True, window=[50, 200], bin_width=5)
- guess_peaks(min_height=10, min_distance=4, prominence=4)[source]#
Guess peaks in the histogram data.
The different arguments will be adjusted according to the region of the histogram. For example, the given distance will be used for mass data between 0 and 650 kDa, between 650 and 1500 kDa, the distance will be multiplied by a factor of 3, and for data above 1500 kDa, the distance will be multiplied by a factor of 8. See the guess_peaks function in utils.helpers for more details.
- Example of min_height, min_distance and prominence for contrasts:
min_height=10, min_distance=4, prominence=4
- Parameters:
min_height (int, default 10) – Minimum height of the peaks.
min_distance (int, default 4) – Minimum distance between peaks.
prominence (int, default 4) – Minimum prominence of the peaks.
- contrasts_to_masses(slope=1.0, intercept=0.0)[source]#
Convert contrasts to masses using a linear transformation. We assume a calibratio was done using f(mass) = slope * contrast + intercept
- Parameters:
slope (float, default 1.0) – Slope of the linear transformation.
intercept (float, default 0.0) – Intercept of the linear transformation.
- fit_histogram(peaks_guess, mean_tolerance=None, std_tolerance=None, threshold=None, baseline=0.0, fit_baseline=False)[source]#
Fit the histogram data to the guessed peaks. We use a multi-Gaussian fit to the histogram data.
The data type (masses or contrasts) is automatically detected from the histogram that was previously created using create_histogram().
- Parameters:
peaks_guess (list) – List of guessed peaks.
mean_tolerance (float) – Tolerance for the mean of the Gaussian fit. If None, it will be inferred from the peaks guesses.
std_tolerance (float) – Tolerance for the standard deviation of the Gaussian fit. If None, it will be inferred from the peaks guesses.
threshold (float, optional) – For masses: minimum value that can be observed (in kDa units). Default is 40. For contrasts: maximum value that can be observed (should be negative). Default is -0.0024. If None, defaults are applied based on detected data type.
baseline (float, default 0.0) – Baseline value to be subtracted from the fit.
fit_baseline (bool, default False) – Whether to fit a baseline to the histogram. If True, a baseline will be included in the fit and the ‘baseline’ argument will be ignored.
Examples
Fit histogram after creating it:
>>> model.create_histogram(use_masses=True, window=[0, 2000], bin_width=10) >>> model.guess_peaks() >>> model.fit_histogram(model.peaks_guess, mean_tolerance=100, std_tolerance=200)
- class pyphotomol.main.MPAnalyzer[source]#
Bases:
object
A class to handle multiple PyPhotoMol instances. This is useful for batch processing of multiple files.
- get_batch_logbook(as_dataframe=True, save_to_file=None)[source]#
Retrieve the batch logbook of all operations performed.
- Parameters:
as_dataframe (bool, default True) – If True, return as pandas DataFrame, else as list of dicts
save_to_file (str, optional) – Optional file path to save logbook as JSON
- Returns:
Batch logbook entries
- Return type:
pandas.DataFrame or list
- get_all_logbooks(save_to_file=None)[source]#
Get combined logbooks from all models plus batch operations.
- Parameters:
save_to_file (str, optional) – Optional file path to save combined logbook as JSON
- Returns:
Combined logbooks with batch and individual model logs
- Return type:
dict
- import_files(files, names=None)[source]#
Load multiple files into PyPhotoMol instances.
- Parameters:
files (list) – List of file paths to load.
names (list, optional) – List of names for the PyPhotoMol instances.
- apply_to_all(method_name, *args, names=None, **kwargs)[source]#
Apply a method to all or selected PyPhotoMol instances.
- Parameters:
method_name (str) – Name of the method to apply to instances
*args (tuple) – Positional arguments to pass to the method
names (list or str, optional) – Names of specific models to apply method to. If None (default), applies to all models.
**kwargs (dict) – Keyword arguments to pass to the method
Examples
Count binding events for all models:
>>> pms.apply_to_all('count_binding_events')
Create histograms for specific models only:
>>> pms.apply_to_all('create_histogram', names=['model1', 'model2'], ... use_masses=True, window=[0, 2000], bin_width=10)
Guess peaks for a single model:
>>> pms.apply_to_all('guess_peaks', names='model1', ... min_height=10, min_distance=4, prominence=4)
- get_properties(variable)[source]#
Get properties from all PyPhotoMol instances.
- Parameters:
variable (str) – The property to get from each instance.
- Returns:
List of the specified property from each instance.
- Return type:
list
Examples
Get masses from all models:
>>> masses_list = pms.get_properties('masses')
Get fit tables from all models:
>>> fit_tables = pms.get_properties('fit_table')
- create_plotting_config(repeat_colors=True)[source]#
Create configuration dataframes for plotting multiple PhotoMol models.
- Parameters:
repeat_colors (bool, default True) – If True, repeat the same color scheme for each model’s peaks. If False, use sequential colors across all peaks from all models.
- Returns:
tuple –
legends_df: DataFrame with legends, colors, and selection flags for Gaussian traces
colors_hist_df: DataFrame with histogram colors for each model
- Return type:
(legends_df, colors_hist_df)