pychemelt.utils package#
Submodules#
pychemelt.utils.constants module#
pychemelt.utils.files module#
This module contains helper functions to parse Differential Scanning Fluorimetry files from different instrument providers Author: Osvaldo Burastero
All functions that import files should return: - signal_data_dic: dictionary with the signal data, one entry per signal - temp_data_dic: dictionary with the temperature data, one entry per signal - conditions: list with the names of the samples - signals: list with the names of the signals
A signal can be “350nm”, “330nm”, “Scattering”, “Ratio”, “Turbidity”, “Ratio 350nm/330nm”, etc. The length of the lists in signal_data_dic and temp_data_dic should be the same as the length of conditions
- pychemelt.utils.files.load_csv_file(file)[source]#
Load a CSV file containing temperature and signal columns and return structured data.
- Parameters:
file (str) – Path to the csv file
- Returns:
signal_data_dic (dict) – Dictionary mapping signal names to lists of 1D numpy arrays (one array per condition)
temp_data_dic (dict) – Dictionary mapping signal names to lists of temperature arrays corresponding to the signals
conditions (list) – List of condition names
signals (numpy.ndarray) – Array of signal name strings
- pychemelt.utils.files.load_aunty_xlsx(file_path)[source]#
Load AUNTY-format multi-sheet Excel file where each sheet is a condition.
- Parameters:
file_path (str) – Path to the AUNTY xlsx file
- pychemelt.utils.files.load_quantstudio_txt(QSfile)[source]#
Load QuantStudio TXT files (.txt) exported from QuantStudio instruments.
- Parameters:
QSfile (str) – Path to the QuantStudio txt file
- Returns:
signal_data_dic (dict) – Dictionary with signal data (key: ‘Fluorescence’)
temp_data_dic (dict) – Dictionary with temperature arrays per condition
conditions (list) – List of condition names (well identifiers)
signals (numpy.ndarray) – Array with signal name(s)
- pychemelt.utils.files.load_thermofluor_xlsx(thermofluor_file)[source]#
Load DSF Thermofluor xls file and extract data.
- Parameters:
thermofluor_file (str) – Path to the xls file
- Returns:
signal_data_dic (dict) – Dictionary with signal data
temp_data_dic (dict) – Dictionary with temperature data
conditions (list) – List of conditions
- pychemelt.utils.files.load_nanoDSF_xlsx(processed_dsf_file)[source]#
Load nanotemper processed xlsx file and extract relevant data.
- Parameters:
processed_dsf_file (str) – Path to the processed xlsx file
- Returns:
signal_data_dic (dict) – Dictionary with signal data
temp_data_dic (dict) – Dictionary with temperature data
conditions (list) – List of conditions
signals (numpy.ndarray) – Array of signal names
- pychemelt.utils.files.load_panta_xlsx(pantaFile)[source]#
Load the xlsx file generated by a Prometheus Panta instrument.
- Parameters:
pantaFile (str) – Path to the xlsx file
- Returns:
signal_data_dic (dict) – Dictionary with signal data
temp_data_dic (dict) – Dictionary with temperature data
conditions (list) – List of conditions
signals (numpy.ndarray) – List of signal names, such as 330nm and 350nm
- pychemelt.utils.files.load_uncle_multi_channel(uncle_file)[source]#
Function to load the data from the UNCLE instrument.
- Parameters:
uncle_file (str) – Path to the xlsx file
- Returns:
signal_data_dic (dict) – Dictionary with signal data (keys: wavelength strings like ‘350 nm’)
temp_data_dic (dict) – Dictionary with temperature arrays per condition
conditions (list) – List of sample names
signals (list) – List of wavelength strings
- pychemelt.utils.files.load_mx3005p_txt(filename)[source]#
Load Agilent MX3005P qPCR txt file and extract data
- Parameters:
filename (str) – Path to the MX3005P txt file. The second column has the fluorescence data, and the third column the temperature. Wells are separated by rows containing a sentence like this one: ‘Segment 2 Plateau 1 Well 1’
- Returns:
signal_data_dic (dict) – Dictionary with signal data
temp_data_dic (dict) – Dictionary with temperature data
conditions (list) – List of conditions (well numbers)
signals (numpy.ndarray) – List of signal names
- pychemelt.utils.files.detect_file_type(file)[source]#
Detect the type of file based on its extension and content.
- Parameters:
file (str) – Path to the file
- Returns:
Type of file (e.g., ‘supr’, ‘csv’, ‘prometheus’, ‘panta’, ‘uncle’, ‘mx3005p’, ‘quantstudio’, etc.) or None if unknown
- Return type:
str or None
- pychemelt.utils.files.detect_encoding(file_path)[source]#
Detect the encoding of a file by trying common encodings.
- Parameters:
file_path (str) – Path to the file
- Returns:
Detected encoding or the string ‘Unknown encoding’
- Return type:
str
- pychemelt.utils.files.read_jasco_thermal_ramp(file)[source]#
Given a JASCO file with a thermal ramp, this function reads the data
The data is given in chuncks:
- Channel 1
4.94 14.93 25.08 35.02 45.03 55.07 64.99 75.04 85.08 95.03
250 -0.310564 -0.112003 0.0199744 -0.217282 -0.238716 -0.173046 0.00129784 -0.394731 -0.687165 -1.40543
- Parameters:
file (str) – Path to the JASCO thermal ramp file
- Returns:
signal_data_dic (dict) – Dictionary with signal data (key: wavelength string like ‘250 nm’)
temp_data_dic (dict) – Dictionary with temperature arrays per condition (only one condition in this case)
conditions (list) – File name as condition
wavelength_data (list) – List of wavelength strings (e.g., ‘250 nm’, ‘255 nm’, etc.)
pychemelt.utils.fitting module#
This module contains helper functions to fit unfolding data Author: Osvaldo Burastero
- pychemelt.utils.fitting.fit_line_robust(x, y)[source]#
Fit a line to the data using robust fitting
- Parameters:
x (array-like) – x data
y (array-like) – y data
- Returns:
m (float) – Slope of the fitted line
b (float) – Intercept of the fitted line
- pychemelt.utils.fitting.fit_quadratic_robust(x, y)[source]#
Fit a quadratic equation to the data using robust fitting
- Parameters:
x (array-like) – x data
y (array-like) – y data
- Returns:
a (float) – Quadratic coefficient of the fitted polynomial
b (float) – Linear coefficient of the fitted polynomial
c (float) – Constant coefficient of the fitted polynomial
- pychemelt.utils.fitting.fit_exponential_robust(x, y)[source]#
Fit an exponential function to the data using robust fitting.
Notes
Temperatures should be shifted to the reference (Tref) before calling this function.
- Parameters:
x (array-like) – x data
y (array-like) – y data
- Returns:
a (float) – Baseline
c (float) – Pre-exponential factor
alpha (float) – Exponential factor
- pychemelt.utils.fitting.fit_thermal_unfolding(list_of_temperatures, list_of_signals, initial_parameters, low_bounds, high_bounds, signal_fx, baseline_native_fx, baseline_unfolded_fx, Cp, list_of_oligomer_conc=None)[source]#
Fit the thermal unfolding profile of many curves at the same time.
This performs global fitting of shared thermodynamic parameters with per-curve baselines.
- Parameters:
list_of_temperatures (list of array-like) – List of temperature arrays for each dataset
list_of_signals (list of array-like) – List of signal arrays for each dataset
initial_parameters (array-like) – Initial guess for the parameters
low_bounds (array-like) – Lower bounds for the parameters
high_bounds (array-like) – Upper bounds for the parameters
signal_fx (callable) – Function to calculate the signal based on the parameters
baseline_native_fx (callable) – function to calculate the native state baseline
baseline_unfolded_fx (callable) – function to calculate the unfolded state baseline
Cp (float) – Heat capacity change (passed to signal_fx)
list_of_oligomer_conc (list, optional) – List of oligomer concentrations for each dataset (if applicable)
- Returns:
global_fit_params (numpy.ndarray) – Fitted global parameters
cov (numpy.ndarray) – Covariance matrix of the fitted parameters
predicted_lst (list of numpy.ndarray) – Predicted signals for each dataset based on the fitted parameters
- pychemelt.utils.fitting.fit_tc_unfolding_single_slopes(list_of_temperatures, list_of_signals, denaturant_concentrations, initial_parameters, low_bounds, high_bounds, signal_fx, baseline_native_fx, baseline_unfolded_fx, list_of_oligomer_conc=None, fit_m1=False, cp_value=None, tm_value=None, dh_value=None, method='least_sq')[source]#
Vectorized and optimized version of global thermal unfolding fitting.
- Parameters:
- list_of_temperatureslist of array-like
Temperature arrays for each dataset
- list_of_signalslist of array-like
Signal arrays for each dataset
- denaturant_concentrationslist
Denaturant concentrations (one per dataset)
- initial_parametersarray-like
Initial guess for parameters
- low_boundsarray-like
Lower bounds for parameters
- high_boundsarray-like
Upper bounds for parameters
- signal_fxcallable
Signal model function
- baseline_native_fxcallable
function to calculate the native state baseline
- baseline_unfolded_fxcallable
function to calculate the unfolded state baseline
- list_of_oligomer_conclist, optional
Oligomer concentrations per dataset
- fit_m1bool, optional
Whether to fit temperature dependence of m-value
- cp_value, tm_value, dh_valuefloat or None, optional
Optional fixed thermodynamic parameters
- methodstr, optional
Optimization method (‘least_sq’ or ‘curve_fit’)
- :returns: * **global_fit_params (numpy.ndarray) – Fitted global parameters**
cov (numpy.ndarray) – Covariance matrix
predicted_lst (list of numpy.ndarray) – Predicted signals per dataset
Vectorized fitting of thermochemical unfolding curves for multiple signal types sharing thermodynamic parameters and slopes, using least_squares.
- Parameters:
list_of_temperatures (list of array-like) – Temperature arrays for each dataset
list_of_signals (list of array-like) – Signal arrays for each dataset
signal_ids (list of int) – Signal-type id for each dataset (0..n_signals-1)
denaturant_concentrations (list) – Denaturant concentrations for each dataset (flattened across signals)
initial_parameters (array-like) – Initial guess for the parameters
low_bounds (array-like) – Lower bounds for the parameters
high_bounds (array-like) – Upper bounds for the parameters
signal_fx (callable) –
Signal model function baseline_native_fx : callable
function to calculate the baseline for the native state
baseline_unfolded_fx (callable) – function to calculate the baseline for the unfolded state
list_of_oligomer_conc (list, optional) – Oligomer concentrations per dataset
fit_m1 (bool, optional) – Whether to fit temperature dependence of m-value
cp_value (float or None, optional) – Optional fixed thermodynamic parameters
tm_value (float or None, optional) – Optional fixed thermodynamic parameters
dh_value (float or None, optional) – Optional fixed thermodynamic parameters
- Returns:
global_fit_params (numpy.ndarray) – Fitted global parameters
cov (numpy.ndarray) – Covariance matrix
predicted_lst (list of numpy.ndarray) – Predicted signals per dataset
- pychemelt.utils.fitting.fit_tc_unfolding_many_signals(list_of_temperatures, list_of_signals, signal_ids, denaturant_concentrations, initial_parameters, low_bounds, high_bounds, signal_fx, baseline_native_fx, baseline_unfolded_fx, oligomer_concentrations=None, fit_m1=False, model_scale_factor=False, scale_factor_exclude_ids=[], cp_value=None, fit_native_den_slope=True, fit_unfolded_den_slope=True)[source]#
Fit thermochemical unfolding curves for many signals (optimized variant).
- Parameters:
list_of_temperatures (list of array-like) – Temperature arrays for each dataset.
list_of_signals (list of array-like) – Signal arrays for each dataset.
signal_ids (list of int) – Signal-type id for each dataset (0..n_signals-1)
denaturant_concentrations (list) – Denaturant concentrations for each dataset (flattened across signals)
initial_parameters (array-like) – Initial guess for the parameters
low_bounds (array-like) – Lower bounds for the parameters
high_bounds (array-like) – Upper bounds for the parameters
signal_fx (callable) – Signal model function
baseline_native_fx (callable) – function to calculate the native state baseline
baseline_unfolded_fx (callable) – function to calculate the unfolded state baseline
oligomer_concentrations (list, optional) – Oligomer concentrations per dataset (used by oligomeric models)
fit_m1 (bool, optional) – Whether to include and fit temperature dependence of the m-value (m1)
model_scale_factor (bool, optional) – If True, include a per-denaturant concentration scale factor to account for intensity differences
scale_factor_exclude_ids (list, optional) – IDs of scale factors to exclude / fix to 1
cp_value (float or None, optional) – If provided, Cp is fixed to this value and not fitted
fit_native_den_slope (bool, optional) – Whether to fit denaturant dependence of baselines.
fit_unfolded_den_slope (bool, optional) – Whether to fit denaturant dependence of baselines.
- Returns:
global_fit_params (numpy.ndarray) – Fitted global parameters
cov (numpy.ndarray) – Covariance matrix
predicted_lst (list of numpy.ndarray) – Predicted signals per dataset
- pychemelt.utils.fitting.fit_oligomer_unfolding_single_slopes(list_of_temperatures, list_of_signals, oligomer_concentrations, initial_parameters, low_bounds, high_bounds, signal_fx, baseline_native_fx, baseline_unfolded_fx, cp_value=None, tm_value=None, dh_value=None)[source]#
Vectorized and optimized version of global thermal unfolding fitting. of oligomers
- Parameters:
list_of_temperatures (list of array-like) – Temperature arrays for each dataset
list_of_signals (list of array-like) – Signal arrays for each dataset
oligomer_concentrations (list) – sample concentrations of the oligomeric complex (one per dataset)
initial_parameters (array-like) – Initial guess for parameters
low_bounds (array-like) – Lower bounds for parameters
high_bounds (array-like) – Upper bounds for parameters
signal_fx (callable) – Signal model function
baseline_native_fx (callable) – function to calculate the native state baseline
baseline_unfolded_fx (callable) – function to calculate the unfolded state baseline
cp_value (float or None, optional) – Optional fixed thermodynamic parameters
tm_value (float or None, optional) – Optional fixed thermodynamic parameters
dh_value (float or None, optional) – Optional fixed thermodynamic parameters
- Returns:
global_fit_params (numpy.ndarray) – Fitted global parameters
cov (numpy.ndarray) – Covariance matrix
predicted_lst (list of numpy.ndarray) – Predicted signals per dataset
Vectorized fitting of oligomer thermal unfolding curves for multiple signal types sharing thermodynamic parameters and slopes, using least_squares.
- Parameters:
list_of_temperatures (list of array-like) – Temperature arrays for each dataset.
list_of_signals (list of array-like) – Signal arrays for each dataset.
signal_ids (list of int) – Signal-type id for each dataset (0..n_signals-1)
oligomer_concentrations (list) – sample concentrations of the oligomeric complex for each dataset (flattened across signals)
initial_parameters (array-like) – Initial guess for the parameters
low_bounds (array-like) – Lower bounds for the parameters
high_bounds (array-like) – Upper bounds for the parameters
signal_fx (callable) –
Signal model function baseline_native_fx : callable
function to calculate the baseline for the native state
baseline_unfolded_fx (callable) – function to calculate the baseline for the unfolded state
cp_value (float or None, optional) – Optional fixed thermodynamic parameters
tm_value (float or None, optional) – Optional fixed thermodynamic parameters
dh_value (float or None, optional) – Optional fixed thermodynamic parameters
- Returns:
global_fit_params (numpy.ndarray) – Fitted global parameters
cov (numpy.ndarray) – Covariance matrix
predicted_lst (list of numpy.ndarray) – Predicted signals per dataset
- pychemelt.utils.fitting.fit_oligomer_unfolding_many_signals(list_of_temperatures, list_of_signals, signal_ids, oligomer_concentrations, initial_parameters, low_bounds, high_bounds, signal_fx, baseline_native_fx, baseline_unfolded_fx, model_scale_factor=False, scale_factor_exclude_ids=[], cp_value=None, fit_native_olig_slope=True, fit_unfolded_olig_slope=True)[source]#
Fit thermal unfolding curves of oligomers for many signals (optimized variant).
- Parameters:
- list_of_temperatureslist of array-like
Temperature arrays for each dataset
- list_of_signalslist of array-like
Signal arrays for each dataset
- signal_idslist of int
Signal-type id for each dataset (0..n_signals-1)
- oligomer_concentrationslist
sample concentrations of the oligomeric complex for each dataset (flattened across signals)
- initial_parametersarray-like
Initial guess for the parameters
- low_boundsarray-like
Lower bounds for the parameters
- high_boundsarray-like
Upper bounds for the parameters
- signal_fxcallable
Signal model function
- baseline_native_fxcallable
function to calculate the native state baseline
- baseline_unfolded_fxcallable
function to calculate the unfolded state baseline
- model_scale_factorbool, optional
If True, include a per-oligomeric concentration scale factor to account for intensity differences
- scale_factor_exclude_idslist, optional
IDs of scale factors to exclude / fix to 1
- cp_valuefloat or None, optional
If provided, Cp is fixed to this value and not fitted
- fit_native_olig_slope, fit_unfolded_olig_slopebool, optional
Whetever to fit the dependence of the slopes of the baselines
- Returns:
- global_fit_paramsnumpy.ndarray
Fitted global parameters
- covnumpy.ndarray
Covariance matrix
- predicted_lstlist of numpy.ndarray
Predicted signals per dataset
pychemelt.utils.fractions module#
This module contains helper functions to obtain the amount of folded/intermediate/unfolded (etc.) protein Author: Osvaldo Burastero
- pychemelt.utils.fractions.fn_two_state_monomer(K)[source]#
Given the equilibrium constant K of N <-> U, return the fraction of folded protein.
- Parameters:
K (float) – Equilibrium constant of the reaction N <-> U
- Returns:
Fraction of folded protein
- Return type:
float
- pychemelt.utils.fractions.fu_two_state_dimer(K, C)[source]#
Given the equilibrium constant K, of N2 <-> 2U, and the concentration of dimer equivalent C, return the fraction of unfolded protein
- Parameters:
K (float) – Equilibrium constant of the reaction N2 <-> 2U
C (float) – Total concentration of the protein in dimer equivalents
- Returns:
Fraction of unfolded protein
- Return type:
float
- pychemelt.utils.fractions.fu_two_state_trimer(K, C)[source]#
Given the equilibrium constant K, of N3 <-> 3U, and the concentration of trimer equivalent C, return the fraction of unfolded protein
- Parameters:
K (float) – Equilibrium constant of the reaction N3 <-> 3U
C (float) – Total concentration of the protein in trimer equivalents
- Returns:
Fraction of unfolded protein
- Return type:
float
- pychemelt.utils.fractions.fu_two_state_tetramer(K, C)[source]#
Given the equilibrium constant K, of N4 <-> 4U, and the concentration of tetramer equivalent C, return the fraction of folded protein
- Parameters:
K (float) – Equilibrium constant of the reaction N4 <-> 4U
C (float) – Total concentration of the protein in tetramer equivalents
- Returns:
Fraction of unfolded protein
- Return type:
float
pychemelt.utils.math module#
This module contains helper functions for mathematical operations Author: Osvaldo Burastero
- pychemelt.utils.math.temperature_to_kelvin(T)[source]#
Convert temperature from Celsius to Kelvin if necessary.
- Parameters:
T (array-like) – Temperature values
- Returns:
Temperature values in Kelvin
- Return type:
array-like
- pychemelt.utils.math.temperature_to_celsius(T)[source]#
Convert temperature from Kelvin to Celsius if necessary.
- Parameters:
T (array-like) – Temperature values
- Returns:
Temperature values in Celsius
- Return type:
array-like
- pychemelt.utils.math.shift_temperature(T)[source]#
Shift temperature to be relative to Tref_cst in Kelvin.
- Parameters:
T (array-like) – Temperature values
- Returns:
Shifted temperature values
- Return type:
array-like
- pychemelt.utils.math.constant_baseline(dt, d, den_slope, a, *args)[source]#
Baseline function with no dependence on temperature and dependence on denaturant concentration
- Parameters:
dt (float) – delta temperature, not used here but required for compatibility with other baseline functions
d (float) – denaturant concentration
den_slope (float) – linear dependence of signal on denaturant concentration
a (float) – intercept of the baseline
- Returns:
Baseline signal
- Return type:
float
- pychemelt.utils.math.linear_baseline(dt, d, den_slope, a, b, *args)[source]#
Baseline function with linear dependence on temperature and linear dependence on denaturant concentration
- Parameters:
dt (float) – delta temperature, not used here but required for compatibility with other baseline functions
d (float) – denaturant concentration
den_slope (float) – linear dependence of signal on denaturant concentration
a (float) – intercept of the baseline
b (float) – linear dependence of signal on temperature
- Returns:
Baseline signal
- Return type:
float
- pychemelt.utils.math.quadratic_baseline(dt, d, den_slope, a, b, c)[source]#
Baseline function with quadratic dependence on temperature and linear dependence on denaturant concentration
- Parameters:
dt (float) – delta temperature, not used here but required for compatibility with other baseline functions
d (float) – denaturant concentration
den_slope (float) – linear dependence of signal on denaturant concentration
a (float) – intercept of the baseline
b (float) – linear dependence of signal on temperature
c (float) – quadratic dependence of signal on temperature
- Returns:
Baseline signal
- Return type:
float
- pychemelt.utils.math.exponential_baseline(dt, d, den_slope, a, c, alpha)[source]#
Baseline function with exponential dependence on temperature and linear dependence on denaturant concentration
- Parameters:
dt (float) – delta temperature, not used here but required for compatibility with other baseline functions
d (float) – denaturant concentration
den_slope (float) – linear dependence of signal on denaturant concentration
a (float) – intercept of the baseline
b (float) – pre-exponential factor for the dependence on temperature
c (float) – exponential coefficient for the dependence on temperature
- Returns:
Baseline signal
- Return type:
float
- pychemelt.utils.math.is_evenly_spaced(x, tol=0.0001)[source]#
Check if x is evenly spaced within a given tolerance.
- Parameters:
x (array-like) – x data
tol (float, optional) – Tolerance for considering spacing equal (default: 1e-4)
- Returns:
True if x is evenly spaced, False otherwise
- Return type:
bool
- pychemelt.utils.math.first_derivative_savgol(x, y, window_length=5, polyorder=4)[source]#
Estimate the first derivative using Savitzky-Golay filtering.
- Parameters:
x (array-like) – x data (must be evenly spaced)
y (array-like) – y data
window_length (int, optional) – Length of the filter window, in temperature units (default: 5)
polyorder (int, optional) – Order of the polynomial used to fit the samples (default: 4)
- Returns:
First derivative of y with respect to x
- Return type:
numpy.ndarray
Notes
This function will raise a ValueError if x is not evenly spaced.
- pychemelt.utils.math.relative_errors(params, cov)[source]#
Calculate the relative errors of the fitted parameters.
- Parameters:
params (numpy.ndarray) – Fitted parameters
cov (numpy.ndarray) – Covariance matrix of the fitted parameters
- Returns:
Relative errors of the fitted parameters (in percent)
- Return type:
numpy.ndarray
- pychemelt.utils.math.find_line_outliers(m, b, x, y, sigma=2.5)[source]#
Find outliers in a linear fit using the sigma rule.
- Parameters:
m (float) – Slope of the line
b (float) – Intercept of the line
x (array-like) – x data
y (array-like) – y data
sigma (float, optional) – Number of standard deviations to use for outlier detection (default: 2.5)
- Returns:
Indices of the outliers
- Return type:
numpy.ndarray
- pychemelt.utils.math.get_rss(y, y_fit)[source]#
Compute the residual sum of squares.
- Parameters:
y (array-like) – Observed values
y_fit (array-like) – Fitted values
- Returns:
Residual sum of squares
- Return type:
float
pychemelt.utils.palette module#
Viridis color palette.
A perceptually uniform color map that is readable by those with colorblindness. Contains hex color values transitioning from dark purple to yellow.
pychemelt.utils.plotting module#
- class pychemelt.utils.plotting.PlotConfig(width: int = 1000, height: int = 800, type: str = 'png', font_size: int = 16, marker_size: int = 8, line_width: int = 3)[source]#
Bases:
objectGeneral plot configuration
- width: int = 1000#
- height: int = 800#
- type: str = 'png'#
- font_size: int = 16#
- marker_size: int = 8#
- line_width: int = 3#
- __init__(width: int = 1000, height: int = 800, type: str = 'png', font_size: int = 16, marker_size: int = 8, line_width: int = 3) None#
- class pychemelt.utils.plotting.AxisConfig(showgrid_x: bool = True, showgrid_y: bool = True, n_y_axis_ticks: int = 5, linewidth: int = 1, tickwidth: int = 1, ticklen: int = 5, gridwidth: int = 1)[source]#
Bases:
objectAxis styling configuration
- showgrid_x: bool = True#
- showgrid_y: bool = True#
- n_y_axis_ticks: int = 5#
- linewidth: int = 1#
- tickwidth: int = 1#
- ticklen: int = 5#
- gridwidth: int = 1#
- __init__(showgrid_x: bool = True, showgrid_y: bool = True, n_y_axis_ticks: int = 5, linewidth: int = 1, tickwidth: int = 1, ticklen: int = 5, gridwidth: int = 1) None#
- class pychemelt.utils.plotting.LayoutConfig(show_subplot_titles: bool = False, vertical_spacing: float = 0.1)[source]#
Bases:
objectLayout and spacing configuration
- show_subplot_titles: bool = False#
- vertical_spacing: float = 0.1#
- __init__(show_subplot_titles: bool = False, vertical_spacing: float = 0.1) None#
- class pychemelt.utils.plotting.LegendConfig[source]#
Bases:
objectLegend and labeling configuration
- color_bar_length = 0.4#
- color_bar_orientation = 'v'#
- color_bar_x_pos = 1.05#
- color_bar_y_pos = 0.5#
- __init__() None#
- pychemelt.utils.plotting.config_fig(fig, plot_width=800, plot_height=600, plot_type='png', plot_title_for_download='plot')[source]#
Configure plotly figure with download options and toolbar settings.
- Parameters:
fig (go.Figure) – Plotly figure object
plot_width (int, default 800) – Width of the plot in pixels
plot_height (int, default 600) – Height of the plot in pixels
plot_type (str, default "png") – Format for downloading the plot (e.g., “png”, “jpeg”)
plot_title_for_download (str, default "plot") – Title for the downloaded plot file
- Returns:
Configured plotly figure
- Return type:
go.Figure
- pychemelt.utils.plotting.plot_unfolding(pychemelt_sample, plot_derivative=False, plot_config: PlotConfig = None, axis_config: AxisConfig = None, layout_config: LayoutConfig = None, legend_config: LegendConfig = None)[source]#
Plot the unfolding curves, including the signal and the predicted curves
- Parameters:
pychemelt_sample – pychemelt.Sample object
plot_derivative (bool) – Whether to plot the derivative of the signal
plot_config (PlotConfig, optional) – Configuration for the overall plot
axis_config (AxisConfig, optional) – Configuration for the axes
layout_config (LayoutConfig, optional) – Configuration for the layout
legend_config (LegendConfig, optional) – configuration for the legend
pychemelt.utils.processing module#
This module contains helper functions to process data Author: Osvaldo Burastero
- pychemelt.utils.processing.set_param_bounds(p0, param_names)[source]#
Generate heuristic lower and upper bounds for fitting parameters based on initial guesses.
- Parameters:
p0 (array-like) – Initial parameter guesses.
param_names (list of str) – Names of the parameters to apply specific logic (e.g., non-negative constraints).
- Returns:
(low_bounds, high_bounds) as lists of numeric values.
- Return type:
tuple
- pychemelt.utils.processing.expand_temperature_list(temp_lst, signal_lst)[source]#
Expand the temperature list to match the length of the signal list.
- Parameters:
temp_lst (list) – List of temperatures
signal_lst (list) – List of signals
- Returns:
Expanded temperature list
- Return type:
list
- pychemelt.utils.processing.clean_conditions_labels(conditions)[source]#
Clean the conditions labels by removing unwanted characters and patterns.
- Parameters:
conditions (list) – List of condition strings.
- Returns:
List of cleaned condition strings.
- Return type:
list
- pychemelt.utils.processing.subset_signal_by_temperature(signal_lst, temp_lst, min_temp, max_temp)[source]#
Subset the signal and temperature lists based on the specified temperature range.
- Parameters:
signal_lst (list) – List of signal arrays.
temp_lst (list) – List of temperature arrays.
min_temp (float) – Minimum temperature for subsetting.
max_temp (float) – Maximum temperature for subsetting.
- Returns:
Tuple containing the subsetted signal and temperature lists.
- Return type:
tuple
- pychemelt.utils.processing.guess_Tm_from_derivative(temp_lst, deriv_lst, x1, x2)[source]#
Estimate the melting temperature (Tm) by finding the extremum of the first derivative.
- Parameters:
temp_lst (list of np.ndarray) – Temperature arrays for each dataset.
deriv_lst (list of np.ndarray) – First derivative of the signal for each dataset.
x1 (float) – Lower buffer from the temperature edges to exclude noise/artifacts.
x2 (float) – Upper buffer from the temperature edges to define the baseline median window.
- Returns:
Estimated Tm values for each dataset.
- Return type:
list of float
- pychemelt.utils.processing.estimate_signal_baseline_params(signal_lst, temp_lst, native_baseline_type, unfolded_baseline_type, window_range_native=12, window_range_unfolded=12, oligomer_number=1)[source]#
Estimate the baseline parameters for the sample
- Parameters:
signal_lst (list of np.ndarray) – List of signal arrays
temp_lst (list of np.ndarray) – List of temperature arrays
window_range_native (float) – Range of the temperature window to estimate the native state baseline
window_range_unfolded (float) – Range of the temperature window to estimate the unfolded state baseline
native_baseline_type (str) – options: ‘constant’, ‘linear’, ‘quadratic’, ‘exponential’
unfolded_baseline_type (str) – options: ‘constant’, ‘linear’, ‘quadratic’, ‘exponential’
oligomer_number (int) – number of subunits in the oligomer
- Returns:
Lists of estimated parameters (p1Ns, p1Us, p2Ns, p2Us, p3Ns, p3Us).
- Return type:
tuple
- pychemelt.utils.processing.fit_local_thermal_unfolding_to_signal_lst(signal_lst, temp_lst, t_melting_init, p1_Ns, p1_Us, p2_Ns, p2_Us, p3_Ns, p3_Us, baseline_native_fx, baseline_unfolded_fx)[source]#
Perform individual (local) fits for each signal curve in a list.
- Parameters:
signal_lst (list of np.ndarray) – List of signals.
temp_lst (list of np.ndarray) – List of temperatures.
t_melting_init (list of float) – Initial Tm guesses.
p1_Ns (list of float) – Estimated baseline parameters for each curve.
p1_Us (list of float) – Estimated baseline parameters for each curve.
p2_Ns (list of float) – Estimated baseline parameters for each curve.
p2_Us (list of float) – Estimated baseline parameters for each curve.
p3_Ns (list of float) – Estimated baseline parameters for each curve.
p3_Us (list of float) – Estimated baseline parameters for each curve.
baseline_native_fx (callable) – Function to calculate the native baseline.
baseline_unfolded_fx (callable) – Function to calculate the unfolded baseline.
- Returns:
(Tms, dHs, predicted_lst) containing fitted parameters and signal arrays.
- Return type:
tuple
- pychemelt.utils.processing.re_arrange_predictions(predicted_lst, n_signals, n_denaturants)[source]#
Re-arrange the flattened predictions to match the original signal list with sublists.
- Parameters:
predicted_lst (list) – Flattened list of predicted signals of length n_signals * n_denaturants.
n_signals (int) – Number of signal types (e.g., different wavelengths).
n_denaturants (int) – Number of denaturant concentrations or conditions per signal.
- Returns:
Re-arranged list of predicted signals of length n_signals, where each element is a sublist of length n_denaturants.
- Return type:
list
- pychemelt.utils.processing.re_arrange_params(params, n_signals)[source]#
Re-arrange flattened parameters into a list of sublists grouped by signal.
- Parameters:
params (list or np.ndarray) – Flattened list of parameters.
n_signals (int) – Number of signal types to group parameters by.
- Returns:
Re-arranged list of parameters of length n_signals containing parameter arrays for each signal.
- Return type:
list of np.ndarray
- pychemelt.utils.processing.subset_data(data, max_points)[source]#
Reduces the number of data points by repeated striding until the size is below a threshold.
- Parameters:
data (np.ndarray) – Input data array to be subsetted.
max_points (int) – The maximum number of points allowed in the resulting array.
- Returns:
Subsetted data array containing every $2^n$-th point of the original.
- Return type:
np.ndarray
- pychemelt.utils.processing.get_colors_from_numeric_values(values, min_val, max_val, use_log_scale=False)[source]#
Map numeric values to colors in the VIRIDIS palette based on a specified range.
- Parameters:
values (list or np.ndarray) – Numeric values to map to colors.
min_val (float) – Minimum value of the range.
max_val (float) – Maximum value of the range.
use_log_scale (bool, optional) – Whether to use logarithmic scaling for the values, default is True.
- Returns:
List of hex color codes corresponding to the input values.
- Return type:
list
- pychemelt.utils.processing.combine_sequences(seq1, seq2)[source]#
Combine two sequences to generate all possible combinations of their elements.
- Parameters:
seq1 (list) – First sequence of elements.
seq2 (list) – Second sequence of elements.
- Returns:
A list of tuples, where each tuple contains one element from seq1 and one from seq2.
- Return type:
list
- pychemelt.utils.processing.adjust_value_to_interval(value, lower_bound, upper_bound, shift)[source]#
Verify that a value is within the specified bounds. If the value is outside the bounds, adjust it to the nearest bound. :param value: The value to be adjusted. :type value: float :param lower_bound: The lower bound of the interval. :type lower_bound: float :param upper_bound: The upper bound of the interval. :type upper_bound: float :param shift: How much to shift the value if it is outside the bounds. :type shift: float
- pychemelt.utils.processing.oligomer_number(model)[source]#
Get the number of subunits in the oligomer based on the model.
- Returns:
The number of subunits (2 for ‘Dimer’, 3 for ‘Trimer’, 4 for ‘Tetramer’, 1 otherwise).
- Return type:
int
- pychemelt.utils.processing.parse_number(s)[source]#
Parse a string as a float, handling: - European decimal (comma) - Optional thousands separators - Standard decimal point
- Parameters:
s (str) – The string to parse
- Return type:
float The parsed number
- Raises:
ValueError If the string cannot be parsed as a float –
- pychemelt.utils.processing.are_all_strings_numeric(lst)[source]#
- Parameters:
lst (list of str) – List of strings to check
- Returns:
True if all strings in the list are numeric (can contain digits, ‘.’, ‘-’, ‘,’), False otherwise
- Return type:
bool
- pychemelt.utils.processing.transform_to_list(element_or_list)[source]#
- Parameters:
element_or_list (bool, str, int, float, list, or numpy array) – The input element or list to be transformed into a list.
- Returns:
A list containing the input element if it is not already a list, or the input itself if it is None, a numpy array, or a list.
- Return type:
list or None
- Raises:
ValueError – If the input is not a boolean, string, integer, float, list, numpy array
pychemelt.utils.rates module#
This module contains helper functions to obtain equilibrium constants Author: Osvaldo Burastero
- Useful references for unfolding models:
Rumfeldt, Jessica AO, et al. “Conformational stability and folding mechanisms of dimeric proteins.” Progress in biophysics and molecular biology 98.1 (2008): 61-84.
Bedouelle, Hugues. “Principles and equations for measuring and interpreting protein stability: From monomer to tetramer.” Biochimie 121 (2016): 29-37.
Mazurenko, Stanislav, et al. “Exploration of protein unfolding by modelling calorimetry data from reheating.” Scientific reports 7.1 (2017): 16321.
All thermodynamic parameters are used in kcal mol units
Unfolding functions for monomers have an argument called ‘extra_arg’ that is not used. This is because unfolding functions for oligomers require the protein concentration in that position
- pychemelt.utils.rates.eq_constant_thermo(T, DH1, T1, Cp)[source]#
T1 is the temperature at which ΔG(T) = 0 ΔH1, the variation of enthalpy between the two considered states at T1 Cp the variation of calorific capacity between the two states
- Parameters:
T (array-like) – Temperature (Kelvin)
DH1 (float) – Variation of enthalpy between the two considered states at T1 (kcal/mol)
T1 (float) – Temperature at which the equilibrium constant equals one (Kelvin)
Cp (float) – Variation of heat capacity between the two states (kcal/mol/K)
- Returns:
Equilibrium constant at the given temperature
- Return type:
numpy.ndarray
- pychemelt.utils.rates.eq_constant_termochem(T, D, DHm, Tm, Cp0, m0, m1)[source]#
Ref: Louise Hamborg et al., 2020. Global analysis of protein stability by temperature and chemical denaturation
- Parameters:
T (array-like) – Temperature (Kelvin only!)
D (float) – Denaturant concentration (M)
DHm (float) – Enthalpy change at Tm (kcal/mol)
Tm (float) – Melting temperature where ΔG = 0 (Kelvin only!)
Cp0 (float) – Heat capacity change (kcal/mol/K)
m0 (float) – m-value at the reference temperature
m1 (float) – Temperature dependence of the m-value
- Returns:
Equilibrium constant at a certain temperature and denaturant agent concentration
- Return type:
numpy.ndarray
pychemelt.utils.signals module#
This module contains helper functions to obtain the signal, given certain parameters Author: Osvaldo Burastero
- pychemelt.utils.signals.signal_two_state_tc_unfolding(T, D, DHm, Tm, Cp0, m0, m1, p1_N, p2_N, p3_N, p4_N, p1_U, p2_U, p3_U, p4_U, baseline_N_fx, baseline_U_fx, extra_arg=None)[source]#
Ref: Louise Hamborg et al., 2020. Global analysis of protein stability by temperature and chemical denaturation
- Parameters:
T (array-like) – Temperature in Kelvin units
D (array-like) – Denaturant agent concentration
DHm (float) – Variation of enthalpy between the two considered states at Tm
Tm (float) – Temperature at which the equilibrium constant equals one, in Kelvin units
Cp0 (float) – Variation of calorific capacity between the two states
m0 (float) – m-value at the reference temperature (Tref)
m1 (float) – Variation of m-value with temperature
p1_N (float) – parameters describing the native-state baseline
p2_N (float) – parameters describing the native-state baseline
p3_N (float) – parameters describing the native-state baseline
p4_N (float) – parameters describing the native-state baseline
p1_U (float) – parameters describing the unfolded-state baseline
p2_U (float) – parameters describing the unfolded-state baseline
p3_U (float) – parameters describing the unfolded-state baseline
p4_U (float) – parameters describing the unfolded-state baseline
baseline_N_fx (function) – for the native-state baseline
baseline_U_fx (function) – for the unfolded-state baseline
extra_arg (None, optional) – Not used but present for API compatibility with oligomeric models
- Returns:
Signal at the given temperatures and denaturant agent concentration, given the parameters
- Return type:
numpy.ndarray
- pychemelt.utils.signals.signal_two_state_t_unfolding(T, Tm, dHm, p1_N, p2_N, p3_N, p1_U, p2_U, p3_U, baseline_N_fx, baseline_U_fx, Cp=0, extra_arg=None)[source]#
Two-state temperature unfolding (monomer).
- Parameters:
T (array-like) – Temperature
Tm (float) – Temperature at which the equilibrium constant equals one
dHm (float) – Variation of enthalpy between the two considered states at Tm
p1_N (float) – baseline parameters for the native-state baseline
p2_N (float) – baseline parameters for the native-state baseline
p3_N (float) – baseline parameters for the native-state baseline
p1_U (float) – baseline parameters for the unfolded-state baseline
p2_U (float) – baseline parameters for the unfolded-state baseline
p3_U (float) – baseline parameters for the unfolded-state baseline
baseline_N_fx (callable) – function to calculate the baseline for the native state
baseline_U_fx (callable) – function to calculate the baseline for the unfolded state
Cp (float, optional) – Variation of heat capacity between the two states (default: 0)
extra_arg (None, optional) – Not used but present for compatibility
- Returns:
Signal at the given temperatures, given the parameters
- Return type:
numpy.ndarray
- pychemelt.utils.signals.two_state_thermal_unfold_curve(T, C, Tm, dHm, p1_N, p2_N, p3_N, p4_N, p1_U, p2_U, p3_U, p4_U, baseline_N_fx, baseline_U_fx, Cp=0)[source]#
Two-state temperature unfolding (monomer). N ⇔ U
- Parameters:
T (array-like) – Temperature
C (array-like) – Oligomer sample concentration
Tm (float) – Temperature at which the equilibrium constant equals one
dHm (float) – Variation of enthalpy between the two considered states at Tm
p1_N (float) – baseline parameters for the native-state baseline
p2_N (float) – baseline parameters for the native-state baseline
p3_N (float) – baseline parameters for the native-state baseline
p1_U (float) – baseline parameters for the unfolded-state baseline
p2_U (float) – baseline parameters for the unfolded-state baseline
p3_U (float) – baseline parameters for the unfolded-state baseline
baseline_N_fx (callable) – function to calculate the baseline for the native state
baseline_U_fx (callable) – function to calculate the baseline for the unfolded state
Cp (float, optional) – Variation of heat capacity between the two states (default: 0)
- Returns:
Signal at the given temperatures, given the parameters
- Return type:
numpy.ndarray
- pychemelt.utils.signals.two_state_thermal_unfold_curve_dimer(T, C, Tm, dHm, p1_N, p2_N, p3_N, p4_N, p1_U, p2_U, p3_U, p4_U, baseline_N_fx, baseline_U_fx, Cp=0)[source]#
Two-state temperature unfolding (dimer). N2 ⇔ 2U C is the total concentration (M) of the protein in dimer equivalent.
- Parameters:
T (array-like) – Temperature
C (array-like) – Oligomer sample concentration
Tm (float) – Temperature at which the equilibrium constant equals one
dHm (float) – Variation of enthalpy between the two considered states at Tm
p1_N (float) – baseline parameters for the native-state baseline
p2_N (float) – baseline parameters for the native-state baseline
p3_N (float) – baseline parameters for the native-state baseline
p1_U (float) – baseline parameters for the unfolded-state baseline
p2_U (float) – baseline parameters for the unfolded-state baseline
p3_U (float) – baseline parameters for the unfolded-state baseline
baseline_N_fx (callable) – function to calculate the baseline for the native state
baseline_U_fx (callable) – function to calculate the baseline for the unfolded state
Cp (float, optional) – Variation of heat capacity between the two states (default: 0)
- Returns:
Signal at the given temperatures, given the parameters
- Return type:
numpy.ndarray
Notes
C is the total concentration (M) of the protein in dimer equivalent.
- pychemelt.utils.signals.two_state_thermal_unfold_curve_trimer(T, C, Tm, dHm, p1_N, p2_N, p3_N, p4_N, p1_U, p2_U, p3_U, p4_U, baseline_N_fx, baseline_U_fx, Cp=0)[source]#
Two-state temperature unfolding (trimer). N3 ⇔ 3U
- Parameters:
T (array-like) – Temperature
C (array-like) – Oligomer sample concentration
Tm (float) – Temperature at which the equilibrium constant equals one
dHm (float) – Variation of enthalpy between the two considered states at Tm
p1_N (float) – baseline parameters for the native-state baseline
p2_N (float) – baseline parameters for the native-state baseline
p3_N (float) – baseline parameters for the native-state baseline
p1_U (float) – baseline parameters for the unfolded-state baseline
p2_U (float) – baseline parameters for the unfolded-state baseline
p3_U (float) – baseline parameters for the unfolded-state baseline
baseline_N_fx (callable) – function to calculate the baseline for the native state
baseline_U_fx (callable) – function to calculate the baseline for the unfolded state
Cp (float, optional) – Variation of heat capacity between the two states (default: 0)
- Returns:
Signal at the given temperatures, given the parameters
- Return type:
numpy.ndarray
Notes
C is the total concentration (M) of the protein in dimer equivalent.
- pychemelt.utils.signals.two_state_thermal_unfold_curve_tetramer(T, C, Tm, dHm, p1_N, p2_N, p3_N, p4_N, p1_U, p2_U, p3_U, p4_U, baseline_N_fx, baseline_U_fx, Cp=0, extra_arg=None)[source]#
Two-state temperature unfolding (tetramer). N4 ⇔ 4U
- Parameters:
T (array-like) – Temperature
C (array-like) – Oligomer sample concentration
Tm (float) – Temperature at which the equilibrium constant equals one
dHm (float) – Variation of enthalpy between the two considered states at Tm
p1_N (float) – baseline parameters for the native-state baseline
p2_N (float) – baseline parameters for the native-state baseline
p3_N (float) – baseline parameters for the native-state baseline
p1_U (float) – baseline parameters for the unfolded-state baseline
p2_U (float) – baseline parameters for the unfolded-state baseline
p3_U (float) – baseline parameters for the unfolded-state baseline
baseline_N_fx (callable) – function to calculate the baseline for the native state
baseline_U_fx (callable) – function to calculate the baseline for the unfolded state
Cp (float, optional) – Variation of heat capacity between the two states (default: 0)
- Returns:
Signal at the given temperatures, given the parameters
- Return type:
numpy.ndarray
Notes
C is the total concentration (M) of the protein in dimer equivalent.
pychemelt.utils.svd module#
Module containing functions to perform Singular Value Decomposition (SVD) and Principal Component Analysis (PCA) on spectral data, along with utilities for manipulating basis spectra and coefficients.
Author: Osvaldo Burastero
- pychemelt.utils.svd.apply_svd(X)[source]#
Perform Singular Value Decomposition (SVD) on the input data matrix X.
- Parameters:
X (numpy array of shape (n_wavelengths, n_measurements)) – The input data matrix to decompose.
- Returns:
explained_variance (numpy array) – The cumulative explained variance for each component.
basis_spectra (numpy array) – The left singular vectors (U matrix) representing the basis spectra.
coefficients (numpy array) – The coefficients associated with each basis spectrum.
- pychemelt.utils.svd.filter_basis_spectra(explained_variance, basis_spectra_all, coefficients_all, explained_variance_threshold=99)[source]#
Filter the basis spectra and coefficients based on the explained variance threshold :param explained_variance: The cumulative explained variance for each component. :type explained_variance: numpy array :param basis_spectra_all: The left singular vectors (U matrix) representing the basis spectra. :type basis_spectra_all: numpy array :param coefficients_all: The coefficients associated with each basis spectrum. :type coefficients_all: numpy array :param explained_variance_threshold: The threshold for explained variance to filter components. Default is 99. :type explained_variance_threshold: float, optional
- Returns:
basis_spectra (numpy array) – The filtered basis spectra.
coefficients (numpy array) – The filtered coefficients.
k (int) – The number of components that meet the explained variance threshold.
- pychemelt.utils.svd.align_basis_spectra_and_coefficients(X, basis_spectra, coefficients)[source]#
Align the basis spectra peaks to the original data :param X: The input data matrix. :type X: numpy array of shape (n_samples, n_features) :param basis_spectra: The basis spectra obtained from SVD. :type basis_spectra: numpy array :param coefficients: The coefficients associated with each basis spectrum. :type coefficients: numpy array
- Returns:
basis_spectra (numpy array) – The aligned basis spectra.
coefficients (numpy array) – The adjusted coefficients.
- pychemelt.utils.svd.angle_from_cathets(adjacent_leg, opposite_leg)[source]#
Calculate the angle between the hypotenuse and the adjacent leg of a right triangle. :param adjacent_leg: Length of the adjacent leg. :type adjacent_leg: float :param opposite_leg: Length of the opposite leg. :type opposite_leg: float
- Returns:
angle_in_radians – Angle in radians between the hypotenuse and the adjacent leg.
- Return type:
float
- pychemelt.utils.svd.get_2d_counterclockwise_rot_matrix(angle_in_radians)[source]#
Obtain the rotation matrix for a 2d coordinates system using a counterclockwise direction :param angle_in_radians: Angle in radians for the rotation. :type angle_in_radians: float
- Returns:
rotM – 2x2 rotation matrix.
- Return type:
numpy array
- pychemelt.utils.svd.get_3d_counterclockwise_rot_matrix_around_z_axis(angle_in_radians)[source]#
Obtain the rotation matrix for a 3d coordinates system around the z axis using a counterclockwise direction :param angle_in_radians: Angle in radians for the rotation. :type angle_in_radians: float
- Returns:
rotM – 3x3 rotation matrix.
- Return type:
numpy array
- pychemelt.utils.svd.get_3d_clockwise_rot_matrix_around_y_axis(angle_in_radians)[source]#
Obtain the rotation matrix for a 3d coordinates system around the y axis using a clockwise direction :param angle_in_radians: Angle in radians for the rotation. :type angle_in_radians: float
- Returns:
rotM – 3x3 rotation matrix.
- Return type:
numpy array
- pychemelt.utils.svd.rotate_two_basis_spectra(X, basis_spectra, pca_based=False)[source]#
Create a new basis spectra using a linear combination of the first and second basis spectra
- Parameters:
X (numpy array) – The raw data matrix of size n*m, where ‘n’ is the number of measured wavelengths and ‘m’ is the number of acquired spectra.
basis_spectra (numpy array) – The matrix containing the set of basis spectra.
pca_based (bool, optional) – Boolean to decide if we need to center the matrix X. Default is False.
- Returns:
basis_spectra_new (numpy array) – The new set of basis spectra.
coefficients (numpy array) – The new set of associated coefficients.
- pychemelt.utils.svd.rotate_three_basis_spectra(X, basis_spectra, pca_based=False)[source]#
Create a new basis spectra using a linear combination from the first, second and third basis spectra
- Parameters:
X (numpy array) – The raw data matrix of size n*m, where ‘n’ is the number of measured wavelengths and ‘m’ is the number of acquired spectra.
basis_spectra (numpy array) – The matrix containing the set of basis spectra.
pca_based (bool, optional) – Boolean to decide if we need to center the matrix X. Default is False.
- Returns:
basis_spectra_new (numpy array) – The new set of basis spectra.
coefficients_subset (numpy array) – The new set of associated coefficients.
- pychemelt.utils.svd.reconstruct_spectra(basis_spectra, coefficients, X=None, pca_based=False)[source]#
Reconstruct the original spectra based on the set of basis spectra and the associated coefficients
- Parameters:
basis_spectra (numpy array) – The matrix containing the set of basis spectra.
coefficients (numpy array) – The associated coefficients of each basis spectrum.
X (numpy array, optional) – Only used if pca_based equals TRUE! X is the raw data matrix of size n*m, where ‘n’ is the number of measured wavelengths and ‘m’ is the number of acquired spectra.
pca_based (bool, optional) – Boolean to decide if we need to extract the mean from the the X raw data matrix. Default is False.
Returns
-------
fitted (numpy array) – The reconstructed matrix which should be close the original raw data.
- pychemelt.utils.svd.explained_variance_from_orthogonal_vectors(vectors, coefficients, total_variance)[source]#
Useful to get the percentage of variance, not in the coordinate space provided by PCA/SVD, but against a different set of (rotated) vectors.
- Parameters:
vectors (numpy array) – The set of orthogonal vectors.
coefficients (numpy array) – The associated coefficients of each orthogonal vector.
total_variance (float) – The total variance of the original data (mean subtracted if we performed PCA…).
- Returns:
explained_variance – The amount of explained variance by each orthogonal vector.
- Return type:
list
- pychemelt.utils.svd.apply_pca(X)[source]#
Perform Principal Component Analysis (PCA) on the input data matrix X. :param X: The input data matrix to decompose. :type X: numpy array of shape (n_wavelengths, n_measurements)
- Returns:
cum_sum_eigenvalues (numpy array) – The cumulative explained variance for each principal component.
principal_components (numpy array) – The principal components (eigenvectors) representing the basis spectra.
coefficients (numpy array) – The coefficients associated with each principal component.
- pychemelt.utils.svd.recalc_explained_variance(basis_spectra, coefficients, X, pca_based=False)[source]#
Recalculate the explained variance of a set of basis spectra and associated coefficients :param basis_spectra: The basis spectra. :type basis_spectra: numpy array :param coefficients: The associated coefficients of each basis spectrum. :type coefficients: numpy array :param X: The raw data matrix of size n*m, where ‘n’ is the number of measured wavelengths
and ‘m’ is the number of acquired spectra.
- Parameters:
pca_based (bool, optional) – Boolean to decide if we need to center the matrix X. Default is False.
- Returns:
explained_variance – The cumulative explained variance for each component.
- Return type:
numpy array