pychemelt.utils package#

Submodules#

pychemelt.utils.constants module#

pychemelt.utils.files module#

This module contains helper functions to parse Differential Scanning Fluorimetry files from different instrument providers Author: Osvaldo Burastero

All functions that import files should return: - signal_data_dic: dictionary with the signal data, one entry per signal - temp_data_dic: dictionary with the temperature data, one entry per signal - conditions: list with the names of the samples - signals: list with the names of the signals

A signal can be “350nm”, “330nm”, “Scattering”, “Ratio”, “Turbidity”, “Ratio 350nm/330nm”, etc. The length of the lists in signal_data_dic and temp_data_dic should be the same as the length of conditions

pychemelt.utils.files.load_csv_file(file)[source]#

Load a CSV file containing temperature and signal columns and return structured data.

Parameters:

file (str) – Path to the csv file

Returns:

  • signal_data_dic (dict) – Dictionary mapping signal names to lists of 1D numpy arrays (one array per condition)

  • temp_data_dic (dict) – Dictionary mapping signal names to lists of temperature arrays corresponding to the signals

  • conditions (list) – List of condition names

  • signals (numpy.ndarray) – Array of signal name strings

pychemelt.utils.files.load_aunty_xlsx(file_path)[source]#

Load AUNTY-format multi-sheet Excel file where each sheet is a condition.

Parameters:

file_path (str) – Path to the AUNTY xlsx file

pychemelt.utils.files.load_quantstudio_txt(QSfile)[source]#

Load QuantStudio TXT files (.txt) exported from QuantStudio instruments.

Parameters:

QSfile (str) – Path to the QuantStudio txt file

Returns:

  • signal_data_dic (dict) – Dictionary with signal data (key: ‘Fluorescence’)

  • temp_data_dic (dict) – Dictionary with temperature arrays per condition

  • conditions (list) – List of condition names (well identifiers)

  • signals (numpy.ndarray) – Array with signal name(s)

pychemelt.utils.files.load_thermofluor_xlsx(thermofluor_file)[source]#

Load DSF Thermofluor xls file and extract data.

Parameters:

thermofluor_file (str) – Path to the xls file

Returns:

  • signal_data_dic (dict) – Dictionary with signal data

  • temp_data_dic (dict) – Dictionary with temperature data

  • conditions (list) – List of conditions

pychemelt.utils.files.load_nanoDSF_xlsx(processed_dsf_file)[source]#

Load nanotemper processed xlsx file and extract relevant data.

Parameters:

processed_dsf_file (str) – Path to the processed xlsx file

Returns:

  • signal_data_dic (dict) – Dictionary with signal data

  • temp_data_dic (dict) – Dictionary with temperature data

  • conditions (list) – List of conditions

  • signals (numpy.ndarray) – Array of signal names

pychemelt.utils.files.load_panta_xlsx(pantaFile)[source]#

Load the xlsx file generated by a Prometheus Panta instrument.

Parameters:

pantaFile (str) – Path to the xlsx file

Returns:

  • signal_data_dic (dict) – Dictionary with signal data

  • temp_data_dic (dict) – Dictionary with temperature data

  • conditions (list) – List of conditions

  • signals (numpy.ndarray) – List of signal names, such as 330nm and 350nm

pychemelt.utils.files.load_uncle_multi_channel(uncle_file)[source]#

Function to load the data from the UNCLE instrument.

Parameters:

uncle_file (str) – Path to the xlsx file

Returns:

  • signal_data_dic (dict) – Dictionary with signal data (keys: wavelength strings like ‘350 nm’)

  • temp_data_dic (dict) – Dictionary with temperature arrays per condition

  • conditions (list) – List of sample names

  • signals (list) – List of wavelength strings

pychemelt.utils.files.load_mx3005p_txt(filename)[source]#

Load Agilent MX3005P qPCR txt file and extract data

Parameters:

filename (str) – Path to the MX3005P txt file. The second column has the fluorescence data, and the third column the temperature. Wells are separated by rows containing a sentence like this one: ‘Segment 2 Plateau 1 Well 1’

Returns:

  • signal_data_dic (dict) – Dictionary with signal data

  • temp_data_dic (dict) – Dictionary with temperature data

  • conditions (list) – List of conditions (well numbers)

  • signals (numpy.ndarray) – List of signal names

pychemelt.utils.files.detect_file_type(file)[source]#

Detect the type of file based on its extension and content.

Parameters:

file (str) – Path to the file

Returns:

Type of file (e.g., ‘supr’, ‘csv’, ‘prometheus’, ‘panta’, ‘uncle’, ‘mx3005p’, ‘quantstudio’, etc.) or None if unknown

Return type:

str or None

pychemelt.utils.files.detect_encoding(file_path)[source]#

Detect the encoding of a file by trying common encodings.

Parameters:

file_path (str) – Path to the file

Returns:

Detected encoding or the string ‘Unknown encoding’

Return type:

str

pychemelt.utils.files.read_jasco_thermal_ramp(file)[source]#

Given a JASCO file with a thermal ramp, this function reads the data

The data is given in chuncks:

Channel 1

4.94 14.93 25.08 35.02 45.03 55.07 64.99 75.04 85.08 95.03

250 -0.310564 -0.112003 0.0199744 -0.217282 -0.238716 -0.173046 0.00129784 -0.394731 -0.687165 -1.40543

Parameters:

file (str) – Path to the JASCO thermal ramp file

Returns:

  • signal_data_dic (dict) – Dictionary with signal data (key: wavelength string like ‘250 nm’)

  • temp_data_dic (dict) – Dictionary with temperature arrays per condition (only one condition in this case)

  • conditions (list) – File name as condition

  • wavelength_data (list) – List of wavelength strings (e.g., ‘250 nm’, ‘255 nm’, etc.)

pychemelt.utils.fitting module#

This module contains helper functions to fit unfolding data Author: Osvaldo Burastero

pychemelt.utils.fitting.fit_line_robust(x, y)[source]#

Fit a line to the data using robust fitting

Parameters:
  • x (array-like) – x data

  • y (array-like) – y data

Returns:

  • m (float) – Slope of the fitted line

  • b (float) – Intercept of the fitted line

pychemelt.utils.fitting.fit_quadratic_robust(x, y)[source]#

Fit a quadratic equation to the data using robust fitting

Parameters:
  • x (array-like) – x data

  • y (array-like) – y data

Returns:

  • a (float) – Quadratic coefficient of the fitted polynomial

  • b (float) – Linear coefficient of the fitted polynomial

  • c (float) – Constant coefficient of the fitted polynomial

pychemelt.utils.fitting.fit_exponential_robust(x, y)[source]#

Fit an exponential function to the data using robust fitting.

Notes

Temperatures should be shifted to the reference (Tref) before calling this function.

Parameters:
  • x (array-like) – x data

  • y (array-like) – y data

Returns:

  • a (float) – Baseline

  • c (float) – Pre-exponential factor

  • alpha (float) – Exponential factor

pychemelt.utils.fitting.fit_thermal_unfolding(list_of_temperatures, list_of_signals, initial_parameters, low_bounds, high_bounds, signal_fx, baseline_native_fx, baseline_unfolded_fx, Cp, list_of_oligomer_conc=None)[source]#

Fit the thermal unfolding profile of many curves at the same time.

This performs global fitting of shared thermodynamic parameters with per-curve baselines.

Parameters:
  • list_of_temperatures (list of array-like) – List of temperature arrays for each dataset

  • list_of_signals (list of array-like) – List of signal arrays for each dataset

  • initial_parameters (array-like) – Initial guess for the parameters

  • low_bounds (array-like) – Lower bounds for the parameters

  • high_bounds (array-like) – Upper bounds for the parameters

  • signal_fx (callable) – Function to calculate the signal based on the parameters

  • baseline_native_fx (callable) – function to calculate the native state baseline

  • baseline_unfolded_fx (callable) – function to calculate the unfolded state baseline

  • Cp (float) – Heat capacity change (passed to signal_fx)

  • list_of_oligomer_conc (list, optional) – List of oligomer concentrations for each dataset (if applicable)

Returns:

  • global_fit_params (numpy.ndarray) – Fitted global parameters

  • cov (numpy.ndarray) – Covariance matrix of the fitted parameters

  • predicted_lst (list of numpy.ndarray) – Predicted signals for each dataset based on the fitted parameters

pychemelt.utils.fitting.fit_tc_unfolding_single_slopes(list_of_temperatures, list_of_signals, denaturant_concentrations, initial_parameters, low_bounds, high_bounds, signal_fx, baseline_native_fx, baseline_unfolded_fx, list_of_oligomer_conc=None, fit_m1=False, cp_value=None, tm_value=None, dh_value=None, method='least_sq')[source]#

Vectorized and optimized version of global thermal unfolding fitting.

Parameters:
list_of_temperatureslist of array-like

Temperature arrays for each dataset

list_of_signalslist of array-like

Signal arrays for each dataset

denaturant_concentrationslist

Denaturant concentrations (one per dataset)

initial_parametersarray-like

Initial guess for parameters

low_boundsarray-like

Lower bounds for parameters

high_boundsarray-like

Upper bounds for parameters

signal_fxcallable

Signal model function

baseline_native_fxcallable

function to calculate the native state baseline

baseline_unfolded_fxcallable

function to calculate the unfolded state baseline

list_of_oligomer_conclist, optional

Oligomer concentrations per dataset

fit_m1bool, optional

Whether to fit temperature dependence of m-value

cp_value, tm_value, dh_valuefloat or None, optional

Optional fixed thermodynamic parameters

methodstr, optional

Optimization method (‘least_sq’ or ‘curve_fit’)

:returns: * **global_fit_params (numpy.ndarray) – Fitted global parameters**
  • cov (numpy.ndarray) – Covariance matrix

  • predicted_lst (list of numpy.ndarray) – Predicted signals per dataset

pychemelt.utils.fitting.fit_tc_unfolding_shared_slopes_many_signals(list_of_temperatures, list_of_signals, signal_ids, denaturant_concentrations, initial_parameters, low_bounds, high_bounds, signal_fx, baseline_native_fx, baseline_unfolded_fx, list_of_oligomer_conc=None, fit_m1=False, cp_value=None, tm_value=None, dh_value=None)[source]#

Vectorized fitting of thermochemical unfolding curves for multiple signal types sharing thermodynamic parameters and slopes, using least_squares.

Parameters:
  • list_of_temperatures (list of array-like) – Temperature arrays for each dataset

  • list_of_signals (list of array-like) – Signal arrays for each dataset

  • signal_ids (list of int) – Signal-type id for each dataset (0..n_signals-1)

  • denaturant_concentrations (list) – Denaturant concentrations for each dataset (flattened across signals)

  • initial_parameters (array-like) – Initial guess for the parameters

  • low_bounds (array-like) – Lower bounds for the parameters

  • high_bounds (array-like) – Upper bounds for the parameters

  • signal_fx (callable) –

    Signal model function baseline_native_fx : callable

    function to calculate the baseline for the native state

  • baseline_unfolded_fx (callable) – function to calculate the baseline for the unfolded state

  • list_of_oligomer_conc (list, optional) – Oligomer concentrations per dataset

  • fit_m1 (bool, optional) – Whether to fit temperature dependence of m-value

  • cp_value (float or None, optional) – Optional fixed thermodynamic parameters

  • tm_value (float or None, optional) – Optional fixed thermodynamic parameters

  • dh_value (float or None, optional) – Optional fixed thermodynamic parameters

Returns:

  • global_fit_params (numpy.ndarray) – Fitted global parameters

  • cov (numpy.ndarray) – Covariance matrix

  • predicted_lst (list of numpy.ndarray) – Predicted signals per dataset

pychemelt.utils.fitting.fit_tc_unfolding_many_signals(list_of_temperatures, list_of_signals, signal_ids, denaturant_concentrations, initial_parameters, low_bounds, high_bounds, signal_fx, baseline_native_fx, baseline_unfolded_fx, oligomer_concentrations=None, fit_m1=False, model_scale_factor=False, scale_factor_exclude_ids=[], cp_value=None, fit_native_den_slope=True, fit_unfolded_den_slope=True)[source]#

Fit thermochemical unfolding curves for many signals (optimized variant).

Parameters:
  • list_of_temperatures (list of array-like) – Temperature arrays for each dataset.

  • list_of_signals (list of array-like) – Signal arrays for each dataset.

  • signal_ids (list of int) – Signal-type id for each dataset (0..n_signals-1)

  • denaturant_concentrations (list) – Denaturant concentrations for each dataset (flattened across signals)

  • initial_parameters (array-like) – Initial guess for the parameters

  • low_bounds (array-like) – Lower bounds for the parameters

  • high_bounds (array-like) – Upper bounds for the parameters

  • signal_fx (callable) – Signal model function

  • baseline_native_fx (callable) – function to calculate the native state baseline

  • baseline_unfolded_fx (callable) – function to calculate the unfolded state baseline

  • oligomer_concentrations (list, optional) – Oligomer concentrations per dataset (used by oligomeric models)

  • fit_m1 (bool, optional) – Whether to include and fit temperature dependence of the m-value (m1)

  • model_scale_factor (bool, optional) – If True, include a per-denaturant concentration scale factor to account for intensity differences

  • scale_factor_exclude_ids (list, optional) – IDs of scale factors to exclude / fix to 1

  • cp_value (float or None, optional) – If provided, Cp is fixed to this value and not fitted

  • fit_native_den_slope (bool, optional) – Whether to fit denaturant dependence of baselines.

  • fit_unfolded_den_slope (bool, optional) – Whether to fit denaturant dependence of baselines.

Returns:

  • global_fit_params (numpy.ndarray) – Fitted global parameters

  • cov (numpy.ndarray) – Covariance matrix

  • predicted_lst (list of numpy.ndarray) – Predicted signals per dataset

pychemelt.utils.fitting.fit_oligomer_unfolding_single_slopes(list_of_temperatures, list_of_signals, oligomer_concentrations, initial_parameters, low_bounds, high_bounds, signal_fx, baseline_native_fx, baseline_unfolded_fx, cp_value=None, tm_value=None, dh_value=None)[source]#

Vectorized and optimized version of global thermal unfolding fitting. of oligomers

Parameters:
  • list_of_temperatures (list of array-like) – Temperature arrays for each dataset

  • list_of_signals (list of array-like) – Signal arrays for each dataset

  • oligomer_concentrations (list) – sample concentrations of the oligomeric complex (one per dataset)

  • initial_parameters (array-like) – Initial guess for parameters

  • low_bounds (array-like) – Lower bounds for parameters

  • high_bounds (array-like) – Upper bounds for parameters

  • signal_fx (callable) – Signal model function

  • baseline_native_fx (callable) – function to calculate the native state baseline

  • baseline_unfolded_fx (callable) – function to calculate the unfolded state baseline

  • cp_value (float or None, optional) – Optional fixed thermodynamic parameters

  • tm_value (float or None, optional) – Optional fixed thermodynamic parameters

  • dh_value (float or None, optional) – Optional fixed thermodynamic parameters

Returns:

  • global_fit_params (numpy.ndarray) – Fitted global parameters

  • cov (numpy.ndarray) – Covariance matrix

  • predicted_lst (list of numpy.ndarray) – Predicted signals per dataset

pychemelt.utils.fitting.fit_oligomer_unfolding_shared_slopes_many_signals(list_of_temperatures, list_of_signals, signal_ids, oligomer_concentrations, initial_parameters, low_bounds, high_bounds, signal_fx, baseline_native_fx, baseline_unfolded_fx, cp_value=None, tm_value=None, dh_value=None)[source]#

Vectorized fitting of oligomer thermal unfolding curves for multiple signal types sharing thermodynamic parameters and slopes, using least_squares.

Parameters:
  • list_of_temperatures (list of array-like) – Temperature arrays for each dataset.

  • list_of_signals (list of array-like) – Signal arrays for each dataset.

  • signal_ids (list of int) – Signal-type id for each dataset (0..n_signals-1)

  • oligomer_concentrations (list) – sample concentrations of the oligomeric complex for each dataset (flattened across signals)

  • initial_parameters (array-like) – Initial guess for the parameters

  • low_bounds (array-like) – Lower bounds for the parameters

  • high_bounds (array-like) – Upper bounds for the parameters

  • signal_fx (callable) –

    Signal model function baseline_native_fx : callable

    function to calculate the baseline for the native state

  • baseline_unfolded_fx (callable) – function to calculate the baseline for the unfolded state

  • cp_value (float or None, optional) – Optional fixed thermodynamic parameters

  • tm_value (float or None, optional) – Optional fixed thermodynamic parameters

  • dh_value (float or None, optional) – Optional fixed thermodynamic parameters

Returns:

  • global_fit_params (numpy.ndarray) – Fitted global parameters

  • cov (numpy.ndarray) – Covariance matrix

  • predicted_lst (list of numpy.ndarray) – Predicted signals per dataset

pychemelt.utils.fitting.fit_oligomer_unfolding_many_signals(list_of_temperatures, list_of_signals, signal_ids, oligomer_concentrations, initial_parameters, low_bounds, high_bounds, signal_fx, baseline_native_fx, baseline_unfolded_fx, model_scale_factor=False, scale_factor_exclude_ids=[], cp_value=None, fit_native_olig_slope=True, fit_unfolded_olig_slope=True)[source]#

Fit thermal unfolding curves of oligomers for many signals (optimized variant).

Parameters:
list_of_temperatureslist of array-like

Temperature arrays for each dataset

list_of_signalslist of array-like

Signal arrays for each dataset

signal_idslist of int

Signal-type id for each dataset (0..n_signals-1)

oligomer_concentrationslist

sample concentrations of the oligomeric complex for each dataset (flattened across signals)

initial_parametersarray-like

Initial guess for the parameters

low_boundsarray-like

Lower bounds for the parameters

high_boundsarray-like

Upper bounds for the parameters

signal_fxcallable

Signal model function

baseline_native_fxcallable

function to calculate the native state baseline

baseline_unfolded_fxcallable

function to calculate the unfolded state baseline

model_scale_factorbool, optional

If True, include a per-oligomeric concentration scale factor to account for intensity differences

scale_factor_exclude_idslist, optional

IDs of scale factors to exclude / fix to 1

cp_valuefloat or None, optional

If provided, Cp is fixed to this value and not fitted

fit_native_olig_slope, fit_unfolded_olig_slopebool, optional

Whetever to fit the dependence of the slopes of the baselines

Returns:
global_fit_paramsnumpy.ndarray

Fitted global parameters

covnumpy.ndarray

Covariance matrix

predicted_lstlist of numpy.ndarray

Predicted signals per dataset

pychemelt.utils.fractions module#

This module contains helper functions to obtain the amount of folded/intermediate/unfolded (etc.) protein Author: Osvaldo Burastero

pychemelt.utils.fractions.fn_two_state_monomer(K)[source]#

Given the equilibrium constant K of N <-> U, return the fraction of folded protein.

Parameters:

K (float) – Equilibrium constant of the reaction N <-> U

Returns:

Fraction of folded protein

Return type:

float

pychemelt.utils.fractions.fu_two_state_dimer(K, C)[source]#

Given the equilibrium constant K, of N2 <-> 2U, and the concentration of dimer equivalent C, return the fraction of unfolded protein

Parameters:
  • K (float) – Equilibrium constant of the reaction N2 <-> 2U

  • C (float) – Total concentration of the protein in dimer equivalents

Returns:

Fraction of unfolded protein

Return type:

float

pychemelt.utils.fractions.fu_two_state_trimer(K, C)[source]#

Given the equilibrium constant K, of N3 <-> 3U, and the concentration of trimer equivalent C, return the fraction of unfolded protein

Parameters:
  • K (float) – Equilibrium constant of the reaction N3 <-> 3U

  • C (float) – Total concentration of the protein in trimer equivalents

Returns:

Fraction of unfolded protein

Return type:

float

pychemelt.utils.fractions.fu_two_state_tetramer(K, C)[source]#

Given the equilibrium constant K, of N4 <-> 4U, and the concentration of tetramer equivalent C, return the fraction of folded protein

Parameters:
  • K (float) – Equilibrium constant of the reaction N4 <-> 4U

  • C (float) – Total concentration of the protein in tetramer equivalents

Returns:

Fraction of unfolded protein

Return type:

float

pychemelt.utils.math module#

This module contains helper functions for mathematical operations Author: Osvaldo Burastero

pychemelt.utils.math.temperature_to_kelvin(T)[source]#

Convert temperature from Celsius to Kelvin if necessary.

Parameters:

T (array-like) – Temperature values

Returns:

Temperature values in Kelvin

Return type:

array-like

pychemelt.utils.math.temperature_to_celsius(T)[source]#

Convert temperature from Kelvin to Celsius if necessary.

Parameters:

T (array-like) – Temperature values

Returns:

Temperature values in Celsius

Return type:

array-like

pychemelt.utils.math.shift_temperature(T)[source]#

Shift temperature to be relative to Tref_cst in Kelvin.

Parameters:

T (array-like) – Temperature values

Returns:

Shifted temperature values

Return type:

array-like

pychemelt.utils.math.constant_baseline(dt, d, den_slope, a, *args)[source]#

Baseline function with no dependence on temperature and dependence on denaturant concentration

Parameters:
  • dt (float) – delta temperature, not used here but required for compatibility with other baseline functions

  • d (float) – denaturant concentration

  • den_slope (float) – linear dependence of signal on denaturant concentration

  • a (float) – intercept of the baseline

Returns:

Baseline signal

Return type:

float

pychemelt.utils.math.linear_baseline(dt, d, den_slope, a, b, *args)[source]#

Baseline function with linear dependence on temperature and linear dependence on denaturant concentration

Parameters:
  • dt (float) – delta temperature, not used here but required for compatibility with other baseline functions

  • d (float) – denaturant concentration

  • den_slope (float) – linear dependence of signal on denaturant concentration

  • a (float) – intercept of the baseline

  • b (float) – linear dependence of signal on temperature

Returns:

Baseline signal

Return type:

float

pychemelt.utils.math.quadratic_baseline(dt, d, den_slope, a, b, c)[source]#

Baseline function with quadratic dependence on temperature and linear dependence on denaturant concentration

Parameters:
  • dt (float) – delta temperature, not used here but required for compatibility with other baseline functions

  • d (float) – denaturant concentration

  • den_slope (float) – linear dependence of signal on denaturant concentration

  • a (float) – intercept of the baseline

  • b (float) – linear dependence of signal on temperature

  • c (float) – quadratic dependence of signal on temperature

Returns:

Baseline signal

Return type:

float

pychemelt.utils.math.exponential_baseline(dt, d, den_slope, a, c, alpha)[source]#

Baseline function with exponential dependence on temperature and linear dependence on denaturant concentration

Parameters:
  • dt (float) – delta temperature, not used here but required for compatibility with other baseline functions

  • d (float) – denaturant concentration

  • den_slope (float) – linear dependence of signal on denaturant concentration

  • a (float) – intercept of the baseline

  • b (float) – pre-exponential factor for the dependence on temperature

  • c (float) – exponential coefficient for the dependence on temperature

Returns:

Baseline signal

Return type:

float

pychemelt.utils.math.is_evenly_spaced(x, tol=0.0001)[source]#

Check if x is evenly spaced within a given tolerance.

Parameters:
  • x (array-like) – x data

  • tol (float, optional) – Tolerance for considering spacing equal (default: 1e-4)

Returns:

True if x is evenly spaced, False otherwise

Return type:

bool

pychemelt.utils.math.first_derivative_savgol(x, y, window_length=5, polyorder=4)[source]#

Estimate the first derivative using Savitzky-Golay filtering.

Parameters:
  • x (array-like) – x data (must be evenly spaced)

  • y (array-like) – y data

  • window_length (int, optional) – Length of the filter window, in temperature units (default: 5)

  • polyorder (int, optional) – Order of the polynomial used to fit the samples (default: 4)

Returns:

First derivative of y with respect to x

Return type:

numpy.ndarray

Notes

This function will raise a ValueError if x is not evenly spaced.

pychemelt.utils.math.relative_errors(params, cov)[source]#

Calculate the relative errors of the fitted parameters.

Parameters:
  • params (numpy.ndarray) – Fitted parameters

  • cov (numpy.ndarray) – Covariance matrix of the fitted parameters

Returns:

Relative errors of the fitted parameters (in percent)

Return type:

numpy.ndarray

pychemelt.utils.math.find_line_outliers(m, b, x, y, sigma=2.5)[source]#

Find outliers in a linear fit using the sigma rule.

Parameters:
  • m (float) – Slope of the line

  • b (float) – Intercept of the line

  • x (array-like) – x data

  • y (array-like) – y data

  • sigma (float, optional) – Number of standard deviations to use for outlier detection (default: 2.5)

Returns:

Indices of the outliers

Return type:

numpy.ndarray

pychemelt.utils.math.get_rss(y, y_fit)[source]#

Compute the residual sum of squares.

Parameters:
  • y (array-like) – Observed values

  • y_fit (array-like) – Fitted values

Returns:

Residual sum of squares

Return type:

float

pychemelt.utils.math.solve_one_root_quadratic(a, b, c)[source]#

Solution to one root quadratic: a * X**2 + b * X + c = 0

Parameters:
  • a (number type) – parameter a

  • b (number type) – parameter b

  • c (number type) – parameter c

Returns:

Solution of the formula

Return type:

float

pychemelt.utils.math.solve_one_root_depressed_cubic(p, q)[source]#

Solution to one root depressed cubic: X**3 + p * X + q = 0

Parameters:
  • p (number type) – parameter p

  • q (number type) – parameter q

Returns:

Solution of the formula

Return type:

float

pychemelt.utils.palette module#

Viridis color palette.

A perceptually uniform color map that is readable by those with colorblindness. Contains hex color values transitioning from dark purple to yellow.

pychemelt.utils.plotting module#

class pychemelt.utils.plotting.PlotConfig(width: int = 1000, height: int = 800, type: str = 'png', font_size: int = 16, marker_size: int = 8, line_width: int = 3)[source]#

Bases: object

General plot configuration

width: int = 1000#
height: int = 800#
type: str = 'png'#
font_size: int = 16#
marker_size: int = 8#
line_width: int = 3#
__init__(width: int = 1000, height: int = 800, type: str = 'png', font_size: int = 16, marker_size: int = 8, line_width: int = 3) None#
class pychemelt.utils.plotting.AxisConfig(showgrid_x: bool = True, showgrid_y: bool = True, n_y_axis_ticks: int = 5, linewidth: int = 1, tickwidth: int = 1, ticklen: int = 5, gridwidth: int = 1)[source]#

Bases: object

Axis styling configuration

showgrid_x: bool = True#
showgrid_y: bool = True#
n_y_axis_ticks: int = 5#
linewidth: int = 1#
tickwidth: int = 1#
ticklen: int = 5#
gridwidth: int = 1#
__init__(showgrid_x: bool = True, showgrid_y: bool = True, n_y_axis_ticks: int = 5, linewidth: int = 1, tickwidth: int = 1, ticklen: int = 5, gridwidth: int = 1) None#
class pychemelt.utils.plotting.LayoutConfig(show_subplot_titles: bool = False, vertical_spacing: float = 0.1)[source]#

Bases: object

Layout and spacing configuration

show_subplot_titles: bool = False#
vertical_spacing: float = 0.1#
__init__(show_subplot_titles: bool = False, vertical_spacing: float = 0.1) None#
class pychemelt.utils.plotting.LegendConfig[source]#

Bases: object

Legend and labeling configuration

color_bar_length = 0.4#
color_bar_orientation = 'v'#
color_bar_x_pos = 1.05#
color_bar_y_pos = 0.5#
__init__() None#
pychemelt.utils.plotting.config_fig(fig, plot_width=800, plot_height=600, plot_type='png', plot_title_for_download='plot')[source]#

Configure plotly figure with download options and toolbar settings.

Parameters:
  • fig (go.Figure) – Plotly figure object

  • plot_width (int, default 800) – Width of the plot in pixels

  • plot_height (int, default 600) – Height of the plot in pixels

  • plot_type (str, default "png") – Format for downloading the plot (e.g., “png”, “jpeg”)

  • plot_title_for_download (str, default "plot") – Title for the downloaded plot file

Returns:

Configured plotly figure

Return type:

go.Figure

pychemelt.utils.plotting.plot_unfolding(pychemelt_sample, plot_derivative=False, plot_config: PlotConfig = None, axis_config: AxisConfig = None, layout_config: LayoutConfig = None, legend_config: LegendConfig = None)[source]#

Plot the unfolding curves, including the signal and the predicted curves

Parameters:
  • pychemelt_sample – pychemelt.Sample object

  • plot_derivative (bool) – Whether to plot the derivative of the signal

  • plot_config (PlotConfig, optional) – Configuration for the overall plot

  • axis_config (AxisConfig, optional) – Configuration for the axes

  • layout_config (LayoutConfig, optional) – Configuration for the layout

  • legend_config (LegendConfig, optional) – configuration for the legend

pychemelt.utils.processing module#

This module contains helper functions to process data Author: Osvaldo Burastero

pychemelt.utils.processing.set_param_bounds(p0, param_names)[source]#

Generate heuristic lower and upper bounds for fitting parameters based on initial guesses.

Parameters:
  • p0 (array-like) – Initial parameter guesses.

  • param_names (list of str) – Names of the parameters to apply specific logic (e.g., non-negative constraints).

Returns:

(low_bounds, high_bounds) as lists of numeric values.

Return type:

tuple

pychemelt.utils.processing.expand_temperature_list(temp_lst, signal_lst)[source]#

Expand the temperature list to match the length of the signal list.

Parameters:
  • temp_lst (list) – List of temperatures

  • signal_lst (list) – List of signals

Returns:

Expanded temperature list

Return type:

list

pychemelt.utils.processing.clean_conditions_labels(conditions)[source]#

Clean the conditions labels by removing unwanted characters and patterns.

Parameters:

conditions (list) – List of condition strings.

Returns:

List of cleaned condition strings.

Return type:

list

pychemelt.utils.processing.subset_signal_by_temperature(signal_lst, temp_lst, min_temp, max_temp)[source]#

Subset the signal and temperature lists based on the specified temperature range.

Parameters:
  • signal_lst (list) – List of signal arrays.

  • temp_lst (list) – List of temperature arrays.

  • min_temp (float) – Minimum temperature for subsetting.

  • max_temp (float) – Maximum temperature for subsetting.

Returns:

Tuple containing the subsetted signal and temperature lists.

Return type:

tuple

pychemelt.utils.processing.guess_Tm_from_derivative(temp_lst, deriv_lst, x1, x2)[source]#

Estimate the melting temperature (Tm) by finding the extremum of the first derivative.

Parameters:
  • temp_lst (list of np.ndarray) – Temperature arrays for each dataset.

  • deriv_lst (list of np.ndarray) – First derivative of the signal for each dataset.

  • x1 (float) – Lower buffer from the temperature edges to exclude noise/artifacts.

  • x2 (float) – Upper buffer from the temperature edges to define the baseline median window.

Returns:

Estimated Tm values for each dataset.

Return type:

list of float

pychemelt.utils.processing.estimate_signal_baseline_params(signal_lst, temp_lst, native_baseline_type, unfolded_baseline_type, window_range_native=12, window_range_unfolded=12, oligomer_number=1)[source]#

Estimate the baseline parameters for the sample

Parameters:
  • signal_lst (list of np.ndarray) – List of signal arrays

  • temp_lst (list of np.ndarray) – List of temperature arrays

  • window_range_native (float) – Range of the temperature window to estimate the native state baseline

  • window_range_unfolded (float) – Range of the temperature window to estimate the unfolded state baseline

  • native_baseline_type (str) – options: ‘constant’, ‘linear’, ‘quadratic’, ‘exponential’

  • unfolded_baseline_type (str) – options: ‘constant’, ‘linear’, ‘quadratic’, ‘exponential’

  • oligomer_number (int) – number of subunits in the oligomer

Returns:

Lists of estimated parameters (p1Ns, p1Us, p2Ns, p2Us, p3Ns, p3Us).

Return type:

tuple

pychemelt.utils.processing.fit_local_thermal_unfolding_to_signal_lst(signal_lst, temp_lst, t_melting_init, p1_Ns, p1_Us, p2_Ns, p2_Us, p3_Ns, p3_Us, baseline_native_fx, baseline_unfolded_fx)[source]#

Perform individual (local) fits for each signal curve in a list.

Parameters:
  • signal_lst (list of np.ndarray) – List of signals.

  • temp_lst (list of np.ndarray) – List of temperatures.

  • t_melting_init (list of float) – Initial Tm guesses.

  • p1_Ns (list of float) – Estimated baseline parameters for each curve.

  • p1_Us (list of float) – Estimated baseline parameters for each curve.

  • p2_Ns (list of float) – Estimated baseline parameters for each curve.

  • p2_Us (list of float) – Estimated baseline parameters for each curve.

  • p3_Ns (list of float) – Estimated baseline parameters for each curve.

  • p3_Us (list of float) – Estimated baseline parameters for each curve.

  • baseline_native_fx (callable) – Function to calculate the native baseline.

  • baseline_unfolded_fx (callable) – Function to calculate the unfolded baseline.

Returns:

(Tms, dHs, predicted_lst) containing fitted parameters and signal arrays.

Return type:

tuple

pychemelt.utils.processing.re_arrange_predictions(predicted_lst, n_signals, n_denaturants)[source]#

Re-arrange the flattened predictions to match the original signal list with sublists.

Parameters:
  • predicted_lst (list) – Flattened list of predicted signals of length n_signals * n_denaturants.

  • n_signals (int) – Number of signal types (e.g., different wavelengths).

  • n_denaturants (int) – Number of denaturant concentrations or conditions per signal.

Returns:

Re-arranged list of predicted signals of length n_signals, where each element is a sublist of length n_denaturants.

Return type:

list

pychemelt.utils.processing.re_arrange_params(params, n_signals)[source]#

Re-arrange flattened parameters into a list of sublists grouped by signal.

Parameters:
  • params (list or np.ndarray) – Flattened list of parameters.

  • n_signals (int) – Number of signal types to group parameters by.

Returns:

Re-arranged list of parameters of length n_signals containing parameter arrays for each signal.

Return type:

list of np.ndarray

pychemelt.utils.processing.subset_data(data, max_points)[source]#

Reduces the number of data points by repeated striding until the size is below a threshold.

Parameters:
  • data (np.ndarray) – Input data array to be subsetted.

  • max_points (int) – The maximum number of points allowed in the resulting array.

Returns:

Subsetted data array containing every $2^n$-th point of the original.

Return type:

np.ndarray

pychemelt.utils.processing.get_colors_from_numeric_values(values, min_val, max_val, use_log_scale=False)[source]#

Map numeric values to colors in the VIRIDIS palette based on a specified range.

Parameters:
  • values (list or np.ndarray) – Numeric values to map to colors.

  • min_val (float) – Minimum value of the range.

  • max_val (float) – Maximum value of the range.

  • use_log_scale (bool, optional) – Whether to use logarithmic scaling for the values, default is True.

Returns:

List of hex color codes corresponding to the input values.

Return type:

list

pychemelt.utils.processing.combine_sequences(seq1, seq2)[source]#

Combine two sequences to generate all possible combinations of their elements.

Parameters:
  • seq1 (list) – First sequence of elements.

  • seq2 (list) – Second sequence of elements.

Returns:

A list of tuples, where each tuple contains one element from seq1 and one from seq2.

Return type:

list

pychemelt.utils.processing.adjust_value_to_interval(value, lower_bound, upper_bound, shift)[source]#

Verify that a value is within the specified bounds. If the value is outside the bounds, adjust it to the nearest bound. :param value: The value to be adjusted. :type value: float :param lower_bound: The lower bound of the interval. :type lower_bound: float :param upper_bound: The upper bound of the interval. :type upper_bound: float :param shift: How much to shift the value if it is outside the bounds. :type shift: float

pychemelt.utils.processing.oligomer_number(model)[source]#

Get the number of subunits in the oligomer based on the model.

Returns:

The number of subunits (2 for ‘Dimer’, 3 for ‘Trimer’, 4 for ‘Tetramer’, 1 otherwise).

Return type:

int

pychemelt.utils.processing.parse_number(s)[source]#

Parse a string as a float, handling: - European decimal (comma) - Optional thousands separators - Standard decimal point

Parameters:

s (str) – The string to parse

Return type:

float The parsed number

Raises:

ValueError If the string cannot be parsed as a float

pychemelt.utils.processing.are_all_strings_numeric(lst)[source]#
Parameters:

lst (list of str) – List of strings to check

Returns:

True if all strings in the list are numeric (can contain digits, ‘.’, ‘-’, ‘,’), False otherwise

Return type:

bool

pychemelt.utils.processing.is_float(element)[source]#
pychemelt.utils.processing.transform_to_list(element_or_list)[source]#
Parameters:

element_or_list (bool, str, int, float, list, or numpy array) – The input element or list to be transformed into a list.

Returns:

A list containing the input element if it is not already a list, or the input itself if it is None, a numpy array, or a list.

Return type:

list or None

Raises:

ValueError – If the input is not a boolean, string, integer, float, list, numpy array

pychemelt.utils.rates module#

This module contains helper functions to obtain equilibrium constants Author: Osvaldo Burastero

Useful references for unfolding models:
  • Rumfeldt, Jessica AO, et al. “Conformational stability and folding mechanisms of dimeric proteins.” Progress in biophysics and molecular biology 98.1 (2008): 61-84.

  • Bedouelle, Hugues. “Principles and equations for measuring and interpreting protein stability: From monomer to tetramer.” Biochimie 121 (2016): 29-37.

  • Mazurenko, Stanislav, et al. “Exploration of protein unfolding by modelling calorimetry data from reheating.” Scientific reports 7.1 (2017): 16321.

All thermodynamic parameters are used in kcal mol units

Unfolding functions for monomers have an argument called ‘extra_arg’ that is not used. This is because unfolding functions for oligomers require the protein concentration in that position

pychemelt.utils.rates.eq_constant_thermo(T, DH1, T1, Cp)[source]#

T1 is the temperature at which ΔG(T) = 0 ΔH1, the variation of enthalpy between the two considered states at T1 Cp the variation of calorific capacity between the two states

Parameters:
  • T (array-like) – Temperature (Kelvin)

  • DH1 (float) – Variation of enthalpy between the two considered states at T1 (kcal/mol)

  • T1 (float) – Temperature at which the equilibrium constant equals one (Kelvin)

  • Cp (float) – Variation of heat capacity between the two states (kcal/mol/K)

Returns:

Equilibrium constant at the given temperature

Return type:

numpy.ndarray

pychemelt.utils.rates.eq_constant_termochem(T, D, DHm, Tm, Cp0, m0, m1)[source]#

Ref: Louise Hamborg et al., 2020. Global analysis of protein stability by temperature and chemical denaturation

Parameters:
  • T (array-like) – Temperature (Kelvin only!)

  • D (float) – Denaturant concentration (M)

  • DHm (float) – Enthalpy change at Tm (kcal/mol)

  • Tm (float) – Melting temperature where ΔG = 0 (Kelvin only!)

  • Cp0 (float) – Heat capacity change (kcal/mol/K)

  • m0 (float) – m-value at the reference temperature

  • m1 (float) – Temperature dependence of the m-value

Returns:

Equilibrium constant at a certain temperature and denaturant agent concentration

Return type:

numpy.ndarray

pychemelt.utils.signals module#

This module contains helper functions to obtain the signal, given certain parameters Author: Osvaldo Burastero

pychemelt.utils.signals.signal_two_state_tc_unfolding(T, D, DHm, Tm, Cp0, m0, m1, p1_N, p2_N, p3_N, p4_N, p1_U, p2_U, p3_U, p4_U, baseline_N_fx, baseline_U_fx, extra_arg=None)[source]#

Ref: Louise Hamborg et al., 2020. Global analysis of protein stability by temperature and chemical denaturation

Parameters:
  • T (array-like) – Temperature in Kelvin units

  • D (array-like) – Denaturant agent concentration

  • DHm (float) – Variation of enthalpy between the two considered states at Tm

  • Tm (float) – Temperature at which the equilibrium constant equals one, in Kelvin units

  • Cp0 (float) – Variation of calorific capacity between the two states

  • m0 (float) – m-value at the reference temperature (Tref)

  • m1 (float) – Variation of m-value with temperature

  • p1_N (float) – parameters describing the native-state baseline

  • p2_N (float) – parameters describing the native-state baseline

  • p3_N (float) – parameters describing the native-state baseline

  • p4_N (float) – parameters describing the native-state baseline

  • p1_U (float) – parameters describing the unfolded-state baseline

  • p2_U (float) – parameters describing the unfolded-state baseline

  • p3_U (float) – parameters describing the unfolded-state baseline

  • p4_U (float) – parameters describing the unfolded-state baseline

  • baseline_N_fx (function) – for the native-state baseline

  • baseline_U_fx (function) – for the unfolded-state baseline

  • extra_arg (None, optional) – Not used but present for API compatibility with oligomeric models

Returns:

Signal at the given temperatures and denaturant agent concentration, given the parameters

Return type:

numpy.ndarray

pychemelt.utils.signals.signal_two_state_t_unfolding(T, Tm, dHm, p1_N, p2_N, p3_N, p1_U, p2_U, p3_U, baseline_N_fx, baseline_U_fx, Cp=0, extra_arg=None)[source]#

Two-state temperature unfolding (monomer).

Parameters:
  • T (array-like) – Temperature

  • Tm (float) – Temperature at which the equilibrium constant equals one

  • dHm (float) – Variation of enthalpy between the two considered states at Tm

  • p1_N (float) – baseline parameters for the native-state baseline

  • p2_N (float) – baseline parameters for the native-state baseline

  • p3_N (float) – baseline parameters for the native-state baseline

  • p1_U (float) – baseline parameters for the unfolded-state baseline

  • p2_U (float) – baseline parameters for the unfolded-state baseline

  • p3_U (float) – baseline parameters for the unfolded-state baseline

  • baseline_N_fx (callable) – function to calculate the baseline for the native state

  • baseline_U_fx (callable) – function to calculate the baseline for the unfolded state

  • Cp (float, optional) – Variation of heat capacity between the two states (default: 0)

  • extra_arg (None, optional) – Not used but present for compatibility

Returns:

Signal at the given temperatures, given the parameters

Return type:

numpy.ndarray

pychemelt.utils.signals.two_state_thermal_unfold_curve(T, C, Tm, dHm, p1_N, p2_N, p3_N, p4_N, p1_U, p2_U, p3_U, p4_U, baseline_N_fx, baseline_U_fx, Cp=0)[source]#

Two-state temperature unfolding (monomer). N ⇔ U

Parameters:
  • T (array-like) – Temperature

  • C (array-like) – Oligomer sample concentration

  • Tm (float) – Temperature at which the equilibrium constant equals one

  • dHm (float) – Variation of enthalpy between the two considered states at Tm

  • p1_N (float) – baseline parameters for the native-state baseline

  • p2_N (float) – baseline parameters for the native-state baseline

  • p3_N (float) – baseline parameters for the native-state baseline

  • p1_U (float) – baseline parameters for the unfolded-state baseline

  • p2_U (float) – baseline parameters for the unfolded-state baseline

  • p3_U (float) – baseline parameters for the unfolded-state baseline

  • baseline_N_fx (callable) – function to calculate the baseline for the native state

  • baseline_U_fx (callable) – function to calculate the baseline for the unfolded state

  • Cp (float, optional) – Variation of heat capacity between the two states (default: 0)

Returns:

Signal at the given temperatures, given the parameters

Return type:

numpy.ndarray

pychemelt.utils.signals.two_state_thermal_unfold_curve_dimer(T, C, Tm, dHm, p1_N, p2_N, p3_N, p4_N, p1_U, p2_U, p3_U, p4_U, baseline_N_fx, baseline_U_fx, Cp=0)[source]#

Two-state temperature unfolding (dimer). N2 ⇔ 2U C is the total concentration (M) of the protein in dimer equivalent.

Parameters:
  • T (array-like) – Temperature

  • C (array-like) – Oligomer sample concentration

  • Tm (float) – Temperature at which the equilibrium constant equals one

  • dHm (float) – Variation of enthalpy between the two considered states at Tm

  • p1_N (float) – baseline parameters for the native-state baseline

  • p2_N (float) – baseline parameters for the native-state baseline

  • p3_N (float) – baseline parameters for the native-state baseline

  • p1_U (float) – baseline parameters for the unfolded-state baseline

  • p2_U (float) – baseline parameters for the unfolded-state baseline

  • p3_U (float) – baseline parameters for the unfolded-state baseline

  • baseline_N_fx (callable) – function to calculate the baseline for the native state

  • baseline_U_fx (callable) – function to calculate the baseline for the unfolded state

  • Cp (float, optional) – Variation of heat capacity between the two states (default: 0)

Returns:

Signal at the given temperatures, given the parameters

Return type:

numpy.ndarray

Notes

C is the total concentration (M) of the protein in dimer equivalent.

pychemelt.utils.signals.two_state_thermal_unfold_curve_trimer(T, C, Tm, dHm, p1_N, p2_N, p3_N, p4_N, p1_U, p2_U, p3_U, p4_U, baseline_N_fx, baseline_U_fx, Cp=0)[source]#

Two-state temperature unfolding (trimer). N3 ⇔ 3U

Parameters:
  • T (array-like) – Temperature

  • C (array-like) – Oligomer sample concentration

  • Tm (float) – Temperature at which the equilibrium constant equals one

  • dHm (float) – Variation of enthalpy between the two considered states at Tm

  • p1_N (float) – baseline parameters for the native-state baseline

  • p2_N (float) – baseline parameters for the native-state baseline

  • p3_N (float) – baseline parameters for the native-state baseline

  • p1_U (float) – baseline parameters for the unfolded-state baseline

  • p2_U (float) – baseline parameters for the unfolded-state baseline

  • p3_U (float) – baseline parameters for the unfolded-state baseline

  • baseline_N_fx (callable) – function to calculate the baseline for the native state

  • baseline_U_fx (callable) – function to calculate the baseline for the unfolded state

  • Cp (float, optional) – Variation of heat capacity between the two states (default: 0)

Returns:

Signal at the given temperatures, given the parameters

Return type:

numpy.ndarray

Notes

C is the total concentration (M) of the protein in dimer equivalent.

pychemelt.utils.signals.two_state_thermal_unfold_curve_tetramer(T, C, Tm, dHm, p1_N, p2_N, p3_N, p4_N, p1_U, p2_U, p3_U, p4_U, baseline_N_fx, baseline_U_fx, Cp=0, extra_arg=None)[source]#

Two-state temperature unfolding (tetramer). N4 ⇔ 4U

Parameters:
  • T (array-like) – Temperature

  • C (array-like) – Oligomer sample concentration

  • Tm (float) – Temperature at which the equilibrium constant equals one

  • dHm (float) – Variation of enthalpy between the two considered states at Tm

  • p1_N (float) – baseline parameters for the native-state baseline

  • p2_N (float) – baseline parameters for the native-state baseline

  • p3_N (float) – baseline parameters for the native-state baseline

  • p1_U (float) – baseline parameters for the unfolded-state baseline

  • p2_U (float) – baseline parameters for the unfolded-state baseline

  • p3_U (float) – baseline parameters for the unfolded-state baseline

  • baseline_N_fx (callable) – function to calculate the baseline for the native state

  • baseline_U_fx (callable) – function to calculate the baseline for the unfolded state

  • Cp (float, optional) – Variation of heat capacity between the two states (default: 0)

Returns:

Signal at the given temperatures, given the parameters

Return type:

numpy.ndarray

Notes

C is the total concentration (M) of the protein in dimer equivalent.

pychemelt.utils.signals.map_two_state_model_to_signal_fx(model)[source]#

Maps the model string to the signal type

Parameters:

model (str,) – string representation of model type.

Returns:

signal function corresponding to the string

Return type:

function

pychemelt.utils.svd module#

Module containing functions to perform Singular Value Decomposition (SVD) and Principal Component Analysis (PCA) on spectral data, along with utilities for manipulating basis spectra and coefficients.

Author: Osvaldo Burastero

pychemelt.utils.svd.apply_svd(X)[source]#

Perform Singular Value Decomposition (SVD) on the input data matrix X.

Parameters:

X (numpy array of shape (n_wavelengths, n_measurements)) – The input data matrix to decompose.

Returns:

  • explained_variance (numpy array) – The cumulative explained variance for each component.

  • basis_spectra (numpy array) – The left singular vectors (U matrix) representing the basis spectra.

  • coefficients (numpy array) – The coefficients associated with each basis spectrum.

pychemelt.utils.svd.filter_basis_spectra(explained_variance, basis_spectra_all, coefficients_all, explained_variance_threshold=99)[source]#

Filter the basis spectra and coefficients based on the explained variance threshold :param explained_variance: The cumulative explained variance for each component. :type explained_variance: numpy array :param basis_spectra_all: The left singular vectors (U matrix) representing the basis spectra. :type basis_spectra_all: numpy array :param coefficients_all: The coefficients associated with each basis spectrum. :type coefficients_all: numpy array :param explained_variance_threshold: The threshold for explained variance to filter components. Default is 99. :type explained_variance_threshold: float, optional

Returns:

  • basis_spectra (numpy array) – The filtered basis spectra.

  • coefficients (numpy array) – The filtered coefficients.

  • k (int) – The number of components that meet the explained variance threshold.

pychemelt.utils.svd.align_basis_spectra_and_coefficients(X, basis_spectra, coefficients)[source]#

Align the basis spectra peaks to the original data :param X: The input data matrix. :type X: numpy array of shape (n_samples, n_features) :param basis_spectra: The basis spectra obtained from SVD. :type basis_spectra: numpy array :param coefficients: The coefficients associated with each basis spectrum. :type coefficients: numpy array

Returns:

  • basis_spectra (numpy array) – The aligned basis spectra.

  • coefficients (numpy array) – The adjusted coefficients.

pychemelt.utils.svd.angle_from_cathets(adjacent_leg, opposite_leg)[source]#

Calculate the angle between the hypotenuse and the adjacent leg of a right triangle. :param adjacent_leg: Length of the adjacent leg. :type adjacent_leg: float :param opposite_leg: Length of the opposite leg. :type opposite_leg: float

Returns:

angle_in_radians – Angle in radians between the hypotenuse and the adjacent leg.

Return type:

float

pychemelt.utils.svd.get_2d_counterclockwise_rot_matrix(angle_in_radians)[source]#

Obtain the rotation matrix for a 2d coordinates system using a counterclockwise direction :param angle_in_radians: Angle in radians for the rotation. :type angle_in_radians: float

Returns:

rotM – 2x2 rotation matrix.

Return type:

numpy array

pychemelt.utils.svd.get_3d_counterclockwise_rot_matrix_around_z_axis(angle_in_radians)[source]#

Obtain the rotation matrix for a 3d coordinates system around the z axis using a counterclockwise direction :param angle_in_radians: Angle in radians for the rotation. :type angle_in_radians: float

Returns:

rotM – 3x3 rotation matrix.

Return type:

numpy array

pychemelt.utils.svd.get_3d_clockwise_rot_matrix_around_y_axis(angle_in_radians)[source]#

Obtain the rotation matrix for a 3d coordinates system around the y axis using a clockwise direction :param angle_in_radians: Angle in radians for the rotation. :type angle_in_radians: float

Returns:

rotM – 3x3 rotation matrix.

Return type:

numpy array

pychemelt.utils.svd.rotate_two_basis_spectra(X, basis_spectra, pca_based=False)[source]#

Create a new basis spectra using a linear combination of the first and second basis spectra

Parameters:
  • X (numpy array) – The raw data matrix of size n*m, where ‘n’ is the number of measured wavelengths and ‘m’ is the number of acquired spectra.

  • basis_spectra (numpy array) – The matrix containing the set of basis spectra.

  • pca_based (bool, optional) – Boolean to decide if we need to center the matrix X. Default is False.

Returns:

  • basis_spectra_new (numpy array) – The new set of basis spectra.

  • coefficients (numpy array) – The new set of associated coefficients.

pychemelt.utils.svd.rotate_three_basis_spectra(X, basis_spectra, pca_based=False)[source]#

Create a new basis spectra using a linear combination from the first, second and third basis spectra

Parameters:
  • X (numpy array) – The raw data matrix of size n*m, where ‘n’ is the number of measured wavelengths and ‘m’ is the number of acquired spectra.

  • basis_spectra (numpy array) – The matrix containing the set of basis spectra.

  • pca_based (bool, optional) – Boolean to decide if we need to center the matrix X. Default is False.

Returns:

  • basis_spectra_new (numpy array) – The new set of basis spectra.

  • coefficients_subset (numpy array) – The new set of associated coefficients.

pychemelt.utils.svd.reconstruct_spectra(basis_spectra, coefficients, X=None, pca_based=False)[source]#

Reconstruct the original spectra based on the set of basis spectra and the associated coefficients

Parameters:
  • basis_spectra (numpy array) – The matrix containing the set of basis spectra.

  • coefficients (numpy array) – The associated coefficients of each basis spectrum.

  • X (numpy array, optional) – Only used if pca_based equals TRUE! X is the raw data matrix of size n*m, where ‘n’ is the number of measured wavelengths and ‘m’ is the number of acquired spectra.

  • pca_based (bool, optional) – Boolean to decide if we need to extract the mean from the the X raw data matrix. Default is False.

  • Returns

  • -------

  • fitted (numpy array) – The reconstructed matrix which should be close the original raw data.

pychemelt.utils.svd.explained_variance_from_orthogonal_vectors(vectors, coefficients, total_variance)[source]#

Useful to get the percentage of variance, not in the coordinate space provided by PCA/SVD, but against a different set of (rotated) vectors.

Parameters:
  • vectors (numpy array) – The set of orthogonal vectors.

  • coefficients (numpy array) – The associated coefficients of each orthogonal vector.

  • total_variance (float) – The total variance of the original data (mean subtracted if we performed PCA…).

Returns:

explained_variance – The amount of explained variance by each orthogonal vector.

Return type:

list

pychemelt.utils.svd.apply_pca(X)[source]#

Perform Principal Component Analysis (PCA) on the input data matrix X. :param X: The input data matrix to decompose. :type X: numpy array of shape (n_wavelengths, n_measurements)

Returns:

  • cum_sum_eigenvalues (numpy array) – The cumulative explained variance for each principal component.

  • principal_components (numpy array) – The principal components (eigenvectors) representing the basis spectra.

  • coefficients (numpy array) – The coefficients associated with each principal component.

pychemelt.utils.svd.recalc_explained_variance(basis_spectra, coefficients, X, pca_based=False)[source]#

Recalculate the explained variance of a set of basis spectra and associated coefficients :param basis_spectra: The basis spectra. :type basis_spectra: numpy array :param coefficients: The associated coefficients of each basis spectrum. :type coefficients: numpy array :param X: The raw data matrix of size n*m, where ‘n’ is the number of measured wavelengths

and ‘m’ is the number of acquired spectra.

Parameters:

pca_based (bool, optional) – Boolean to decide if we need to center the matrix X. Default is False.

Returns:

explained_variance – The cumulative explained variance for each component.

Return type:

numpy array