pychemelt package

PyChemelt package for the analysis of chemical and thermal denaturation data

Subpackages

pychemelt.utils package

Submodules

pychemelt.main module

Main class to handle thermal and chemical denaturation data The current model assumes that the unfolding is reversible

class pychemelt.main.Sample(name='Test')[source]

Bases: object

Class to hold, process, and fit thermal and chemical denaturation data.

This class manages multiple signal types (e.g., 350nm, 330nm, Ratio) and concentrations, providing an interface for global thermodynamic analysis under the assumption of reversible unfolding.

Parameters:: name (str, optional) – Identifier for the sample. Default is ‘Test’.

signal_dic

Raw signal data mapped by signal name.

Type:: dict

temp_dic

Temperature data mapped by signal name.

Type:: dict

conditions

Processed numeric values for experimental conditions (e.g., [Denaturant]).

Type:: list of float

labels

Original string labels for each condition.

Type:: list of str

signals

Names of all available signal types in the loaded files.

Type:: list of str

nr_signals

Number of distinct signal types selected for analysis.

Type:: int

single_fit_done

Flag indicating if individual dataset fits have been completed.

Type:: bool

global_fit_done

Flag for global thermodynamic fitting with local baselines.

Type:: bool

global_global_fit_done

Flag for global thermodynamics and global baseline slopes.

Type:: bool

global_global_global_fit_done

Flag for global thermodynamics, slopes, and intercepts.

Type:: bool

set_units(format='international')[source]

read_file(file)[source]

Read the file and load the data into the sample object

Parameters:: file (str or os.PathLike) – Path to the file
Returns:: True if the file was read and loaded into the sample object
Return type:: bool

read_multiple_files(files)[source]

Read multiple files and load the data into the sample object

Parameters:: files (list or str) – List of paths to the files (or a single path)
Returns:: True if the files were read and loaded into the sample object
Return type:: bool

set_signal(signal_names)[source]

Set multiple signals to be used for the analysis. This way, we can fit globally multiple signals at the same time, such as 350nm and 330nm

Parameters:: signal_names (list or str) – List of names of the signals to be used. E.g., [‘350nm’,’330nm’] or a single name

Notes

This method creates/updates the following attributes on the instance: - signal_lst_pre_multiple, temp_lst_pre_multiple : lists of lists - signal_names : list of signal name strings - nr_signals : int, number of signal types

set_temperature_range(min_temp=0, max_temp=100)[source]

Set the temperature range for the sample

Parameters:

min_temp (float, optional) – Minimum temperature
max_temp (float, optional) – Maximum temperature

set_signal_id()[source]: Create a list with the same length as the total number of signals The elements of the list indicated the ID of the signal, e.g., all 350nm datasets are mapped to 0, all 330nm datasets to 1, etc.

estimate_derivative(window_length=8)[source]

Estimate the derivative of the signal using Savitzky-Golay filter

Parameters:: window_length (int, optional) – Length of the filter window in degrees

Notes

Creates/updates attributes: - temp_deriv_lst_multiple, deriv_lst_multiple, deriv_lst_expanded : lists storing estimated derivatives and corresponding temps - predicted_deriv_lst_multiple : list storing estimated derivatives of predicted values

guess_Tm(x1=6, x2=11)[source]

Guess the Tm of the sample using the derivative of the signal

Parameters:

x1 (float, optional) – Shift from the minimum and maximum temperature to estimate the median of the initial and final baselines
x2 (float, optional) – Shift from the minimum and maximum temperature to estimate the median of the initial and final baselines

Notes

x2 must be greater than x1.

This method creates/updates attributes: - t_melting_init_multiple : list of initial Tm guesses per signal - t_melting_df_multiple : list of pandas.DataFrame objects with Tm vs Denaturant

estimate_baseline_parameters(native_baseline_type, unfolded_baseline_type, window_range_native=12, window_range_unfolded=12)[source]

Estimate the baseline parameters for multiple signals

Parameters:

native_baseline_type (str) – one of ‘constant’, ‘linear’, ‘quadratic’, ‘exponential’
unfolded_baseline_type (str) – one of ‘constant’, ‘linear’, ‘quadratic’, ‘exponential’
window_range_native (int, float, or tuple(float, float), optional) – If scalar, range (in degrees) from the minimum temperature used to estimate the native-state baseline. If tuple, interpreted as an explicit temperature interval (min_temp, max_temp).
window_range_unfolded (int, float, or tuple(float, float), optional) – If scalar, range (in degrees) from the maximum temperature used to estimate the unfolded-state baseline. If tuple, interpreted as an explicit temperature interval (min_temp, max_temp).

Notes

This method sets or updates these attributes: - bNs_per_signal, bUs_per_signal, kNs_per_signal, kUs_per_signal, qNs_per_signal, qUs_per_signal - poly_order_native, poly_order_unfolded

reset_fittings_results()[source]: Deletes the results of previous fittings from the object

expand_multiple_signal()[source]

Create a single list with all the signals Create a single list with all the temperatures

Notes

Creates/updates attributes: - signal_lst_expanded, temp_lst_expanded - signal_lst_expanded_subset, temp_lst_expanded_subset

create_params_df()[source]: Create a dataframe of the parameters

pychemelt.monomer module

Main class to handle thermal and chemical denaturation data The current model assumes the protein is a monomer and that the unfolding is reversible

class pychemelt.monomer.Monomer(name='Test')[source]

Bases: Sample

Class to hold the data of a single sample and fit it

set_denaturant_concentrations(concentrations=None)[source]

Set the denaturant concentrations for the sample

Parameters:: concentrations (list, optional) – List of denaturant concentrations. If None, use the sample conditions

Notes

Creates/updates attribute denaturant_concentrations_pre (numpy.ndarray)

select_conditions(boolean_lst=None, normalise_to_global_max=True)[source]

For each signal, select the conditions to be used for the analysis

Parameters:

boolean_lst (list of bool, optional) – List of booleans selecting which conditions to keep. If None, keep all.
normalise_to_global_max (bool, optional) – If True, normalise the signal to the global maximum - per signal type

Notes

Creates/updates several attributes used by downstream fitting: - signal_lst_multiple, temp_lst_multiple : lists of lists with selected data - denaturant_concentrations : list of selected denaturant concentrations - denaturant_concentrations_expanded : flattened numpy array matching expanded signals - boolean_lst, normalise_to_global_max, nr_den : control flags/values

fit_thermal_unfolding_local()[source]: Fit the thermal unfolding of the sample using the signal and temperature data We fit one curve at a time, with individual parameters

guess_Cp()[source]

Guess the Cp of the sample by fitting a line to the Tm and dH values

Notes

This method creates/updates attributes used later in fitting: - Tms, dHs, slope_dh_tm, intercept_dh_tm, Cp0, Cp0 assigned to self.Cp0

guess_initial_parameters(native_baseline_type, unfolded_baseline_type, window_range_native=12, window_range_unfolded=12)[source]

Estimate starting thermodynamic and baseline parameters for global fitting.

Parameters:

native_baseline_type ({'constant', 'linear', 'quadratic', 'exponential'}) – The model type for the native state baseline.
unfolded_baseline_type ({'constant', 'linear', 'quadratic', 'exponential'}) – The model type for the unfolded state baseline.
window_range_native (float, optional) – Temperature range at the start of the curve (in degrees) used for native baseline estimation. Default is 12.
window_range_unfolded (float, optional) – Temperature range at the end of the curve used for unfolded baseline estimation. Default is 12.

create_dg_df()[source]: Create a dataframe of the dg values versus temperature

set_thermodynamic_params_guess(cp_limits=None, dh_limits=None, tm_limits=None, cp_value=None, user_thermodynamic_params_guess=None)[source]

Get the current guess for the thermodynamic parameters (Tm, dH, Cp, m-value)

Parameters:

cp_limits (list, optional) – List of two values, the lower and upper bounds for the Cp value. If None, bounds set automatically
dh_limits (list, optional) – List of two values, the lower and upper bounds for the dH value. If None, bounds set automatically
tm_limits (list, optional) – List of two values, the lower and upper bounds for the Tm value. If None, bounds set automatically
cp_value (float, optional) – If provided, the Cp value is fixed to this value, the bounds are ignored
user_thermodynamic_params_guess (list, optional) – List of four values, the user-defined guess for the thermodynamic parameters (Tm, dH, Cp, m-value) If None, the guess is calculated automatically

Returns:

List of four values, the current guess for the thermodynamic parameters (Tm, dH, Cp, m-value)

Return type:

list

fit_thermal_unfolding_global(fit_m_dep=False, cp_limits=None, dh_limits=None, tm_limits=None, cp_value=None, predict_baselines=True, set_init_params=True)[source]

Fit the thermal unfolding of the sample using the signal and temperature data We fit all the curves at once, with global thermodynamic parameters but local slopes and local baselines) Multiple signals can be fitted at the same time, such as 350nm and 330nm

Parameters:

fit_m_dep (bool, optional) – If True, fit the temperature dependence of the m-value
cp_limits (list, optional) – List of two values, the lower and upper bounds for the Cp value. If None, bounds set automatically
dh_limits (list, optional) – List of two values, the lower and upper bounds for the dH value. If None, bounds set automatically
tm_limits (list, optional) – List of two values, the lower and upper bounds for the Tm value. If None, bounds set automatically
cp_value (float, optional) – If provided, the Cp value is fixed to this value, the bounds are ignored
predict_baselines (bool, optional) – If True, predict the baselines after fitting and store them in the object. Default is True.
set_init_params (bool, optional) – If True, an initial guess for the thermodynamic parameters will be calculated If False, self.p0_thermodynamics should be set before calling this method.

Notes

This is a heavy routine that creates/updates many fitting-related attributes, including: - bNs_expanded, bUs_expanded, kNs_expanded, kUs_expanded, qNs_expanded, qUs_expanded - p0, low_bounds, high_bounds, global_fit_params, rel_errors - predicted_lst_multiple, params_names, params_df, dg_df - flags: global_fit_done, fit_m_dep, limited_tm, limited_dh, limited_cp, fixed_cp

fit_thermal_unfolding_global_global(predict_baselines=True)[source]

Fit the thermal unfolding of the sample using the signal and temperature data We fit all the curves at once, with global thermodynamic parameters and global slopes (but local baselines) Multiple refers to the fact that we fit many signals at the same time, such as 350nm and 330nm Must be run after fit_thermal_unfolding_global_multiple

Notes

Updates global fitting attributes and sets global_global_fit_done when complete.

fit_thermal_unfolding_global_global_global(model_scale_factor=True, predict_baselines=True)[source]

Fit the thermal unfolding of the sample using the signal and temperature data We fit all the curves at once, with global thermodynamic parameters, global slopes and global baselines Must be run after fit_thermal_unfolding_global_global

Parameters:: model_scale_factor (bool, optional) – If True, model a scale factor for each denaturant concentration

Notes

Updates many global fitting attributes and sets global_global_global_fit_done when complete. If model_scale_factor is True the method also creates scaled signal attributes: - signal_lst_multiple_scaled, predicted_lst_multiple_scaled

predict_baselines()[source]

leave_one_out_cross_validation()[source]: Perform a leave-one-out cross-validation by fitting the model multiple times, each time leaving out one of the datasets (signal-temperature pairs). If we selected two signals, such as 350nm and 330nm, we leave two datasets out at a time.

create_fit_report(neff=None)[source]: Create a fit report using the lmfit result object.

calculate_confidence_intervals(percentage=0.95)[source]

signal_to_df(signal_type='raw', scaled=False)[source]

Create a dataframe with three columns: Temperature, Signal, and Denaturant. Optimized for speed by avoiding per-curve DataFrame creation.

Parameters:

signal_type ({'raw', 'fitted', 'derivative'}, optional) – Which signal to include in the dataframe. ‘raw’ uses experimental data, ‘fitted’ uses model predictions, ‘derivative’ uses the estimated derivative signal.
scaled (bool, optional) – If True and signal_type == ‘fitted’ or ‘raw’, use the scaled versions if available.

compare_models(native_baseline_types, unfolded_baseline_types, global_model_types=['global', 'global_global', 'global_global_global'], neff=None, gamma=1, **kwargs)[source]

Compare different models with different baseline types and global/local parameters by fitting them and comparing their BIC values.

Parameters:

native_baseline_types (list of str) – List of native baseline types to compare. Each element should be one of ‘linear’, ‘quadratic’, ‘exponential’, or ‘constant’.
unfolded_baseline_types (list of str) – List of unfolded baseline types to compare. Each element should be one of ‘linear’, ‘quadratic’, ‘exponential’, or ‘constant’.
global_model_types (list of str) – List of global model types to fit. Each element should be one of ‘global’, ‘global_global’, or ‘global_global_global’.
neff (int, optional) – Effective number of data points to use for AIC, BIC, and EBIC calculation. If None, the total number of data points across all signals and temperatures will be used.
gamma (float, optional) – Tuning parameter for the Extended BIC (EBIC), typically between 0 and 1 (default: 0.5). When gamma=0, EBIC reduces to standard BIC. Higher values impose stronger penalties for model complexity.
**kwargs – Additional keyword arguments to pass to fit_thermal_unfolding_global (e.g., fit_m_dep, cp_limits, dh_limits, tm_limits, cp_value).

Returns:

A DataFrame summarizing the fitted models and their BIC and EBIC values, sorted by EBIC.

Return type:

pd.DataFrame

pychemelt.thermal_oligomer module

Main class to handle thermal denaturation data of mono- and oligomers up to tetramers The current model assumes the proteins’ unfolding is reversible

class pychemelt.thermal_oligomer.ThermalOligomer(name='Test')[source]

Bases: Sample

Class to hold the data of a DSF experiment of thermal unfolding with different concentrations of an oligomer.

set_model(model_name, intermediate_name=None)[source]

Set thermodynamic model of oligomer used for the analysis. Currently supported are 2 state models of monomeres, dimers, trimeres and tetrameres

Parameters:: model_name (str) – name of the used model. Can be: “Monomer”, “Dimer”, “Trimer”, “Tetramer”. Case insensitive
Raises:: ValueError – If the provided model name is not in the supported list.

Notes

This method creates/updates the following attributes on the instance: - self.model: oligomeric model used for analysis

set_concentrations(concentrations=None)[source]

Set the oligomeric concentrations for the sample

Parameters:: concentrations (list, optional) – List of oligomer concentrations. If None, use the sample conditions

Notes

Creates/updates attribute oligomer_concentrations_pre (numpy.ndarray)

select_conditions(boolean_lst=None)[source]

For each signal, select the conditions to be used for the analysis

Parameters:: boolean_lst (list of bool, optional) – List of booleans selecting which conditions to keep. If None, keep all.

Notes

Creates/updates several attributes used by downstream fitting: - signal_lst_multiple, temp_lst_multiple : lists of lists with selected data - oligomer_concentrations : list of selected oligomer concentrations - oligomer_concentrations_expanded : flattened numpy array matching expanded signals - boolean_lst, nr_olig : control flags/values

guess_Cp()[source]

Guess the Cp of the assembled oligomer by the number of residues.

Raises:: ValueError – If self.n_residues is not set.

Notes

The number of residues represent the total number of residues in the oligomer

This method creates/updates attributes used later in fitting: - Cp0 assigned to self.Cp0

estimate_baseline_parameters(native_baseline_type, unfolded_baseline_type, window_range_native=12, window_range_unfolded=12)[source]

Estimate the baseline parameters for multiple signals of the oligomer. The native baseline represents the curve for the assemble doligomer while the unfolded baseline represents the curve for the unfolded and disassembled oligomer.

Parameters:

native_baseline_type (str) – one of ‘constant’, ‘linear’, ‘quadratic’, ‘exponential’
unfolded_baseline_type (str) – one of ‘constant’, ‘linear’, ‘quadratic’, ‘exponential’
window_range_native (int, optional) – Range of the window (in degrees) to estimate the baselines and slopes of the native state
window_range_unfolded (int, optional) – Range of the window (in degrees) to estimate the baselines and slopes of the unfolded state

Notes

This method sets or updates these attributes: - bNs_per_signal, bUs_per_signal, kNs_per_signal, kUs_per_signal, qNs_per_signal, qUs_per_signal - poly_order_native, poly_order_unfolded

create_dg_df()[source]: Create a dataframe of the dg values versus temperature

fit_thermal_unfolding_global(cp_limits=None, dh_limits=None, tm_limits=None, cp_value=None)[source]

Fit the thermal unfolding of the sample using the signal and temperature data We fit all the curves at once, with global thermodynamic parameters but local slopes and local baselines) Multiple signals can be fitted at the same time, such as 350nm and 330nm

Parameters:

cp_limits (list, optional) – List of two values, the lower and upper bounds for the Cp value. If None, bounds set automatically
dh_limits (list, optional) – List of two values, the lower and upper bounds for the dH value. If None, bounds set automatically
tm_limits (list, optional) – List of two values, the lower and upper bounds for the Tm value. If None, bounds set automatically
cp_value (float, optional) – If provided, the Cp value is fixed to this value, the bounds are ignored

Notes

This is a heavy routine that creates/updates many fitting-related attributes, including: - bNs_expanded, bUs_expanded, kNs_expanded, kUs_expanded, qNs_expanded, qUs_expanded - p0, low_bounds, high_bounds, global_fit_params, rel_errors - predicted_lst_multiple, params_names, params_df, dg_df - flags: global_fit_done, limited_tm, limited_dh, limited_cp, fixed_cp

fit_thermal_unfolding_three_state_global(t1_init=0, t2_init=0, dh_limits=None, tm_limits=None, CpTh=None)[source]

Fit the thermal unfolding of the sample using the signal and temperature data on a three state model We fit all the curves at once, with global thermodynamic parameters but local slopes and local baselines) Multiple signals can be fitted at the same time, such as 350nm and 330nm

Parameters:

t1_init (float, optional) – initial user given values of the melting temperatures of the three states, t1_init: Native to intermediate, t2_init: intermediate to unfolded
t2_init (float, optional) – initial user given values of the melting temperatures of the three states, t1_init: Native to intermediate, t2_init: intermediate to unfolded
dh_limits (list of lists, optional) – List of two lists with two values each, the lower and upper bounds for the dH values. If None, bounds set automatically
tm_limits (list of lists, optional) – List of two lists with two values each, the lower and upper bounds for the Tm values. If None, bounds set automatically
CpTh (float, optional) – Given estimate of the Total Cp of the system. If given, the Cp value of the transition from native to intermediate will be fitted as Cp1. If not given, the system assumes a total Cp of 0

Notes

This is a heavy routine that creates/updates many fitting-related attributes, including: - bNs_expanded, bUs_expanded, kNs_expanded, kUs_expanded, qNs_expanded, qUs_expanded - p0, low_bounds, high_bounds, global_fit_params, rel_errors - predicted_lst_multiple, params_names, params_df, dg_df - flags: global_fit_done, limited_tm, limited_dh, limited_cp, fixed_cp

fit_thermal_unfolding_global_global()[source]

Fit the thermal unfolding of the sample using the signal and temperature data We fit all the curves at once, with global thermodynamic parameters and global slopes (but local baselines) Multiple refers to the fact that we fit many signals at the same time, such as 350nm and 330nm Must be run after fit_thermal_unfolding_global_multiple

Notes

Updates global fitting attributes and sets global_global_fit_done when complete.

fit_thermal_unfolding_three_state_global_global()[source]

Fit the thermal unfolding with three states of the sample using the signal and temperature data We fit all the curves at once, with global thermodynamic parameters and global slopes (but local baselines) Multiple refers to the fact that we fit many signals at the same time, such as 350nm and 330nm Must be run after fit_thermal_unfolding_global

Notes

Updates global fitting attributes and sets global_global_fit_done when complete.

fit_thermal_unfolding_global_global_global(model_scale_factor=True)[source]

Fit the thermal unfolding of the sample using the signal and temperature data We fit all the curves at once, with global thermodynamic parameters, global slopes and global baselines Must be run after fit_thermal_unfolding_global_global

Parameters:: model_scale_factor (bool, optional) – If True, model a scale factor for each oligomer concentration

Notes

Updates many global fitting attributes and sets global_global_global_fit_done when complete. If model_scale_factor is True the method also creates scaled signal attributes: - signal_lst_multiple_scaled, predicted_lst_multiple_scaled

fit_thermal_unfolding_three_state_global_global_global(model_scale_factor=True)[source]

Fit the thermal unfolding of the sample using the signal and temperature data for models assuming three states We fit all the curves at once, with global thermodynamic parameters, global slopes and global baselines Must be run after fit_thermal_unfolding_global_global

Parameters:: model_scale_factor (bool, optional) – If True, model a scale factor for each oligomer concentration

Notes

Updates many global fitting attributes and sets global_global_global_fit_done when complete. If model_scale_factor is True the method also creates scaled signal attributes: - signal_lst_multiple_scaled, predicted_lst_multiple_scaled

signal_to_df(signal_type='raw', scaled=False)[source]

Create a dataframe with three columns: Temperature, Signal, and oligomer. Optimized for speed by avoiding per-curve DataFrame creation.

Parameters:

signal_type ({'raw', 'fitted', 'derivative'}, optional) – Which signal to include in the dataframe. ‘raw’ uses experimental data, ‘fitted’ uses model predictions, ‘derivative’ uses the estimated derivative signal.
scaled (bool, optional) – If True and signal_type == ‘fitted’ or ‘raw’, use the scaled versions if available.

Returns:

A DataFrame with columns: [‘Temperature’, ‘Signal’, ‘Oligomer’, ‘ID’].

Return type:

pd.DataFrame

create_params_df_three_state()[source]: Create a dataframe of the parameters