API Reference#

Top-level package for SpectoPrep. SpectroPrep: A comprehensive toolkit for spectroscopic data preprocessing and modeling.

This package provides tools for preprocessing spectroscopic data, pipeline optimization, and modeling using Ridge regression.

class spectoprep.OptimizedRidgeCV(alphas=None, cv=5, scoring='neg_mean_squared_error', fit_intercept=True, normalize=False, gcv_mode=None, store_cv_values=False, groups=None)[source]#

Bases: BaseEstimator, RegressorMixin

Ridge regression with built-in cross-validation and optimization capabilities.

Parameters#

alphasarray-like, default=np.logspace(-3, 3, 10): Array of alpha values to try. A large array of values will slow down the computation.
cvint, cross-validation generator or an iterable, default=5: Determines the cross-validation splitting strategy.
scoringstr, callable, default=’neg_mean_squared_error’: A string or a scorer callable object / function with signature scorer(estimator, X, y).
fit_interceptbool, default=True: Whether to calculate the intercept for this model.
normalizebool, default=False: This parameter is ignored when fit_intercept is set to False. If True, the regressors X will be normalized before regression by subtracting the mean and dividing by the l2-norm.
gcv_mode{None, ‘auto’, ‘svd’, ‘eigen’}, default=None: Flag indicating which strategy to use when performing Generalized Cross-Validation.
store_cv_valuesbool, default=False: Flag indicating if the cross-validation values corresponding to each alpha should be stored in the cv_values_ attribute.
groupsarray-like, default=None: Group labels for the samples. Only used if cv is a group-based cross-validation splitter.

fit(X, y, sample_weight=None)[source]#

Fit Ridge regression model with cross-validation.

Parameters#

Xarray-like of shape (n_samples, n_features): Training data.
yarray-like of shape (n_samples,) or (n_samples, n_targets): Target values.
sample_weightfloat or array-like of shape (n_samples,), default=None: Individual weights for each sample.

Returns#

selfobject: Returns self.

get_cv_results()[source]#

Return cross-validation results.

Returns#

cv_resultsdict: Results from cross-validation.

predict(X)[source]#

Predict using the Ridge model.

Parameters#

Xarray-like of shape (n_samples, n_features): Samples.

Returns#

y_predarray-like of shape (n_samples,) or (n_samples, n_targets): Returns predicted values.

score(X, y, sample_weight=None)[source]#

Return the coefficient of determination R^2 of the prediction.

Parameters#

Xarray-like of shape (n_samples, n_features): Test samples.
yarray-like of shape (n_samples,) or (n_samples, n_targets): True values for X.
sample_weightarray-like of shape (n_samples,), default=None: Sample weights.

Returns#

scorefloat: R^2 of self.predict(X) wrt. y.

set_fit_request(*, sample_weight: bool | None | str = '$UNCHANGED$') → OptimizedRidgeCV#

Request metadata passed to the fit method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to fit.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters#

sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED: Metadata routing for sample_weight parameter in fit.

Returns#

selfobject: The updated object.

set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') → OptimizedRidgeCV#

Request metadata passed to the score method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to score.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters#

sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED: Metadata routing for sample_weight parameter in score.

Returns#

selfobject: The updated object.

class spectoprep.PipelineOptimizer(X_train: ndarray[tuple[int, ...], dtype[_ScalarType_co]], y_train: ndarray[tuple[int, ...], dtype[_ScalarType_co]], preprocessing_steps: List[str] | None = None, X_test: ndarray[tuple[int, ...], dtype[_ScalarType_co]] | None = None, y_test: ndarray[tuple[int, ...], dtype[_ScalarType_co]] | None = None, cv_method: str = 'group_shuffle_split', n_splits: int = 3, test_size: float = 0.3, n_groups_out: int = 2, random_state: int = 42, groups: ndarray[tuple[int, ...], dtype[_ScalarType_co]] | None = None, max_pipeline_length: int = 5, n_jobs: int = -1, allowed_preprocess_combinations: int | List[int] | Tuple[int, ...] | None = [1, 2], log_level: str = 'INFO')[source]#

Bases: object

A class for optimizing machine learning pipelines using Bayesian optimization. It precomputes possible pipeline configurations and then searches over both the pipeline configuration (encoded as an index) and the hyperparameters.

bayes_objective(**params) → float[source]#

Objective function for Bayesian optimization.

Args:: **params: Parameters to evaluate
Returns:: float: Negative RMSE or penalty score on error

bayesian_optimize(init_points: int = 10, n_iter: int = 50, acquisition_function: str = 'ei') → Tuple[Dict, Pipeline][source]#

Run Bayesian optimization to find the best pipeline configuration and hyperparameters.

Args:

init_points: Number of random initial points n_iter: Number of Bayesian optimization iterations acquisition_function: Acquisition function for Bayesian optimization

Returns:

Tuple containing:

Dict of best parameters
Fitted Pipeline with best configuration

export_best_pipeline(file_path: str) → None[source]#

Export the best pipeline configuration and hyperparameters to a file.

Args:: file_path: Path to save the export file
Raises:: AttributeError: If optimizer hasn’t been run yet

get_all_tested_pipelines() → List[Dict][source]#

Get details of all tested pipeline configurations.

Returns:: List of dictionaries with pipeline details

get_best_pipeline_predictions(best_pipeline: Pipeline) → Tuple[ndarray[tuple[int, ...], dtype[_ScalarType_co]], float, float][source]#

Get predictions using the best pipeline.

Args:

best_pipeline: Fitted pipeline object

Returns:

Tuple containing:

Predictions array
RMSE score
R² score

print_evaluated_pipelines() → None[source]#

Print details for all evaluated pipelines from the Bayesian optimizer.

This method assumes that bayesian_optimize() has been run and that self.optimizer exists.

summarize_optimization() → Dict[source]#

Generate a summary of the optimization results.

Returns:: Dictionary containing optimization summary metrics

class spectoprep.SpectroPrepPlotter[source]#

Bases: object

A class for creating high-quality plots for spectroscopy data.

This class provides various plotting functions specifically designed for spectroscopy data and pipeline optimization results.

static plot_feature_importance(wavenumbers: ndarray, coefficients: ndarray, title: str = 'Feature Importance', xlabel: str = 'Wavenumber (cm$^{-1}$)', ylabel: str = 'Coefficient Value', figsize: Tuple[int, int] = (12, 6), color: str = 'purple', highlight_threshold: float | None = None, highlight_color: str = 'red', save_path: str | None = None)[source]#

Plot feature importance from model coefficients.

Parameters#

wavenumbersarray-like: The x-axis values (wavenumbers).
coefficientsarray-like: Model coefficients corresponding to each wavenumber.
titlestr, default=’Feature Importance’: Plot title.
xlabelstr, default=’Wavenumber (cm$^{-1}$)’: X-axis label.
ylabelstr, default=’Coefficient Value’: Y-axis label.
figsizetuple, default=(12, 6): Figure size.
colorstr, default=’purple’: Color of the line.
highlight_thresholdfloat, optional: If provided, highlights coefficients with absolute values above this threshold.
highlight_colorstr, default=’red’: Color for highlighted coefficients.
save_pathstr, optional: If provided, save the figure to this path.

Returns#

figmatplotlib.figure.Figure: The figure object.
axmatplotlib.axes.Axes: The axes object.

static plot_optimization_progress(optimizer: PipelineOptimizer, figsize: Tuple[int, int] = (12, 6), title: str = 'Optimization Progress', save_path: str | None = None)[source]#

Plot optimization progress over iterations.

Parameters#

optimizerPipelineOptimizer: The fitted pipeline optimizer.
figsizetuple, default=(12, 6): Figure size.
titlestr, default=’Optimization Progress’: Plot title.
save_pathstr, optional: If provided, save the figure to this path.

Returns#

figmatplotlib.figure.Figure: The figure object.
axmatplotlib.axes.Axes: The axes object.

static plot_optimization_results(optimizer: PipelineOptimizer, top_n: int = 5, figsize: Tuple[int, int] = (12, 8), title: str = 'Pipeline Optimization Results', save_path: str | None = None)[source]#

Plot optimization results from PipelineOptimizer.

Parameters#

optimizerPipelineOptimizer: The fitted pipeline optimizer.
top_nint, default=5: Number of top pipelines to display.
figsizetuple, default=(12, 8): Figure size.
titlestr, default=’Pipeline Optimization Results’: Plot title.
save_pathstr, optional: If provided, save the figure to this path.

Returns#

figmatplotlib.figure.Figure: The figure object.

static plot_prediction_scatter(y_true: ndarray, y_pred: ndarray, title: str = 'Prediction Performance', xlabel: str = 'Measured', ylabel: str = 'Predicted', figsize: Tuple[int, int] = (10, 8), alpha: float = 0.7, color: str = 'blue', add_metrics: bool = True, save_path: str | None = None)[source]#

Create a scatter plot of predicted vs true values.

Parameters#

y_truearray-like: True target values.
y_predarray-like: Predicted target values.
titlestr, default=’Prediction Performance’: Plot title.
xlabelstr, default=’Measured’: X-axis label.
ylabelstr, default=’Predicted’: Y-axis label.
figsizetuple, default=(10, 8): Figure size.
alphafloat, default=0.7: Transparency of the points.
colorstr, default=’blue’: Color of the scatter points.
add_metricsbool, default=True: Whether to add RMSE and R² metrics to the plot.
save_pathstr, optional: If provided, save the figure to this path.

Returns#

figmatplotlib.figure.Figure: The figure object.
axmatplotlib.axes.Axes: The axes object.

static plot_preprocessing_comparison(wavenumbers: ndarray, original_spectra: ndarray, processed_spectra: Dict[str, ndarray], sample_indices: List[int] | None = None, figsize: Tuple[int, int] = (15, 10), title: str = 'Preprocessing Comparison', color_map: str = 'tab10', save_path: str | None = None)[source]#

Plot comparison of original and processed spectra.

Parameters#

wavenumbersarray-like: The x-axis values (wavenumbers).
original_spectraarray-like: The original spectra data of shape (n_samples, n_features).
processed_spectradict: Dictionary mapping preprocessing method names to processed spectra.
sample_indiceslist of int, optional: Indices of samples to plot. If None, all samples are plotted.
figsizetuple, default=(15, 10): Figure size.
titlestr, default=’Preprocessing Comparison’: Main title for the figure.
color_mapstr, default=’tab10’: Colormap for differentiating samples.
save_pathstr, optional: If provided, save the figure to this path.

Returns#

figmatplotlib.figure.Figure: The figure object.

static plot_spectra(wavenumbers: ndarray, spectra: ndarray, labels: List[str] | None = None, title: str = 'Spectral Data', xlabel: str = 'Wavenumber (cm$^{-1}$)', ylabel: str = 'Absorbance', alpha: float = 0.7, figsize: Tuple[int, int] = (12, 6), color_map: str = 'viridis', legend_loc: str = 'best', grid: bool = True, save_path: str | None = None)[source]#

Plot spectral data.

Parameters#

wavenumbersarray-like: The x-axis values (wavenumbers).
spectraarray-like: The spectra data of shape (n_samples, n_features).
labelslist of str, optional: Labels for each spectrum. If None, spectra are numbered.
titlestr, default=’Spectral Data’: Plot title.
xlabelstr, default=’Wavenumber (cm$^{-1}$)’: X-axis label.
ylabelstr, default=’Absorbance’: Y-axis label.
alphafloat, default=0.7: Transparency of the lines.
figsizetuple, default=(12, 6): Figure size.
color_mapstr, default=’viridis’: Colormap for the spectra.
legend_locstr, default=’best’: Location of the legend.
gridbool, default=True: Whether to show grid.
save_pathstr, optional: If provided, save the figure to this path.

Returns#

figmatplotlib.figure.Figure: The figure object.
axmatplotlib.axes.Axes: The axes object.

static set_style(style='whitegrid', context='paper', font_scale=1.2)[source]#

Set the visual style for the plots.

Parameters#

stylestr, default=’whitegrid’: The seaborn style.
contextstr, default=’paper’: The seaborn context.
font_scalefloat, default=1.2: The font scale.

API Reference#

Parameters#

Parameters#

Returns#

Returns#

Parameters#

Returns#

Parameters#

Returns#

Parameters#

Returns#

Parameters#

Returns#

Parameters#

Returns#

Parameters#

Returns#

Parameters#

Returns#

Parameters#

Returns#

Parameters#

Returns#

Parameters#

Returns#

Parameters#

Modules#

This Page