Fit A Local Cluster Expansion¶
This page starts after you have built the
LocalClusterExpansion in
Choose And Build A Model and written fitting files with
NEBDataLoader in Local Environments And NEB Data.
Fitting does not decide the local environment. It only finds coefficients for the fixed feature order already defined by the LCE object.
Cluster, Orbit, Correlation¶
A cluster is a set of local sites: point, pair, triplet, or quadruplet. An orbit is a group of symmetry-equivalent clusters. The correlation vector evaluates the basis-decorated orbit functions for one local occupation.
The fitted scalar is
where:
sigmais the ordered local occupation vector,Phi_jis one decorated cluster-orbit feature,alpha_jis the fitted coefficient,E_0is the empty-cluster term.
For a multicomponent site with q allowed states, the Chebyshev basis uses
q - 1 non-constant site functions. Cluster features are products of these site
functions, so multicomponent sites add more decorated features.
Fit Parameters¶
After writing fitting inputs with NEBDataLoader.write_fitting_inputs(...), fit
the coefficients:
fit_files = loader.write_fitting_inputs(output_dir="fit_kra")
params, y_pred, y_true = kra_lce.fit(
**fit_files,
alpha=1e-4,
lce_params_fname="fit_kra/lce_params.json",
)
kra_lce.set_parameters(params)
kra_lce.to("kra_lce.json")
The important LocalClusterExpansion.fit(...)
arguments are:
alpha: Lasso regularization strength. Larger values usually produce fewer active coefficients.corr_fname: correlation matrix file fromNEBDataLoader.ekra_fname: target-value file. The name is historical; the values can beE_KRAor another fitted scalar as long as the model usage is consistent.weight_fname: sample weights, one per target value.lce_params_fname: output JSON for fitted coefficients and metadata.max_iter: maximum Lasso iterations.
fit(...) returns:
params: fitted LCE parameters.y_pred: model predictions for the training rows.y_true: target values loaded fromekra_fname.
Call set_parameters(params) before saving or using the LCE in kMC.
For a composite model, fit the KRA LCE and site-energy-difference model separately, then combine them:
from kmcpy.models import CompositeLCEModel
model = CompositeLCEModel(kra_model=kra_lce, site_model=site_lce)
model.to("model.json")
If you only have a KRA model, omit site_model.
Underfit And Overfit¶
NEB data is usually expensive, so the number of training structures is often small compared with the number of possible local environments. Do not chase a perfect training error without checking whether the model is physically useful.
Typical symptoms:
Underfit: the model has too few active features or too strong regularization; both training RMSE and validation error are large.
Overfit: training RMSE is very small, but leave-one-out or held-out error is large; the model is fitting noise or sparse sampling artifacts.
Some residual fitting error is normal for sparse NEB datasets. Prefer a stable model with sensible errors and few active coefficients over a model that only reproduces the training set.
Practical Fitting Checks¶
Before using an LCE in kMC:
confirm all training structures map to the expected local occupation length,
inspect the correlation matrix shape,
compare
y_trueandy_pred,inspect RMSE and LOOCV,
keep the model, fitting parameters, local site order, and training data together.
Next: Prepare Input And Run kMC.