Choose Basis Functions

Basis functions convert occupation states into numerical features for local cluster expansion.

Occupation State Indices

kMCpy stores active-site occupations as state indices. For a site with allowed species:

["Na", "X"]

state 0 means Na and state 1 means vacancy X.

For:

["Si", "P", "Al"]

state 0 means Si, state 1 means P, and state 2 means Al.

The allowed species order in site_mapping therefore matters.

Chebyshev Basis

For a site with q allowed species, kMCpy uses q - 1 non-constant Chebyshev site functions. A cluster feature is a product of the selected site functions over the sites in that cluster.

This means multicomponent sites naturally create more decorated features:

Site States

Non-Constant Site Functions

2

1

3

2

4

3

A pair of two four-state sites can therefore contribute up to 3 x 3 decorated pair features.

Cost

More species per site increases the number of decorated cluster features. The largest cost usually appears during correlation-matrix construction and fitting, not during the accepted-hop update itself.

Keep the local cutoff and cluster cutoffs physically motivated. A larger basis is useful only if the training data can constrain it.

Practical Rules

  • Use basis_type="chebyshev" for multicomponent active sites.

  • Keep the same site_mapping species order throughout fitting and kMC.

  • Refit if you change the basis, local site order, cutoff, or allowed species.

  • Check the correlation matrix shape before fitting.