Tutorial¶
This notebook is a guided tutorial for running and understanding a full kMCpy workflow.
What you will learn:
Which input artifacts are required and why.
How to run a reproducible NASICON simulation through the Python API.
How to inspect and interpret the resulting transport metrics.
How to build an LCE model and prepare fitting matrices for your own dataset.
How to fit LCE parameters.
0) Prerequisites¶
From the repository root:
uv sync --extra doc
uv sync --extra dev
The notebook is stored in docs and rendered with nbsphinx. By default docs rendering does not execute cells (nbsphinx.execute = never).
1) Understand the required inputs¶
For a production kMC run, kMCpy needs these categories of files:
Structure: a CIF file defining the host lattice.
Event library:
events.jsondescribing allowed hops and their dependency graph.Model file:
model.jsoncontaining the local cluster expansion structure and fitted coefficients.Initial state:
initial_state.jsonoccupation vector at simulation start.
In this tutorial we use the repository’s validated NASICON regression dataset under tests/files.
Input snapshot (used in this tutorial)¶
Below is the modern Configuration payload used for the bundled NASICON regression run:
{
"structure_file": "tests/files/EntryWithCollCode15546_Na4Zr2Si3O12_573K.cif",
"model_file": "tests/files/input/model.json",
"event_file": "tests/files/input/events.json",
"initial_state_file": "tests/files/input/initial_state.json",
"mobile_ion_specie": "Na",
"temperature": 298,
"attempt_frequency": 5000000000000.0,
"equilibration_passes": 1,
"kmc_passes": 100,
"supercell_shape": [2, 1, 1],
"site_mapping": {"Na": ["Na", "X"], "Zr": "Zr", "Si": ["Si", "P"], "O": "O"},
"convert_to_primitive_cell": true,
"elementary_hop_distance": 3.47782,
"random_seed": 12345,
"name": "NASICON_Regression"
}
If you run the model-building cells first, the simulation cell uses the generated example/output/tutorial/model/model.json; otherwise it falls back to the validated bundled model file above.
Raw input data excerpts¶
These are actual excerpts from the input dataset used in this tutorial.
A) ``tests/files/input/initial_state.json`` (occupation preview)
{
"occupation_preview_first_24": [
0,
0,
0,
0,
0,
0,
0,
0,
1,
0,
0,
0,
0,
0,
0,
1,
0,
0,
0,
0,
0,
0,
0,
0
],
"occupation_length": 84
}
B) ``tests/files/input/events.json`` (first event)
{
"mobile_ion_indices": [
0,
8
],
"local_env_indices": [
8,
10,
14,
4,
6,
12,
22,
16,
20,
24,
26,
18
],
"local_env_indices_site": [
8,
10,
14,
4,
6,
12,
22,
16,
20,
24,
26,
18
]
}
C) ``tests/files/input/fitting_results.json`` (latest fitted record preview)
{
"alpha": 1.5,
"empty_cluster": 339.1083600548,
"rmse": 29.1495050791,
"loocv": 149.8671381416,
"keci_preview": [
10.5029085379,
4.0803809727,
0.0,
0.0,
-11.2639080302,
0.0,
-4.0578126572,
0.0
]
}
D) ``NASICON cif file (ICSD-15546)`` (header excerpt)
#(C) 2021 by FIZ Karlsruhe - Leibniz Institute for Information Infrastructure. All rights reserved.
data_15546-ICSD
_database_code_ICSD 15546
_chemical_formula_structural 'Na4 Zr2 Si3 O12'
_chemical_formula_sum 'Na4 O12 Si3 Zr2'
_diffrn_ambient_temperature 573.
2) Build a local cluster expansion (LCE) model¶
Use this when starting from a new crystal system and you need model topology (lce.json) before fitting.
basis_type='chebyshev' is selected once for the lattice. Active-site occupations are stored as species-state indices (0..q-1), and sites with q allowed species are expanded into q - 1 decorated Chebyshev site functions.
[ ]:
from pathlib import Path
from kmcpy.io.cif import load_labeled_structure_from_cif
from kmcpy.models.local_cluster_expansion import LocalClusterExpansion
from kmcpy.structure.local_lattice_structure import LocalLatticeStructure
repo = Path('.').resolve()
out_dir = repo / 'example/output/tutorial/model'
out_dir.mkdir(parents=True, exist_ok=True)
structure = load_labeled_structure_from_cif(
str(repo / 'tests' / 'files' / 'EntryWithCollCode15546_Na4Zr2Si3O12_573K.cif'),
primitive=True,
)
site_mapping = {
'Na': ['Na', 'X'],
'Zr': 'Zr',
'Si': ['Si', 'P'],
'O': 'O',
}
local_lattice = LocalLatticeStructure(
template_structure=structure,
center=0,
cutoff=4.0,
site_mapping=site_mapping,
basis_type='chebyshev',
)
lce = LocalClusterExpansion()
lce.build(local_lattice_structure=local_lattice, cutoff_cluster=[6, 6, 0])
lce.to(str(out_dir / 'lce.json'))
3) Generate correlation matrix¶
For your own training dataset (for example NEB barriers), build a list of structures + target values and convert them into fitting matrices. If you leave the training lists empty, this tutorial uses bundled fitting matrices so the workflow still runs end-to-end.
Fitting data format (what each file contains)¶
When preparing fitting inputs, LocalClusterExpansion.fit(...) delegates to the registered LCE fitter and expects:
correlation_matrix.txt: shape(n_samples, n_features)where multicomponent sites may create multiple decorated features per orbite_kra.txt: shape(n_samples,)target barrier valuesweight.txt: shape(n_samples,)sample weights
Minimal conceptual example:
correlation_matrix.txt
0.12 0.00 -0.34 ...
0.10 0.05 -0.29 ...
...
e_kra.txt
120.5
140.1
...
weight.txt
1.0
1.0
...
[ ]:
from pathlib import Path
import shutil
import numpy as np
from pymatgen.core import Structure
from kmcpy.io.neb import NEBDataLoader
from kmcpy.models.local_cluster_expansion import LocalClusterExpansion
repo = Path('.').resolve()
model = LocalClusterExpansion.from_file(str(repo / 'example/output/tutorial/model/lce.json'))
fit_dir = repo / 'example/output/tutorial/fit_data'
fit_dir.mkdir(parents=True, exist_ok=True)
training_structures = [
# Replace with your local-environment structure paths
# 'path/to/neb_local_env_0001.cif',
# 'path/to/neb_local_env_0002.cif',
]
target_ekra = [
# Replace with matching barrier values in meV
# 120.5,
# 140.1,
]
if not training_structures and not target_ekra:
reference_dir = repo / 'tests' / 'files' / 'fitting' / 'local_cluster_expansion'
for name in ('correlation_matrix.txt', 'e_kra.txt', 'weight.txt'):
shutil.copy(reference_dir / name, fit_dir / name)
print('No custom NEB data supplied; copied bundled tutorial fitting data from', reference_dir)
else:
if len(training_structures) != len(target_ekra):
raise ValueError('training_structures and target_ekra must have the same length')
loader = NEBDataLoader(model=model)
for cif_file, ekra in zip(training_structures, target_ekra):
structure = Structure.from_file(cif_file)
loader.add_structure(structure, float(ekra))
if len(loader) == 0:
raise ValueError('No samples were added. Provide at least one (structure, e_kra) pair.')
np.savetxt(fit_dir / 'correlation_matrix.txt', loader.get_correlation_matrix())
np.savetxt(fit_dir / 'e_kra.txt', loader.get_properties())
np.savetxt(fit_dir / 'weight.txt', np.ones(len(loader)))
print('Generated fitting data from custom NEB structures in', fit_dir)
corr = np.atleast_2d(np.loadtxt(fit_dir / 'correlation_matrix.txt'))
e_kra = np.atleast_1d(np.loadtxt(fit_dir / 'e_kra.txt'))
weight = np.atleast_1d(np.loadtxt(fit_dir / 'weight.txt'))
print('correlation_matrix shape:', corr.shape)
print('e_kra shape:', e_kra.shape)
print('weight shape:', weight.shape)
4) Fit LCE parameters¶
Fit using generated training files. This writes keci.txt and fitting_results.json for later model loading:
[ ]:
from pathlib import Path
from kmcpy.models.local_cluster_expansion import LocalClusterExpansion
repo = Path('.').resolve()
fit_dir = repo / 'example/output/tutorial/fit_data'
lce = LocalClusterExpansion()
model_params, y_pred, y_true = lce.fit(
alpha=1.5,
max_iter=1_000_000,
ekra_fname=str(fit_dir / 'e_kra.txt'),
keci_fname=str(fit_dir / 'keci.txt'),
weight_fname=str(fit_dir / 'weight.txt'),
corr_fname=str(fit_dir / 'correlation_matrix.txt'),
fit_results_fname=str(fit_dir / 'fitting_results.json'),
lce_params_fname=None,
lce_params_history_fname=None,
)
print('Non-zero KECI terms:', sum(abs(x) > 1e-8 for x in model_params.keci))
print('RMSE (meV):', round(float(model_params.rmse), 4))
print('Saved fitting outputs in', fit_dir)
5) Construct a CompositeLCEModel¶
The simulator consumes a composite model with both barrier (kra_model) and site-energy-difference (site_model) contributions. Load each LocalClusterExpansion, apply its fitted parameters, then merge the prepared models into a CompositeLCEModel.
[ ]:
from pathlib import Path
from kmcpy.models.composite_lce_model import CompositeLCEModel
from kmcpy.models.fitting.fitter import LCEFitter
from kmcpy.models.local_cluster_expansion import LocalClusterExpansion
root = Path('.').resolve()
input_dir = root / 'tests' / 'files' / 'input'
def load_fitted_lce(lce_file, fit_file):
model = LocalClusterExpansion.from_file(str(lce_file))
fitter = LCEFitter.from_file(str(fit_file))
model.set_parameters(fitter.model_parameters)
metadata = {
'time_stamp': fitter.model_parameters.time_stamp,
'time': fitter.model_parameters.time,
}
return model, metadata
kra_model, kra_metadata = load_fitted_lce(
input_dir / 'lce.json',
input_dir / 'fitting_results.json',
)
site_model, site_metadata = load_fitted_lce(
input_dir / 'lce_site.json',
input_dir / 'fitting_results_site.json',
)
compound_lce_model = CompositeLCEModel(
site_model=site_model,
kra_model=kra_model,
kra_fit_metadata=kra_metadata,
site_fit_metadata=site_metadata,
)
model_file = root / 'example' / 'output' / 'tutorial' / 'model' / 'model.json'
model_file.parent.mkdir(parents=True, exist_ok=True)
compound_lce_model.to(str(model_file))
print(type(compound_lce_model).__name__)
print('Has site model:', compound_lce_model.site_model is not None)
print('Has KRA model:', compound_lce_model.kra_model is not None)
print('Wrote model file:', model_file)
Optional: build a simple LocalBarrierModel instead¶
If you do not need an LCE, use LocalBarrierModel for direct barrier logic: constant barriers, local occupation counts, species-count rules, wildcard patterns, or exact catalog-style matches.
[ ]:
from pathlib import Path
from kmcpy.models import LocalBarrierModel
root = Path('.').resolve()
local_barrier_model = LocalBarrierModel(
default_barrier=300.0,
site_species={
1: {0: 'P', 1: 'Si'},
2: {0: 'Si', 1: 'P'},
3: {0: 'Si', 1: 'P'},
4: {0: 'Al', 1: 'Si'},
},
)
local_barrier_model.add_species_count_rule(
name='si_rich',
sites='local_env',
species='Si',
min_count=4,
barrier=420.0,
)
local_barrier_file = root / 'example' / 'output' / 'tutorial' / 'model' / 'local_barrier_model.json'
local_barrier_file.parent.mkdir(parents=True, exist_ok=True)
local_barrier_model.to(str(local_barrier_file))
print('Wrote local barrier model:', local_barrier_file)
6) Run a reproducible NASICON simulation (API path)¶
This section runs the same setup used by the regression test suite. It is a good sanity check that your environment and dependencies are working.
If the previous section wrote example/output/tutorial/model/model.json, that generated model file is used. If not, the code falls back to the bundled validated tests/files/input/model.json.
[ ]:
from pathlib import Path
import os
from kmcpy.simulator.config import Configuration
from kmcpy.simulator.kmc import KMC
root = Path('.').resolve()
file_path = root / 'tests' / 'files'
model_file = root / 'example' / 'output' / 'tutorial' / 'model' / 'model.json'
if not model_file.exists():
model_file = file_path / 'input' / 'model.json'
run_dir = root / 'example' / 'output' / 'tutorial' / 'run'
run_dir.mkdir(parents=True, exist_ok=True)
config = Configuration(
structure_file=str(file_path / 'EntryWithCollCode15546_Na4Zr2Si3O12_573K.cif'),
model_file=str(model_file),
event_file=str(file_path / 'input' / 'events.json'),
initial_state_file=str(file_path / 'input' / 'initial_state.json'),
mobile_ion_specie='Na',
temperature=298,
attempt_frequency=5e12,
equilibration_passes=1,
kmc_passes=100,
supercell_shape=(2, 1, 1),
site_mapping={'Na': ['Na', 'X'], 'Zr': 'Zr', 'Si': ['Si', 'P'], 'O': 'O'},
convert_to_primitive_cell=True,
elementary_hop_distance=3.47782,
random_seed=12345,
name='NASICON_Regression',
)
kmc = KMC.from_config(config)
original_cwd = Path.cwd()
try:
os.chdir(run_dir)
tracker = kmc.run()
finally:
os.chdir(original_cwd)
tracker.return_current_info()
(1.1193006038758543e-06, np.float64(307.3744449426362), np.float64(1.4630573145769372e-08), np.float64(4.576882562174338e-09), np.float64(1.1823906621661553), np.float64(0.31283002494661705), np.float64(0.21998150220477225))
7) Results (reference + run output)¶
For the input snapshot above, the expected final metrics are:
Metric |
Value |
Unit |
|---|---|---|
time |
1.1193006038758543e-06 |
s |
msd |
307.37444494263616 |
Angstrom^2 |
jump_diffusivity |
1.4630573145769372e-08 |
cm^2/s |
tracer_diffusivity |
4.5768825621743376e-09 |
cm^2/s |
conductivity |
1.1823906621661553 |
mS/cm |
havens_ratio |
0.312830024946617 |
dimensionless |
correlation_factor |
0.21998150220477225 |
dimensionless |
These values are intentionally embedded here so the tutorial remains informative even when cells are not executed during docs build.
Metric definitions¶
tracker.return_current_info() returns:
(time, msd, jump_diffusivity, tracer_diffusivity, conductivity, havens_ratio, correlation_factor)
time: simulated physical time (s)msd: mean squared displacement (Angstrom^2)jump_diffusivity: jump diffusivity (cm^2/s)tracer_diffusivity: tracer diffusivity (cm^2/s)conductivity: ionic conductivity (mS/cm)havens_ratio: Haven ratio (dimensionless)f: correlation factor (dimensionless)
tracker.result_units returns the same unit metadata, and Tracker.write_results(...) writes it to a results_units*.json.gz sidecar.
Output files¶
After the run, kMCpy writes compressed CSV artifacts in example/output/tutorial/run:
results_NASICON_Regression.csv.gzresults_units_NASICON_Regression.json.gzdisplacement_NASICON_Regression_<pass>.csv.gzhop_counter_NASICON_Regression_<pass>.csv.gzcurrent_occ_NASICON_Regression_<pass>.csv.gz
[2]:
from pathlib import Path
import pandas as pd
run_dir = Path('example/output/tutorial/run')
results_file = run_dir / 'results_NASICON_Regression.csv.gz'
df = pd.read_csv(results_file)
print(df.tail(1).to_string(index=False))
time jump_diffusivity tracer_diffusivity conductivity f havens_ratio msd
0.000001 1.463057e-08 4.576883e-09 1.182391 0.219982 0.31283 307.374445
8) Next steps¶
Swap NASICON regression inputs with your own CIF/event/model files.
Run temperature sweeps by varying
temperatureinConfiguration.Track convergence by increasing
kmc_passesand comparingjump_diffusivity,tracer_diffusivity, and conductivity.