{ "cells": [ { "cell_type": "markdown", "id": "cb44274d", "metadata": {}, "source": [ "# Tutorial\n", "\n", "This notebook is a guided tutorial for running and understanding a full kMCpy workflow.\n", "\n", "What you will learn:\n", "\n", "1. Which input artifacts are required and why.\n", "2. How to run a reproducible NASICON simulation through the Python API.\n", "3. How to inspect and interpret the resulting transport metrics.\n", "4. How to build an LCE model and prepare fitting matrices for your own dataset.\n", "5. How to fit LCE parameters.\n" ] }, { "cell_type": "markdown", "id": "e764e59e", "metadata": {}, "source": [ "## 0) Prerequisites\n", "\n", "From the repository root:\n", "\n", "```bash\n", "uv sync --extra doc\n", "uv sync --extra dev\n", "```\n", "\n", "The notebook is stored in docs and rendered with `nbsphinx`.\n", "By default docs rendering does not execute cells (`nbsphinx.execute = never`).\n" ] }, { "cell_type": "markdown", "id": "24c27a26", "metadata": {}, "source": [ "## 1) Understand the required inputs\n", "\n", "For a production kMC run, kMCpy needs these categories of files:\n", "\n", "- **Structure**: a CIF file defining the host lattice.\n", "- **Event library**: `events.json` describing allowed hops and their dependency graph.\n", "- **Model file**: `model.json` containing the local cluster expansion structure and fitted coefficients.\n", "- **Initial state**: `initial_state.json` occupation vector at simulation start.\n", "\n", "In this tutorial we use the repository's validated NASICON regression dataset under `tests/files`.\n" ] }, { "cell_type": "markdown", "id": "632f3c9a", "metadata": {}, "source": [ "### Input snapshot (used in this tutorial)\n", "\n", "Below is the modern `Configuration` payload used for the bundled NASICON regression run:\n", "\n", "```json\n", "{\n", " \"structure_file\": \"tests/files/EntryWithCollCode15546_Na4Zr2Si3O12_573K.cif\",\n", " \"model_file\": \"tests/files/input/model.json\",\n", " \"event_file\": \"tests/files/input/events.json\",\n", " \"initial_state_file\": \"tests/files/input/initial_state.json\",\n", " \"mobile_ion_specie\": \"Na\",\n", " \"temperature\": 298,\n", " \"attempt_frequency\": 5000000000000.0,\n", " \"equilibration_passes\": 1,\n", " \"kmc_passes\": 100,\n", " \"supercell_shape\": [2, 1, 1],\n", " \"site_mapping\": {\"Na\": [\"Na\", \"X\"], \"Zr\": \"Zr\", \"Si\": [\"Si\", \"P\"], \"O\": \"O\"},\n", " \"convert_to_primitive_cell\": true,\n", " \"elementary_hop_distance\": 3.47782,\n", " \"random_seed\": 12345,\n", " \"name\": \"NASICON_Regression\"\n", "}\n", "```\n", "\n", "If you run the model-building cells first, the simulation cell uses the generated `example/output/tutorial/model/model.json`; otherwise it falls back to the validated bundled model file above." ] }, { "cell_type": "markdown", "id": "b2bf09a7", "metadata": {}, "source": [ "### Raw input data excerpts\n", "\n", "These are **actual excerpts** from the input dataset used in this tutorial.\n", "\n", "**A) `tests/files/input/initial_state.json` (occupation preview)**\n", "\n", "```json\n", "{\n", " \"occupation_preview_first_24\": [\n", " 0,\n", " 0,\n", " 0,\n", " 0,\n", " 0,\n", " 0,\n", " 0,\n", " 0,\n", " 1,\n", " 0,\n", " 0,\n", " 0,\n", " 0,\n", " 0,\n", " 0,\n", " 1,\n", " 0,\n", " 0,\n", " 0,\n", " 0,\n", " 0,\n", " 0,\n", " 0,\n", " 0\n", " ],\n", " \"occupation_length\": 84\n", "}\n", "```\n", "\n", "**B) `tests/files/input/events.json` (first event)**\n", "\n", "```json\n", "{\n", " \"mobile_ion_indices\": [\n", " 0,\n", " 8\n", " ],\n", " \"local_env_indices\": [\n", " 8,\n", " 10,\n", " 14,\n", " 4,\n", " 6,\n", " 12,\n", " 22,\n", " 16,\n", " 20,\n", " 24,\n", " 26,\n", " 18\n", " ],\n", " \"local_env_indices_site\": [\n", " 8,\n", " 10,\n", " 14,\n", " 4,\n", " 6,\n", " 12,\n", " 22,\n", " 16,\n", " 20,\n", " 24,\n", " 26,\n", " 18\n", " ]\n", "}\n", "```\n", "\n", "**C) `tests/files/input/fitting_results.json` (latest fitted record preview)**\n", "\n", "```json\n", "{\n", " \"alpha\": 1.5,\n", " \"empty_cluster\": 339.1083600548,\n", " \"rmse\": 29.1495050791,\n", " \"loocv\": 149.8671381416,\n", " \"keci_preview\": [\n", " 10.5029085379,\n", " 4.0803809727,\n", " 0.0,\n", " 0.0,\n", " -11.2639080302,\n", " 0.0,\n", " -4.0578126572,\n", " 0.0\n", " ]\n", "}\n", "```\n", "\n", "**D) `NASICON cif file (ICSD-15546)` (header excerpt)**\n", "\n", "```text\n", "#(C) 2021 by FIZ Karlsruhe - Leibniz Institute for Information Infrastructure. All rights reserved.\n", "data_15546-ICSD\n", "_database_code_ICSD 15546\n", "_chemical_formula_structural 'Na4 Zr2 Si3 O12'\n", "_chemical_formula_sum 'Na4 O12 Si3 Zr2'\n", "_diffrn_ambient_temperature 573.\n", "```\n", "" ] }, { "cell_type": "markdown", "id": "c2c55370", "metadata": {}, "source": [ "## 2) Build a local cluster expansion (LCE) model\n", "\n", "Use this when starting from a new crystal system and you need model topology (`lce.json`) before fitting.\n", "\n", "`basis_type='chebyshev'` is selected once for the lattice. Active-site occupations are stored as species-state indices (`0..q-1`), and sites with `q` allowed species are expanded into `q - 1` decorated Chebyshev site functions." ] }, { "cell_type": "code", "execution_count": null, "id": "24b6d941", "metadata": {}, "outputs": [], "source": [ "from pathlib import Path\n", "from kmcpy.io.cif import load_labeled_structure_from_cif\n", "from kmcpy.models.local_cluster_expansion import LocalClusterExpansion\n", "from kmcpy.structure.local_lattice_structure import LocalLatticeStructure\n", "\n", "repo = Path('.').resolve()\n", "out_dir = repo / 'example/output/tutorial/model'\n", "out_dir.mkdir(parents=True, exist_ok=True)\n", "\n", "structure = load_labeled_structure_from_cif(\n", " str(repo / 'tests' / 'files' / 'EntryWithCollCode15546_Na4Zr2Si3O12_573K.cif'),\n", " primitive=True,\n", ")\n", "\n", "site_mapping = {\n", " 'Na': ['Na', 'X'],\n", " 'Zr': 'Zr',\n", " 'Si': ['Si', 'P'],\n", " 'O': 'O',\n", "}\n", "\n", "local_lattice = LocalLatticeStructure(\n", " template_structure=structure,\n", " center=0,\n", " cutoff=4.0,\n", " site_mapping=site_mapping,\n", " basis_type='chebyshev',\n", ")\n", "\n", "lce = LocalClusterExpansion()\n", "lce.build(local_lattice_structure=local_lattice, cutoff_cluster=[6, 6, 0])\n", "lce.to(str(out_dir / 'lce.json'))" ] }, { "cell_type": "markdown", "id": "89889ab3", "metadata": {}, "source": [ "## 3) Generate correlation matrix\n", "\n", "For your own training dataset (for example NEB barriers), build a list of structures + target values and convert them into fitting matrices.\n", "If you leave the training lists empty, this tutorial uses bundled fitting matrices so the workflow still runs end-to-end.\n" ] }, { "cell_type": "markdown", "id": "0b96ed84", "metadata": {}, "source": [ "### Fitting data format (what each file contains)\n", "\n", "When preparing fitting inputs, `LocalClusterExpansion.fit(...)` delegates to the registered LCE fitter and expects:\n", "\n", "- `correlation_matrix.txt`: shape `(n_samples, n_features)` where multicomponent sites may create multiple decorated features per orbit\n", "- `e_kra.txt`: shape `(n_samples,)` target barrier values\n", "- `weight.txt`: shape `(n_samples,)` sample weights\n", "\n", "Minimal conceptual example:\n", "\n", "```text\n", "correlation_matrix.txt\n", "0.12 0.00 -0.34 ...\n", "0.10 0.05 -0.29 ...\n", "...\n", "\n", "e_kra.txt\n", "120.5\n", "140.1\n", "...\n", "\n", "weight.txt\n", "1.0\n", "1.0\n", "...\n", "```" ] }, { "cell_type": "code", "execution_count": null, "id": "e7ebfe7c", "metadata": {}, "outputs": [], "source": [ "from pathlib import Path\n", "import shutil\n", "import numpy as np\n", "\n", "from pymatgen.core import Structure\n", "from kmcpy.io.neb import NEBDataLoader\n", "from kmcpy.models.local_cluster_expansion import LocalClusterExpansion\n", "\n", "repo = Path('.').resolve()\n", "model = LocalClusterExpansion.from_file(str(repo / 'example/output/tutorial/model/lce.json'))\n", "fit_dir = repo / 'example/output/tutorial/fit_data'\n", "fit_dir.mkdir(parents=True, exist_ok=True)\n", "\n", "training_structures = [\n", " # Replace with your local-environment structure paths\n", " # 'path/to/neb_local_env_0001.cif',\n", " # 'path/to/neb_local_env_0002.cif',\n", "]\n", "target_ekra = [\n", " # Replace with matching barrier values in meV\n", " # 120.5,\n", " # 140.1,\n", "]\n", "\n", "if not training_structures and not target_ekra:\n", " reference_dir = repo / 'tests' / 'files' / 'fitting' / 'local_cluster_expansion'\n", " for name in ('correlation_matrix.txt', 'e_kra.txt', 'weight.txt'):\n", " shutil.copy(reference_dir / name, fit_dir / name)\n", " print('No custom NEB data supplied; copied bundled tutorial fitting data from', reference_dir)\n", "else:\n", " if len(training_structures) != len(target_ekra):\n", " raise ValueError('training_structures and target_ekra must have the same length')\n", "\n", " loader = NEBDataLoader(model=model)\n", " for cif_file, ekra in zip(training_structures, target_ekra):\n", " structure = Structure.from_file(cif_file)\n", " loader.add_structure(structure, float(ekra))\n", "\n", " if len(loader) == 0:\n", " raise ValueError('No samples were added. Provide at least one (structure, e_kra) pair.')\n", "\n", " np.savetxt(fit_dir / 'correlation_matrix.txt', loader.get_correlation_matrix())\n", " np.savetxt(fit_dir / 'e_kra.txt', loader.get_properties())\n", " np.savetxt(fit_dir / 'weight.txt', np.ones(len(loader)))\n", " print('Generated fitting data from custom NEB structures in', fit_dir)\n", "\n", "corr = np.atleast_2d(np.loadtxt(fit_dir / 'correlation_matrix.txt'))\n", "e_kra = np.atleast_1d(np.loadtxt(fit_dir / 'e_kra.txt'))\n", "weight = np.atleast_1d(np.loadtxt(fit_dir / 'weight.txt'))\n", "print('correlation_matrix shape:', corr.shape)\n", "print('e_kra shape:', e_kra.shape)\n", "print('weight shape:', weight.shape)" ] }, { "cell_type": "markdown", "id": "621e18cf", "metadata": {}, "source": [ "## 4) Fit LCE parameters\n", "\n", "Fit using generated training files. This writes `keci.txt` and `fitting_results.json` for later model loading:\n" ] }, { "cell_type": "code", "execution_count": null, "id": "a2731e21", "metadata": {}, "outputs": [], "source": [ "from pathlib import Path\n", "from kmcpy.models.local_cluster_expansion import LocalClusterExpansion\n", "\n", "repo = Path('.').resolve()\n", "fit_dir = repo / 'example/output/tutorial/fit_data'\n", "\n", "lce = LocalClusterExpansion()\n", "model_params, y_pred, y_true = lce.fit(\n", " alpha=1.5,\n", " max_iter=1_000_000,\n", " ekra_fname=str(fit_dir / 'e_kra.txt'),\n", " keci_fname=str(fit_dir / 'keci.txt'),\n", " weight_fname=str(fit_dir / 'weight.txt'),\n", " corr_fname=str(fit_dir / 'correlation_matrix.txt'),\n", " fit_results_fname=str(fit_dir / 'fitting_results.json'),\n", " lce_params_fname=None,\n", " lce_params_history_fname=None,\n", ")\n", "\n", "print('Non-zero KECI terms:', sum(abs(x) > 1e-8 for x in model_params.keci))\n", "print('RMSE (meV):', round(float(model_params.rmse), 4))\n", "print('Saved fitting outputs in', fit_dir)" ] }, { "cell_type": "markdown", "id": "e36eb661", "metadata": {}, "source": [ "## 5) Construct a `CompositeLCEModel`\n", "\n", "The simulator consumes a composite model with both barrier (`kra_model`) and site-energy-difference (`site_model`) contributions.\n", "Load each `LocalClusterExpansion`, apply its fitted parameters, then merge the prepared models into a `CompositeLCEModel`.\n" ] }, { "cell_type": "code", "execution_count": null, "id": "b0dddb62", "metadata": {}, "outputs": [], "source": [ "from pathlib import Path\n", "from kmcpy.models.composite_lce_model import CompositeLCEModel\n", "from kmcpy.models.fitting.fitter import LCEFitter\n", "from kmcpy.models.local_cluster_expansion import LocalClusterExpansion\n", "\n", "root = Path('.').resolve()\n", "input_dir = root / 'tests' / 'files' / 'input'\n", "\n", "def load_fitted_lce(lce_file, fit_file):\n", " model = LocalClusterExpansion.from_file(str(lce_file))\n", " fitter = LCEFitter.from_file(str(fit_file))\n", " model.set_parameters(fitter.model_parameters)\n", " metadata = {\n", " 'time_stamp': fitter.model_parameters.time_stamp,\n", " 'time': fitter.model_parameters.time,\n", " }\n", " return model, metadata\n", "\n", "kra_model, kra_metadata = load_fitted_lce(\n", " input_dir / 'lce.json',\n", " input_dir / 'fitting_results.json',\n", ")\n", "site_model, site_metadata = load_fitted_lce(\n", " input_dir / 'lce_site.json',\n", " input_dir / 'fitting_results_site.json',\n", ")\n", "compound_lce_model = CompositeLCEModel(\n", " site_model=site_model,\n", " kra_model=kra_model,\n", " kra_fit_metadata=kra_metadata,\n", " site_fit_metadata=site_metadata,\n", ")\n", "model_file = root / 'example' / 'output' / 'tutorial' / 'model' / 'model.json'\n", "model_file.parent.mkdir(parents=True, exist_ok=True)\n", "compound_lce_model.to(str(model_file))\n", "\n", "print(type(compound_lce_model).__name__)\n", "print('Has site model:', compound_lce_model.site_model is not None)\n", "print('Has KRA model:', compound_lce_model.kra_model is not None)\n", "print('Wrote model file:', model_file)\n" ] }, { "cell_type": "markdown", "id": "local-barrier-note", "metadata": {}, "source": [ "### Optional: build a simple `LocalBarrierModel` instead\n", "\n", "If you do not need an LCE, use `LocalBarrierModel` for direct barrier logic: constant barriers, local occupation counts, species-count rules, wildcard patterns, or exact catalog-style matches.\n" ] }, { "cell_type": "code", "execution_count": null, "id": "local-barrier-example", "metadata": {}, "outputs": [], "source": [ "from pathlib import Path\n", "from kmcpy.models import LocalBarrierModel\n", "\n", "root = Path('.').resolve()\n", "local_barrier_model = LocalBarrierModel(\n", " default_barrier=300.0,\n", " site_species={\n", " 1: {0: 'P', 1: 'Si'},\n", " 2: {0: 'Si', 1: 'P'},\n", " 3: {0: 'Si', 1: 'P'},\n", " 4: {0: 'Al', 1: 'Si'},\n", " },\n", ")\n", "local_barrier_model.add_species_count_rule(\n", " name='si_rich',\n", " sites='local_env',\n", " species='Si',\n", " min_count=4,\n", " barrier=420.0,\n", ")\n", "local_barrier_file = root / 'example' / 'output' / 'tutorial' / 'model' / 'local_barrier_model.json'\n", "local_barrier_file.parent.mkdir(parents=True, exist_ok=True)\n", "local_barrier_model.to(str(local_barrier_file))\n", "print('Wrote local barrier model:', local_barrier_file)\n" ] }, { "cell_type": "markdown", "id": "c666b4c7", "metadata": {}, "source": [ "## 6) Run a reproducible NASICON simulation (API path)\n", "\n", "This section runs the same setup used by the regression test suite.\n", "It is a good sanity check that your environment and dependencies are working.\n", "\n", "If the previous section wrote `example/output/tutorial/model/model.json`, that generated model file is used. If not, the code falls back to the bundled validated `tests/files/input/model.json`." ] }, { "cell_type": "code", "execution_count": null, "id": "be00d113", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(1.1193006038758543e-06, np.float64(307.3744449426362), np.float64(1.4630573145769372e-08), np.float64(4.576882562174338e-09), np.float64(1.1823906621661553), np.float64(0.31283002494661705), np.float64(0.21998150220477225))" ] }, "execution_count": 1, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from pathlib import Path\n", "import os\n", "\n", "from kmcpy.simulator.config import Configuration\n", "from kmcpy.simulator.kmc import KMC\n", "\n", "root = Path('.').resolve()\n", "file_path = root / 'tests' / 'files'\n", "model_file = root / 'example' / 'output' / 'tutorial' / 'model' / 'model.json'\n", "if not model_file.exists():\n", " model_file = file_path / 'input' / 'model.json'\n", "\n", "run_dir = root / 'example' / 'output' / 'tutorial' / 'run'\n", "run_dir.mkdir(parents=True, exist_ok=True)\n", "\n", "config = Configuration(\n", " structure_file=str(file_path / 'EntryWithCollCode15546_Na4Zr2Si3O12_573K.cif'),\n", " model_file=str(model_file),\n", " event_file=str(file_path / 'input' / 'events.json'),\n", " initial_state_file=str(file_path / 'input' / 'initial_state.json'),\n", " mobile_ion_specie='Na',\n", " temperature=298,\n", " attempt_frequency=5e12,\n", " equilibration_passes=1,\n", " kmc_passes=100,\n", " supercell_shape=(2, 1, 1),\n", " site_mapping={'Na': ['Na', 'X'], 'Zr': 'Zr', 'Si': ['Si', 'P'], 'O': 'O'},\n", " convert_to_primitive_cell=True,\n", " elementary_hop_distance=3.47782,\n", " random_seed=12345,\n", " name='NASICON_Regression',\n", ")\n", "\n", "kmc = KMC.from_config(config)\n", "original_cwd = Path.cwd()\n", "try:\n", " os.chdir(run_dir)\n", " tracker = kmc.run()\n", "finally:\n", " os.chdir(original_cwd)\n", "\n", "tracker.return_current_info()" ] }, { "cell_type": "markdown", "id": "a163370c", "metadata": {}, "source": [ "## 7) Results (reference + run output)\n", "\n", "For the input snapshot above, the expected final metrics are:\n", "\n", "| Metric | Value | Unit |\n", "|---|---:|---|\n", "| time | 1.1193006038758543e-06 | s |\n", "| msd | 307.37444494263616 | Angstrom^2 |\n", "| jump_diffusivity | 1.4630573145769372e-08 | cm^2/s |\n", "| tracer_diffusivity | 4.5768825621743376e-09 | cm^2/s |\n", "| conductivity | 1.1823906621661553 | mS/cm |\n", "| havens_ratio | 0.312830024946617 | dimensionless |\n", "| correlation_factor | 0.21998150220477225 | dimensionless |\n", "\n", "These values are intentionally embedded here so the tutorial remains informative even when cells are not executed during docs build.\n" ] }, { "cell_type": "markdown", "id": "78ffdb13", "metadata": {}, "source": [ "### Metric definitions\n", "\n", "`tracker.return_current_info()` returns:\n", "\n", "`(time, msd, jump_diffusivity, tracer_diffusivity, conductivity, havens_ratio, correlation_factor)`\n", "\n", "- `time`: simulated physical time (s)\n", "- `msd`: mean squared displacement (Angstrom^2)\n", "- `jump_diffusivity`: jump diffusivity (cm^2/s)\n", "- `tracer_diffusivity`: tracer diffusivity (cm^2/s)\n", "- `conductivity`: ionic conductivity (mS/cm)\n", "- `havens_ratio`: Haven ratio (dimensionless)\n", "- `f`: correlation factor (dimensionless)\n", "\n", "`tracker.result_units` returns the same unit metadata, and `Tracker.write_results(...)` writes it to a `results_units*.json.gz` sidecar." ] }, { "cell_type": "markdown", "id": "a615ca60", "metadata": {}, "source": [ "### Output files\n", "\n", "After the run, kMCpy writes compressed CSV artifacts in `example/output/tutorial/run`:\n", "\n", "- `results_NASICON_Regression.csv.gz`\n", "- `results_units_NASICON_Regression.json.gz`\n", "- `displacement_NASICON_Regression_.csv.gz`\n", "- `hop_counter_NASICON_Regression_.csv.gz`\n", "- `current_occ_NASICON_Regression_.csv.gz`" ] }, { "cell_type": "code", "execution_count": 2, "id": "c7e8a573", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " time jump_diffusivity tracer_diffusivity conductivity f havens_ratio msd\n", "0.000001 1.463057e-08 4.576883e-09 1.182391 0.219982 0.31283 307.374445\n" ] } ], "source": [ "from pathlib import Path\n", "import pandas as pd\n", "\n", "run_dir = Path('example/output/tutorial/run')\n", "results_file = run_dir / 'results_NASICON_Regression.csv.gz'\n", "df = pd.read_csv(results_file)\n", "print(df.tail(1).to_string(index=False))" ] }, { "cell_type": "markdown", "id": "e55f0573", "metadata": {}, "source": [ "## 8) Next steps\n", "\n", "- Swap NASICON regression inputs with your own CIF/event/model files.\n", "- Run temperature sweeps by varying `temperature` in `Configuration`.\n", "- Track convergence by increasing `kmc_passes` and comparing `jump_diffusivity`, `tracer_diffusivity`, and conductivity.\n" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "name": "python", "version": "3.13" }, "nbsphinx": { "execute": "never" } }, "nbformat": 4, "nbformat_minor": 5 }