# Use Site-Energy-Difference Models

This guide shows how to use site-energy-difference models inside a kMCpy KMC
simulation, including direct Python callables and external codes such as smol,
CLEASE, ASE, or a project-specific cluster expansion.

The main rule is simple:

```text
delta_e_site = E_after_hop - E_before_hop
```

The site model must return that signed energy difference for the proposed event.
kMCpy combines it with the KRA barrier as:

```text
effective_barrier = e_kra + delta_e_site / 2
```

In kMCpy, barriers and site-energy differences are consumed in `meV`. If a
callable or external code returns `eV`, set `units="eV"` and kMCpy will convert
it.

## Choose The Interface

Use one of these options:

| Use case | Interface |
| --- | --- |
| No site-energy-difference term | `site_model=None` |
| A kMCpy `LocalClusterExpansion` supplies the site-energy difference | `LocalClusterExpansion` as `site_model` |
| A Python function or external runtime supplies the site-energy difference | `SiteEnergyModel` |

`SiteEnergyModel` handles both simple kMCpy-native callables and mapped external
codes. If no mapping is supplied, it uses kMCpy's active-site order and
occupation labels directly. If an external code has its own site order or state
labels, provide `site_mapping` and `state_mapping`; kMCpy builds the mapping once
before the KMC loop and then updates only event endpoints.

When the site contribution is another kMCpy `LocalClusterExpansion`, it still
uses the regular `compute(simulation_state=..., event=...)` method. Its role in
`CompositeLCEModel` determines the meaning: as `kra_model` it returns
`E_KRA`, and as `site_model` it returns the site-energy-difference
contribution. `SiteEnergyModel` uses the same public `compute(...)` method and
returns `E_after_hop - E_before_hop` directly.

## What kMCpy Stores

kMCpy stores the simulation occupation as a compact active-site list:

```python
simulation_state.occupations
```

For a binary mobile-ion/vacancy model this often looks like:

```text
0 = mobile ion
1 = vacancy
```

For multicomponent models, states may be `0`, `1`, `2`, etc. These integers are
kMCpy state labels. External codes often use a different site order and
different occupation labels, so you must provide mappings.

## Optional Mappings

Mappings are only needed when an external code uses a different site order or
different occupation labels.

`site_mapping` maps from kMCpy active-site index to the external code's site
index:

```python
site_mapping = {
    0: 12,  # kMCpy active site 0 is external site 12
    1: 18,
    2: 19,
}
```

`state_mapping` maps from kMCpy state index to the external occupation value:

```python
state_mapping = {
    0: 0,  # kMCpy mobile-ion state -> smol occupation code
    1: 1,  # kMCpy vacancy state -> smol occupation code
    2: 2,  # another species
}
```

For CLEASE or ASE, the external value may be a symbol:

```python
state_mapping = {
    0: "Li",
    1: "X",
}
```

If the same kMCpy state number means different external values on different
sublattices, use `state_mapping_by_site`:

```python
state_mapping_by_site = {
    0: {0: 0, 1: 1},
    1: {0: 3, 1: 4},
}
```

kMCpy validates the site mapping during initialization. If an `event_lib` is
available, it also checks that the endpoint state mappings needed by the events
are present before the KMC loop starts.

During initialization, readable dictionaries/lists are converted to runtime
lookup arrays:

```text
site_lookup[kmcpy_active_site] -> external_site
state_lookup[kmcpy_state] -> external_occupation_value
state_lookup_by_site[kmcpy_site][kmcpy_state] -> external_occupation_value
```

The KMC loop then uses array indexing for event endpoints. It does not walk the
mapping dictionaries for every proposed hop.

## Build The External Site Mapping

Build `site_mapping` after the external code has finalized its site order. The
dictionary direction is always:

```python
site_mapping[kmcpy_active_site_index] = external_site_index
```

If the external object is built directly from kMCpy's active-site structure and
keeps the same order, no site mapping is needed:

```python
active_structure = active_site_order.active_structure()
external_runtime = build_external_runtime(active_structure)

site_model = SiteEnergyModel(
    runtime=external_runtime,
    compute_fn=external_delta,
    state_mapping=kmcpy_state_to_external_value,
    units="eV",
)
```

If the external structure preserves site properties, use the
`_kmcpy_active_site_index` property. This works for either active-only
structures or full structures with fixed sites:

```python
from kmcpy.structure.active_site_order import ACTIVE_SITE_PROPERTY


def site_mapping_from_property(external_structure, active_site_order):
    site_mapping = {}

    for external_index, site in enumerate(external_structure):
        active_index = site.properties.get(ACTIVE_SITE_PROPERTY)
        if active_index is None or int(active_index) < 0:
            continue
        site_mapping[int(active_index)] = int(external_index)

    if len(site_mapping) != active_site_order.active_site_count:
        raise ValueError("External structure does not contain every active site.")

    return site_mapping
```

If the external code drops site properties, match coordinates once before the
KMC run:

```python
import numpy as np


def site_mapping_from_coordinates(
    external_structure,
    active_site_order,
    tol=1e-3,
):
    active_structure = active_site_order.active_structure()
    site_mapping = {}
    used_external_sites = set()

    for active_index, active_site in enumerate(active_structure):
        distances = external_structure.lattice.get_all_distances(
            [active_site.frac_coords],
            external_structure.frac_coords,
        )[0]
        external_index = int(np.argmin(distances))

        if float(distances[external_index]) > tol:
            raise ValueError(f"No external site matches active site {active_index}.")
        if external_index in used_external_sites:
            raise ValueError("Two active sites map to the same external site.")

        site_mapping[active_index] = external_index
        used_external_sites.add(external_index)

    return site_mapping
```

Coordinate matching is a setup step, not a per-event operation. Store the
resulting dictionary in the `SiteEnergyModel`; kMCpy converts it to an array in
`initialize_state(...)`.

## Site-Order Traceability

kMCpy's own compact active-site order is defined by `ActiveSiteOrder`, the
same object used when structures, events, and occupations are loaded. When a
`SiteEnergyModel` is initialized during a normal KMC run, kMCpy passes
that map to the model. The model records:

- `active_site_order_hash`: the `ActiveSiteOrder.fingerprint` for the kMCpy
  active-site order.
- `external_site_order_hash`: a hash of the active-site to external-site
  mapping and external occupation size.

These hashes are serialized with the model so model files can be traced
back to the exact active-site order and external mapping they were built
against. If you construct the model manually, pass the index map explicitly:

```python
site_model = SiteEnergyModel(
    runtime=external_evaluator,
    compute_fn=external_delta,
    site_mapping=kmcpy_to_external_site,
    state_mapping=kmcpy_state_to_external_value,
    active_site_order=active_site_order,
    units="eV",
)
```

## Runtime Lifecycle

When mappings or an external runtime are used, `SiteEnergyModel` does three
things:

1. At setup, it builds site/state lookup arrays and one external occupation
   array from the initial kMCpy occupation.
2. For a proposed event, it maps only the two endpoint changes and calls your
   `compute_fn`.
3. After an accepted event, it optionally calls your `apply_fn`, then updates
   only the two changed external occupation entries.

So the full occupation conversion happens once, not every event.

## Smol-Style Example

For smol, keep the smol processor as the runtime object. The delta function
turns kMCpy's mapped endpoint changes into smol-style flips:

```python
import numpy as np

from kmcpy.models import CompositeLCEModel, SiteEnergyModel


def smol_delta(runtime, external_occupation, changes, coefficients):
    flips = [change.as_flip() for change in changes]
    delta_features = runtime.compute_feature_vector_change(
        external_occupation,
        flips,
    )
    return float(np.dot(delta_features, coefficients))  # eV


site_model = SiteEnergyModel(
    runtime=smol_processor,
    compute_fn=smol_delta,
    compute_kwargs={"coefficients": smol_coefficients},
    site_mapping=kmcpy_to_smol_site,
    state_mapping=kmcpy_state_to_smol_code,
    external_dtype="int32",
    units="eV",
)

model = CompositeLCEModel(
    kra_model=kra_lce_model,
    site_model=site_model,
)
```

`compute(...)` does not mutate `external_occupation`. kMCpy updates the
cached external occupation only after the event is accepted.

## CLEASE-Style Example

For CLEASE or ASE, keep the evaluator as the runtime object. The delta function
evaluates the proposed local changes; the apply function commits accepted
changes to the external evaluator.

```python
from kmcpy.models import CompositeLCEModel, SiteEnergyModel


def clease_delta(runtime, external_occupation, changes):
    system_changes = [
        make_system_change(
            change.external_site,
            change.old_value,
            change.new_value,
        )
        for change in changes
    ]
    return runtime.get_energy_given_change(system_changes) - runtime.get_energy()


def clease_apply(runtime, external_occupation, changes):
    system_changes = [
        make_system_change(
            change.external_site,
            change.old_value,
            change.new_value,
        )
        for change in changes
    ]
    runtime.apply_system_changes(system_changes, keep=True)


site_model = SiteEnergyModel(
    runtime=clease_evaluator,
    compute_fn=clease_delta,
    apply_fn=clease_apply,
    site_mapping=kmcpy_to_clease_site,
    state_mapping={0: "Li", 1: "X"},
    units="eV",
)

model = CompositeLCEModel(
    kra_model=kra_lce_model,
    site_model=site_model,
)
```

The `external_occupation` argument is still updated by kMCpy after accepted
events. Use `apply_fn` only for external runtime state that kMCpy cannot update
itself, such as a live evaluator or calculator.

## Model Files

Live Python objects cannot be serialized into model files. For file-based
workflows, provide factory references instead:

```python
site_model = SiteEnergyModel(
    runtime_ref="my_project.smol_site_energy:build_processor",
    runtime_kwargs={"model_file": "smol_ce.json"},
    compute_ref="my_project.smol_site_energy:smol_delta",
    compute_kwargs={"coefficients_file": "eci.npy"},
    site_mapping=kmcpy_to_smol_site,
    state_mapping=kmcpy_state_to_smol_code,
    external_dtype="int32",
    units="eV",
)

site_model.to("site_energy.json")
```

At runtime, kMCpy resolves `runtime_ref`, `compute_ref`, and `apply_ref` from
their import paths.

## Direct Callable Example

For a kMCpy-native callable, omit `site_mapping` and `state_mapping`. The
callable can accept only the arguments it needs.

```python
from kmcpy.models import SiteEnergyModel

site_model = SiteEnergyModel(
    compute_ref="my_project.site_energy:site_energy_difference",
    units="eV",
    compute_kwargs={"model_file": "site_ce.json"},
)
```

The callable receives:

```python
def site_energy_difference(event, simulation_state, model_file):
    return 0.04  # E_after - E_before, in eV
```

Do not use this pattern if the callable rebuilds a full smol or CLEASE
occupation object every time it is called. Keep the external runtime in
`SiteEnergyModel` and provide mappings instead.

## No Site-Energy Term

If the KRA model already contains everything you need, omit the site model:

```python
model = CompositeLCEModel(
    kra_model=kra_lce_model,
    site_model=None,
)
```

## Checklist

Before running a simulation, check:

- `compute_fn` returns `E_after_hop - E_before_hop`, not an absolute energy.
- `units` matches the model output.
- `site_mapping` covers every kMCpy active site.
- `state_mapping` or `state_mapping_by_site` covers every state needed by the
  events.
- `compute_fn` does not mutate the external occupation for proposed events.
- `apply_fn` mutates only accepted events.
- Full occupation and mapping conversion happens in `initialize_state(...)`,
  not in `compute(...)`.