Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
30 commits
Select commit Hold shift + click to select a range
c595976
Integrate all estimators into estiMINT
CosmoNaught Jun 23, 2026
a56f7ee
update README
CosmoNaught Jun 23, 2026
992af02
update README, more tests and remove fluff
CosmoNaught Jun 23, 2026
6c32614
Fix hardcoded paths in prepare scripts; trim bednet docstrings
CosmoNaught Jun 23, 2026
7c99d0c
Add run_scenarios pipeline, CI/publish workflows, split deps into ext…
CosmoNaught Jun 23, 2026
f9c1c53
update Py cli ver, README and reqs
CosmoNaught Jun 24, 2026
a0a6528
update publish rules
CosmoNaught Jun 24, 2026
90a8371
Refactor code structure for improved readability and maintainability
absternator Jun 24, 2026
b6fb4dd
Bump estimint version to 1.4.2 in uv.lock
absternator Jun 24, 2026
528d9b2
Add dependency groups for development dependencies in pyproject.toml …
absternator Jun 24, 2026
2b224e7
refactor: restore canonical scenario key names
CosmoNaught Jun 25, 2026
9963a04
refactor(hbr): rename DT to dt, lower prev floor to 0.01
CosmoNaught Jun 25, 2026
8ca87e0
style: remove unused imports
CosmoNaught Jun 25, 2026
8a01454
chore: drop committed coverage and metrics
CosmoNaught Jun 25, 2026
4b27322
docs: sync run_scenarios example to canonical keys
CosmoNaught Jun 25, 2026
013b471
refactor(hbr): use df for the dataframe in eir_to_hbr trainer
CosmoNaught Jun 25, 2026
8c5327f
refactor: use df for the dataframe in remaining trainers
CosmoNaught Jun 25, 2026
4a7493e
docs: clarify y9 columns are year-9 means, not 'aggregates'
CosmoNaught Jun 25, 2026
eb676f7
fix(scenarios): map current/future nets, IRS, LSM and routine covaria…
CosmoNaught Jun 25, 2026
d8a4e74
update manifest
CosmoNaught Jun 25, 2026
e6ba653
verbump
CosmoNaught Jun 25, 2026
70c43e7
verbump
CosmoNaught Jun 25, 2026
0516fbc
fix(ci): sync uv.lock to estimint 1.4.4
CosmoNaught Jun 25, 2026
121e9a2
build: add scripts/release.sh for atomic version bumps
CosmoNaught Jun 25, 2026
90bc0fe
remove scripts folder
CosmoNaught Jun 25, 2026
b2e9ae0
refactor: update scenarios to use Scenario dataclass for improved cla…
absternator Jun 25, 2026
dda2434
bump version to 1.5.0 in pyproject.toml
absternator Jun 25, 2026
346d838
fix: update revision to 2 and bump estimint version to 1.5.0 in uv.lock
absternator Jun 25, 2026
06c57a1
refactor: move Scenario and EirTarget dataclasses to types.py and upd…
absternator Jun 26, 2026
c1219f6
update version
absternator Jun 26, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
33 changes: 33 additions & 0 deletions .github/workflows/publish.yml
Comment thread
absternator marked this conversation as resolved.
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
name: Publish Release to PyPI

on:
push:
tags:
- "v*.*.*"

jobs:
run:
runs-on: ubuntu-latest
environment:
name: pypi
permissions:
id-token: write
contents: read
steps:
- name: Checkout code
uses: actions/checkout@v6
- name: Install uv
uses: astral-sh/setup-uv@08807647e7069bb48b6ef5acd8ec9567f424441b
with:
enable-cache: true
version: "0.11.18"
- name: Set up Python
run: uv python install
- name: Build
run: uv build
- name: Smoke test (wheel)
run: uv run --isolated --no-project --with dist/*.whl tests/smoke_test.py
- name: Smoke test (source distribution)
run: uv run --isolated --no-project --with dist/*.tar.gz tests/smoke_test.py
- name: Publish
run: uv publish
38 changes: 38 additions & 0 deletions .github/workflows/tests.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
name: tests

on:
push:
branches:
- master
pull_request:
Comment thread
CosmoNaught marked this conversation as resolved.
branches:
- "*"

concurrency:
group: tests-${{ github.ref }}
cancel-in-progress: true

jobs:
test:
name: pytest (Python ${{ matrix.python-version }})
runs-on: ubuntu-latest
strategy:
fail-fast: false
matrix:
python-version: ["3.12", "3.13"]

steps:
Comment thread
absternator marked this conversation as resolved.
- uses: actions/checkout@v6

- name: Install uv
uses: astral-sh/setup-uv@08807647e7069bb48b6ef5acd8ec9567f424441b
with:
version: "0.11.18"
enable-cache: true
python-version: ${{ matrix.python-version }}

- name: Install project
run: uv sync --locked

- name: Run tests
run: uv run pytest
13 changes: 13 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -40,3 +40,16 @@ uv.lock
# Training outputs
output/
scripts/output/

# Model training artifacts (regenerable; shipped copies live in src/estimint/data/)
models/**/*.parquet
models/**/*.pkl
models/**/*.model
models/**/plots/
models/**/metrics/

# Test / coverage
.coverage
.coverage.*
htmlcov/
.pytest_cache/
3 changes: 1 addition & 2 deletions MANIFEST.in

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this file needed?

Original file line number Diff line number Diff line change
@@ -1,4 +1,3 @@
include README.md
include requirements.txt
recursive-include src/estimint/data *
recursive-include src/estimint/inst *
prune tests
192 changes: 144 additions & 48 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,62 +1,77 @@
# estiMINT (Python)
# estiMINT

Python port of the estiMINT R package for EIR (Entomological Inoculation Rate) estimation using machine learning.
Package for EIR (Entomological Inoculation Rate) estimation using machine learning.

It estimates EIR from prevalence, converts between EIR and human biting rate (including the effect of changes in mosquito density), and turns a bednet specification (net type and resistance level) into the `dn0` killing parameter.

## Installation

```bash
pip install -e .
pip install estimint # core: inference only (numpy, pandas, xgboost, scipy)
```

Or install dependencies directly:
Optional extras, by use case:

```bash
pip install -r requirements.txt
pip install "estimint[train]" # data prep + model training (duckdb, scikit-learn, pyarrow)
pip install "estimint[viz]" # plotting (matplotlib)
pip install "estimint[scenarios]" # run_scenarios pipeline (stateMINT emulator)
pip install "estimint[all]"
pip install "estimint[dev]" # test/lint/type-check toolchain
```

## File Mapping (R → Python)
Comment thread
CosmoNaught marked this conversation as resolved.
The `run_scenarios` pipeline also needs the stateMINT emulator (Python 3.12+). For now it
comes from the `mamba2-train` branch. With uv this is handled for you:

```bash
uv sync --extra scenarios
```

| R File | Python File | Description |
|--------|-------------|-------------|
| `estiMINT-package.R` | `__init__.py` | Package initialization and exports |
| `globals.R` | `globals.py` | Global variables and constants |
| `utils.R` | `utils.py` | Utility functions (metrics, QMAP, etc.) |
| `data_processing.R` | `data_processing.py` | Data loading and preprocessing |
| `models.R` | `models.py` | XGBoost model training |
| `train.R` | `train.py` | Main training pipeline with K-fold CV |
| `plotting.R` | `plotting.py` | Visualization functions |
| `storage.R` | `storage.py` | Model persistence and loading |
| `run.R` | `run.py` | Model inference |
With plain pip, install stateMINT from the branch yourself, then estiMINT:

## API Reference
```bash
pip install "git+https://github.com/mrc-ide/stateMINT.git@mamba2-train"
pip install estimint
```

### Training
For local development with [uv](https://docs.astral.sh/uv/):

```python
from estimint import train_xgb_model

model = train_xgb_model(
in_parquet="data/input.parquet",
out_dir="output/",
thr_lo=0.02, # Lower prevalence threshold
thr_hi=0.95, # Upper prevalence threshold
k_strata=16, # K-means strata for EIR
K=10, # CV folds
seed=42,
save_pkl=True,
save_plots=True,
save_artifacts=True
)
```bash
uv sync --all-extras --dev
```

## Data & retraining pipeline

All training data lives in `datasets/estimint_simulations_y9.parquet`. Two model folders
derive their views from it and train:

```
datasets/ # training data (see datasets/README.md)
models/
prevalence/ # prev_y9 -> EIR (estiMINT_model.pkl)
hbr/ # HBR<->EIR sub-models (estiMINT_HBR_model.pkl, estiMINT_EIR_to_HBR_model.pkl)
```

Retrain a model end-to-end, e.g. the prevalence model:

```bash
python models/prevalence/prepare.py # derive the training view from the parquet
python models/prevalence/train.py # train -> estiMINT_model.pkl + metrics/ + plots/
```

The deployed models shipped with the package live in `src/estimint/data/` and are loaded by
name (`prevalence`, `hbr`, `eir_to_hbr`). This is independent of the training pipeline above.

## API Reference

### Inference

```python
from estimint import load_xgb_model, run_xgb_model
import pandas as pd

# Load model
model = load_xgb_model("output/models/estiMINT_model.pkl")
# Load a bundled model by name: "prevalence", "hbr", or "eir_to_hbr"
model = load_xgb_model("prevalence")

# Prepare input data
new_data = pd.DataFrame({
Expand All @@ -80,13 +95,80 @@ print(f"Predicted EIR: {eir_predictions[0]:.2f}")
from estimint import load_xgb_model, run_xgb_model, set_global_model

# Set global model once
model = load_xgb_model("output/models/estiMINT_model.pkl")
model = load_xgb_model("prevalence")
set_global_model(model)

# Run predictions without passing model
predictions = run_xgb_model(new_data) # Uses global model
```

### Bednet to dn0

Turn a bednet specification (a mix of net types and an insecticide resistance level) into
the `dn0` covariate, the probability a mosquito dies on contact, along with total ITN usage.

```python
from estimint import calculate_dn0, net_types

net_types() # ['pyrethroid_only', 'pyrethroid_pbo', 'pyrethroid_ppf', 'pyrethroid_pyrrole']
res = calculate_dn0(0.5, py_only=0.4, py_pbo=0.3, py_pyrrole=0.2, py_ppf=0.1)
res.dn0, res.itn_use # weighted dn0, total net usage
```

### Run scenarios

`run_scenarios` runs the whole pipeline in one call. You give it a list of scenarios and
get back a DataFrame. For each scenario it works out the bednet killing effect, estimates
the EIR (from prevalence, from biting rate, or taken directly), optionally adjusts for a
change in mosquito density, then runs the stateMINT emulator forward to the prevalence and
cases trajectories.

This needs the [stateMINT](https://github.com/mrc-ide/stateMINT) package installed as well
as estiMINT. estiMINT only loads it when you call `run_scenarios`, and the model weights
download from HuggingFace.

```python
from estimint import run_scenarios
from estimint.scenarios import Scenario

scenarios = [
Scenario(name="PBO nets, prevalence input, 60% more mosquitoes",
input="prevalence", value=0.30,
res_use=0.55, py_pbo=0.85,
Q0=0.90, phi=0.85, seasonal=1, irs=0.40, lsm=0.0,
mosquito_delta=0.60),
Scenario(name="Biting rate input, mixed nets",
input="hbr", value=250000.0,
res_use=0.45, py_only=0.30, py_ppf=0.20,
Q0=0.80, phi=0.82, seasonal=0, irs=0.0),
Scenario(name="EIR supplied directly, no nets",
input="eir", value=20.0, res_use=0.0,
Q0=0.88, phi=0.78, seasonal=1, irs=0.60),
]

df = run_scenarios(scenarios)
print(df[["name", "eir_baseline", "eir_final", "prev_y9", "cases_endline"]])
```

Every scenario is a `Scenario` and needs `name`, `res_use`, `input`, `value`, `Q0`,
`phi`, `seasonal` and `irs`. `lsm`, `routine` and `irs_future` default to 0 (note
`irs_future` does **not** default to `irs` — set it explicitly if you want IRS to
continue). **Current nets:** give a net-type usage mix (`py_only`, `py_pbo`,
`py_pyrrole`, `py_ppf` shares), or leave the net keys out for none; current and
future legs share the same `res_use`. **Future nets:** give `net_type_future` +
`itn_future` to switch net type; omit `net_type_future` and the future leg is zeroed
(it does **not** carry the current mix forward), or set `itn_future=0` to remove
nets explicitly. `mosquito_delta` only applies when `input` is `"prevalence"`.

The returned DataFrame has one row per scenario. Alongside the inputs it gives the
estimated EIR (`eir_baseline`, and `eir_final` after any mosquito-density change) and the
stateMINT output. That output is year-9 prevalence (`prev_y9`), endline prevalence and
cases, and the full 157-step `prevalence` and `cases` series. What you do with it is up to
you.

The `estimint.scenarios` module is also where the simulation-based inference and experiment
code will go.

## Utility Functions

```python
Expand All @@ -110,6 +192,9 @@ y_calibrated = predict_qmap_w(y_pred, cal)

## Data Processing

These functions need the training extras. Install them with `pip install "estimint[train]"`,
which adds duckdb and scikit-learn.

```python
from estimint import load_and_filter, make_value_weights, strata_and_split

Expand All @@ -126,24 +211,35 @@ df["eir_log10"] = np.log10(df["eir"])
df = strata_and_split(df, k_strata=16, seed=42)
```

## Testing

```bash
uv sync --extra dev # or: pip install -e ".[dev]"
uv run pytest # or: pytest
```

This covers the metric and utility helpers, the EIR estimators (prevalence, HBR and direct
EIR), the mosquito-density HBR pipeline, and the bednet calculation.

## CI and releases

The test suite runs on every push and pull request across Python 3.10 to 3.14, defined in
[`.github/workflows/tests.yml`](.github/workflows/tests.yml).

Releases publish to PyPI from [`.github/workflows/publish.yml`](.github/workflows/publish.yml).
It builds with `uv build` and uploads with `uv publish` using
[PyPI trusted publishing](https://docs.astral.sh/uv/guides/integration/github/#publishing-to-pypi),
so no token is stored. To cut a release, bump `version` in `pyproject.toml` and publish a
GitHub Release. The first time, register this repository as a trusted publisher in the PyPI
project settings.

## Key Differences from R Version

1. **File format**: Models saved as `.pkl` (pickle) instead of `.rds`
2. **Data handling**: Uses pandas instead of data.table
3. **Plotting**: Uses matplotlib instead of ggplot2
4. **Global model**: Use `set_global_model()` / `get_global_model()` instead of `.GlobalEnv`

## Dependencies

- numpy >= 1.20.0
- pandas >= 1.3.0
- duckdb >= 0.8.0
- xgboost >= 1.6.0
- scikit-learn >= 1.0.0
- matplotlib >= 3.4.0
- requests >= 2.28.0 (optional, for model download)
- appdirs >= 1.4.0 (optional, for cache directory)

## License

MIT License
11 changes: 11 additions & 0 deletions datasets/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
# datasets/

Training data for retraining the estiMINT models. Not shipped with the package.

**`estimint_simulations_y9.parquet`** — 16,384 rows (4,096 parameter sets × 4 sims).
`prev_y9` and `hbr_y9` are year-9 means: prevalence and human biting rate averaged over
the 365 days of simulation year 9 (the year ending at the intervention on day 3285).
Columns: `parameter_index`, `simulation_index`, `eir`, `dn0_use`, `Q0`, `phi_bednets`,
`seasonal`, `itn_use`, `irs_use`, `prev_y9`, `hbr_y9`.

Each model's `prepare.py` filters this source and sorts by key into its training view.
Binary file added datasets/estimint_simulations_y9.parquet
Binary file not shown.
17 changes: 17 additions & 0 deletions models/hbr/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
# models/hbr

The HBR feature's two sub-models, both used by `estimate_eir_with_mosquito_delta`
(`src/estimint/hbr.py`) to answer "what happens to EIR if mosquito density changes by X%?".

| Sub-model | Direction | Bundle name | File |
|---|---|---|---|
| `train_hbr_to_eir.py` | HBR + interventions → EIR | `hbr` | `estiMINT_HBR_model.pkl` |
| `train_eir_to_hbr.py` | EIR + interventions → HBR | `eir_to_hbr` | `estiMINT_EIR_to_HBR_model.pkl` |

```bash
python models/hbr/prepare.py # source -> hbr_training.parquet + eir_to_hbr_training.parquet
python models/hbr/train_hbr_to_eir.py # -> estiMINT_HBR_model.pkl
python models/hbr/train_eir_to_hbr.py # -> estiMINT_EIR_to_HBR_model.pkl
```

Deployed copies live in `src/estimint/data/`.
Loading