AIline docs

Concepts

Four ideas cover almost everything AIline does: the lineage row, the snapshot, the MLflow link strategy, and the two project files .ailine.yml and .ailineignore.

Lineage row vs. MLflow run

A lineage row is what AIline writes to its local SQLite database when you run ailine track. It carries the code identity (Git commit or snapshot id), the exact argv, the environment fingerprint, DVC linkage, and a pointer to the matching MLflow run. The MLflow run is whatever your training script (or AIline in wrap mode) opens; AIline does not own its metrics or params.

ailine track --
   |
   v
+---------------------+        +--------------------+
| AIline lineage row  |  <----  | MLflow run         |
| (.ailine/tree.db)   |        | (your tracking     |
|  - id, snapshot,    |  link  |  server / file://) |
|  - argv, env, dvc,  |  ----> |  - metrics         |
|  - mlflow_run id    |        |  - params          |
+---------------------+        +--------------------+
AIline persists lineage; MLflow persists metrics. The two are stitched together by the mlflow_run column on the lineage row.

Snapshot vs. commit

When you run ailine track on a clean Git tree, the lineage row is identified by the Git commit SHA (type=git). When the tree is dirty (modified or untracked files), AIline takes a content-addressed snapshot first (type=snapshot):

  • Manifest of every included file with its sha256.
  • Per-file objects under .ailine/snapshots/objects/, deduplicated across runs.
  • A git diff HEAD patch and the list of untracked files.
  • The parent commit SHA and the snapshot's own content hash as the row id.

Files matching .ailineignore are skipped. Large non-DVC files use the policy configured in snapshot.large_file_mode (prompt, skip, or include). DVC-tracked files are recorded as pointers, never copied into the snapshot.

.ailine.yml and .ailineignore

Two project-level files at the repo root drive AIline behavior. ailine init-workspace seeds both with sensible defaults:

.ailine.yml

The behavior contract. Sets the MLflow mode and link strategy, the snapshot storage dir and large-file policy, the DVC linkage rules, and cleanup defaults. Every key is documented in the schema; the generated file lists every default explicitly so behavior never depends on hidden defaults.

track:
  mlflow:
    mode: inherit
    link_strategy: tag
    link_poll_seconds: 3.0

snapshot:
  storage_dir: .ailine/snapshots
  large_file_mb: 50
  large_file_mode: prompt

.ailineignore

Snapshot ignore patterns in standard gitignore syntax. Used both by the snapshot scan (these paths are not stored) and by ailine restore (these paths are preserved on disk during a strict-sync). A built-in default set is always active even if the file is missing — covering virtualenvs, build dirs, IDE caches, MLflow / W&B / DVC internals, and editor scratch.

__pycache__/
.venv/
.idea/
.cursor/
mlruns/
wandb/
!dist/keep.txt   # negate a default with `!`