SparqlModel production guide

Operator and architect guide for running SparqlModel in production. Normative API detail: SparqlModel Technical Specification. Feature schedule: SparqlModel Roadmap. Task guides: FastAPI integration, Sessions and stores.

When to use which store

Store	Use case
MemoryStore	Unit tests, local prototypes, single-process tools
HttpStore	Remote Fuseki/Jena/compatible SPARQL 1.1 endpoint

Do not use HttpStore as a shared cache across many writers without a mirror sync strategy. Prefer one writer per endpoint. Since 0.9.1, get can CONSTRUCT-pull individual subjects into the mirror when they are missing locally; since 0.9.2, refresh does the same. Since 0.12.0, sync_mirror() reloads the entire default graph into the mirror via Graph Store HTTP GET.

HttpStore mirror model (0.2)

Operation	Reads / writes
`put`, `delete`, `update_graph`	Remote + local mirror
`query`, `execute`	Remote only
`get`, `session.graph`, cascade	Mirror only

Symptom: execute returns IRIs that get cannot load — data exists on the server but not in the mirror. Mitigation (0.9.1+): get and refresh (0.9.2+) attempt pull_subjects_into_mirror automatically; you can also call pull_subjects_into_mirror explicitly, put through the same session/store, or use MemoryStore for single-process apps.

Do not mutate session.graph directly on HttpStore / AsyncHttpStore. session.graph is the local mirror only; add/remove on it do not update the remote endpoint. query and execute still read the server, so the mirror and remote can diverge permanently. Use session.put / delete or MemoryStore for tests that need direct graph edits.

Shipped (0.9.1): Optional read_endpoint / write_endpoint, pull_subjects_into_mirror, auto-pull on get, pyoxigraph.parse_query_results for SELECT JSON.

Shipped (0.9.2): Auto-pull on refresh; merge partial-field semantics and hydration invalidation.

Shipped (0.10.0): Replace-on-pull (no stale predicates per IRI after pull); mirror_mode on HTTP stores.

Shipped (0.11.0): Retries, batched UPDATE, optional SELECT GET — see HTTP resilience (0.11+).

Shipped (0.12.0): GSP sync_mirror() when graph_store_url is set — see Mirror sync (0.12+).

Mirror sync (0.12+)

When another process or admin UI changes the remote dataset, reconcile the local mirror with sync_mirror() (read-only on the server):

from sparqlmodel import HttpStore, SPARQLSession

store = HttpStore(
    "http://localhost:3030/myds/sparql",
    read_endpoint="http://localhost:3030/myds/sparql",
    write_endpoint="http://localhost:3030/myds/update",
    graph_store_url="http://localhost:3030/myds/data",
)
with SPARQLSession(store=store) as session:
    store.sync_mirror()
    person = session.get(Person, IRI("http://example.org/p/1"))

Since 0.13.1, sync_mirror() and pull_subjects_into_mirror() bump the store’s mirror_generation, which clears the session identity map and hydration cache on the next get, refresh, or query hydration — you do not need expunge_all() after a mirror sync.

Mechanism	Scope	When to use
`pull_subjects_into_mirror([iri, ...])`	Listed subjects (CONSTRUCT)	Targeted refresh after known IRIs changed
`mirror_mode="remote_authoritative"`	Per `get` / `refresh`	Every read must match remote for that IRI
`sync_mirror()`	Entire default graph (GSP GET)	Bulk external load, admin UI edits, startup warm-cache

Fuseki URLs (typical):

Service	Path
SPARQL query	`http://host:3030/{dataset}/sparql`
SPARQL update	`http://host:3030/{dataset}/update`
Graph Store HTTP	`http://host:3030/{dataset}/data`

http_common.default_graph_store_url(sparql_endpoint) guesses .../sparql → .../data; production apps should pass an explicit graph_store_url.

CI / local integration tests: make fuseki-up (sets ADMIN_PASSWORD=testadmin and FUSEKI_DATASET_1=sparqlmodel_test), export FUSEKI_BASE_URL=http://127.0.0.1:3030 and FUSEKI_ADMIN_PASSWORD=testadmin, then run pytest.

HTTP resilience (0.11+)

Configure on HttpStore / AsyncHttpStore (and via http_store_lifespan / async_http_store_lifespan — kwargs forward to the store constructor):

Parameter	Default	Notes
`max_retries`	`2`	Up to 3 attempts total (`0..max_retries`)
`retry_backoff`	`0.5`	Seconds; exponential backoff per attempt (capped at 30s)
`max_triples_per_update`	`500`	Chunk size for `INSERT DATA` / `DELETE DATA` per HTTP request
`query_method`	`"post"`	`"get"` sends SELECT via query string (SPARQL Protocol)

Retries apply to: remote SELECT (query), CONSTRUCT pull (pull_subjects_into_mirror), Graph Store GET (sync_mirror), and each UPDATE chunk. Retried status codes: 502, 503, 504, plus httpx connection/timeouts. 4xx and other errors fail immediately (no retry).

Batched UPDATE: update_graph sends all DELETE chunks first, then all INSERT chunks. The local mirror is updated only after every remote chunk succeeds. If chunk k fails after earlier chunks succeeded, the remote dataset may be partially updated; the mirror is not changed — treat as an operator incident and reconcile manually.

UPDATE retries: Retries are safe for idempotent chunks only. If the server applies an UPDATE then returns 503, a retry may send the same INSERT DATA / DELETE DATA again. Use max_retries=0 for sensitive writes or design remote data so duplicate chunks are harmless.

GET SELECT: Useful behind caches or strict read-only proxies. Very long queries can exceed URL length limits on some servers; prefer POST for large SELECT text. CONSTRUCT pull always uses POST. When read_endpoint already includes query parameters (for example Fuseki default-graph-uri), GET merges query= with & rather than adding a second ?.

Compact IRIs: Pass prefixes= on the HTTP store (or use absolute IRIs) so pull_subjects_into_mirror expands schema:Person/1 consistently in CONSTRUCT VALUES and mirror removal.

Set max_retries=0 in tests or when you need immediate failure without transport retries.

Mirror modes (0.10+)

Configure on HttpStore / AsyncHttpStore (and via http_store_lifespan(..., mirror_mode=...)):

`mirror_mode`	When `get` / `refresh` pull from remote	Typical use
`writer` (default)	Only when the subject lacks the expected `rdf:type` in the mirror	This app is the primary writer for the endpoint
`remote_authoritative`	Every `get` / `refresh` (CONSTRUCT + replace-on-pull)	Read replicas, admin UIs, or multi-reader apps that must see remote truth

Replace-on-pull applies to explicit pull_subjects_into_mirror in both modes: outgoing mirror triples (subject, ?, ?) for each requested IRI are removed before remote triples are merged. Triples where the IRI appears only as object are not removed by per-subject pull; use sync_mirror() for a full-graph refresh.

Authority:

API	Source of truth
`query`, `execute`, `Query.count()`	Remote endpoint
`get`, `refresh`, cascade, `session.graph`	Mirror (after any pull for that read)

With writer, a subject already in the mirror may have stale properties until you call pull_subjects_into_mirror, sync_mirror(), or switch to remote_authoritative. Auto-pull on get/refresh runs only when the expected rdf:type is missing from the mirror, not when literals or links are outdated. With remote_authoritative, each get/refresh re-syncs that IRI from remote.

Caution: remote_authoritative does not flush pending put(..., flush=False); avoid reading the same IRI with unflushed local writes.

Session per request (FastAPI)

Use one SPARQLSession per HTTP request — same pattern as SQLAlchemy:

from sparqlmodel.fastapi import SessionDep, http_store_lifespan, init_app

# Lifespan registers shared HttpStore on app.state
# Route handlers: def handler(session: SessionDep): ...

Shared store on app.state; new session per request.
close_on_exit=False on shared stores (default via init_app).
Pending put(..., flush=False) is flushed on successful request end; rolled back on error.

Threading: Do not share one SPARQLSession across threads. See SPECS — Session lifecycle.

Asyncio (0.6+): Use AsyncSPARQLSession with AsyncHttpStore in async FastAPI routes. Do not share one async session across concurrent asyncio tasks (same rule as sync: one session per request/task). In-memory graph work (compiler, hydration, cascade) runs synchronously on the event loop thread; only HTTP store I/O is non-blocking. For CPU-heavy batch jobs in a sync codebase, run_in_executor with a sync session remains valid.

Pagination and sorting (0.8+)

session.query(Person).where(...).order_by(Person.name).offset(20).limit(10).all()
total = session.query(Person).where(...).count()

count() hits the store with a COUNT(DISTINCT ?root) query and does not hydrate rows. On HttpStore, it uses the remote endpoint only (no mirror hydration). For list routes, prefer .all() with .offset() / .limit() and a separate .count() with the same .where() filters.

Identity map and caching

After put, get(Model, iri, depth=0) returns the same instance when relationships are not materialized on the in-memory object.
expire(Model, iri) clears cache for that resource (and drops a pending put for that IRI).
expunge(model) / expunge_all() detach instances from the session without changing the store.
refresh(model, *, depth=0) reloads from the store; merge(model) reconciles a detached instance with the identity map (no store write).
depth=0 vs depth=1 may cache separate hydrated views.

Validation and quality

Concern	Today	Planned
Write validation	Pydantic on `SPARQLModel` at construct / `put`	SHACL on `put` (1.0, TripleModel) — complements Pydantic
Load validation	Pydantic via hydration (`HydrationError` on type mismatch)	Same; multi-valued fields 1.1
Query logging	None	Structured SPARQL log (1.0)
Bulk import	Repeated `put`	Bulk helpers (1.0)
Async FastAPI routes	Sync `SessionDep` (blocking)	`AsyncSessionDep` + `AsyncHttpStore` (0.6+)

Security

Use HTTPS for remote endpoints; configure bearer_token or auth on HttpStore.
Do not pass user-controlled strings into raw execute() without parameterization patterns supported by your endpoint.
Filter values in the query DSL are serialized via SparqlModel N3 helpers (rdf_n3) for pyoxigraph-compatible SPARQL.

Monitoring checklist

Log SPARQL execution time and HTTP status from HttpStore (custom middleware until 1.0).
Alert on mirror divergence if you use both execute and get on the same dataset.
Track pending-queue failures after flush() (partial writes possible; see SPECS limitations).