SparqlModel production guide
Operator and architect guide for running SparqlModel in production. Normative API detail: SparqlModel Technical Specification. Feature schedule: SparqlModel Roadmap. Task guides: FastAPI integration, Sessions and stores.
When to use which store
Store |
Use case |
|---|---|
MemoryStore |
Unit tests, local prototypes, single-process tools |
HttpStore |
Remote Fuseki/Jena/compatible SPARQL 1.1 endpoint |
Do not use HttpStore as a shared cache across many writers without a mirror sync strategy. Prefer one writer per endpoint. Since 0.9.1, get can CONSTRUCT-pull individual subjects into the mirror when they are missing locally; since 0.9.2, refresh does the same. Since 0.12.0, sync_mirror() reloads the entire default graph into the mirror via Graph Store HTTP GET.
HttpStore mirror model (0.2)
Operation |
Reads / writes |
|---|---|
|
Remote + local mirror |
|
Remote only |
|
Mirror only |
Symptom: execute returns IRIs that get cannot load — data exists on the server but not in the mirror. Mitigation (0.9.1+): get and refresh (0.9.2+) attempt pull_subjects_into_mirror automatically; you can also call pull_subjects_into_mirror explicitly, put through the same session/store, or use MemoryStore for single-process apps.
Do not mutate session.graph directly on HttpStore / AsyncHttpStore. session.graph is the local mirror only; add/remove on it do not update the remote endpoint. query and execute still read the server, so the mirror and remote can diverge permanently. Use session.put / delete or MemoryStore for tests that need direct graph edits.
Shipped (0.9.1): Optional read_endpoint / write_endpoint, pull_subjects_into_mirror, auto-pull on get, pyoxigraph.parse_query_results for SELECT JSON.
Shipped (0.9.2): Auto-pull on refresh; merge partial-field semantics and hydration invalidation.
Shipped (0.10.0): Replace-on-pull (no stale predicates per IRI after pull); mirror_mode on HTTP stores.
Shipped (0.11.0): Retries, batched UPDATE, optional SELECT GET — see HTTP resilience (0.11+).
Shipped (0.12.0): GSP sync_mirror() when graph_store_url is set — see Mirror sync (0.12+).
Mirror sync (0.12+)
When another process or admin UI changes the remote dataset, reconcile the local mirror with sync_mirror() (read-only on the server):
from sparqlmodel import HttpStore, SPARQLSession
store = HttpStore(
"http://localhost:3030/myds/sparql",
read_endpoint="http://localhost:3030/myds/sparql",
write_endpoint="http://localhost:3030/myds/update",
graph_store_url="http://localhost:3030/myds/data",
)
with SPARQLSession(store=store) as session:
store.sync_mirror()
person = session.get(Person, IRI("http://example.org/p/1"))
Since 0.13.1, sync_mirror() and pull_subjects_into_mirror() bump the store’s mirror_generation, which clears the session identity map and hydration cache on the next get, refresh, or query hydration — you do not need expunge_all() after a mirror sync.
Mechanism |
Scope |
When to use |
|---|---|---|
|
Listed subjects (CONSTRUCT) |
Targeted refresh after known IRIs changed |
|
Per |
Every read must match remote for that IRI |
|
Entire default graph (GSP GET) |
Bulk external load, admin UI edits, startup warm-cache |
Fuseki URLs (typical):
Service |
Path |
|---|---|
SPARQL query |
|
SPARQL update |
|
Graph Store HTTP |
|
http_common.default_graph_store_url(sparql_endpoint) guesses .../sparql → .../data; production apps should pass an explicit graph_store_url.
CI / local integration tests: make fuseki-up (sets ADMIN_PASSWORD=testadmin and FUSEKI_DATASET_1=sparqlmodel_test), export FUSEKI_BASE_URL=http://127.0.0.1:3030 and FUSEKI_ADMIN_PASSWORD=testadmin, then run pytest.
HTTP resilience (0.11+)
Configure on HttpStore / AsyncHttpStore (and via http_store_lifespan / async_http_store_lifespan — kwargs forward to the store constructor):
Parameter |
Default |
Notes |
|---|---|---|
|
|
Up to 3 attempts total ( |
|
|
Seconds; exponential backoff per attempt (capped at 30s) |
|
|
Chunk size for |
|
|
|
Retries apply to: remote SELECT (query), CONSTRUCT pull (pull_subjects_into_mirror), Graph Store GET (sync_mirror), and each UPDATE chunk. Retried status codes: 502, 503, 504, plus httpx connection/timeouts. 4xx and other errors fail immediately (no retry).
Batched UPDATE: update_graph sends all DELETE chunks first, then all INSERT chunks. The local mirror is updated only after every remote chunk succeeds. If chunk k fails after earlier chunks succeeded, the remote dataset may be partially updated; the mirror is not changed — treat as an operator incident and reconcile manually.
UPDATE retries: Retries are safe for idempotent chunks only. If the server applies an UPDATE then returns 503, a retry may send the same INSERT DATA / DELETE DATA again. Use max_retries=0 for sensitive writes or design remote data so duplicate chunks are harmless.
GET SELECT: Useful behind caches or strict read-only proxies. Very long queries can exceed URL length limits on some servers; prefer POST for large SELECT text. CONSTRUCT pull always uses POST. When read_endpoint already includes query parameters (for example Fuseki default-graph-uri), GET merges query= with & rather than adding a second ?.
Compact IRIs: Pass prefixes= on the HTTP store (or use absolute IRIs) so pull_subjects_into_mirror expands schema:Person/1 consistently in CONSTRUCT VALUES and mirror removal.
Set max_retries=0 in tests or when you need immediate failure without transport retries.
Mirror modes (0.10+)
Configure on HttpStore / AsyncHttpStore (and via http_store_lifespan(..., mirror_mode=...)):
|
When |
Typical use |
|---|---|---|
|
Only when the subject lacks the expected |
This app is the primary writer for the endpoint |
|
Every |
Read replicas, admin UIs, or multi-reader apps that must see remote truth |
Replace-on-pull applies to explicit pull_subjects_into_mirror in both modes: outgoing mirror triples (subject, ?, ?) for each requested IRI are removed before remote triples are merged. Triples where the IRI appears only as object are not removed by per-subject pull; use sync_mirror() for a full-graph refresh.
Authority:
API |
Source of truth |
|---|---|
|
Remote endpoint |
|
Mirror (after any pull for that read) |
With writer, a subject already in the mirror may have stale properties until you call pull_subjects_into_mirror, sync_mirror(), or switch to remote_authoritative. Auto-pull on get/refresh runs only when the expected rdf:type is missing from the mirror, not when literals or links are outdated. With remote_authoritative, each get/refresh re-syncs that IRI from remote.
Caution: remote_authoritative does not flush pending put(..., flush=False); avoid reading the same IRI with unflushed local writes.
Session per request (FastAPI)
Use one SPARQLSession per HTTP request — same pattern as SQLAlchemy:
from sparqlmodel.fastapi import SessionDep, http_store_lifespan, init_app
# Lifespan registers shared HttpStore on app.state
# Route handlers: def handler(session: SessionDep): ...
Shared store on
app.state; new session per request.close_on_exit=Falseon shared stores (default viainit_app).Pending
put(..., flush=False)is flushed on successful request end; rolled back on error.
Threading: Do not share one SPARQLSession across threads. See SPECS — Session lifecycle.
Asyncio (0.6+): Use AsyncSPARQLSession with AsyncHttpStore in async FastAPI routes. Do not share one async session across concurrent asyncio tasks (same rule as sync: one session per request/task). In-memory graph work (compiler, hydration, cascade) runs synchronously on the event loop thread; only HTTP store I/O is non-blocking. For CPU-heavy batch jobs in a sync codebase, run_in_executor with a sync session remains valid.
Pagination and sorting (0.8+)
session.query(Person).where(...).order_by(Person.name).offset(20).limit(10).all()
total = session.query(Person).where(...).count()
count() hits the store with a COUNT(DISTINCT ?root) query and does not hydrate rows. On HttpStore, it uses the remote endpoint only (no mirror hydration). For list routes, prefer .all() with .offset() / .limit() and a separate .count() with the same .where() filters.
Identity map and caching
After
put,get(Model, iri, depth=0)returns the same instance when relationships are not materialized on the in-memory object.expire(Model, iri)clears cache for that resource (and drops a pendingputfor that IRI).expunge(model)/expunge_all()detach instances from the session without changing the store.refresh(model, *, depth=0)reloads from the store;merge(model)reconciles a detached instance with the identity map (no store write).depth=0vsdepth=1may cache separate hydrated views.
Validation and quality
Concern |
Today |
Planned |
|---|---|---|
Write validation |
Pydantic on |
SHACL on |
Load validation |
Pydantic via hydration ( |
Same; multi-valued fields 1.1 |
Query logging |
None |
Structured SPARQL log (1.0) |
Bulk import |
Repeated |
Bulk helpers (1.0) |
Async FastAPI routes |
Sync |
|
Security
Use HTTPS for remote endpoints; configure
bearer_tokenorauthonHttpStore.Do not pass user-controlled strings into raw
execute()without parameterization patterns supported by your endpoint.Filter values in the query DSL are serialized via SparqlModel N3 helpers (
rdf_n3) for pyoxigraph-compatible SPARQL.
Monitoring checklist
Log SPARQL execution time and HTTP status from
HttpStore(custom middleware until 1.0).Alert on mirror divergence if you use both
executeandgeton the same dataset.Track pending-queue failures after
flush()(partial writes possible; see SPECS limitations).
Further reading
ORM.md — developer guide