SparqlModel Technical Specification

Overview

This document specifies the SparqlModel ORM layer: session API, query compilation, hydration, cascade policy, and stores.

SparqlModel — the SQLModel of SPARQL.

Mapping (literals, terms, to_graph, sync_to_graph, from_graph, parse, serialize) is specified and implemented by TripleModel >=0.10, a required dependency (Pydantic TripleModel classes). SparqlModel integrates TripleModel internally; application code uses SPARQLSession and Pydantic v2 SPARQLModel unless doing stateless file I/O.

Concern

SparqlModel

TripleModel

SPARQLSession CRUD

Yes

No

Query DSL + compiler

Yes

No

Cascade / orphans on put

Yes

No

Hydration depth

Yes

No

Stores

Yes

No

Model ↔ triples, terms, files

SPARQLModel(TripleModel) (0.4+); thin serializers wrappers (0.7)

Yes

ORM.md · ECOSYSTEM.md · ROADMAP.md · PRODUCTION.md


Production ORM checklist (1.3 GA gate)

Normative checklist for declaring SparqlModel production-ready (version 0.15). See ROADMAP.md — Forward roadmap and SPARQLMojo parity backlog for milestone versions.

Parity tiers: P0 = required for production HTTP/API apps; P1 = SQLModel / SPARQLMojo parity; P2 = advanced RDF / ecosystem.

P0 — Production APIs

  • SPARQLModel, Field, Relationship, IRI, Pydantic validation

  • SPARQLSessionadd, put, delete, get, query, execute, context manager

  • Query filters: ==, !=, &, |, ordering, in_, multi-hop paths; (A & B) | C precedence (0.2+)

  • limit, first, all with hydration depth 0–2

  • Identity map + flush / rollback_pending / put(..., flush=False)

  • MemoryStore and HttpStore (documented mirror semantics)

  • FastAPI SessionDep, init_app, http_store_lifespan

  • Session I/O via TripleModel (put / get / hydrate) — 0.3.0

  • Option ASPARQLModel(TripleModel); _triple.py removed; rdf_bridge + direct from_graph0.4.0

  • AsyncSPARQLSession — async CRUD, async with, async def execute0.6.0

  • AsyncStoreProtocol + AsyncHttpStore (httpx.AsyncClient) + AsyncMemoryStore0.6.0

  • AsyncQueryasync def all() / first(); same expression DSL as sync — 0.6.0

  • FastAPI AsyncSessionDep + async_http_store_lifespan0.6.0

  • Async/sync parity contract tests on memory and HTTP stores — 0.6.0

  • Query.offset(n)0.8.0

  • Query.order_by(...)0.8.0

  • Query.count()0.8.0

  • OPTIONAL / absence filters for nullable Relationship | None0.8.0

  • HttpStore partial mirror sync — pull_subjects_into_mirror, auto-pull on get0.9.1; on refresh0.9.2

  • HttpStore replace-on-pull + mirror_mode (writer / remote_authoritative) — 0.10.0

  • HttpStore retries, batched UPDATE, SELECT query_method GET/POST — 0.11.0

  • HttpStore full mirror sync (GSP sync_mirror) — 0.12.0

  • Scoped session pattern documented (FastAPI + scripts) — 0.9.0

  • Threading / asyncio concurrency model documented — 0.6 (async) + 0.9 (threads)

P1 — SQLModel / SPARQLMojo parity

See ROADMAP — SPARQLMojo parity backlog for the full catch-up list.

  • merge, refresh, expunge, expunge_all on session — 0.9.0

  • HttpStore separate read/write endpoint URLs — 0.9.1

  • Multi-valued scalar and relationship fields (set[...] / list[...] where TripleModel allows) — 0.13.0

  • Language-tagged literals (LangString, MultiLangString) — 0.13.0

  • Polymorphic queries (Query.polymorphic(), Rdf.ontology_registry) — 0.13.0

  • Property paths (inverse ^, +/*, property_eq escape hatch) — 0.13.0

  • Inverse / back_populates / Relationship(..., inverse=)0.13.0

  • SchemaRegistry (OntologyRegistry alias, lite hints) — 0.13.0

  • IRI field string filters (FieldRef.str() / lower() / upper()) — 0.13.0

  • VALUES clause in query DSL (Query.values(...)) — 0.13.0

  • Query negation (not_() / ~expr) — 0.13.0

  • Filters on collection fields (.in_() on set/list refs and scalars) — 0.13.0

  • HttpStore query_method GET vs POST — 0.11.0

  • Optional SHACL validation on put0.14

P2 — Advanced

  • session.ask(...) or Query.exists() helper wrapping ASK — 0.14+

  • CONSTRUCT / DESCRIBE helpers — 0.14+

  • Named graph scope on session/store — 0.14+

  • Oxigraph or additional store backends — 0.14+

  • SPARQL federation in query layer — future

Explicit non-goals: OWL editor, built-in reasoner, duplicate TripleModel mapping in graph.py.


SPARQLSession

ORM entry point. Binds a Store (default MemoryStore) and namespace registry.

with SPARQLSession() as session:
    session.put(person)
    found = session.query(Person).where(Person.name == "Odos").first()

Methods

Method

Behavior

add(model)

Append triples; no removal of existing subject triples

put(model, *, flush=True)

Remove owned subjects (cascade), then write; queue when flush=False

delete(model)

Remove owned triples for root + embedded composition

get(model_cls, iri, *, depth=0)

Load one resource; optional relationship depth 0–2

query(model_cls)

Return Query builder

execute(sparql)

Raw SELECT; auto-prefixes when configured

flush() / rollback_pending()

Apply or discard pending put queue

close()

Call store.close() when available

expire(model_cls, iri)

Evict identity and hydration cache for an IRI (0.9 also drops pending put for that subject)

expunge(model)

Detach one instance from session cache (store unchanged) — 0.9

expunge_all()

Clear identity map and hydration cache (pending queue unchanged) — 0.9

refresh(model, *, depth=0)

Reload from store into cached instance when present — 0.9

merge(model)

Return canonical session instance for identity key (no store write) — 0.9

Context manager

On clean exit: flush() if the pending queue is non-empty. On exception: rollback_pending() when rollback_on_error=True (default). Always calls close() when close_on_exit=True (default). Does not undo already-flushed writes.

Properties

  • store — backing store

  • graphtriplemodel.Store (MemoryStore graph, or HttpStore local mirror — not the remote dataset)

  • namespacesNamespaceRegistry for compiler and serialization

Session lifecycle (target API)

Current (0.9): Context manager flushes pending put queue on success; rollback_pending on error; expire(model_cls, iri) evicts identity and hydration cache; merge, refresh, expunge, and expunge_all for explicit cache control (sync and async). Not thread-safe.

Target (1.2):

Method

Behavior

merge(model)

Attach detached/transient instance to session; reconcile with identity map

refresh(model, *, depth=0)

Reload from store; replace cached attributes

expunge(model)

Remove one instance from identity map

expunge_all()

Clear identity map and hydration cache

scoped_session(...)

Factory for request-scoped sessions (FastAPI pattern)

Object states (SQLAlchemy-aligned):

transient → (add|put) → pending (flush=False) → persistent (in store + identity map)
persistent → delete → (removed from store; expunge clears session)
persistent → expunge → detached (no session; may merge again)

Threading: One SPARQLSession per task/request unless documented otherwise; shared HttpStore requires external synchronization or single-writer discipline.


Query builder

with SPARQLSession() as session:
    session.query(Person).where(Person.name == "Odos").all()
    session.query(Person).where(Person.works_for.name == "Acme").limit(10).first()
  • .where(*expr)CompareExpr, AndExpr, or top-level OrExpr

  • .limit(n) — non-negative integer

  • .offset(n) — non-negative integer (0.8)

  • .order_by(field, *, desc=False) — scalar field only; repeatable (0.8)

  • .count() — returns int; ignores limit/offset/order_by (0.8)

  • .first() — always uses LIMIT 1; ignores any prior .limit() or .offset() on the same query

  • .use_not_exists_for_ne() — compile != with NOT EXISTS (default since 0.5.2)

  • .use_inequality_for_ne() — legacy inequality != (pre-0.5.2 default)

  • .all(*, depth=0) / .first(*, depth=0) — execute and hydrate

Query builder (target API)

Current (0.8): .offset(n), .order_by(field, *, desc=False), .count() (ignores limit/offset/order_by). .first() always LIMIT 1 and ignores .limit() / .offset(). Nullable relationship hops use OPTIONAL; relationship.is_(None) / is_not(None) for absence/presence. No distinct or field projection.

Target (post-1.3):

Method

SPARQL

.distinct()

DISTINCT projection (if supported)

Precedence: Python & binds tighter than |; (A & B) | C is two disjuncts (fixed 0.2).


SPARQL compilation

Person.name == "Odos" → SPARQL triple patterns bound to ?person.

Operator

Semantics

==

Pattern match

!=

NOT EXISTS by default (or Query.use_inequality_for_ne() for legacy inequality)

&

Conjoin patterns (AndExpr or multiple .where)

|

Disjunction via FILTER + EXISTS branches (OrExpr)

<, >, <=, >=

Ordering on bound literal variables

.in_(tuple) / .in_(list)

FILTER(?var IN (...)) — bare str raises QueryError (use ("value",) or ["value"])

None

Raises QueryError

Nested attribute paths (Person.works_for.located_in.name) support arbitrary hop length via join variables and related-type patterns.

Implementation: compiler.pySparqlModel only; TripleModel does not compile Python filters.


Hydration

with SPARQLSession() as session:
    session.get(Person, iri, depth=2)
    session.query(Person).where(...).all(depth=1)

depth

Loads

0

Scalars on root

1

One hop of Relationship fields

2

Two hops

validate_depth rejects values outside 0–2.

Integration note (0.3.x): scalar and relationship loading uses sparql_from_graph → TripleModel from_graph via interim _triple.py. 0.4+: SPARQLModel.from_graph on the unified subclass + SparqlModel depth hydration.


SPARQLModel

ORM entity base class. SQLModel-style declaration:

class Person(SPARQLModel):
    rdf_type = "schema:Person"
    __prefixes__ = {"schema": "https://schema.org/"}

    id: IRI
    name: str = Field("schema:name")
  • Metaclass enables Person.name == "x" in queries (FieldRef)

  • ensure_id() assigns urn:uuid:… when id is unset

  • JSON-LD helpers: model_dump_jsonld / model_validate_jsonld (ORM dict API; file JSON-LD via serialize0.7)

  • Subclasses TripleModel (Option A, 0.4+); merged metaclass for query FieldRef

  • model_config uses extra="forbid"

  • Field / Relationship are ORM sugar over rdf_field / Predicate (built at class creation, no exec)

Interim (0.3.x): dynamic shadow TripleModel classes via sparqlmodel._tripleremoved in 0.4.

See also Models and Pydantic validation for application patterns.

Validation architecture

Three layers; all are complementary, not interchangeable.

Layer

When

Mechanism

Application (Pydantic)

SPARQLModel(...) / model_validate

Field types, Field constraints, extra="forbid"

Mapping (TripleModel)

from_graph(..., validate_type=True)

Expected rdf:type on subject; literal coercion per field

Graph shapes (optional)

put0.14

SHACL via triplemodel[shacl]; after Pydantic passes

Write path (0.4+): validated SPARQLModel → cascade in graph.pysync_to_graph(model, store.graph, …).

Write path (0.3.x interim): validated SPARQLModelto_triplemodelTripleModel.model_validatesync_to_graph.

Read path (0.4+): graph → SPARQLModel.from_graph → optional depth hydration → identity map; Pydantic ValidationError surfaced as HydrationError.

Planning rule: new ORM features should extend Pydantic annotations and Field kwargs before adding ad-hoc validation in session or compiler code. See SparqlModel Roadmap (Pydantic-first).


Relationships

works_for: Organization | None = Relationship("schema:worksFor", model=Organization)

Value type

Semantics

Embedded SPARQLModel

Composition — cascade on put/delete

IRI

Reference — no cascade delete of target

Relationships and hydration (target API)

Current (0.2): Single object per predicate on load; depth 0–2 eager-loads Relationship fields; composition cascade on put/delete.

Target (0.13):

  • list[T] / collection fields for multi-valued literals and IRIs (via TripleModel) — 0.13

  • Language-tagged fields (LangString, multi-lang maps) — 0.13 (TripleModel)

  • Polymorphic session.query(Base).where(...) matching subclasses — 0.13

  • Property paths, VALUES clause, IRI string filters, query negation — 0.13 (parity backlog)

  • Compiler emits OPTIONAL for nullable relationship paths in filters — 0.8.0

  • Optional Relationship(..., back_populates=...) / inverse navigation — 0.13


Persistence policy

SparqlModel-specific; orchestrates which subjects TripleModel (or interim graph.py) syncs.

put

  1. Compute cascade_subjects_for_removal (root, nested embeds, orphans on relationship change)

  2. Remove owned_triples_for_subjects from store graph

  3. Add current model graph (model_to_graph → future: TripleModel export + cascade)

delete

Remove owned triples for cascade subject set (no re-add).

Ownership rules

  • Only declared predicates + rdf:type are owned

  • Extension triples on a subject are not removed by put/delete

  • Orphan keys use expanded IRIs and stable _:bnode keys


Mapping integration (TripleModel)

Dependencies (0.5+): triplemodel>=0.10.0,<2, pyoxigraph>=0.5,<0.6 in pyproject.toml (no core rdflib).

Today (0.7+): SPARQLModel(TripleModel); session graphs are triplemodel.Store; graph.py holds cascade/orphan policy; rdf_bridge owns graph I/O. serializers.py is thin wrappers over TripleModel infer_format, load_graph, and serialize.

Target wiring (0.4+):

SparqlModel surface

TripleModel API

put graph write

sync_to_graph(model, graph, mode=...) + cascade (same instance type)

get / query load

SPARQLModel.from_graph + depth hydration

export_model

to_graph().serialize(...) or serialize()

Predicate metadata

rdf_field, Predicate, nested class Rdf

Cascade orchestration remains in SparqlModel after wiring.


HttpStore

SPARQL 1.1 over HTTP (httpx) with a local mirror (stores/http.py).

Method

Target

update_graph

Remote INSERT DATA / DELETE DATA, then mirror delta on success

query / execute (via session)

Remote SELECT

graph, get, cascade/orphan

Mirror only

External writers or SELECT-only visibility without a matching mirror update can make get return None while execute returns bindings. Single-writer per endpoint is assumed. If both auth and bearer_token are set, Basic auth wins.

put may send DELETE DATA followed by INSERT DATA in one SPARQL Update request; whether that is atomic depends on the endpoint (not guaranteed in 0.2). After HttpStore.close(), query, update_graph, and pull_subjects_into_mirror raise RuntimeError (same for AsyncHttpStore.aclose()).

HTTP resilience (0.11+)

Constructor kwargs on HttpStore / AsyncHttpStore (sync + async parity):

Parameter

Default

Behavior

max_retries

2

Retry 502/503/504 and connection/timeouts on SELECT, CONSTRUCT pull, and each UPDATE chunk

retry_backoff

0.5

Exponential backoff between attempts (cap 30s)

max_triples_per_update

500

Split update_graph into multiple INSERT DATA / DELETE DATA requests

query_method

"post"

"get" for remote SELECT only (CONSTRUCT stays POST)

Mirror updates run only after all remote UPDATE chunks succeed. Mid-batch remote failure leaves the mirror unchanged; remote state may be partial.

Store protocol (target API)

Current (0.2): Storegraph, query(sparql), update_graph(add=, remove=).

Target (Production HttpStore 0.10–0.12):

Capability

Notes

query

SPARQL 1.1 SELECT (required)

update

Chunked INSERT DATA / DELETE DATA — shipped 0.11.0 (atomic multi-op sequences still endpoint-dependent)

query_method

GET vs POST for remote SELECT — shipped 0.11.0

ask / construct

Optional protocol methods for existence and graph-shaped reads — 0.14 (P2)

HttpStore read_endpoint / write_endpoint

Fuseki-style split URLs — 0.9.1 (shipped)

Replace-on-pull, mirror_mode

Shipped 0.10.0

Mirror sync (GSP sync_mirror)

Shipped 0.12.0 (graph_store_url, sync_mirror)

Retries, batch size limits

Shipped 0.11.0 (max_retries, max_triples_per_update)

OxigraphStore / embedded backends

Optional — 0.14+

Protocols: SPARQL 1.1 Query, SPARQL 1.1 Update, Graph Store HTTP.


Security (SPARQL generation)

Current (0.5+): Filter values serialized via SparqlModel N3 helpers (rdf_n3) on pyoxigraph terms and string IRIs. IRIs with invalid characters raise QueryError. Predicates come from model metadata (trusted code).

Target (1.3 GA):

  • No public API that concatenates untrusted strings into SPARQL text

  • Predicates and class IRIs remain declaration-time only

  • LIMIT / OFFSET remain integer-typed at API boundary

  • Security review documented before 1.3 GA


Async API (target 0.6)

Parallel to the sync stack; sync API remains supported.

Component

Sync (shipped)

Async (0.6.0)

Session

SPARQLSession

AsyncSPARQLSession

Store

Store, MemoryStore, HttpStore

AsyncStore, AsyncMemoryStore, AsyncHttpStore

Query

Query.all() / first()

AsyncQueryawait .all() / .first()

FastAPI

SessionDep, sync get_session

AsyncSessionDep, async get_async_session

Semantics: Same identity map, cascade, compiler, and hydration rules as sync. One session per asyncio task (not shared across concurrent tasks). HttpStore uses httpx.Client; AsyncHttpStore uses httpx.AsyncClient with the same mirror contract.

Non-goals for 0.6: Replacing sync session; async TripleModel mapping APIs (unified model stays sync; in-memory graph work stays on the event loop thread).


Known limitations

Until 0.4 (unified model)

Area

Behavior

Dual model types

0.3 uses interim _triple.py dynamic adapter; 0.4 unifies on SPARQLModel(TripleModel)

HttpStore mirror (0.12+)

Area

Behavior

Mirror vs remote

get / cascade use mirror; query uses remote

External writers

Use pull_subjects_into_mirror, mirror_mode="remote_authoritative", or sync_mirror() (GSP GET; requires graph_store_url)

Multi-writer endpoints

Assume single writer per endpoint; reconcile with sync_mirror() after bulk external changes

Permanent constraints

Area

Behavior

Composition vs reference

Embedded SPARQLModel cascades; IRI references do not

Owned triples

Only declared predicates + rdf:type removed on put/delete

add vs put

add does not remove stale triples

put(..., flush=False)

Pending models not visible in get until flush

flush()

Not a full remote transaction; partial failure re-queues remainder (0.2+)

Sessions

Not thread-safe; one session per task unless scoped externally

Closed session

After close(), all CRUD/query methods raise RuntimeError; share the store via a new session

Interim mapping

0.3.0: _triple.py adapter; 0.4 Option A removes it; serializers.py thin wrappers since 0.7

Other (current)

Area

Behavior

Duplicate predicates

Two fields with the same expanded predicate on one model class → ConfigurationError at class definition

Write-path cycles

Cyclic embedded SPARQLModel graphs → ConfigurationError on put / model_to_graph

Shared composition

Orphan cleanup skips embedded targets still linked from subjects outside the current put cascade

Pending put

Identity for that subject evicted when queued; close() with pending writes raises RuntimeError

Nested query filters

Related resource must have expected rdf:type

AND filters, same path

Compiler reuses join variables per relationship path within one WHERE / EXISTS block

JSON-LD

model_dump_jsonld vs export_model(..., "json-ld") differ; non-cascade embeds omitted from model_to_jsonld

Export without id

ensure_id() may assign urn:uuid:…


Optional: export and files

ORM workflows do not require sparqlmodel.serializers.

Long term: all formats via TripleModel; SparqlModel may expose session-scoped helpers only.


FastAPI (optional extra)

Install sparqlmodel[fastapi]. fastapi/deps.py provides init_app, get_session, SessionDep, http_store_lifespan. fastapi/__init__.py provides turtle_response, jsonld_response, negotiated_response.


Feature ownership

Feature

Owner

SHACL shapes / validation engine

TripleModel [shacl]

SHACL on session.put

SparqlModel hook calling TripleModel

Named graphs / Dataset

TripleModel; SparqlModel consumes

SPARQL federation in apps

SparqlModel

Alternate store backends

SparqlModel stores/

OWL reasoner

Out of scope


Maintainer boundaries

For end users, use ORM.md. This table is for contributors.

Symptom

Fix in

Wrong XSD / literal on export

TripleModel

Subject IRI collision

TripleModel

Stale predicate after put

TripleModel sync + SparqlModel cascade

Orphan after relationship change

SparqlModel graph.py

!= / nested filter wrong

SparqlModel compiler.py

New RDF format

TripleModel

Fuseki / HTTP store

SparqlModel stores/

Anti-patterns: new mapping code only in graph.py; session/compiler in TripleModel; triplemodel importing sparqlmodel.


Package layout

sparqlmodel/
  session.py       # ORM unit of work
  query.py         # query builder
  compiler.py      # ORM-only
  hydration.py     # depth; → TripleModel load
  model.py         # SPARQLModel(TripleModel) — 0.4+
  fields.py        # Field/Relationship sugar → rdf_field / Predicate
  graph.py         # cascade/orphan policy only
  serializers.py   # thin TripleModel parse/serialize wrappers (0.7)
  stores/
  rdf_bridge.py    # graph I/O (Option A; replaced _triple.py in 0.4)

Dependencies

pydantic>=2.5,<3
pyoxigraph>=0.5,<0.6
triplemodel>=0.10.0,<2
typing-extensions>=4.8

Optional: httpx, fastapi