Real-world examples

These examples use real vocabularies, public datasets, and typical integration problems—not synthetic http://example.org/ toys. They are adapted from the TripleModel real-world suite to show SparqlModel patterns: load bundled Turtle into a MemoryStore, then use SPARQLSession for queries, get, and execute.

Source tree: examples/realworld/ (scripts below are included from that directory at doc build time).

Overview

Example

Script

Data

Nobel laureates

nobel_laureates.py

data/nobel_laureates_1901.ttl

DCAT catalog

dcat_data_catalog.py

data/dcat_nobel_catalog.ttl

Wikidata capitals

wikidata_capitals.py

data/wikidata_capitals.ttl

Schema.org NGOs

schema_org_ngos.py

data/schema_org_ngos.ttl

Provenance and licenses: DATA_SOURCES.md.

Run locally

pip install sparqlmodel

From the SparqlModel repository root:

PYTHONPATH=src python examples/realworld/nobel_laureates.py
PYTHONPATH=src python examples/realworld/dcat_data_catalog.py
PYTHONPATH=src python examples/realworld/wikidata_capitals.py
PYTHONPATH=src python examples/realworld/schema_org_ngos.py

Load bundled Turtle

Each script opens data with the public API :meth:~sparqlmodel.session.SPARQLSession.from_rdf_file (in-memory :class:~sparqlmodel.stores.memory.MemoryStore):

from pathlib import Path

from sparqlmodel import SPARQLSession

DATA_DIR = Path(__file__).resolve().parent / "data"

with SPARQLSession.from_rdf_file(
    DATA_DIR / "nobel_laureates_1901.ttl",
    prefixes=PREFIXES,
) as session:
    ...

Pass a :class:~pathlib.Path (or path string), not file contents—TripleModel treats long strings as path-like sources.

For production, swap the default in-memory store for HttpStore (see SparqlModel production guide) and keep the same session API.


Nobel Prize linked data (1901)

Problem: Cultural heritage and science datasets publish stable URIs and a shared ontology; you need typed models and filters over an existing graph.

Data: nobel_laureates_1901.ttl — excerpt aligned with Nobel Prize linked data examples.

#!/usr/bin/env python3
"""Nobel Prize linked data (1901): query laureates with :class:`~sparqlmodel.session.SPARQLSession`.

Problem: integrate biographical linked open data where resources already have
stable URIs and a published ontology (common in cultural heritage and science).

Data: ``examples/realworld/data/nobel_laureates_1901.ttl``
Source: https://www.nobelprize.org/about/linked-data-examples/
"""

from __future__ import annotations

from pathlib import Path

from sparqlmodel import IRI, Field, SPARQLModel, SPARQLSession

DATA_DIR = Path(__file__).resolve().parent / "data"

NOBEL = "http://data.nobelprize.org/terms/"
RDFS = "http://www.w3.org/2000/01/rdf-schema#"
PREFIXES = {
    "nobel": NOBEL,
    "rdfs": RDFS,
    "foaf": "http://xmlns.com/foaf/0.1/",
}


class Laureate(SPARQLModel):
    """Person or organisation receiving a Nobel Prize (``nobel:Laureate``)."""

    rdf_type = "nobel:Laureate"
    __prefixes__ = PREFIXES

    id: IRI
    name: str = Field("rdfs:label")
    gender: str | None = Field("foaf:gender", default=None)


class NobelPrize(SPARQLModel):
    """Award instance for a category and year (``nobel:NobelPrize``)."""

    rdf_type = "nobel:NobelPrize"
    __prefixes__ = PREFIXES

    id: IRI
    title: str = Field("rdfs:label")
    year: str = Field("nobel:year")


def main() -> None:
    with SPARQLSession.from_rdf_file(
        DATA_DIR / "nobel_laureates_1901.ttl", prefixes=PREFIXES
    ) as session:
        laureates = session.query(Laureate).all()
        prizes = session.query(NobelPrize).all()
        print(
            f"Loaded {len(laureates)} laureates and {len(prizes)} prizes from 1901 excerpt"
        )
        for person in sorted(laureates, key=lambda m: m.name):
            print(f"  {person.name} ({person.gender})")

        roentgen = next(p for p in laureates if "Röntgen" in p.name)
        physics = session.query(NobelPrize).where(NobelPrize.year == "1901").all()
        physics_1901 = next(p for p in physics if "Physics" in p.title)
        assert physics_1901.year == "1901"

        male_laureates = session.query(Laureate).where(Laureate.gender == "male").all()
        assert roentgen in male_laureates

        loaded = session.get(Laureate, roentgen.id)
        assert loaded is not None and loaded.name == roentgen.name
        print("Round-trip OK for Wilhelm Conrad Röntgen")


# Example output:
# Loaded 6 laureates and 5 prizes from 1901 excerpt
#   Emil Adolf von Behring (male)
#   Frédéric Passy (male)
#   Jacobus Henricus van 't Hoff (male)
#   Jean Henry Dunant (male)
#   Sully Prudhomme (male)
#   Wilhelm Conrad Röntgen (male)
# Round-trip OK for Wilhelm Conrad Röntgen

Note

rdfs:label values in the bundle include language tags (@en). Equality filters on name must match the stored literal form; this example filters on gender and uses session.get by IRI for round-trip checks.


DCAT open data catalog

Problem: Governments and EU portals publish DCAT metadata so users can discover datasets and SPARQL endpoints before downloading data.

Data: dcat_nobel_catalog.ttl.

Use IRI for object fields that are resources in the graph (e.g. dcat:accessURL). Multi-valued dcat:keyword in the bundle hydrates as the first value only (see Troubleshooting).

#!/usr/bin/env python3
"""DCAT data catalog: discover datasets and SPARQL endpoints with the query DSL.

Problem: governments and EU institutions publish metadata as DCAT/DCAT-AP so
users can find datasets and SPARQL/HTTP distributions before downloading data.

Data: ``examples/realworld/data/dcat_nobel_catalog.ttl``
"""

from __future__ import annotations

from pathlib import Path

from sparqlmodel import IRI, Field, SPARQLModel, SPARQLSession

DATA_DIR = Path(__file__).resolve().parent / "data"

DCAT = "http://www.w3.org/ns/dcat#"
DCT = "http://purl.org/dc/terms/"
PREFIXES = {"dcat": DCAT, "dct": DCT}


class DataCatalog(SPARQLModel):
    rdf_type = "dcat:Catalog"
    __prefixes__ = PREFIXES

    id: IRI
    title: str = Field("dct:title")
    description: str | None = Field("dct:description", default=None)


class Dataset(SPARQLModel):
    rdf_type = "dcat:Dataset"
    __prefixes__ = PREFIXES

    id: IRI
    title: str = Field("dct:title")
    description: str | None = Field("dct:description", default=None)
    keyword: str | None = Field("dcat:keyword", default=None)


class Distribution(SPARQLModel):
    rdf_type = "dcat:Distribution"
    __prefixes__ = PREFIXES

    id: IRI
    title: str = Field("dct:title")
    access_url: IRI = Field("dcat:accessURL")


def main() -> None:
    with SPARQLSession.from_rdf_file(
        DATA_DIR / "dcat_nobel_catalog.ttl", prefixes=PREFIXES
    ) as session:
        catalogs = session.query(DataCatalog).all()
        datasets = session.query(Dataset).all()
        distributions = session.query(Distribution).all()

        print(f"Catalog: {catalogs[0].title}")
        for ds in datasets:
            print(f"  Dataset: {ds.title}")
            if ds.keyword:
                print(f"    Keyword: {ds.keyword}")
        for dist in distributions:
            print(f"  Distribution: {dist.title}")
            print(f"    accessURL: {dist.access_url}")

        sparql_dist = session.query(Distribution).where(
            Distribution.access_url == IRI("http://data.nobelprize.org/sparql")
        ).first()
        assert sparql_dist is not None
        assert "Nobel prize" in (datasets[0].keyword or "")
        print("DCAT catalog query OK")


# Example output:
# Catalog: Nobel Media Dataset catalog
#   Dataset: Linked Nobel prizes
#     Keyword: Nobel prize
#   Distribution: Nobel Prize SPARQL endpoint
#     accessURL: http://data.nobelprize.org/sparql
# DCAT catalog query OK


Wikidata capital cities

Problem: Wikidata (and similar KGs) often assert types with wdt:P31 rather than rdf:type, so the default session.query type pattern (?s a <Class>) may not match.

Data: wikidata_capitals.ttl — Paris and London with population and country (CC0).

Approach: session.execute with Wikidata property patterns, then from_graph(..., validate_type=False). session.execute on MemoryStore supports SELECT (not ASK).

#!/usr/bin/env python3
"""Wikidata capital cities: session execute + graph load (P31, not only rdf:type).

Problem: knowledge-graph pipelines (Wikidata) often use property assertions
(``wdt:P31``) instead of ``rdf:type``; combine raw SPARQL with ``from_graph``.

Data: ``examples/realworld/data/wikidata_capitals.ttl``
Source: Wikidata Q90, Q84 — CC0 1.0
"""

from __future__ import annotations

from pathlib import Path

from sparqlmodel import IRI, Field, SPARQLModel, SPARQLSession

DATA_DIR = Path(__file__).resolve().parent / "data"

WD = "http://www.wikidata.org/entity/"
WIKIDATA_PREFIXES = {
    "wd": WD,
    "wdt": "http://www.wikidata.org/prop/direct/",
    "rdfs": "http://www.w3.org/2000/01/rdf-schema#",
}


class Country(SPARQLModel):
    """Wikidata item with an English label (e.g. France, United Kingdom)."""

    rdf_type = "wd:Q6256"
    __prefixes__ = WIKIDATA_PREFIXES

    id: IRI
    label_en: str | None = Field("rdfs:label", default=None)


class CapitalCity(SPARQLModel):
    """Capital city facts: label, population, country item IRI."""

    rdf_type = "wd:Q174844"
    __prefixes__ = WIKIDATA_PREFIXES

    id: IRI
    label_en: str | None = Field("rdfs:label", default=None)
    population: int = Field("wdt:P1082")
    country: IRI = Field("wdt:P17")


def main() -> None:
    with SPARQLSession.from_rdf_file(
        DATA_DIR / "wikidata_capitals.ttl", prefixes=WIKIDATA_PREFIXES
    ) as session:
        large_cities = session.execute(
            """
            PREFIX wdt: <http://www.wikidata.org/prop/direct/>
            SELECT ?city WHERE {
              wd:Q90 wdt:P1082 ?pop .
              FILTER(?pop > 2000000)
              BIND(wd:Q90 AS ?city)
            }
            """
        )
        assert len(large_cities) == 1

        capital_rows = session.execute(
            """
            PREFIX wdt: <http://www.wikidata.org/prop/direct/>
            SELECT ?city ?pop WHERE {
              ?city wdt:P31 wd:Q174844 ; wdt:P1082 ?pop .
            }
            ORDER BY DESC(?pop)
            """
        )
        cities: list[CapitalCity] = []
        for row in capital_rows:
            city = CapitalCity.from_graph(
                session.graph,
                row["city"],
                validate_type=False,
            )
            cities.append(city)

        print("European capitals (Wikidata excerpt):")
        for city in cities:
            country = Country.from_graph(
                session.graph,
                str(city.country),
                validate_type=False,
            )
            print(
                f"  {city.label_en}: population={city.population:,} "
                f"country={country.label_en} ({city.country})"
            )

        paris = next(c for c in cities if str(c.id).endswith("Q90"))
        assert paris.label_en == "Paris"
        assert paris.population == 2_103_778
        france = Country.from_graph(session.graph, str(paris.country), validate_type=False)
        assert france.label_en == "France"
        print("Paris load OK (country link via wdt:P17)")


# Example output:
# European capitals (Wikidata excerpt):
#   London: population=8,799,728 country=United Kingdom (http://www.wikidata.org/entity/Q145)
#   Paris: population=2,103,778 country=France (http://www.wikidata.org/entity/Q142)
# Paris load OK (country link via wdt:P17)


Schema.org NGO registry

Problem: Transparency and search pipelines expose schema:NGO records; you want Pydantic validation and session APIs over that graph.

Data: schema_org_ngos.ttl.

#!/usr/bin/env python3
"""Schema.org NGOs: nonprofit registry records via session query and get.

Problem: transparency portals publish organization metadata with schema.org;
map it into Pydantic models for validation and filter with the ORM query DSL.

Data: ``examples/realworld/data/schema_org_ngos.ttl``
"""

from __future__ import annotations

from pathlib import Path

from sparqlmodel import IRI, Field, SPARQLModel, SPARQLSession

DATA_DIR = Path(__file__).resolve().parent / "data"

SCHEMA = "https://schema.org/"


class NgoOrganization(SPARQLModel):
    rdf_type = "schema:NGO"
    __prefixes__ = {"schema": SCHEMA, "xsd": "http://www.w3.org/2001/XMLSchema#"}

    id: IRI
    name: str = Field("schema:name")
    url: str = Field("schema:url")
    nonprofit_status: str | None = Field("schema:nonprofitStatus", default=None)
    founding_year: int | None = Field("schema:foundingDate", default=None)


def main() -> None:
    with SPARQLSession.from_rdf_file(DATA_DIR / "schema_org_ngos.ttl") as session:
        ngos = session.query(NgoOrganization).all()
        print(f"Loaded {len(ngos)} NGO records")
        for org in sorted(ngos, key=lambda o: o.name):
            founded = org.founding_year if org.founding_year is not None else "n/a"
            print(f"  {org.name} (founded {founded}) — {org.url}")

        wwf = session.get(NgoOrganization, IRI("https://example.org/org/wwf"))
        assert wwf is not None
        ttl = wwf.serialize(format="turtle")
        assert "World Wide Fund" in ttl or "schema:name" in ttl

        with_status = session.query(NgoOrganization).where(
            NgoOrganization.nonprofit_status == "NonprofitANBI"
        ).all()
        assert len(with_status) == len(ngos)
        print("Schema.org NGO session OK")


# Example output:
# Loaded 3 NGO records
#   International Committee of the Red Cross (founded 1863) — https://www.icrc.org/
#   Médecins Sans Frontières (founded n/a) — https://www.msf.org/
#   World Wide Fund for Nature (founded 1961) — https://www.worldwildlife.org/
# Schema.org NGO session OK


TripleModel vs SparqlModel in these examples

Task

TripleModel (upstream)

SparqlModel (here)

Parse bundled TTL

load_models, parse_file

load_graph + MemoryStore + SPARQLSession

Filter rows

Python list comprehensions

session.query(Model).where(...)

Load one resource

Model.from_graph

session.get(Model, IRI(...), depth=...)

Wikidata P31 typing

instance_of / validate_type=False

execute + from_graph(..., validate_type=False)

Remote SPARQL

load_sparql, construct_from_sparql

HttpStore + same session (see SparqlModel production guide)

Mapping details (literals, serialize, parse) remain in TripleModel; SparqlModel adds the session and query layer on top of SPARQLModel(TripleModel).

What’s next