Relationship Modeling Patterns in Knowledge Graphs

Core Models
Relationship Patterns
Relationship Properties
Real-World Implementations
- Wikidata
- Schema.org
- FOAF
- SKOS
Validation and Constraints
- SHACL
- OWL Constraints
Controversies and Open Questions
Anti-Patterns and Pitfalls
Best Practices Summary

Core Models

RDF Triples

RDF (Resource Description Framework) represents relationships as triples: subject-predicate-object.

:Joe :knows :Alice .
:Joe :employer :Acme .

Each triple has three components:

Subject: The entity being described (always an IRI or blank node)
Predicate: The relationship/property (always an IRI)
Object: The target (IRI, blank node, or literal value)

Node types in RDF:

Type	Description	Example
IRI	Globally unique identifier	`http://example.org/Joe`
Blank Node	Anonymous node, no global ID	`_:b1`
Literal	Concrete value with datatype	`"Joe"^^xsd:string`

Key characteristics:

Properties are first-class citizens (IRIs)
Graphs are sets of triples
Open world assumption (absence of data ≠ negation)
No native support for relationship metadata

Property Graphs

Property graphs (Neo4j, etc.) allow properties on both nodes AND edges.

(:Person {name: "Joe"})-[:KNOWS {since: 2020}]->(:Person {name: "Alice"})

Key characteristics:

Relationships can have properties directly
Relationships have types (labels) and direction
Closed world assumption typical
No built-in global identifiers

Naming conventions (Neo4j):

Element	Convention	Example
Node labels	CamelCase	`Person`, `Movie`
Relationship types	ALL_CAPS	`ACTED_IN`, `KNOWS`
Properties	camelCase	`name`, `startDate`

Comparison: RDF vs Property Graphs

Aspect	RDF	Property Graph
Relationship metadata	Requires reification/workarounds	Native edge properties
Scalability	Harder for analytics workloads	Better for large-scale analytics
Semantic richness	OWL reasoning, inference	Limited inference
Standards	W3C standards (SPARQL, OWL, SHACL)	Vendor-specific
Flexibility	Built for schema evolution	More rigid schemas
Query language	SPARQL	Cypher, Gremlin
Learning curve	Steeper	More intuitive

When to use RDF:

Semantic reasoning required
Data from multiple distributed sources
Long-term data reuse and extension
Standards compliance matters

When to use Property Graphs:

Performance-critical applications
OLTP (transactional) workloads
Developer productivity priority
Complex local queries over deep traversals

Relationship Patterns

N-ary Relationships

RDF properties are inherently binary (subject→object). N-ary relationships involve more than two participants.

Problem: How do you model “Joe bought a book from Alice for $20”?

Solution: Create a class representing the relationship itself.

:purchase1 a :Purchase ;
    :buyer :Joe ;
    :seller :Alice ;
    :item :Book123 ;
    :price 20 .

The relationship becomes a first-class entity that connects all participants.

Use cases:

Transactions (buyer, seller, item, price, date)
Events (organizer, attendees, location, time)
Measurements (subject, value, unit, method, time)
Attributions (source, claim, confidence, context)

W3C guidance: Defining N-ary Relations on the Semantic Web

Qualified Relations

A qualified relation adds context to what would otherwise be a simple binary relationship.

Problem: “Joe worked at Acme” needs temporal context.

Without qualification:

:Joe :employer :Acme .

With qualification:

:Joe :employment :emp1 .
:emp1 a :Employment ;
    :organization :Acme ;
    :startDate "2020-01-01"^^xsd:date ;
    :endDate "2023-06-15"^^xsd:date ;
    :role "Engineer" .

Common qualifiers:

Temporal (start, end, duration)
Provenance (source, confidence, method)
Role/context (position, capacity, relationship type)
Attribution (who said it, when, certainty)

Trade-off: Each qualified relation requires 2+ predicates and a class, expanding the vocabulary significantly.

Reification

Reification makes statements about statements—turning a triple into a resource.

Standard RDF reification (verbose, rarely used):

:statement1 a rdf:Statement ;
    rdf:subject :Joe ;
    rdf:predicate :knows ;
    rdf:object :Alice .

:statement1 :source :LinkedIn ;
    :confidence 0.9 .

Problems with standard reification:

4 triples just to describe 1 triple
Complex queries
3-4x storage overhead
Doesn’t actually assert the original triple

When reification is appropriate:

Provenance tracking (who said what, when)
Uncertainty/confidence scores
Describing changes to a graph
Reasoning about statements

RDF-star

RDF-star is a modern extension that solves reification’s verbosity problem.

Syntax (Turtle-star):

<<:Joe :knows :Alice>> :since 2020 ;
                        :source :LinkedIn .

The triple <<:Joe :knows :Alice>> can be used as a subject or object.

Key concepts:

Quoted triple: Referenced but not necessarily asserted
Asserted triple: Makes a factual claim
A triple can be both quoted and asserted

Advantages over standard reification:

Compact syntax
Intuitive model (feels like edge properties)
SPARQL-star for querying
Backward compatible

Current status: RDF 1.2 includes RDF-star; growing database support.

Named Graphs

Named graphs assign an IRI to a collection of triples.

GRAPH :graph1 {
    :Joe :knows :Alice .
    :Joe :knows :Bob .
}

:graph1 :source :LinkedIn ;
        :retrievedDate "2024-01-15"^^xsd:date .

Use cases:

Provenance: Track where data came from
Trust/authority: Different sources have different trust levels
Versioning: Snapshots of data at different times
Access control: Different visibility for different graphs
Partitioning: Organize large datasets

Named graphs vs reification:

Named graphs: metadata about groups of statements
Reification: metadata about individual statements
Often used together

Relationship Properties

Property Hierarchies

Properties can form hierarchies using rdfs:subPropertyOf.

:hasMother rdfs:subPropertyOf :hasParent .
:hasFather rdfs:subPropertyOf :hasParent .

If :Joe :hasMother :Mary, a reasoner infers :Joe :hasParent :Mary.

Benefits:

Query for broader relationships
Organize vocabularies
Enable reasoning/inference

SKOS hierarchies (for concept relationships):

:Poodle skos:broader :Dog .
:Dog skos:broader :Animal .
:Dog skos:related :Pet .

Property	Meaning
`skos:broader`	More general concept
`skos:narrower`	More specific concept
`skos:related`	Associative (non-hierarchical)

Inverse, Symmetric, Transitive

OWL property characteristics:

Characteristic	Meaning	Example
Inverse	If P(A,B) then P⁻¹(B,A)	`hasChild` ↔ `hasParent`
Symmetric	If P(A,B) then P(B,A)	`marriedTo`, `knows`
Transitive	If P(A,B) and P(B,C) then P(A,C)	`ancestorOf`, `partOf`
Functional	At most one value	`hasBirthDate`
InverseFunctional	Values are unique	`hasSocialSecurityNumber`

Declaration example:

:knows a owl:SymmetricProperty .
:hasParent owl:inverseOf :hasChild .
:ancestorOf a owl:TransitiveProperty .

Temporal Relationships

Temporal Knowledge Graphs (TKGs) track when relationships are valid.

Representation approaches:

Quadruples (subject, predicate, object, time):

(:Joe, :worksAt, :Acme, [2020-01-01, 2023-06-15])

Qualifiers (Wikidata style):

:Joe :employer :Acme .
# Plus qualifiers: P580 (start time), P582 (end time)

Reification/RDF-star:

<<:Joe :employer :Acme>> :validFrom "2020-01-01" ;
                          :validUntil "2023-06-15" .

W3C Time Ontology provides vocabulary:

Instants vs intervals
Before, after, during relations
Duration types
Calendar systems

TKG applications:

Completion: Fill in missing facts at a time point
Forecasting: Predict future relationships
Change detection: Track relationship evolution
Historical queries: “Who was CEO in 2015?”

Real-World Implementations

Wikidata

Wikidata is the world’s largest open knowledge graph, using a sophisticated relationship model.

Core structure:

Item (Q-number) → Property (P-number) → Value
Q80 (Tim Berners-Lee) → P108 (employer) → Q42944 (CERN)

Qualifiers: Properties that add context to statements

Q80 (Tim Berners-Lee)
  P108 (employer): Q42944 (CERN)
    P580 (start time): 1984
    P582 (end time): 1994
    P794 (as): software engineer

Key features:

Statements can have multiple qualifiers
Properties can have constraints on allowed qualifiers
Ranks (preferred, normal, deprecated) for conflicting values
References for provenance

Lessons from Wikidata:

Qualifiers are essential for real-world complexity
Constraints prevent data quality issues
Ranks handle contradictions elegantly
Property constraints guide data entry

Schema.org

Schema.org provides a vocabulary for structured web data.

Relationship model:

Types (classes) like Person, Organization, Event
Properties link types to values or other types
Designed for simplicity and broad adoption

Example (JSON-LD):

{
  "@context": "https://schema.org",
  "@type": "Person",
  "name": "Joe",
  "worksFor": {
    "@type": "Organization",
    "name": "Acme Inc"
  }
}

Key characteristics:

Flat hierarchy (minimal inheritance)
Pragmatic over theoretical purity
Designed for search engine consumption
Extensible via schema.org/extensions

FOAF

FOAF (Friend of a Friend) pioneered social graph modeling.

Core vocabulary:

:Joe a foaf:Person ;
    foaf:name "Joe" ;
    foaf:knows :Alice ;
    foaf:mbox <mailto:joe@example.com> .

Key properties:

Property	Description
`foaf:knows`	Social connection
`foaf:mbox`	Email (for identity)
`foaf:homepage`	Personal website
`foaf:depiction`	Photo/image

Lessons from FOAF:

Simple vocabularies get adoption
Identity is hard (email, homepage, or WebID?)
Decentralization requires global identifiers
Limited adoption despite good design

SKOS

SKOS (Simple Knowledge Organization System) for taxonomies and thesauri.

Core model:

:dog a skos:Concept ;
    skos:prefLabel "Dog"@en ;
    skos:altLabel "Canine"@en ;
    skos:broader :mammal ;
    skos:related :pet .

Relationship types:

Property	Semantic
`broader` / `narrower`	Hierarchical
`related`	Associative
`exactMatch`	Cross-scheme equivalence
`closeMatch`	Approximate equivalence
`broaderTransitive`	Transitive hierarchy

Use cases:

Library classification schemes
Corporate taxonomies
Thesaurus management
Vocabulary alignment

Validation and Constraints

SHACL

SHACL (Shapes Constraint Language) validates RDF graphs against structural rules.

Core concepts:

Shapes: Describe expected structure
Targets: Which nodes a shape applies to
Constraints: Rules that must be satisfied

Example shape:

:PersonShape a sh:NodeShape ;
    sh:targetClass :Person ;
    sh:property [
        sh:path :name ;
        sh:minCount 1 ;
        sh:datatype xsd:string
    ] ;
    sh:property [
        sh:path :knows ;
        sh:class :Person ;
        sh:nodeKind sh:IRI
    ] .

Constraint types:

Constraint	Purpose
`sh:minCount` / `sh:maxCount`	Cardinality
`sh:datatype`	Value type
`sh:class`	Target node type
`sh:nodeKind`	IRI vs literal vs blank
`sh:pattern`	Regex validation
`sh:in`	Allowed values list

Key insight: SHACL uses closed world assumption (CWA), unlike OWL’s open world.

OWL Constraints

OWL provides semantic constraints through ontology axioms.

Cardinality:

:Person rdfs:subClassOf [
    a owl:Restriction ;
    owl:onProperty :hasBirthDate ;
    owl:maxCardinality 1
] .

Domain/Range:

:knows rdfs:domain :Person ;
       rdfs:range :Person .

Disjointness:

:Person owl:disjointWith :Organization .

OWL vs SHACL:

Aspect	OWL	SHACL
Purpose	Inference	Validation
Assumption	Open world	Closed world
Missing data	Unknown	Violation
Use case	Reasoning	Data quality

Controversies and Open Questions

Open World vs Closed World

The fundamental divide in semantic web systems.

Open World Assumption (OWA):

Absence of information ≠ negation
“Joe doesn’t have a spouse in my data” → “I don’t know if Joe has a spouse”
Used by: RDF, OWL
Enables: Data extension, distributed knowledge

Closed World Assumption (CWA):

Absence of information = negation
“Joe doesn’t have a spouse in my data” → “Joe has no spouse”
Used by: Databases, SHACL, most applications
Enables: Definite answers, validation

Practical implications:

OWL cardinality constraints don’t validate, they infer
SHACL constraints validate against CWA
Most applications expect CWA behavior
Mixing assumptions causes confusion

Unique Name Assumption (UNA):

CWA typically assumes different names = different entities
OWA allows later assertions that names refer to same entity
owl:sameAs links are common in Linked Data

Blank Nodes

Blank nodes are anonymous nodes without global identifiers—a source of ongoing controversy.

The case for blank nodes:

Represent existential statements (“Joe has a parent”)
Avoid minting unnecessary URIs
Common in real data (25% of RDF terms in surveys)

The problems:

Not globally referenceable
Graph comparison is NP-complete with blank nodes
SPARQL results can differ for “equivalent” graphs
Inconsistent semantics across W3C specs

Skolemization (the solution): Replace blank nodes with generated IRIs:

# Before
:Joe :knows _:b1 .
_:b1 :name "Mystery Person" .

# After (skolemized)
:Joe :knows <http://example.org/.well-known/genid/abc123> .
<http://example.org/.well-known/genid/abc123> :name "Mystery Person" .

Best practice: Avoid blank nodes when possible; use skolemization when they’re necessary.

Community Fragmentation

The semantic web community lacks consensus on fundamental questions.

Key tensions:

Expressivity vs Practicality
- One camp: Rich ontologies, inference, formal semantics
- Other camp: Simple linked data, minimal overhead
- Result: Disconnected toolchains and communities
Standards vs Reality
- W3C specs are complex and sometimes inconsistent
- Real-world usage often simpler than standards allow
- “RDF in the wild” differs from textbook RDF
Promise vs Delivery
- Early semantic web vision: Intelligent agents reasoning over web data
- Practical successes: Schema.org SEO, enterprise knowledge graphs
- Gap between academic research and production systems

What actually worked:

Schema.org (simple, broad adoption, search engine support)
Knowledge graphs at Google, Microsoft, Amazon (closed systems)
Wikidata (open, well-maintained, funded)
Library/museum linked data (domain-specific)

Anti-Patterns and Pitfalls

Common Modeling Mistakes

1. Over-reification

# Bad: Reifying everything
:s1 a rdf:Statement ; rdf:subject :Joe ; rdf:predicate :likes ; rdf:object :Pizza .

# Good: Only reify when you need metadata
:Joe :likes :Pizza .

2. Modeling values as relationships

# Bad: Unnecessary indirection
:Joe :hasAge :age1 .
:age1 :value 30 .

# Good: Direct literal
:Joe :age 30 .

3. Bidirectional relationship duplication

# Bad: Redundant triples
:Joe :knows :Alice .
:Alice :knows :Joe .

# Good: Model once, query both directions
:Joe :knows :Alice .  # Use inverse traversal in queries

4. Ignoring existing vocabularies

# Bad: Inventing your own
:Joe :personName "Joe" .

# Good: Reuse standards
:Joe foaf:name "Joe" .

5. Flat vs deep hierarchies

# Bad: Everything is a direct subclass of Thing
:Dog rdfs:subClassOf :Thing .
:Cat rdfs:subClassOf :Thing .
:Poodle rdfs:subClassOf :Thing .

# Good: Proper hierarchy
:Poodle rdfs:subClassOf :Dog .
:Dog rdfs:subClassOf :Mammal .

Structural Anti-Patterns

Anti-pattern	Problem	Solution
Blank node soup	Unqueryable, unmergeable	Use IRIs or skolemize
Property proliferation	Too many predicates	Use qualified relations
Missing inverse declarations	Incomplete inference	Declare `owl:inverseOf`
Implicit typing	Nodes without `rdf:type`	Always type nodes
Literal abuse	URIs stored as strings	Use proper IRI references

Query Anti-Patterns

1. Not using OPTIONAL correctly

Forgetting OWA: Missing data returns no results, not failures

2. Expensive blank node patterns

Queries with multiple blank nodes can be exponentially slow

3. Ignoring graph partitioning

Querying across all named graphs when not necessary

Best Practices Summary

Relationship Design Checklist

Start with use cases
- What questions must the graph answer?
- Let queries drive the model
Reuse existing vocabularies
- Schema.org for general concepts
- Domain-specific ontologies (FOAF, Dublin Core, etc.)
- Only invent when necessary
Choose the right pattern
- Simple binary? → Direct triple
- Need metadata? → RDF-star or qualified relation
- Multiple participants? → N-ary relation
- Grouping statements? → Named graphs
Define property characteristics
- Symmetric relationships → declare owl:SymmetricProperty
- Inverse pairs → declare owl:inverseOf
- Hierarchies → use rdfs:subPropertyOf
Add temporal context when relevant
- Validity periods for changing relationships
- Timestamps for events
- Use standard time ontology
Validate with SHACL
- Define shapes for expected structure
- Catch constraint violations early
- Document expected cardinality
Avoid blank nodes
- Use IRIs for referenceable entities
- Skolemize when blank nodes are unavoidable
- Be aware of query performance implications

Model Complexity Spectrum

Complexity	When to use	Example
Simple triple	Static, unqualified facts	`:Joe :knows :Alice`
Typed triple	Needs class information	`:Joe a :Person`
Qualified relation	Needs context/metadata	Employment with dates
N-ary relation	Multiple participants	Purchase transaction
Named graph	Provenance, trust, versioning	Data source tracking

Technology Selection

Need	Recommendation
Semantic reasoning	RDF + OWL
Performance-critical	Property graph (Neo4j)
Web publishing	JSON-LD + Schema.org
Validation	SHACL
Hierarchies/taxonomies	SKOS
Edge properties	RDF-star or property graph

Relationship Modeling Patterns in Knowledge Graphs

Table of Contents

Core Models

RDF Triples

Property Graphs

Comparison: RDF vs Property Graphs

Relationship Patterns

N-ary Relationships

Qualified Relations

Reification

RDF-star

Named Graphs

Relationship Properties

Property Hierarchies

Inverse, Symmetric, Transitive

Temporal Relationships

Real-World Implementations

Wikidata

Schema.org

FOAF

SKOS

Validation and Constraints

SHACL

OWL Constraints

Controversies and Open Questions

Open World vs Closed World

Blank Nodes

Community Fragmentation

Anti-Patterns and Pitfalls

Common Modeling Mistakes

Structural Anti-Patterns

Query Anti-Patterns

Best Practices Summary

Relationship Design Checklist

Model Complexity Spectrum

Technology Selection

References

Standards

Vocabularies

Implementations

Research