Overview

Shapes are agentOS’s ontology. A shape declares what a record looks like — field names, types, relations to other records, display rules. Think of shapes as ISO standards: skills that write to the graph and apps that read from it agree on the shape of a message, a product, a person. The engine itself is shape-agnostic — it’s a generic entity store.

Shapes live in shapes/*.yaml. They exist to:

Document the ontology — so skill authors and app authors agree on field names, types, and relations.
Drive codegen — generate.py turns YAML into typed classes in skills-sdk/agentos/_generated.py (Python TypedDict) and apps-sdk/src/shapes.ts (TypeScript interfaces).
Stay human-reviewable — PR review catches bad field design before it ships.

The engine doesn’t load shape YAML. Skills return conformant dicts; apps read whatever is in the graph. If no skill has ever returned a post, there is no post anywhere — the YAML is the standard, the graph is the data.

Authoring a shape

Write the YAML in shapes/<name>.yaml — the source of truth for the ontology.

Run codegen so SDK consumers get updated types:

python generate.py             # Python + TypeScript

Use it in a skill — @returns("your_shape") + return a conformant dict.
That’s it. No engine restart, no graph seeding, no “push shape to graph” step. The YAML is the standard; skills comply; the engine moves data through.

To update a shape, edit the YAML and re-run codegen. Renaming or removing fields is a breaking change — update consumers at the same time.

Format

# A purchasable item. Base type for book and aircraft.
#
# Example sources: Amazon, Uber Eats (grocery), Open Food Facts
product:
  plural: products                    # UI plural form
  subtitle: brand                     # field used as row subtitle in list views
  also: [other_shape]                 # "a product is also a ..." (optional, transitive)
  identity: [at, id]                  # dedupe keys for upsert (optional;
                                       # may include relations — see Identity below)
  fields:
    price:           string           # formatted price
    priceAmount:    number           # numeric price for math
    currency:        string
    categories:      string[]
    available:       boolean
  relations:
    brand:           brand             # single relation → brand
    manufacturer:    organization      # single relation → organization
    tagged:          tag[]             # array relation → tag nodes

Shape YAML is a single top-level key (the shape name) containing metadata + fields + relations. Top-level metadata keys: plural, subtitle, icon, also, identity, identity_any.

`also` (shape implication)

Declares that this shape is also another shape. An email is also a message. A book is also a product. When the engine applies shape email to a node, it transitively applies message too — both shape memberships are written to node_shapes. Both shapes’ fields contribute to the record’s type context.

also is transitive: if A is also B and B is also C, then A is also B and C.

Field types

Type	Stored as	Notes
`string`	text	Short text
`text`	text	Long text, FTS eligible
`integer`	digits	Parsed from strings, floats truncated
`number`	decimal	Parsed from strings
`boolean`	true/false	Coerced from 1/0, “yes”/“no”, “true”/“false”
`datetime`	ISO 8601	Unix timestamps auto-converted, human dates parsed
`url`	text	Stored as-is, rendered as clickable link
`string[]`	JSON array	Each element coerced to string
`integer[]`	JSON array	Each element coerced to integer
`json`	JSON string	Opaque blob, no coercion

Standard fields

These are available on every record without declaring them in a shape:

Field	Type	Purpose
`id`	string	Record identifier
`name`	string	Primary label
`text`	text	Short summary
`url`	url	Canonical link
`image`	url	Thumbnail
`author`	string	Creator
`published`	datetime	Temporal anchor
`content`	text	Long body text (FTS, stored separately)

Relations

Relations declare connections to other records. Keys are edge labels, values are target shapes (shape or shape[] for arrays).

When a skill returns a nested dict under a relation key, the engine extracts it as a child node and creates an edge from parent to child. See Typed references below.

Identity

identity declares the keys used to dedupe records. When two different skill calls return the same identity tuple, the engine upserts instead of creating duplicates.

message:
  identity: [at, id]         # all keys must match to count as the same record
  identity_any: [email]      # OR: any single key in this list is sufficient

Use identity for compound keys (must all match) and identity_any for alternative keys (any one matches). Identity keys may be fields (string/number values) or relations (edges to other nodes — see “Identity via relation” below).

Identity via relation

For platform-scoped shapes (account, message, post, email, etc.), one identity key is a relation to another node, not a string. The convention:

account:
  identity: [at, identifier]
  relations:
    at:    actor   # the namespace this account exists within
                   # (organization, product, software, or protocol node)

This is identity-via-relation: the dedupe tuple is (node_id_of_at_target, identifier_string). Two skills returning at: <github.com product node>, identifier: octocat resolve to the same account, regardless of how each skill named github.com.

Why not strings? A string-keyed identity (e.g. [platform, id] where platform: "github.com") fragments the graph the moment two skills disagree on naming (github.com vs git/github vs vcs/git), and silently merges different entities when handles get reused. Companies rebrand (Twitter → X), banks merge (First Republic → Chase), instances move — the node persists across these events; a string doesn’t. Identity-via-relation makes “what bank operates this account today” a graph traversal (account → at → mergedInto), which is impossible with strings.

Skills declare relation-keyed identity inline by passing a node descriptor where a string would otherwise go:

# inline upsert: the engine resolves the at target node first, then keys identity off it
return {
    "identifier": "octocat",
    "at": {"shape": "product", "url": "https://github.com"},
}

Multi-shape inline targets pass shape: ["product", "software"] (e.g. github.com is both a product and software).

Required nodes for identity targets. A skill’s at target node must be resolvable — either pre-existing (seeded) or upsertable inline. The engine resolves nested upserts in dependency order.

Subtitle, plural, icon

subtitle — the field name to display as a secondary label in list views (e.g. subtitle: brand on product shows the brand under the name).
plural — the plural form of the shape name (e.g. products for product). Used by UI and by agent-sdk shapes listings.
icon — optional icon name for UI rendering.

Design Principles

Tactical principles — concrete patterns for field types, relations, and display. Use the review checklist below after writing or editing a shape. For the higher-order framing (“a shape describes what something IS, not where it came from”), see Shape design principles.

1. Entities over fields

If a field value is itself a thing with identity, it should be a relation to another shape, not a string field.

Bad: shipping_address: string (an address is a thing) Good: shipping_address: place (a relation to a place record)

Bad: email: string on a person (an email is an account) Good: accounts: account[] relation on person

Ask: “Could this field value have its own page?” If yes, it’s a relation.

2. Separate identity from role

A person doesn’t have a job title. A person holds a role at an organization for a period of time. The role is the relationship, not a field on the person.

Bad: job_title: string on person Good: role: role[] relation where the role record carries title, organization, start_date, end_date

Same pattern applies to education, membership, authorship. If it has a time dimension or involves another entity, it’s a role/relationship, not a field.

3. Currency always accompanies price

Any field representing a monetary amount needs a companion currency field. Never assume USD.

Bad: price_amount: number alone Good: price_amount: number + currency: string

4. URLs that reference other things are relations

The standard url field is the record’s own canonical link. But URLs that point to other things should be relations to the appropriate shape.

Bad: website: url on an organization (a website is its own entity) Good: website: website relation

Bad: external_url: url on a post (the linked page is a thing) Good: links_to: webpage relation

Ask: “Is this URL the record itself, or does it point to something else?”

Record’s own link: keep as url (standard field)
Points to another thing: make it a relation

5. Keep shapes domain-agnostic

A shape should describe the kind of thing, not the source it came from. Flight details don’t belong on an offer shape. Browser-specific fields don’t belong on a webpage shape.

Bad: total_duration: integer, flights: json, layovers: json on offer (that’s a flight, not an offer) Good: offer has price + currency + offer_type. Flight is its own shape. Offer relates to flight.

6. Use `also` for genuine “is-a” relationships

also means shape implication: applying shape A to a node also applies shape B. Use it when querying by B should include A.

Good uses:

email also message (querying messages should include emails)
video also post (querying posts should include videos)
book also product (querying products should include books)
review also post (querying posts should include reviews)

Bad uses:

Don’t use also just because shapes share some fields
Don’t create deep chains (A also B also C also D) — keep it shallow

7. Author is a shape, not just a string

The standard author field is a string for convenience. But when the author is a real entity with their own identity (a book author, a blog writer, a video creator), use a relation to the author or account shape.

Quick attribution: author: "Paul Graham" (standard string field) Rich attribution: written_by: author or posted_by: account (relation)

Both can coexist — the string is for display, the relation is for traversal.

8. Address/Place is structured, not a string

Physical locations should be a place shape with structured fields (name, street, city, region, postal_code, country, coordinates). Inspired by Mapbox’s geocoding model.

9. Playlists, shelves, and lists belong to accounts

Any collection (playlist, shelf, list, board) should have a belongs_to: account relation. Collections are owned.

10. Use ISO standards for standardized values

When a field represents something with an international standard, use the standard code:

Human languages — ISO 639-1 codes (en, es, ja, pt-BR). Applies to transcript.language, webpage.language, content language fields. NOT programming languages (those use conventional names like Python, Rust).
Countries — ISO 3166-1 alpha-2 codes (US, GB, JP). Use country_code field.
Currencies — ISO 4217 codes (USD, EUR, JPY). Use currency field.
Timezones — IANA timezone names (America/New_York, Europe/London).

Don’t enforce via enum (too many values). Document the convention and let agentos test flag non-compliant values. Run agent-sdk validate to check skill return dicts against their declared shape.

11. Separate content from context (NEPOMUK principle)

A video is a file. The social engagement around it is a post. A transcript is text. The meeting it came from is the context. Don’t mix artifact properties with social properties on the same shape.

Bad: video has view_count, like_count, comment_count, posted_by (those are social context) Good: video is a file with duration + resolution. A post contains the video and carries the engagement.

Ask: “If I downloaded this to my hard drive, which fields would still make sense?” Those are the artifact fields. Everything else is context that belongs on a wrapper entity.

12. Comments are nested posts, not a separate shape

A comment is a post that replies_to another post. A reply to a message is still a message. Don’t create separate shapes for nested versions of the same thing — use the replies_to relation to express the hierarchy.

13. Booleans describe state, relations describe lineage

is_fork: boolean tells you nothing. forked_from: repository tells you the lineage. If a boolean implies a relationship to another entity, model the relationship instead.

Bad: is_fork: boolean (from what?) Good: forked_from: repository (the source is traversable)

14. Booleans that encode direction are really relationships

is_outgoing: boolean on a message means “I sent this.” But that information already lives in the from: account relation — if the from account is the user, it’s outgoing. Don’t duplicate relationship semantics as boolean flags.

Bad: is_outgoing: boolean on message Good: from: account relation — direction is derived by comparing from to the current user

Same pattern: is_sent, is_received, is_mine — all derivable from a directional relation.

15. Booleans that encode cardinality are derivable

is_group: boolean on a conversation means “has more than two participants.” That’s not state — it’s a count. Don’t store what you can derive from the structure.

Bad: is_group: boolean on conversation Good: participant: account[] relation — is_group is len(participants) > 2

Same pattern: has_attachments (derive from attachment: file[]), has_unread (derive from messages), is_empty (derive from children).

16. Source data doesn’t dictate shape

A skill’s source (API, database, scrape) returns whatever it returns. That doesn’t constrain the shape. The Python function is the transformation boundary — it takes raw source data and returns shape-native dicts.

Apple Contacts gives flat strings: Organization: Anthropic, Title: Engineer. That doesn’t mean person gets organization: string. It means the skill transforms those strings into a roles: role[] typed ref.

Bad: “The API returns platform: string, so the shape needs a platform field” Good: “What kind of thing is this? Model it correctly. The skill transforms source data to fit.”

Design shapes for the domain, not for the source. Every skill file is a template — other agents copy the patterns they see.

17. Model life like LinkedIn, not like a spreadsheet

People have roles at organizations. Roles have titles, departments, start dates, end dates. Education is a role at a school. Membership is a role in a community. Authorship is a role on a publication.

The LinkedIn mental model: a person has a timeline of positions, each connecting them to an organization with a title and time range. This is principle #2 made concrete.

person --roles--> role[] --organization--> organization
                         --title: "Engineer"
                         --department: "Research"
                         --start_date: 2024-01-15
                         --end_date: null (current)

This applies broadly: board membership, team membership, project assignment, course enrollment. If a relationship has a time dimension or a title, it’s a role.

Review Checklist

After writing or editing a shape, ask yourself:

Returning shape-native data from operations

When an operation declares returns: email[], the Python function returns dicts whose keys match the shape. The shape is the contract — no separate mapping layer sits between the Python code and the engine.

# shapes/email.yaml
email:
  also: [message]
  fields:
    from_email: string
    to: string[]
    cc: string[]
    labels: string[]
    thread_id: string
  relations:
    from: account
    conversation: conversation
  display:
    title: name
    subtitle: from_email
    date: datePublished

# gmail.py — @returns points at the shape, function returns email-shaped dicts directly
from agentos import returns, connection

@returns("email")
@connection("gmail")
async def get_email(*, id: str, **params) -> dict:
    return {
        "id": msg_id,
        "name": subject,              # standard field
        "text": snippet,              # standard field
        "url": web_url,               # standard field
        "published": date,            # standard field
        "content": body_text,         # standard field (FTS)
        "from_email": sender,         # shape-specific field
        "to": recipients,             # shape-specific field
        "labels": label_ids,          # shape-specific field
    }

The Python code does the field mapping — it transforms raw API responses into shape-native dicts. Standard fields (id, name, text, url, image, author, published, content) are available on every shape without declaring them.

Canonical fields

The renderer resolves entity display from standard fields. Every Python return should populate as many of these as the source data supports — they drive consistent previews, detail views, and search results across all skills.

Field	Purpose
`name`	Primary label / title
`text`	Short summary or snippet for preview rows
`url`	Clickable link
`image`	Thumbnail / hero image
`author`	Creator / brand / owner
`published`	Temporal anchor
`content`	Long body text (stored separately, FTS-indexed)

Not every entity has all of these — a product may have no published, an order may have no image. Map what the source provides; skip what doesn’t apply.

Typed references (entity relationships)

To create linked entities and graph edges, return flat nested dicts (no wrapper). The relation key in the YAML shape declares the target type; the nested dict is the child record.

@returns("email")
def get_email(*, id: str, **params) -> dict:
    return {
        "id": msg_id,
        "name": subject,
        # Single relation — shape YAML says `from: account`,
        # so a flat dict here becomes: email --from--> account
        "from": {
            "id": sender_email,
            "handle": sender_email,
            "displayName": sender_name,
        },
        # Array relation — shape YAML says `to: account[]`,
        # so a flat list here becomes: email --to--> account (one edge per recipient)
        "to": [
            {"id": addr, "handle": addr, "displayName": name}
            for addr, name in recipients
        ],
    }

Polymorphic children — shape hint. If a relation target is a supertype (like participant: actor[], where an actor might be a person OR an organization OR an agent), the child dict can include a shape key to declare the concrete shape:

return {
    "id": thread_id,
    "name": thread_title,
    # shape YAML: participant: actor[]
    "participant": [
        {"shape": "person",       "id": "joe",       "name": "Joe"},
        {"shape": "organization", "id": "anthropic", "name": "Anthropic"},
    ],
}

shape accepts a string for one shape or a list for multi-shape membership: "shape": ["product", "software"].

Identity — _identity hint. For child records, the engine decides dedupe keys from the YAML shape’s identity declaration. If you want to override or be explicit, include _identity with a list of field names:

"from": {
    "_identity": ["platform", "id"],
    "platform": "gmail",
    "id": msg_id,
    "handle": sender_email,
}

A relation is collapsed to null if none of its identity fields survive — partial data doesn’t create ghost entities.

Validation

Shape conformance is checked statically, pre-commit.

agent-sdk validate parses Python return dict literals via AST and warns if keys don’t match the declared shape. It:

Loads shape YAML from shapes/
Walks each skill’s *.py files looking for @returns("shape")-decorated functions
Checks every dict literal the tool returns against the shape’s declared fields + relations
Flags unknown keys, missing identity fields (id / name), and scalar-fields-that-should-be-relations

Runs automatically on every commit via pre-commit hook. Run manually:

agent-sdk validate                 # audit every skill under cwd
agent-sdk validate claude          # single skill by id or directory
agent-sdk validate --all           # walk the whole skills/ tree
agent-sdk validate --sandbox       # only the banned-import sandbox check

# Or without install:
python3 -m agentos validate --all

Dict-literal returns get checked statically. Helper functions, dynamic construction, and composition through _call aren’t caught here — integration testing via MCP is the catch-all. If a tool ships bad data, fix the tool and add a regression test.

The engine itself does not validate shapes at runtime. It’s a generic entity store — it accepts whatever dict shape a skill returns and stores it. Conformance is a contract between skill authors, app authors, and PR review, not a runtime check.

Prior Research

Our ontology research lives in Research → Ontology. It’s exploratory, not authoritative — some pieces are outdated — but contains principles and platform analysis worth consulting when designing new shapes.

Entity & Ontology Research

Schema entities — Core entity type definitions, OGP foundation, hypotheses on note vs article
Schema relationships — Relationship type catalog and design patterns
Open Graph Protocol — OGP types, why flat beats hierarchical
Google structured data — Schema.org structured data patterns

Platform Research

Google Takeout — 72 Google products analyzed for entity types (Contacts, Calendar, Drive, Gmail, Photos, YouTube, Maps, Chrome, Pay, Play)
Facebook Graph — Facebook Graph API entity model
FamilySearch — GEDCOM X genealogical data model (two relationship types + qualifiers, computed derivations, source citations)

Relationship Research

Genealogical relationships — Family relationship modeling patterns
Relationship modeling — General relationship design
Schema.org relationships — Schema.org relationship types
OGP relationships — OGP relationship patterns
No-orphans constraint — Why every entity needs at least one connection

Systems Research

Outcome entity — Outcome/goal entity modeling
PKM community — Personal knowledge management patterns
Semantic file systems — NEPOMUK and semantic desktop research