Skip to content

Entity Schema

This file defines what entity types exist in our graph. For relationships, see schema-relationships.md.


Core Philosophy: Entity-First, Not Hierarchical

Section titled “Core Philosophy: Entity-First, Not Hierarchical”

We are entity-first. This means:

  1. Flat types, not hierarchies — A book IS a book. It doesn’t “extend” CreativeWork.
  2. Types are labels, not classes — Entities can have multiple types
  3. Relationships express connections — “book relates_to creative_work” not “book inherits from creative_work”
  4. No inheritance tax — No deep Thing > CreativeWork > Book chains

Research (2026-01-24) found:

SystemApproachFinding
Neo4jFlat labelsTypes are tags, not classes. No inheritance.
DgraphFlat labelsMultiple types per entity. No inheritance mechanism.
WikidataHierarchical (P31/P279)“Large-scale conceptual disarray” — circular dependencies, type confusion
Schema.orgDeep hierarchy827 types in Thing > ... > ... chains — hard to maintain
OGPFlat + dot notationvideo.movie groups without inheritance. 69.8% web adoption.

Conclusion: Entity-first systems (Neo4j, Dgraph, OGP) use flat types. Hierarchical systems (Wikidata, Schema.org) have documented problems. We follow the entity-first approach.


We use OGP as our primary influence because:

  1. 69.8% web adoption — The web has voted
  2. Radical simplicity — 17 types cover most content
  3. Entity-first — Flat types, no inheritance
  4. Import-friendly — Extract from web with minimal transformation
  5. Battle-tested — Facebook reduced 39→17 types based on real usage
NamespaceTypes
Globalwebsite, article, book, profile
music.*music.song, music.album, music.playlist, music.radio_station
video.*video.movie, video.episode, video.tv_show, video.other
Businessbusiness.business, product

OGP originally had more types that were consolidated:

CategoryTypes (now deprecated/consolidated)
Organizationsband, government, non_profit, school, university, company
Peopleactor, athlete, author, director, musician, politician, public_figure
Placescity, country, landmark, state_province, bar, cafe, hotel, restaurant
Activitiesactivity, sport
Groupscause, sports_league, sports_team
GapWhyOGP Status
eventCalendar events, meetingsNever existed in OGP
placeStandalone locationsExisted in 1.0, now deprecated
organizationCompanies, bands, institutionsExisted in 1.0 as separate types, now just business.business

  1. IDs are random — NanoID (8 chars), not paths or slugs
  2. Types are flat labels — No inheritance
  3. Multiple types allowed — An entity can be [book, product]
  4. Relationships are first-class — As important as entities (see schema-relationships.md)
  5. Properties are flexible — Core fields defined; data JSON holds the rest
  6. OGP data preserved — Import stores original og:* in data.og
{
"id": "abc12345",
"types": ["article"],
"name": "As We May Think",
"created_at": "2026-01-24T12:00:00Z",
"data": {
"published": "1945-07",
"author": "Vannevar Bush",
"og": {
"og:type": "article",
"og:title": "As We May Think",
"og:url": "https://..."
}
}
}

News article, blog post, essay.

PropertyTypeRequiredDescription
idstringyesNanoID
namestringyesTitle
typesstring[]yes["article"]
data.published_timedatetimenoPublication date
data.authorstringnoAuthor name (or relationship)
data.sectionstringnoCategory/section
data.tagsstring[]noTags
data.urlstringnoSource URL

OGP equivalent: og:type="article", article:published_time, article:author, article:section, article:tag


A published book.

PropertyTypeRequiredDescription
idstringyesNanoID
namestringyesTitle
typesstring[]yes["book"]
data.isbnstringnoISBN
data.release_datedatenoPublication date
data.authorstringnoAuthor name

OGP equivalent: og:type="book", book:isbn, book:release_date, book:author


A human being. OGP calls it “profile” (page-centric); we use “person” (entity-centric).

PropertyTypeRequiredDescription
idstringyesNanoID
namestringyesDisplay name
typesstring[]yes["person"]
data.first_namestringnoFirst name
data.last_namestringnoLast name
data.usernamestringnoUsername/handle
data.borndatenoBirth date
data.dieddatenoDeath date
data.rolesstring[]noWhat they do

OGP equivalent: og:type="profile", profile:first_name, profile:last_name, profile:username

Decision: Use person not profile. Profile is page-centric; person is entity-centric.


A single track/recording.

PropertyTypeRequiredDescription
idstringyesNanoID
namestringyesSong title
typesstring[]yes["music.song"]
data.durationintegernoLength in seconds
data.albumstringnoAlbum name (or relationship)
data.trackintegernoTrack number

OGP equivalent: og:type="music.song", music:duration, music:album, music:album:track


A collection of songs.

PropertyTypeRequiredDescription
idstringyesNanoID
namestringyesAlbum title
typesstring[]yes["music.album"]
data.release_datedatenoRelease date
data.musicianstringnoArtist name

OGP equivalent: og:type="music.album", music:release_date, music:musician


A curated list of songs.

PropertyTypeRequiredDescription
idstringyesNanoID
namestringyesPlaylist title
typesstring[]yes["music.playlist"]
data.creatorstringnoCurator name

A film.

PropertyTypeRequiredDescription
idstringyesNanoID
namestringyesMovie title
typesstring[]yes["video.movie"]
data.durationintegernoLength in seconds
data.release_datedatenoRelease date
data.directorstringnoDirector name

OGP equivalent: og:type="video.movie", video:duration, video:release_date, video:director


A TV episode.

PropertyTypeRequiredDescription
idstringyesNanoID
namestringyesEpisode title
typesstring[]yes["video.episode"]
data.seriesstringnoSeries name (or relationship to video.tv_show)

OGP equivalent: og:type="video.episode", video:series


A TV series.

PropertyTypeRequiredDescription
idstringyesNanoID
namestringyesSeries title
typesstring[]yes["video.tv_show"]

Something you can buy/use.

PropertyTypeRequiredDescription
idstringyesNanoID
namestringyesProduct name
typesstring[]yes["product"]
data.pricenumbernoPrice
data.currencystringnoCurrency code
data.availabilitystringnoIn stock, etc.
data.urlstringnoProduct URL

OGP equivalent: og:type="product", product:price:amount, product:price:currency


A company, band, institution, team, group. (OGP 1.0 had separate types — now consolidated)

PropertyTypeRequiredDescription
idstringyesNanoID
namestringyesOrganization name
typesstring[]yes["organization"]
data.org_typestringnocompany, band, university, government, nonprofit, group
data.foundedyearnoYear founded
data.urlstringnoWebsite

OGP equivalent: og:type="business.business" (partial), or historical types

Note: Groups (Facebook Groups, Google Groups, etc.) are organization with org_type: group.


Default web page type.

PropertyTypeRequiredDescription
idstringyesNanoID
namestringyesSite/page title
typesstring[]yes["website"]
data.urlstringyesURL
data.descriptionstringnoDescription

OGP equivalent: og:type="website" (default)


A calendar event, meeting, scheduled occurrence. OGP never had this.

PropertyTypeRequiredDescription
idstringyesNanoID
namestringyesEvent title
typesstring[]yes["event"]
data.startdatetimeyesStart time
data.enddatetimenoEnd time
data.locationstringnoLocation name
data.descriptionstringnoDescription
data.recurrencestringnoRFC 5545 RRULE

References: Google Calendar, Facebook Events (Graph API), Schema.org Event


A location. OGP 1.0 had city, country, landmark — now deprecated.

PropertyTypeRequiredDescription
idstringyesNanoID
namestringyesPlace name
typesstring[]yes["place"]
data.place_typestringnocity, country, venue, landmark
data.latitudenumbernoLatitude
data.longitudenumbernoLongitude
data.addressstringnoStreet address

References: Google Maps, OGP 1.0 place types


An idea, pattern, methodology. Not in OGP.

PropertyTypeRequiredDescription
idstringyesNanoID
namestringyesConcept name
typesstring[]yes["concept"]
data.descriptionstringnoExplanation

Status: EXPERIMENTAL — see Joe’s Hypotheses below.


These types emerge from PKM research, Google Takeout analysis, and real-world usage patterns. OGP was designed for web content sharing — these fill gaps for personal knowledge management.

Freeform personal capture. The atomic unit of PKM.

PropertyTypeRequiredDescription
idstringyesNanoID
namestringyesTitle (can be auto-generated)
typesstring[]yes["note"]
data.contentstringnoNote body (markdown)
data.colorstringnoVisual color coding
data.pinnedbooleannoPinned state

Research source: Google Keep, Obsidian, PKM community research


An action item with status, due date, and hierarchy.

PropertyTypeRequiredDescription
idstringyesNanoID
namestringyesTask title
typesstring[]yes["task"]
data.statusstringyespending, in_progress, completed, cancelled
data.duedatetimenoDue date
data.completed_atdatetimenoWhen completed
data.prioritystringnohigh, medium, low

Research source: Google Tasks, Todoist, Things — all have: title, notes, due, status, parent, position

Note: Tasks have a lifecycle (pending → complete) and support subtask hierarchy via part_of relationships. Distinct from events (which have duration and attendees).


A communication from one party to another.

PropertyTypeRequiredDescription
idstringyesNanoID
namestringnoSubject line (if any)
typesstring[]yes["message"]
data.contentstringnoMessage body
data.sent_atdatetimeyesWhen sent
data.channelstringnoemail, sms, chat, dm

Research source: Gmail (MBOX), Google Chat, Facebook Messages

Relationships:

  • from → person (sender)
  • to → person[] (recipients)
  • part_of → conversation/thread
  • reply_to → message (threading)
  • attachment → file[]

A container for related messages. Linkable and shareable.

PropertyTypeRequiredDescription
idstringyesNanoID
namestringnoSubject/title
typesstring[]yes["conversation"]
data.channelstringnoemail, slack, imessage, whatsapp
data.started_atdatetimenoFirst message time

Research source: Gmail threads, Slack channels, WhatsApp groups, iMessage


A sub-conversation within a larger context. Distinct from conversation.

PropertyTypeRequiredDescription
idstringyesNanoID
namestringnoThread topic
typesstring[]yes["thread"]
data.channelstringnoPlatform/context

Research source: Reddit threads, YouTube comment threads, Slack threads, email threads

Examples:

  • Reddit: A post spawns comment threads
  • YouTube: A comment spawns reply threads
  • Slack: A message spawns a thread
  • Email: Replies create a thread (via In-Reply-To headers)
  • iMessage: Inline replies create sub-threads

Relationships:

  • part_of → conversation (parent context)
  • started_by → message (root message)
  • Messages part_of → thread

A notable saying with speaker attribution and source provenance.

PropertyTypeRequiredDescription
idstringyesNanoID
namestringyesThe quote text
typesstring[]yes["quote"]
data.contextstringnoWhen/where it was said

Research source: Wikiquote, Goodreads, quote attribution patterns

Critical distinction:

  • Speaker — Who said it (FDR)
  • Source — Where you found it (a biography)
  • Author of source — Who wrote the source (Jean Edward Smith, not FDR)

Relationships:

  • spoken_by → person (the speaker)
  • appears_in → work (the source where you found it)
  • authored_by → person (author of the source, if different from speaker)

A personal extraction from a source with position/context.

PropertyTypeRequiredDescription
idstringyesNanoID
namestringyesThe highlighted text
typesstring[]yes["highlight"]
data.positionstringnoLocation in source (page, timestamp, etc.)
data.colorstringnoHighlight color

Research source: Kindle highlights, Readwise, Hypothesis, PDF annotations

Relationships:

  • extracted_from → book/article/video (the source)
  • created_by → person (who highlighted it)

Distinction from quote: A highlight is personal (you extracted it). A quote is canonical (attributed to a speaker).


A response to another entity (post, video, article, etc.).

PropertyTypeRequiredDescription
idstringyesNanoID
namestringnoComment text (first line as name)
typesstring[]yes["comment"]
data.contentstringyesFull comment text
data.posted_atdatetimeyesWhen posted

Research source: YouTube, Facebook, Reddit, blog comments

Relationships:

  • comment_on → any (what it’s responding to)
  • authored_by → person
  • reply_to → comment (for nested comments)

Proposed Types (Research Complete, Pending Approval)

Section titled “Proposed Types (Research Complete, Pending Approval)”

Based on subagent research (2026-01-24). These have been researched but not yet added to the schema.

The podcast series — analogous to video.tv_show. Following OGP dot notation.

PropertyTypeRequiredDescription
idstringyesNanoID
namestringyesShow title
typesstring[]yes["podcast.show"]
data.descriptionstringnoShow description
data.authorstringnoPrimary creator/host name
data.show_typeenumnoepisodic (any order) or serial (sequential)
data.explicitbooleannoContains explicit content
data.languagestringnoISO language code
data.categorystring[]noApple Podcasts categories
data.imagestringnoArtwork URL
data.feed_urlstringnoRSS feed URL
data.websitestringnoShow website

Research source: Apple Podcasts RSS, Spotify API, Podcast 2.0 namespace, Schema.org PodcastSeries

Relationships:

  • Episodes part_of → show
  • host_of → person (with role)
  • published_by → organization (podcast network)

A single episode — analogous to video.episode.

PropertyTypeRequiredDescription
idstringyesNanoID
namestringyesEpisode title
typesstring[]yes["podcast.episode"]
data.descriptionstringnoShow notes
data.durationintegernoLength in seconds
data.publisheddatetimenoRelease date
data.episode_numberintegernoEpisode number
data.season_numberintegernoSeason number
data.episode_typeenumnofull, trailer, or bonus
data.audio_urlstringnoEnclosure URL (MP3/M4A)
data.transcript_urlstringnoTranscript file URL
data.guidstringnoRSS GUID for import tracking

Research source: Apple Podcasts RSS, Spotify API, Podcast 2.0 namespace, Schema.org PodcastEpisode

Relationships:

  • part_of → podcast.show
  • host_of → person
  • guest_on → person

Import: Parse RSS with iTunes + Podcast 2.0 namespaces. Store original RSS in data.rss.


A user assessment of another entity with an optional rating.

PropertyTypeRequiredDescription
idstringyesNanoID
namestringnoReview title/headline
typesstring[]yes["review"]
data.contentstringnoReview body text
data.ratingnumbernoRating value (e.g., 4)
data.rating_maxnumbernoScale maximum (e.g., 5)
data.posted_atdatetimeyesWhen posted
data.helpful_countintegernoHelpful/useful votes
data.verifiedbooleannoVerified purchase/experience
data.spoilerbooleannoContains spoilers
data.aspectstringnoWhat facet is rated (food, service)
data.prosstring[]noPositive points
data.consstring[]noNegative points

Research source: Schema.org Review, Amazon, Goodreads, IMDB, Yelp, Rotten Tomatoes

Relationships:

  • review_of → any (what’s being reviewed: book, movie, product, place)
  • authored_by → person

Why not comment? Reviews have structured rating data, engagement metrics (helpful votes), verification flags, and aggregate into ratings. Comments are just text.

Aggregate ratings: Computed from individual reviews, not stored separately.


Recommendation: Do NOT add activity as an entity type.

Reasoning:

  1. Most activities are already captured implicitly:
    • Highlights have created_by and created_at
    • Comments have comment_on and posted_at
    • Messages have sent_at and sender/recipient relationships
  2. Relationships with timestamps cover most cases (e.g., joe --[watched, {at: "2026-01-24"}]--> movie_x)
  3. Schema.org’s 100+ Action types is over-engineered
  4. Follows the “compute don’t store” principle from FamilySearch

Alternative: Use timestamped relationship types:

  • watched with {at, duration, completion}
  • listened_to with {at, duration, play_count}
  • read with {at, started_at, finished_at}

Exception: If we need play counts/aggregations, consider specific types (music.listen) not generic activity.


Recommendation: Keep both conversation and thread — they serve different purposes.

TypeUse CaseExample
conversationTop-level venueEmail inbox, Slack channel, WhatsApp group
threadDiscussion topic/sub-conversationSlack thread, Reddit post, email thread

Key insight: The distinction is containment hierarchy:

  • conversation = the venue (channel, chat, inbox)
  • thread = the discussion (may be the whole conversation or a sub-part)

Universal relationship: reply_to is the primitive across all platforms.

Platform mapping examples:

  • Reddit: Post is [thread], comments reply_to each other, part_of the post
  • Slack: Channel is [conversation], threaded replies spawn [thread] entities
  • Email: Mailbox is [conversation], JWZ-reconstructed chains are [thread] entities

Store platform-specific metadata in data.threading:

{
"data": {
"threading": {
"platform": "slack",
"thread_ts": "1234567890.000000"
}
}
}

Cross-referencing our schema against research from Google Takeout, Facebook Graph, Schema.org, OGP, and PKM community.

EntityResearch SourceNotes
noteGoogle Keep, PKM researchFreeform capture — different from structured article
taskGoogle Tasks, PKM researchHas status, due, subtask hierarchy, lifecycle
messageGmail, Chat, FacebookHas from/to, thread relationship, attachments
conversationGmail threads, Slack, WhatsAppContainer for messages — linkable
threadReddit, YouTube, Slack, emailSub-conversations within conversations
quoteREADME, WikiquoteSpeaker vs author vs source distinction
highlightReadwise, Kindle, HypothesisPersonal extraction with position
commentYouTube, Facebook, RedditResponse entity, supports nesting
EntityResearch SourceQuestionCurrent Thinking
channelYouTube, PodcastsDistinct type or organization with org_type: channel?Probably organization
podcast / podcast.episodeGoogle Takeout, web contentMissing from OGP — growing content categoryLikely needed
postFacebook GraphSocial media posts — is this just article subtype?Probably article with data.format: post
groupFacebook, Google GroupsDistinct or organization with org_type: group?Decided: organization with org_type: group
reviewGoogle, Schema.orgUser assessment of another entityLikely needed
recipeSchema.org (popular type)Common web contentProbably article with structure
visitGoogle TimelineLocation stay with durationMaybe event subtype?
activityGoogle My ActivityUser action with timestampSchema.org has 100+ Action types — avoid?
EntityWhy SkipAlternative
album (photo)Covered by relationshipsphoto part_of collection entity
deviceRare use caseproduct with subtype
health_recordVery domain-specificdata.health on generic entities
tag / labelImplementation detailProperties, not entities
fileToo genericUse specific types (note, photo, etc.) or data.file_path

Hypothesis: concept as an explicit type may be wrong. Concepts are abstractions that emerge from clustering and relationships in the graph, not atomic entities you create directly.

Current thinking: Keep concept for now but mark as experimental. Ideas, methodologies, and patterns might be better represented as:

  • Tags/properties on other entities
  • Emergent clusters from graph analysis
  • Notes that evolve into something more structured

Open question: What makes something a “concept” vs a well-linked “note”?

Hypothesis: Everything published started as a note. Publishing is a relationship, not a type change.

  • A note becomes “published” when it has published_to relationships
  • An article on the web is just a note with data.url and published_to: website
  • This unifies personal capture with published content

Implication: Maybe article is redundant? Or article is specifically for web imports where we don’t have the original note?

Hypothesis: Threading is more complex than a single conversation type suggests.

  • Reddit: post → comments → replies (tree structure)
  • Email: messages → thread (linear with quotes)
  • Slack: channel → messages → threads (two-level)
  • iMessage: conversation → messages → inline threads (emerging)
  • YouTube: video → comments → replies (two-level)

Current approach: conversation and thread as separate types, connected via part_of. Threads can be part of conversations. Messages can be part of threads or directly part of conversations.

Hypothesis: related_to is too vague to be useful. Every relationship should have a specific type.

  • If we can’t name the relationship, we don’t understand it yet
  • Better to have many specific types than one catch-all
  • Exception: AI-extracted relationships where type is uncertain (use with confidence)

For family relationships specifically: Joe said “related_to is bullshit” — FamilySearch proves you only need two primitives (couple, parent_child) and compute everything else.


When importing from web:

  1. Extract og:* meta tags from page
  2. Map og:type to our type
  3. Create entity with data.og containing original OGP data
  4. Extract relationships (article:author → created_by, etc.)
OGP TypeOur Type
websitewebsite
articlearticle
bookbook
profileperson
video.movievideo.movie
video.episodevideo.episode
video.tv_showvideo.tv_show
music.songmusic.song
music.albummusic.album
music.playlistmusic.playlist
productproduct
business.businessorganization

Entity-first is validated:

  • Neo4j, Dgraph use flat labels (no inheritance)
  • Wikidata’s hierarchy is documented as problematic
  • OGP’s flat approach has 69.8% web adoption

OGP as foundation:

  • 17 types cover 70% of web
  • Simplicity was intentional design choice
  • Import-friendly for web data

OGP gaps we fill:

  • event — never existed
  • place — deprecated from 1.0
  • organization — consolidated from 6 types into business.business

Beyond OGP (PKM/Personal Data):

  • note, task, message, conversation, thread — core PKM types
  • quote, highlight, comment — content extraction and response
PersonSystemKnown For
Bret TaylorOGPOpen Graph Protocol, Like button
David RecordonOGPOGP design decisions
R.V. GuhaSchema.orgCycL, RSS, RDF, Schema.org
John GiannandreaKnowledge GraphBuilt Google KG
Nathan BronsonTAOFacebook’s graph database


  • schema-relationships.md — Relationship types and patterns
  • SEED_DATA.md — Sample entities to populate
  • research/ — Deep dives on specific systems

Divergent thinking phase. Everything from Google Takeout and Facebook Graph research goes here. We’ll converge and cull later.

These fill gaps in OGP’s content types, especially for audio and newer media formats.

podcast.episode ✅ (Already proposed above)

Section titled “podcast.episode ✅ (Already proposed above)”

A photograph. Distinct from generic file — has geo, faces, albums.

PropertyTypeRequiredDescription
idstringyesNanoID
namestringyesFilename or caption
typesstring[]yes["photo"]
data.taken_atdatetimenoPhoto timestamp
data.latitudenumbernoGPS latitude
data.longitudenumbernoGPS longitude
data.widthnumbernoWidth in pixels
data.heightnumbernoHeight in pixels
data.camerastringnoCamera/device

Research source: Google Photos, Facebook Photo, Instagram IG Media

Relationships:

  • appears_in ← person (faces)
  • part_of → album
  • taken_at → place

A video file. Already partially covered by OGP’s video.* but we may need a standalone type for user-generated content.

PropertyTypeRequiredDescription
idstringyesNanoID
namestringyesTitle
typesstring[]yes["video"]
data.durationintegernoLength in seconds
data.thumbnailstringnoThumbnail URL
data.widthnumbernoWidth in pixels
data.heightnumbernoHeight in pixels

Research source: YouTube, Google Photos, Facebook Video, Instagram IG Media (VIDEO type)

Open question: Is this redundant with video.movie, video.episode? Maybe video is the generic type, and video.movie etc. are specific forms?


An audio file or recording. Not well-covered by OGP (only music.*).

PropertyTypeRequiredDescription
idstringyesNanoID
namestringyesTitle/filename
typesstring[]yes["audio"]
data.durationintegernoLength in seconds
data.transcriptstringnoTranscription text

Research source: Google Voice (voicemails), Threads (AUDIO type), Google Recorder

Use cases: Voice memos, voicemails, audio messages, recordings.


A voice call record. Distinct from message — has duration, no text content.

PropertyTypeRequiredDescription
idstringyesNanoID
namestringnoAuto-generated (“Call with X”)
typesstring[]yes["phone_call"]
data.directionenumyesinbound, outbound, missed
data.durationintegernoLength in seconds
data.started_atdatetimeyesCall start time

Research source: Google Voice

Relationships:

  • from → person (caller)
  • to → person (recipient)
  • recording → audio (if recorded)

A voicemail message with audio and transcription.

PropertyTypeRequiredDescription
idstringyesNanoID
namestringnoAuto-generated or first line of transcript
typesstring[]yes["voicemail"]
data.transcriptstringnoTranscription text
data.durationintegernoLength in seconds
data.received_atdatetimeyesWhen received

Research source: Google Voice

Relationships:

  • from → person (caller)
  • to → person (recipient)
  • audio_file → audio

Open question: Is this just message with channel: voicemail and an audio attachment? Or distinct enough to be its own type?


An e-commerce order/purchase.

PropertyTypeRequiredDescription
idstringyesNanoID
namestringnoOrder number or summary
typesstring[]yes["order"]
data.order_numberstringnoExternal order ID
data.statusenumnoprocessing, shipped, delivered, cancelled
data.totalnumbernoTotal amount
data.currencystringnoISO currency code
data.ordered_atdatetimeyesOrder timestamp

Research source: Gmail schema.org parsing (schema.org/Order), Google Play Store

Relationships:

  • contains → product[] (line items)
  • purchased_by → person
  • sold_by → organization (merchant)
  • shipped_via → delivery

A booking for a future event/service.

PropertyTypeRequiredDescription
idstringyesNanoID
namestringnoReservation summary
typesstring[]yes["reservation"]
data.reservation_typeenumnoflight, hotel, restaurant, event, car
data.confirmation_numberstringnoBooking reference
data.statusenumnoconfirmed, cancelled, pending
data.start_timedatetimenoCheck-in/arrival time
data.end_timedatetimenoCheck-out/departure time

Research source: Gmail schema.org parsing (FlightReservation, HotelReservation, etc.)

Relationships:

  • reserved_for → person
  • at → place (venue/hotel)
  • for → event (if event reservation)

A package shipment being tracked.

PropertyTypeRequiredDescription
idstringyesNanoID
namestringno”Package from X”
typesstring[]yes["delivery"]
data.tracking_numberstringnoCarrier tracking ID
data.carrierstringnoFedEx, UPS, USPS, etc.
data.statusenumnoin_transit, out_for_delivery, delivered
data.expected_bydatetimenoExpected delivery date
data.delivered_atdatetimenoActual delivery time

Research source: Gmail schema.org parsing (ParcelDelivery)

Relationships:

  • part_of → order
  • delivered_to → place (address)
  • shipped_by → organization (carrier)

A financial transaction (payment, transfer).

PropertyTypeRequiredDescription
idstringyesNanoID
namestringnoTransaction description
typesstring[]yes["transaction"]
data.amountnumberyesAmount
data.currencystringyesISO currency code
data.directionenumyescredit, debit
data.timestampdatetimeyesWhen occurred
data.methodstringnoCard, bank, etc.

Research source: Google Pay/Wallet

Relationships:

  • from → person/organization
  • to → person/organization
  • for → order/product

Google Wallet’s Class-Object pattern is interesting: templates (Class) vs instances (Object). We might adopt this.

Generic digital pass. Could be subtyped or use pass_type discriminator.

PropertyTypeRequiredDescription
idstringyesNanoID
namestringyesPass title
typesstring[]yes["pass"]
data.pass_typeenumnoboarding_pass, event_ticket, loyalty, transit, coupon, generic
data.barcodestringnoBarcode/QR data
data.barcode_typeenumnoQR_CODE, AZTEC, PDF_417, CODE_128
data.valid_fromdatetimenoStart of validity
data.valid_untildatetimenoEnd of validity
data.statusenumnoactive, expired, redeemed, void

Research source: Google Wallet (EventTicketObject, LoyaltyObject, TransitObject, FlightObject)

Relationships:

  • held_by → person
  • issued_by → organization
  • for_event → event (if event ticket)
  • for_transit → transit route/line

Open question: Should we have separate types (boarding_pass, event_ticket, loyalty_card) or one pass with discriminator? Google uses separate types. Leaning toward discriminator for simplicity.


loyalty_card (Alternative to pass discriminator)

Section titled “loyalty_card (Alternative to pass discriminator)”

Rewards program membership.

PropertyTypeRequiredDescription
idstringyesNanoID
namestringyesProgram name
typesstring[]yes["loyalty_card"]
data.member_idstringnoMembership number
data.points_balanceintegernoCurrent points
data.tierstringnoMembership tier

Research source: Google Wallet LoyaltyObject

Relationships:

  • held_by → person
  • program_of → organization

A blog container. Holds posts.

PropertyTypeRequiredDescription
idstringyesNanoID
namestringyesBlog title
typesstring[]yes["blog"]
data.descriptionstringnoBlog description
data.urlstringnoBlog URL

Research source: Blogger (Blog resource)

Relationships:

  • authored_by → person
  • published_by → organization
  • Contains posts via part_of relationship

A social media post or blog post. Distinct from article?

PropertyTypeRequiredDescription
idstringyesNanoID
namestringnoTitle (if any)
typesstring[]yes["post"]
data.contentstringnoPost text
data.posted_atdatetimeyesPublication time
data.platformstringnoSource platform
data.permalinkstringnoPermanent URL

Research source: Facebook Post, Blogger Post, Instagram IG Media, Threads Post

Relationships:

  • authored_by → person
  • part_of → blog/page/group
  • comment_on ← comment[]
  • mentions → person[]
  • tagged_at → place

Open question: Is post redundant with article? Maybe:

  • article = long-form, has title, structured (news, essays, blog posts)
  • post = short-form, social media, ephemeral (tweets, status updates)

A bookmarked/saved piece of content.

PropertyTypeRequiredDescription
idstringyesNanoID
namestringyesTitle of saved item
typesstring[]yes["saved_item"]
data.urlstringnoSource URL
data.notestringnoUser annotation
data.saved_atdatetimeyesWhen saved

Research source: Google Saved, Chrome Bookmarks, Facebook Saved

Relationships:

  • saved_by → person
  • saved_from → webpage/article
  • part_of → collection

A user-created grouping of items.

PropertyTypeRequiredDescription
idstringyesNanoID
namestringyesCollection name
typesstring[]yes["collection"]
data.descriptionstringnoDescription

Research source: Google Saved (collections), Chrome bookmark folders, Facebook Collections

Relationships:

  • created_by → person
  • Contains items via part_of relationship

A stay at a location with duration.

PropertyTypeRequiredDescription
idstringyesNanoID
namestringnoAuto-generated from place name
typesstring[]yes["visit"]
data.arrived_atdatetimeyesArrival time
data.departed_atdatetimenoDeparture time
data.confidencenumberno0-100 confidence score

Research source: Google Timeline (placeVisit)

Relationships:

  • visitor → person
  • at → place

Open question: Is this an event subtype? Or distinct because it’s inferred/tracked rather than scheduled?


A movement between places (transit, walk, drive).

PropertyTypeRequiredDescription
idstringyesNanoID
namestringnoAuto-generated
typesstring[]yes["trip"]
data.modeenumnowalking, driving, transit, cycling, flying
data.started_atdatetimeyesStart time
data.ended_atdatetimenoEnd time
data.distancenumbernoDistance in meters

Research source: Google Timeline (activitySegment)

Relationships:

  • traveler → person
  • from → place (origin)
  • to → place (destination)

A typed response to content (beyond simple like).

PropertyTypeRequiredDescription
idstringyesNanoID
namestringnoReaction type name
typesstring[]yes["reaction"]
data.reaction_typeenumyeslike, love, haha, wow, sad, angry, care
data.reacted_atdatetimeyesWhen reacted

Research source: Facebook Reactions

Open question: Is this better modeled as a relationship with metadata rather than an entity? e.g., person --[reacted_to {type: "love", at: "..."}]--> post

Leaning toward relationship, not entity. Reactions don’t have identity beyond the relationship.


A discoverable tag.

PropertyTypeRequiredDescription
idstringyesNanoID
namestringyesTag text (without #)
typesstring[]yes["hashtag"]

Research source: Instagram IG Hashtag, Twitter

Relationships:

  • tagged_with ← post/photo/video

Open question: Is this an entity or just a property/label? Instagram treats it as an entity with its own ID. But for us, maybe data.hashtags: ["tag1", "tag2"] on content is simpler?


A software application.

PropertyTypeRequiredDescription
idstringyesNanoID
namestringyesApp name
typesstring[]yes["app"]
data.developerstringnoDeveloper name
data.platformenumnoandroid, ios, web, desktop
data.versionstringnoCurrent version
data.categorystringnoApp category

Research source: Google Play Store (App)

Relationships:

  • developed_by → organization
  • installed_by → person (relationship with device)

Open question: Is this just product with product_type: app? Probably yes.


A hardware device.

PropertyTypeRequiredDescription
idstringyesNanoID
namestringyesDevice name
typesstring[]yes["device"]
data.device_typeenumnophone, tablet, computer, tv, speaker
data.manufacturerstringnoManufacturer
data.modelstringnoModel name

Research source: Google Play Store (Devices), Google Home

Relationships:

  • owned_by → person
  • runs → app[]

Open question: Is this product with properties? Devices have serial numbers, OS versions, installed apps — more than typical products. Probably worth keeping separate.


An exercise session.

PropertyTypeRequiredDescription
idstringyesNanoID
namestringnoWorkout type/name
typesstring[]yes["workout"]
data.activity_typeenumnorunning, cycling, swimming, walking, gym
data.started_atdatetimeyesStart time
data.ended_atdatetimenoEnd time
data.durationintegernoDuration in seconds
data.distancenumbernoDistance in meters
data.caloriesnumbernoCalories burned

Research source: Google Fit

Relationships:

  • performed_by → person
  • at → place (gym, route)

Summary: Proposed Entity Types (Comprehensive)

Section titled “Summary: Proposed Entity Types (Comprehensive)”
CategoryTypesStatus
Core (OGP)article, book, person, website, product, event, place, organization, concept✅ Defined
Music (OGP)music.song, music.album, music.playlist✅ Defined
Video (OGP)video.movie, video.episode, video.tv_show✅ Defined
PKMnote, task, message, conversation, thread, quote, highlight, comment✅ Defined
Podcastpodcast.show, podcast.episode✅ Proposed
Reviewreview✅ Proposed
Mediaphoto, video (generic), audio🆕 Proposed
Communicationphone_call, voicemail🆕 Proposed
Commerceorder, reservation, delivery, transaction🆕 Proposed
Passespass (with discriminator) OR boarding_pass, event_ticket, loyalty_card, transit_pass🆕 Proposed
Content Orgblog, post, saved_item, collection🆕 Proposed
Locationvisit, trip🆕 Proposed
Platformapp, device🆕 Proposed (maybe product subtypes)
Healthworkout🆕 Low priority
  1. post vs article — Are these distinct types or is post just short-form article?

  2. photo vs video vs generic media — Separate types or one media type with discriminator?

  3. pass types — One pass with pass_type discriminator or separate types?

  4. Reactions as entities or relationships? — Leaning toward relationships with metadata.

  5. Hashtags as entities or properties? — Leaning toward properties on content.

  6. app and device — Subtypes of product or distinct?

  7. visit and trip — Are these event subtypes or distinct?

  8. voicemail — Is this message with channel/attachment or distinct?


Separate templates (shared structure) from instances (individual items). Could apply to:

  • Event tickets (event class → individual tickets)
  • Loyalty programs (program → memberships)
  • Subscriptions (plan → user subscriptions)

Use dot notation for domain-specific types: music.song, video.movie, podcast.episode. Keeps types flat while grouping related concepts.

Instead of many types, use one type with a discriminator:

  • message with channel: email|chat|sms|voicemail
  • media with media_type: photo|video|audio
  • pass with pass_type: boarding_pass|event_ticket|loyalty

Gmail parses schema.org from emails. Our types should map cleanly:

  • order ↔ schema.org/Order
  • reservation ↔ schema.org/Reservation
  • delivery ↔ schema.org/ParcelDelivery

Google’s My Activity provides cross-product activity records. Pattern:

{
"header": "product_name",
"title": "action description",
"time": "timestamp",
"products": ["source_products"]
}

We rejected activity as a type, but the pattern of “timestamped action on entity” is valuable — model as relationships with timestamps.

Facebook distinguishes:

  • Containers: Group, Album, Event, Page (have members/items)
  • Content: Post, Photo, Video, Comment (created by actors)

Our parallel:

  • Containers: conversation, thread, collection, blog, album
  • Content: message, note, article, photo, comment

Design Principles from Research (2026-01-24)

Section titled “Design Principles from Research (2026-01-24)”

Synthesis of key learnings across all research sources. These principles should guide the convergence phase.

Principle 1: Entity-First, Not Hierarchical

Section titled “Principle 1: Entity-First, Not Hierarchical”

Source: Neo4j, Dgraph, OGP, Wikidata critique

  • Types are labels, not classes
  • No inheritance trees (avoid Schema.org’s 827-type hierarchy)
  • Entities can have multiple types
  • Flat structure with dot-notation grouping (OGP pattern: music.song, video.movie)

Principle 2: OGP as Foundation, Fill the Gaps

Section titled “Principle 2: OGP as Foundation, Fill the Gaps”

Source: OGP research, web adoption data (69.8%)

  • OGP’s 17 types cover 70% of web content
  • Fill gaps: event, place, organization (deprecated from OGP 1.0)
  • Add PKM types: note, task, message, conversation, thread
  • Preserve original OGP data in data.og on import

Principle 3: Discriminator Fields Over Type Proliferation

Section titled “Principle 3: Discriminator Fields Over Type Proliferation”

Source: Google Wallet, OGP, Schema.org lessons

  • One pass type with pass_type: boarding_pass | event_ticket | loyalty
  • One message type with channel: email | chat | sms
  • One media type with media_type: photo | video | audio
  • Keeps type count manageable, properties flexible

Principle 4: Class-Object Pattern for Templates

Section titled “Principle 4: Class-Object Pattern for Templates”

Source: Google Wallet

  • Separate shared templates (Class) from individual instances (Object)
  • Event ticket: EventTicketClass (venue, date) + EventTicketObject (seat, holder)
  • Useful for: tickets, loyalty programs, subscriptions

Principle 5: Schema.org Alignment for Import

Section titled “Principle 5: Schema.org Alignment for Import”

Source: Gmail, schema.org research

  • Gmail parses schema.org from emails (Order, FlightReservation, ParcelDelivery)
  • Our types should map cleanly for import
  • Supports web data extraction via JSON-LD

Principle 6: Index Layer, Not File System Replacement

Section titled “Principle 6: Index Layer, Not File System Replacement”

Source: WinFS post-mortem, Spotlight success, Baloo

  • Don’t build a “semantic file system”
  • Build an index/query layer over existing files
  • Purpose-built storage per entity type
  • Graceful degradation if indexing fails

Source: PKM community research, semantic file systems

  • Manual tagging fails at scale (users abandon after months)
  • AI extraction is consistent regardless of user state
  • Extracted entities carry confidence scores
  • Humans confirm important entities

Source: OGP vs Schema.org, FOAF, CommonMark vs full Markdown

  • OGP: 4 required properties, 69.8% adoption
  • Schema.org: 827 types, 52.6% adoption (despite Google backing)
  • Simple vocabularies get adopted; complex ones don’t
  • Start minimal, extend later

CategoryTypesSource
Core (OGP)article, book, person, website, productOGP spec
OGP Gapsevent, place, organization, conceptOGP 1.0 + our additions
Music (OGP)music.song, music.album, music.playlistOGP spec
Video (OGP)video.movie, video.episode, video.tv_showOGP spec
CategoryTypesSource
PKM Corenote, task, message, conversation, threadPKM research, Google Takeout
Contentquote, highlight, comment, reviewReadwise, Goodreads, YouTube
Podcastpodcast.show, podcast.episodeApple Podcasts, Spotify

Tier 3: Extended Types (Medium Confidence)

Section titled “Tier 3: Extended Types (Medium Confidence)”
CategoryTypesSource
Mediaphoto, video (generic), audioGoogle Photos, YouTube
Communicationphone_call, voicemailGoogle Voice
Commerceorder, reservation, delivery, transactionGmail schema.org
Passespass (with discriminator)Google Wallet
Content Orgblog, post, saved_item, collectionBlogger, Google Saved
Locationvisit, tripGoogle Timeline
EntityStatusNotes
deviceMaybeCould be product with properties
appMaybeCould be product with product_type: app
workoutLow priorityDomain-specific
reactionSkipModel as relationship with metadata
hashtagSkipModel as property on content

Abstracted from FamilySearch/GEDCOM research. These patterns apply universally across entities, relationships, and properties — not just genealogy.

Where do we apply these patterns?

LevelMeaningExample
Entity-levelOne provenance/evidence per entity”This person came from GEDCOM import”
Property-levelEach property has its own metadata”Birthdate from census (secondary), death date from certificate (primary)”

FamilySearch does property-level — each fact has its own sources, confidence, status.

Current leaning: Property-level (more granular, more accurate, FamilySearch-proven pattern)

Tradeoff: More complexity, but better fidelity. Worth it for knowledge graphs where source quality matters.


Source: FamilySearch QUAY system, GEDCOM X

Two orthogonal dimensions:

DimensionWhat it measuresScale
QualityHow good is the underlying evidence?0-3 (unreliable → primary)
ConfidenceHow certain are we this is true?0.0-1.0

Why both matter:

ScenarioQualityConfidence
Tabloid article claims XLow (unreliable source)High (we know what it says)
Damaged primary documentHigh (original source)Low (barely readable)
AI extraction from reliable sourceHighMedium (AI uncertainty)
Bible says event X happenedLow (religious text)High (text clearly states it)

Schema:

evidence:
quality: enum
- 0: unreliable # Hearsay, rumors, unverified
- 1: questionable # Circumstantial, secondary interpretation
- 2: secondary # Derived from primary, official but not original
- 3: primary # Original document, firsthand account
confidence: float # 0.0-1.0
method: enum
- manual # Human entered
- imported # From external source
- ai_extracted # AI determined
- computed # Derived from other data
- inferred # Logical inference

Use cases:

  • Genealogy: Census record (quality: 2, secondary) vs birth certificate (quality: 3, primary)
  • Research: Peer-reviewed paper (quality: 3) vs blog post (quality: 1)
  • Investigation: Eyewitness account (quality: 3) vs rumor (quality: 0)
  • AI: “Extracted with 85% confidence from a primary source”

Source: GEDCOM 7 date modifiers

Not all dates are exact. We need to represent uncertainty without forcing false precision.

Schema:

temporal:
# Core value
value: datetime # ISO 8601 (best approximation)
# Precision modifier
modifier: enum
- exact # We know this precisely
- about # ABT - approximately
- estimated # EST - educated guess
- calculated # CAL - computed from other data
- before # BEF - happened before this date
- after # AFT - happened after this date
# For ranges
range:
start: datetime
end: datetime
type: enum [between, from_to, or] # BET, FROM/TO, OR
# Original text (preserves source)
original: string # "about 1850", "Q2 2025", "during the Renaissance"

Examples:

ScenarioRepresentation
Exact date{ value: "2024-01-15", modifier: "exact" }
Approximate{ value: "1850-01-01", modifier: "about", original: "about 1850" }
Before{ value: "1900-01-01", modifier: "before", original: "before 1900" }
Range{ range: { start: "1848", end: "1852", type: "between" }, original: "between 1848 and 1852" }
Vague{ value: "1400-01-01", modifier: "about", original: "during the Renaissance" }

Where this applies:

  • Birthdates, death dates (genealogy)
  • Event dates (history, calendar)
  • Publication dates (articles, books)
  • Any temporal property in the schema

Pattern 3: Provenance (Source Attribution)

Section titled “Pattern 3: Provenance (Source Attribution)”

Source: FamilySearch source citations, GEDCOM X

Everything should be traceable to where it came from.

Schema:

provenance:
source: entity_id | url | text # What's the source?
contributor: entity_id # Who added this? (person or system)
method: enum # How was it determined?
- manual
- imported
- ai_extracted
- computed
- inferred
created_at: datetime # When was it added?
modified_at: datetime # When was it last changed?
quality: 0-3 # Evidence quality (from Pattern 1)
status: enum # Verification status
- unverified # Not yet checked
- verified # Human confirmed
- challenged # Disputed
- disproven # Known to be false

Examples:

ScenarioProvenance
Manual entry{ contributor: "user_joe", method: "manual", status: "verified" }
Web import{ source: "https://...", method: "imported", quality: 2 }
AI extraction{ method: "ai_extracted", contributor: "system", status: "unverified" }
Computed{ method: "computed", source: "derived from parent relationships" }
GEDCOM import{ source: "family.ged", method: "imported", contributor: "user_joe" }

Where this applies:

  • Entities: “Where did we learn about this person/place/organization?”
  • Relationships: “Who said these two things are connected?”
  • Properties: “What’s the source for this birthdate/event/claim?”

Source: FamilySearch names with types, Facebook Graph

Any entity can have multiple names over time.

Schema:

entity:
name: string # Canonical/display name
names:
- value: string # The name
type: enum # Type of name
- primary # Main/current name
- birth # Name at birth
- married # Name after marriage
- former # Previous name
- legal # Legal/official name
- nickname # Informal name
- alias # Alternative name
- stage # Stage/pen name
- dba # "Doing business as" (organizations)
- maiden # Pre-marriage name
- religious # Religious/adopted name
valid_from: temporal # When this name started (optional)
valid_until: temporal # When this name ended (optional)
provenance: provenance # Where we learned this (optional)

Examples:

Entity TypeAliases
PersonBirth name, married name, nickname, stage name, pen name
OrganizationLegal name, DBA, abbreviation, former name (“Facebook” → “Meta”)
PlaceOfficial name, local name, historical name (“Peking” → “Beijing”)
ProductProduct name, SKU, model number, codename

Example person:

person:
id: "abc123"
name: "John Smith" # Display name
names:
- value: "John Robert Smith"
type: "birth"
- value: "Johnny"
type: "nickname"
- value: "J.R. Smith"
type: "professional"
- value: "John Smith-Jones"
type: "married"
valid_from: { value: "2020-06-15", modifier: "exact" }

A fully-attributed property might look like:

person:
id: "abc123"
name: "John Smith"
birthdate:
value:
value: "1850-01-01"
modifier: "about"
original: "abt 1850"
evidence:
quality: 2 # Secondary (census record)
confidence: 0.8
method: "imported"
provenance:
source: "1850 US Census, Page 42"
contributor: "user_joe"
status: "verified"

This is verbose but captures exactly what we know and how we know it. For most properties, we’d use simpler representations and only add full attribution where it matters.


  1. Prioritize Tier 1 + Tier 2 — These are the core schema
  2. Resolve Open Design Questions — See “Open Design Questions” section above
  3. Decide on discriminator patterns — Which types use discriminators vs separate types?
  4. Define import mappings — OGP → our types, schema.org → our types
  5. Decide on granularity level — Entity-level vs property-level metadata (leaning property-level)
  6. Spec the data model — SQLite schema, JSON representation
  7. Build seed data — See SEED_DATA.md