Reverse Engineering — Social Network Patterns
How to model people, relationships, and social data across platforms like Goodreads, Twitter/X, MySpace, LinkedIn, Instagram, etc.
Core Principle: People First, Accounts Second
Section titled “Core Principle: People First, Accounts Second”Every social platform has users. But the same person exists across many platforms. The graph should model this in two layers:
| Entity | What it represents | Cross-platform? |
|---|---|---|
| person | A real human being | Yes — mergeable across platforms |
| account | Their profile on one platform | No — platform-specific |
A person has accounts. An account belongs to a person.
adapters: person: id: .user_id name: .name image: .photo_url location: .location data.gender: .gender data.age: .age data.birthday: .birthday data.website: .website
has_account: account: id: .user_id name: .name handle: .handle url: .profile_url image: .photo_urlWhy this matters: When you later build Twitter and find the same person (by name, website, or explicit cross-link), you can merge the person entities while keeping both accounts distinct. The person is the anchor.
Social Relationship Types
Section titled “Social Relationship Types”Every social network has some subset of these relationship patterns:
Symmetric (mutual)
Section titled “Symmetric (mutual)”Both parties agree. The relationship is bidirectional.
| Relationship | Examples |
|---|---|
| friends | Facebook, Goodreads, MySpace |
Operation pattern: list_friends(user_id) → person[]
Asymmetric (directed)
Section titled “Asymmetric (directed)”One party follows, the other may or may not follow back.
| Relationship | Examples |
|---|---|
| following | Twitter, Instagram, Goodreads |
| followers | Twitter, Instagram, Goodreads |
Operation pattern: two separate operations with different directions.
list_following: description: People this user follows returns: person[]
list_followers: description: People following this user returns: person[]Group membership
Section titled “Group membership”User belongs to a group/community.
| Relationship | Examples |
|---|---|
| member_of | Goodreads groups, Facebook groups, Reddit subreddits, Discord servers |
list_groups: returns: group[]Profile Depth: Light vs Rich
Section titled “Profile Depth: Light vs Rich”Social operations return people at two levels of depth:
Light (from list operations)
Section titled “Light (from list operations)”When you scrape a friends list or followers page, you get limited data per person:
{ "user_id": "10000001", "name": "Alex Reader", "photo_url": "https://...", "location": "Berlin", "books_count": 414, "friends_count": 138,}This is what list_friends, list_following, list_followers return.
Enough to create the person entity and the relationship edge.
Rich (from profile scrape)
Section titled “Rich (from profile scrape)”When you scrape an individual profile page, you get the full picture:
{ "user_id": "10000001", "name": "Alex Reader", "handle": "alexreader", "photo_url": "https://...", "gender": "...", "age": 32, "birthday": "...", "location": "Berlin, Germany", "website": "https://example.com", "about": "...", "interests": "...", "joined_date": "January 2015", "ratings_count": 159, "avg_rating": "3.82", "friends_count": 138, "favorite_books": [...], "currently_reading": [...], "favorite_genres": [...],}This is what get_person(user_id) returns.
Pattern: Always provide both. The light operations populate the graph with
stubs. The rich operation fills them in when you need the detail. The adapter
handles both — missing fields are just null.
Authors Are People Too
Section titled “Authors Are People Too”On platforms with content creators (Goodreads authors, Twitter blue-checks, YouTube channels), the creators are people with special roles. Model them as:
- person entity (they’re a human being)
- author/creator entity (their creative identity)
- account entity (their platform presence)
On Goodreads, an author appears in multiple contexts:
| Context | How we encounter them |
|---|---|
Book’s written_by relationship | author entity with ID and URL |
list_following results | person entity (they follow authors) |
| Quote attribution | author entity |
| Author profile page | full author entity with books |
The key insight: extract real author IDs everywhere, not just name strings.
When a book list shows “Christie, Agatha” as a link to /author/show/123715,
capture the ID so the graph can connect the book → author → their other books.
author_el = row.select_one("td.field.author a")if author_el: href = author_el.get("href", "") m = re.search(r"/author/show/(\d+)", href) if m: author_id = m.group(1) author_url = _abs_url(href)Also fix name ordering — many platforms store names as “LastName, FirstName” in table views:
def _flip_name(name: str) -> str: if "," in name: parts = [p.strip() for p in name.split(",", 1)] if len(parts) == 2 and parts[1]: return f"{parts[1]} {parts[0]}" return nameContent People Create
Section titled “Content People Create”Social platforms aren’t just about connections — people create content. Each platform has its own content types that should map to entities:
| Platform | Content types | Entity mapping |
|---|---|---|
| Goodreads | Books read, reviews, ratings, quotes | book, review, quote |
| Tweets, retweets, likes | post, engagement | |
| MySpace | Music, blog posts, comments | track, post, comment |
| Photos, stories, reels | media, story | |
| Posts, articles, endorsements | post, article |
The person’s relationship to content matters:
# Things a person createdperson → wrote → reviewperson → posted → post
# Things a person engaged withperson → rated → book (with rating value)person → liked → quoteperson → saved → book (to shelf)
# Things attributed to a personquote → attributed_to → authorbook → written_by → authorProfile Page Parsing Patterns
Section titled “Profile Page Parsing Patterns”Social profile pages follow remarkably similar structures across platforms. Common patterns:
Info box / details section
Section titled “Info box / details section”Most profiles have a key-value info section:
titles = soup.select(".infoBoxRowTitle")items = soup.select(".infoBoxRowItem")info = {}for t, v in zip(titles, items): label = clean(t.get_text()).lower() value = clean(v.get_text()) info[label] = valueStats bar
Section titled “Stats bar”Ratings, posts, followers — usually near the top:
stats_text = clean(stats_el.get_text())ratings = re.search(r"([\d,]+)\s+ratings?", stats_text)avg = re.search(r"\(([\d.]+)\s+avg\)", stats_text)Section headers → content blocks
Section titled “Section headers → content blocks”Profile pages have named sections (favorite books, currently reading, groups). The header-to-content relationship varies by platform:
# Pattern 1: Header is inside a container, content is a sibling divfor hdr in soup.select("h2.brownBackground"): parent_box = hdr.find_parent("div", class_="bigBox") body = parent_box.select_one(".bigBoxBody") if parent_box else None
# Pattern 2: Header IS the container, content followsfor hdr in soup.select(".sectionHeader"): body = hdr.find_next_sibling()
# Pattern 3: Header + content share a parentfor section in soup.select(".profileSection"): title = section.select_one("h3") content = section.select_one(".sectionContent")Always check the actual DOM structure — don’t assume.
Pagination for Social Lists
Section titled “Pagination for Social Lists”Social lists (friends, followers, following) almost always paginate. Key patterns from Goodreads that will apply elsewhere:
Auto-pagination with page=0
Section titled “Auto-pagination with page=0”def list_friends(user_id, page=0, ...): """page=0 means fetch all pages automatically.""" if page > 0: return _fetch_single_page(page)
all_items = [] seen = set() for p in range(1, MAX_PAGES + 1): items = _fetch_single_page(p) new = [i for i in items if i["user_id"] not in seen] all_items.extend(new) seen.update(i["user_id"] for i in new) if not _has_next(html_text): break return all_itemsNext-page detection
Section titled “Next-page detection”def _has_next(html_text: str) -> bool: return 'class="next_page"' in html_text or "rel=\"next\"" in html_textSafety limits
Section titled “Safety limits”Always cap pagination to prevent infinite loops:
MAX_PAGES = 50Cross-Platform Identity Signals
Section titled “Cross-Platform Identity Signals”When building skills for multiple social networks, look for identity signals that help merge person entities across platforms:
| Signal | Reliability | Example |
|---|---|---|
| Explicit cross-link | High | Website URL in bio pointing to another profile |
| Same handle | Medium | @jcontini on both Twitter and Goodreads |
| Same name + location | Low | ”Joe Contini, Austin TX” |
| Same profile photo | Medium | Image similarity matching |
| Email (if available) | High | Unique identifier |
For now, just capture everything. The website field on a person’s profile
is particularly valuable — it often links to a personal site that aggregates
all their social profiles.
Checklist for a New Social Network Skill
Section titled “Checklist for a New Social Network Skill”When building a skill for a new social platform:
- Identify the entity types — what do people create, consume, and engage with?
- Map relationships — friends? followers? groups? what content do they produce?
- Model as person → account — not just accounts
- Light + rich profiles — list operations for stubs, get_person for detail
- Extract real IDs everywhere — not just name strings; follow links for IDs
- Capture cross-platform signals — website, handle, email
- Auto-paginate social lists — friends, followers, etc. are always paginated
- Handle name formatting — “LastName, FirstName” flipping, Unicode, etc.
- Look for section-based profile data — favorite X, currently Y, groups, etc.
- Test with a real profile — verify data richness against what you see in the browser
Real-World Examples
Section titled “Real-World Examples”| Skill | Social patterns used | Reference |
|---|---|---|
skills/goodreads/ | person → account, friends, following/followers, groups, quotes, authors as people, favorite books, currently reading, profile scraping | web_scraper.py |
Future: skills/myspace/ | person → account, friends, followers, music, blog posts | — |
Future: skills/twitter/ | person → account, following/followers, tweets, likes, retweets | — |