Graph Databases - Learning Module

Loading content...

0/241

Nodes and Edges

The Atoms of Graph Data

If the graph model is the blueprint, nodes and edges are the bricks and mortar. Every entity you store, every connection you represent, every query you write—all reduce to operations on these two fundamental structures.

But the simplicity is deceptive. How you design your node labels determines query efficiency. How you structure relationship types affects data modeling flexibility. How you distribute properties between nodes and edges impacts both storage and traversal performance.

Mastering nodes and edges isn't about learning what they are—that's trivial. It's about developing the judgment to use them effectively across diverse graph modeling challenges.

What You Will Learn

By the end of this page, you will understand nodes and edges at an expert level—their internal structure, property systems, labeling strategies, relationship type design, the critical role of schema constraints, and the practical patterns that distinguish professional graph modeling from naive approaches.

Nodes In Depth

A node (or vertex) represents an entity in your domain—a person, product, transaction, document, location, or any discrete concept that participates in relationships. But nodes in property graph databases are far richer than simple vertices in mathematical graphs.

Anatomy of a Node:

Every node consists of three core components:

Identity: A unique, immutable identifier assigned by the database. This internal ID enables O(1) lookups and relationship pointers.
Labels: Zero or more type classifications that categorize the node. Labels serve as the primary mechanism for organizing and querying nodes.
Properties: A map of key-value pairs storing the node's attributes. Properties hold the actual data about the entity.

node-anatomy.txt
NODE STRUCTURE ANATOMY
======================
 
┌─────────────────────────────────────────────────────────────────────┐
│                         NODE INSTANCE                                │
├─────────────────────────────────────────────────────────────────────┤
│  IDENTITY                                                            │
│  ─────────                                                           │
│  Internal ID: 4298731 (database-assigned, immutable, reusable)       │
│  Element ID: "4:abc123:4298731" (Neo4j 5.x external-safe ID)         │
│                                                                      │
│  LABELS                                                              │
│  ──────                                                              │
│  Primary:    :Person                                                 │
│  Secondary:  :Employee, :Engineer, :TeamLead                         │
│                                                                      │
│  PROPERTIES                                                          │
│  ──────────                                                          │
│  {                                                                   │
│    "name": "Alice Chen",           // String                         │
│    "email": "alice@techcorp.com",  // String (indexed, unique)       │
│    "age": 32,                      // Integer                        │
│    "salary": 145000.00,            // Float                          │
│    "isActive": true,               // Boolean                        │
│    "skills": ["Python", "Go"],     // List<String>                   │
│    "hiredDate": date("2019-03-15"),// Temporal                       │
│    "location": point({             // Spatial                        │
│        latitude: 37.7749,                                            │
│        longitude: -122.4194                                          │
│    }),                                                               │
│    "metadata": {                   // Map (nested)                   │
│        "source": "HR_IMPORT",                                        │
│        "version": 3                                                  │
│    }                                                                 │
│  }                                                                   │
└─────────────────────────────────────────────────────────────────────┘

Internal vs External IDs

Internal node IDs are for database use only. They may be recycled after node deletion (Neo4j) or change during database operations (some implementations). Never expose internal IDs to external systems or use them as foreign keys in other databases. Instead, create explicit identity properties (e.g., userId, productSku) and index them for lookups.

Label Design Strategies

Labels are the graph equivalent of table names—but with crucial differences. A node can have multiple labels simultaneously, enabling flexible, overlapping categorization that would require complex joins in relational systems.

Label Functions:

Query Targeting: MATCH (p:Person) scans only Person-labeled nodes, not the entire graph
Index Scoping: Indexes are label-specific, enabling efficient property lookups within categories
Constraint Binding: Uniqueness and existence constraints apply per label
Schema Documentation: Labels communicate domain semantics to developers

Label Design Patterns
Pattern	Example	Use Case	Trade-offs
Single Primary Label	`:Person`, `:Product`, `:Order`	Core entity types with distinct schemas	Simple, clear; may miss cross-cutting concerns
Hierarchical Labels	`:Animal`, `:Mammal`, `:Dog`	Taxonomy with inheritance semantics	Flexible queries at any level; redundant storage
Role-Based Labels	`:User`, `:Admin`, `:Moderator`	Same entity acting in different capacities	Query by role; nodes may accumulate many labels
State Labels	`:Order:Pending`, `:Order:Shipped`	Entity lifecycle stages	Fast state queries; requires label updates on transitions
Capability Labels	`:Searchable`, `:Cacheable`	Cross-cutting technical concerns	Infrastructure queries; mixes domain and technical concerns

label-patterns.cypher

Cypher

// MULTI-LABEL STRATEGY: E-COMMERCE USER ROLES
// =============================================
 
// Creating a user with multiple roles
CREATE (u:User:Customer:PremiumMember {
    userId: "USR-001",
    email: "alice@example.com",
    name: "Alice Chen",
    memberSince: date("2020-01-15"),
    premiumTier: "Gold"
})
 
// Later, user becomes a seller too
MATCH (u:User {userId: "USR-001"})
SET u:Seller
 
// Query by any applicable label:
// - All users (broad)
MATCH (u:User) RETURN count(u)
 
// - Only customers who can purchase
MATCH (c:Customer) RETURN c.name
 
// - Premium members for special offers
MATCH (p:PremiumMember) RETURN p.email
 
// - Sellers for marketplace analytics
MATCH (s:Seller) RETURN s.userId
 
// =============================================
// HIERARCHICAL LABELS: CONTENT MANAGEMENT
// =============================================
 
// Base content type with specializations
CREATE (a:Content:Article:NewsArticle {
    contentId: "ART-2024-001",
    title: "Breaking Technology News",
    publishedAt: datetime(),
    category: "Technology"
})
 
CREATE (v:Content:Article:OpinionPiece {
    contentId: "ART-2024-002", 
    title: "Why AI Matters",
    author: "Tech Columnist"
})
 
CREATE (m:Content:Media:Video {
    contentId: "VID-2024-001",
    title: "Product Demo",
    duration: 300
})
 
// Query at different hierarchy levels:
// All content for sitemap
MATCH (c:Content) RETURN c.contentId
 
// Only articles for reading list
MATCH (a:Article) RETURN a.title
 
// Only opinion pieces for editor review
MATCH (o:OpinionPiece) RETURN o.author, o.title

Label Best Practices

1. Use nouns, not adjectives — :Person not :Personal. 2. Prefer singular form — :Product not :Products. 3. Use PascalCase — :ShoppingCart not :shopping_cart. 4. Limit to 4-5 labels per node — Too many labels slow writes and complicate queries. 5. Avoid labels for boolean states — Use properties for simple flags; reserve labels for query-critical categorization.

Property Types and Indexing

Properties store the actual data within nodes (and edges). Unlike schemaless document databases where properties can contain arbitrary structures, graph database properties are typically scalar values or homogeneous lists—a constraint that enables efficient indexing and comparison operations.

Supported Property Types (Neo4j as reference):

Property Data Types
Category	Types	Examples	Indexable
Numeric	Integer, Float	`42`, `3.14159`, `-273`	Yes (range queries)
Text	String	`"Alice"`, `"product-sku-123"`	Yes (exact, prefix, full-text)
Boolean	Boolean	`true`, `false`	Yes (exact match)
Temporal	Date, Time, DateTime, Duration	`date('2024-01-15')`, `duration('P1Y2M')`	Yes (range queries)
Spatial	Point (2D/3D, Cartesian/WGS84)	`point({lat: 37.77, lon: -122.41})`	Yes (distance, bounding box)
Collections	List (homogeneous)	`['red', 'blue', 'green']`, `[1, 2, 3]`	Partial (contains checks)
Nested	Map (limited)	`{key: 'value', nested: {a: 1}}`	No direct indexing

indexing-strategies.cypher

Cypher

// INDEX TYPES AND THEIR PURPOSES
// ================================
 
// 1. RANGE INDEX (default since Neo4j 5.0)
// Used for: Exact lookups, prefix searches, range queries
CREATE INDEX person_email_idx FOR (p:Person) ON (p.email)
 
// Query patterns supported:
MATCH (p:Person) WHERE p.email = 'alice@example.com' RETURN p   // Exact
MATCH (p:Person) WHERE p.email STARTS WITH 'alice' RETURN p     // Prefix
MATCH (p:Person) WHERE p.age > 25 AND p.age < 35 RETURN p       // Range
 
// 2. COMPOSITE INDEX (multiple properties)
// Used for: Queries filtering on multiple properties together
CREATE INDEX person_composite_idx FOR (p:Person) ON (p.country, p.city)
 
// Efficient for:
MATCH (p:Person) WHERE p.country = 'USA' AND p.city = 'NYC' RETURN p
MATCH (p:Person) WHERE p.country = 'USA' RETURN p  // Uses index (left-prefix)
 
// NOT efficient for:
MATCH (p:Person) WHERE p.city = 'NYC' RETURN p  // Ignores index
 
// 3. FULL-TEXT INDEX (Lucene-backed)
// Used for: Natural language search, fuzzy matching
CREATE FULLTEXT INDEX product_search_idx 
FOR (p:Product) 
ON EACH [p.name, p.description]
 
// Query with full-text search:
CALL db.index.fulltext.queryNodes("product_search_idx", "wireless bluetooth")
YIELD node, score
RETURN node.name, score
ORDER BY score DESC
 
// 4. POINT INDEX (spatial)
// Used for: Distance queries, bounding box searches
CREATE POINT INDEX location_point_idx FOR (l:Location) ON (l.coordinates)
 
// Find locations within 10km
MATCH (l:Location)
WHERE point.distance(l.coordinates, point({lat: 37.77, lon: -122.41})) < 10000
RETURN l.name
 
// 5. TOKEN LOOKUP INDEX (label-based)
// Used for: Label scans, existence checks
CREATE LOOKUP INDEX node_label_idx FOR (n) ON EACH labels(n)
 
// Fast label existence check:
MATCH (n:RareLabel) RETURN count(n)
 
// ================================
// CONSTRAINT TYPES
// ================================
 
// Unique constraint (creates index automatically)
CREATE CONSTRAINT person_email_unique 
FOR (p:Person) 
REQUIRE p.email IS UNIQUE
 
// Node key constraint (uniqueness + existence for composite key)
CREATE CONSTRAINT order_key 
FOR (o:Order) 
REQUIRE (o.orderId, o.region) IS NODE KEY
 
// Property existence constraint
CREATE CONSTRAINT person_name_exists 
FOR (p:Person) 
REQUIRE p.name IS NOT NULL
 
// Property type constraint (Neo4j 5.9+)
CREATE CONSTRAINT person_age_type 
FOR (p:Person) 
REQUIRE p.age IS :: INTEGER

Index Selection Strategy

Create indexes based on query patterns, not data structure. Profile your common queries with EXPLAIN and PROFILE commands. Indexes accelerate reads but slow writes—every indexed property requires index updates on modification. Rule of thumb: index properties used in WHERE clauses of frequent queries, especially those filtering large portions of the graph.

Relationships (Edges) In Depth

If nodes are the nouns of your graph, relationships are the verbs—they express how entities connect, interact, and relate. In property graph databases, relationships are first-class citizens with their own identities, types, and properties.

Relationship Anatomy:

Every relationship consists of:

Identity: Unique internal ID (like nodes)
Type: A single mandatory label classifying the relationship
Direction: Start node → End node (always directed in storage)
Properties: Key-value pairs describing the relationship
Endpoints: References to start and end nodes

relationship-anatomy.txt
RELATIONSHIP STRUCTURE ANATOMY
==============================
 
┌─────────────────────────────────────────────────────────────────────┐
│                      RELATIONSHIP INSTANCE                           │
├─────────────────────────────────────────────────────────────────────┤
│  IDENTITY                                                            │
│  ─────────                                                           │
│  Internal ID: 89234571                                               │
│                                                                      │
│  TYPE (MANDATORY, SINGLE)                                            │
│  ────                                                                │
│  :WORKS_FOR                                                          │
│                                                                      │
│  DIRECTION                                                           │
│  ─────────                                                           │
│  Start Node: (Person:4298731) ──────────►  End Node: (Company:892)  │
│         "Alice Chen"                           "TechCorp"            │
│                                                                      │
│  PROPERTIES                                                          │
│  ──────────                                                          │
│  {                                                                   │
│    "since": date("2019-03-15"),      // When relationship started   │
│    "role": "Senior Engineer",         // Current position            │
│    "department": "Platform",          // Organizational unit         │
│    "salary": 145000.00,               // Relationship-specific data  │
│    "isRemote": true,                  // Work arrangement            │
│    "performanceRating": 4.5           // Annual review score         │
│  }                                                                   │
└─────────────────────────────────────────────────────────────────────┘
 
VISUAL REPRESENTATION:
                                               
   ┌──────────────┐       :WORKS_FOR         ┌──────────────┐
   │    Alice     │  ──────────────────────► │   TechCorp   │
   │   :Person    │   since: 2019-03-15      │   :Company   │
   └──────────────┘   role: Sr. Engineer     └──────────────┘
                       salary: 145000
 
NOTE: Direction is always stored, but queries can traverse either way:
      (alice)-[:WORKS_FOR]->(company)   // Follow direction
      (alice)<-[:EMPLOYS]-(company)     // Reverse direction  
      (alice)-[:WORKS_FOR]-(company)    // Either direction

Key Differences from Nodes:

Single Type Requirement: Nodes can have multiple labels; relationships have exactly ONE type. This reflects their semantic specificity—a connection is ONE kind of relationship.
Mandatory Direction: Relationships are always stored with direction, though queries can ignore it. This enables asymmetric relationship modeling (followers vs following).
No Label Indices: Relationship types aren't indexed like node labels. Type filtering happens during traversal, not via index lookup. This affects query planning.
Endpoint Coupling: Relationships exist only between nodes. Deleting a node deletes all attached relationships (cascading delete is automatic).

Direction Matters for Performance

While queries can traverse relationships in either direction, storage is directional. Neo4j stores relationships in a doubly-linked list from both endpoints, so direction doesn't affect traversal speed. However, direction affects semantics—:FOLLOWS, :MANAGES, and :PURCHASED have natural directions. Model them correctly to keep queries intuitive and maintainable.

Relationship Type Design

Relationship types are the semantic backbone of your graph model. Well-designed types make queries expressive and efficient; poor choices lead to property filtering and complex workarounds.

Design Principles:

Relationship Type Best Practices

•Use verbs, not nouns — :MANAGES, :PURCHASED, :AUTHORED not :Manager, :Purchase, :Authorship
•Be specific over generic — :PURCHASED, :VIEWED, :WISHLISTED not simply :INTERACTED_WITH
•Use SCREAMING_SNAKE_CASE — :WORKS_FOR, :REPORTED_BY following Neo4j conventions
•Direction should be natural — :FOLLOWS points from follower to followed, :MANAGES from manager to report
•Avoid relationship explosion — Don't encode property values in types (:RATED_5_STARS); use properties instead
•Consider query patterns — If you frequently filter by a property, consider whether it should be a separate relationship type

poor-design.cypher

Poor Design

// ANTIPATTERN: Generic relationships
// ==================================
 
// Everything is RELATED_TO
CREATE (a)-[:RELATED_TO {
    type: "manages",
    since: date("2020-01-01")
}]->(b)
 
CREATE (c)-[:RELATED_TO {
    type: "follows",
    since: date("2023-05-15")  
}]->(d)
 
// Query requires property filtering:
MATCH (m)-[r:RELATED_TO {type: "manages"}]->(e)
RETURN m.name, e.name
 
// Problems:
// 1. No type-based filtering (slower)
// 2. No type-specific indexes
// 3. Unclear semantics
// 4. Property typos cause silent bugs

good-design.cypher

Good Design

// BEST PRACTICE: Specific types
// ===============================
 
// Distinct relationship types
CREATE (a)-[:MANAGES {
    since: date("2020-01-01")
}]->(b)
 
CREATE (c)-[:FOLLOWS {
    since: date("2023-05-15")
}]->(d)
 
// Query is type-based:
MATCH (m)-[:MANAGES]->(e)
RETURN m.name, e.name
 
// Benefits:
// 1. Type-based traversal (faster)
// 2. Clear, self-documenting model
// 3. Compile-time schema validation
// 4. Easier query optimization

When to Use Multiple Relationship Types

Create separate relationship types when: 1) The relationships have different semantic meanings, 2) You frequently query only one type, 3) The relationship properties differ significantly. Keep a single type when: 1) You usually query all connection types together, 2) The difference is a simple enumerable property, 3) You'd create hundreds of relationship types.

Properties on Relationships

One of the most powerful features of property graphs—and a key differentiator from basic graph models—is the ability to attach properties directly to relationships. This enables rich contextual data about connections without intermediate nodes.

What to Store on Relationships:

Relationship Property Categories
Category	Examples	When to Use
Temporal Context	`since`, `until`, `lastInteraction`	When the relationship has a time dimension
Quantitative Measures	`weight`, `strength`, `score`, `distance`	When connections have numeric intensity
Transactional Data	`price`, `quantity`, `orderId`	When the relationship represents an event
Qualitative Descriptors	`role`, `type`, `context`	When relationships are subcategorized
Computational Results	`similarity`, `confidence`, `correlation`	When relationships are algorithmically derived

relationship-properties-examples.cypher

Cypher

// RELATIONSHIP PROPERTIES IN PRACTICE
// ====================================
 
// 1. SOCIAL NETWORK - Connection Strength
CREATE (alice:Person {name: "Alice"})-[:KNOWS {
    since: date("2015-09-01"),
    context: "college_roommate",
    interactionFrequency: "weekly",
    trustScore: 0.95
}]->(bob:Person {name: "Bob"})
 
// 2. E-COMMERCE - Purchase Transaction
CREATE (customer:Customer {id: "C-001"})-[:PURCHASED {
    transactionId: "TXN-2024-001",
    purchasedAt: datetime("2024-01-15T14:30:00"),
    quantity: 2,
    unitPrice: 29.99,
    discount: 0.10,
    paymentMethod: "credit_card"
}]->(product:Product {sku: "WIDGET-A"})
 
// 3. RECOMMENDATION - Computed Similarity
CREATE (p1:Product {name: "Running Shoes"})-[:SIMILAR_TO {
    algorithm: "collaborative_filtering",
    similarityScore: 0.87,
    computedAt: datetime("2024-01-01"),
    basedOnPurchases: 15420,
    confidence: 0.92
}]->(p2:Product {name: "Running Socks"})
 
// 4. AUTHORIZATION - Access Permission
CREATE (user:User {email: "dev@company.com"})-[:HAS_ACCESS {
    role: "editor",
    grantedBy: "admin@company.com",
    grantedAt: datetime("2023-06-15"),
    expiresAt: datetime("2024-06-15"),
    permissions: ["read", "write", "delete"],
    auditRequired: true
}]->(resource:Document {id: "DOC-SECRET-001"})
 
// 5. COLLABORATION - Weighted Contribution
CREATE (author1:Author {name: "Dr. Smith"})-[:CO_AUTHORED {
    paperDoi: "10.1234/science.2024.001",
    contributionPercentage: 45,
    role: "lead_author",
    sections: ["Introduction", "Methods", "Conclusion"],
    correspondingAuthor: true
}]->(paper:Paper {title: "Breakthrough Research"})
 
// Query using relationship properties:
// Find strong connections in social network
MATCH (a:Person)-[k:KNOWS]->(b:Person)
WHERE k.trustScore > 0.8 AND k.since < date() - duration('P5Y')
RETURN a.name, b.name, k.context
ORDER BY k.trustScore DESC

Node vs Relationship Properties Decision

Put on Node if: The property describes the entity itself and is independent of any relationship. Put on Relationship if: The property describes the connection and might differ for connections to different nodes (e.g., Alice's role at Company A differs from her role at Company B). Create Intermediate Node if: You need to connect the relationship itself to other nodes (e.g., a Purchase node that connects to Payment and Shipment nodes).

Handling Complex Scenarios

Real-world modeling often encounters scenarios that don't fit neatly into simple node-relationship patterns. Here's how to handle common challenges:

complex-scenarios.cypher

Cypher

// SCENARIO 1: HYPEREDGES (N-ary Relationships)
// ==============================================
// Problem: A meeting connects MULTIPLE people simultaneously
// (Not just pairs)
 
// Solution: Intermediate "Event" node
CREATE (meeting:Meeting {
    id: "MTG-001",
    title: "Q4 Planning",
    startTime: datetime("2024-01-15T10:00:00"),
    duration: duration("PT2H")
})
 
CREATE (alice:Person {name: "Alice"})-[:ATTENDED {role: "organizer"}]->(meeting)
CREATE (bob:Person {name: "Bob"})-[:ATTENDED {role: "participant"}]->(meeting)
CREATE (carol:Person {name: "Carol"})-[:ATTENDED {role: "participant"}]->(meeting)
CREATE (meeting)-[:HELD_IN]->(room:Room {name: "Conference Room A"})
 
// Query: Who attended meetings with Alice?
MATCH (alice:Person {name: "Alice"})-[:ATTENDED]->(m:Meeting)<-[:ATTENDED]-(other:Person)
RETURN DISTINCT other.name
 
// SCENARIO 2: TIME-VARYING RELATIONSHIPS
// =======================================
// Problem: Employment history - same person, same company, multiple periods
 
// Solution: Multiple relationships with temporal properties
CREATE (alice:Person {name: "Alice"})
CREATE (techcorp:Company {name: "TechCorp"})
 
CREATE (alice)-[:EMPLOYED_BY {
    startDate: date("2015-03-01"),
    endDate: date("2018-06-30"),
    role: "Junior Developer",
    department: "Engineering"
}]->(techcorp)
 
CREATE (alice)-[:EMPLOYED_BY {
    startDate: date("2021-09-01"),
    endDate: null,  // Current employment
    role: "Engineering Manager",
    department: "Platform"
}]->(techcorp)
 
// Query: Current employers
MATCH (p:Person)-[e:EMPLOYED_BY]->(c:Company)
WHERE e.endDate IS NULL
RETURN p.name, c.name, e.role
 
// SCENARIO 3: VERSIONED DATA
// ==========================
// Problem: Track historical product prices
 
// Solution A: Relationship-based versioning
CREATE (product:Product {sku: "WIDGET-A", name: "Widget A"})
CREATE (product)-[:HAS_PRICE {
    amount: 29.99,
    currency: "USD",
    effectiveFrom: date("2023-01-01"),
    effectiveTo: date("2023-06-30")
}]->(p1:PriceRecord)
CREATE (product)-[:HAS_PRICE {
    amount: 34.99,
    currency: "USD",
    effectiveFrom: date("2023-07-01"),
    effectiveTo: null
}]->(p2:PriceRecord)
 
// Solution B: Event sourcing pattern
CREATE (product)-[:PRICE_CHANGED {
    newPrice: 34.99,
    oldPrice: 29.99,
    changedAt: datetime("2023-07-01T00:00:00"),
    changedBy: "pricing_service",
    reason: "cost_increase"
}]->(priceEvent:PriceChange)
 
// SCENARIO 4: SELF-REFERENTIAL HIERARCHIES
// ========================================
// Problem: Organizational hierarchy, category trees, bill of materials
 
CREATE (ceo:Employee {name: "Alice", title: "CEO"})
CREATE (vp1:Employee {name: "Bob", title: "VP Engineering"})
CREATE (vp2:Employee {name: "Carol", title: "VP Sales"})
CREATE (director:Employee {name: "Dave", title: "Engineering Director"})
CREATE (engineer:Employee {name: "Eve", title: "Senior Engineer"})
 
CREATE (vp1)-[:REPORTS_TO]->(ceo)
CREATE (vp2)-[:REPORTS_TO]->(ceo)
CREATE (director)-[:REPORTS_TO]->(vp1)
CREATE (engineer)-[:REPORTS_TO]->(director)
 
// Query: All people in Eve's management chain
MATCH (eve:Employee {name: "Eve"})-[:REPORTS_TO*]->(manager)
RETURN manager.name, manager.title
 
// Query: All direct and indirect reports of Bob
MATCH (bob:Employee {name: "Bob"})<-[:REPORTS_TO*]-(report)
RETURN report.name, report.title

Intermediate Nodes: Power and Responsibility

Intermediate nodes (reifying relationships as nodes) add modeling flexibility but increase traversal cost. Every hop costs performance. Use intermediate nodes when you need to: connect a relationship to other nodes, version relationship history, or model true n-ary associations. Avoid over-engineering—simple direct relationships are usually sufficient.

Summary: Nodes and Edges

We've comprehensively examined the building blocks of graph databases. Let's consolidate the essential insights:

Key Takeaways

•Nodes hold entity data — Identity, labels, and properties combine to represent discrete entities. Labels enable categorization and query targeting; properties store actual data.
•Multi-labeling is powerful — Unlike relational tables, nodes can have multiple labels (:Person:Employee:Manager), enabling flexible, overlapping categorization without schema changes.
•Indexes are label-scoped — Create indexes on frequently queried properties within specific labels. Composite indexes optimize multi-property filters.
•Relationships are typed and directed — Every relationship has exactly one type and stored direction. Types should be verbs (:PURCHASED, :MANAGES) using specific semantics.
•Relationship properties are essential — Store temporal, quantitative, and contextual data on relationships. Properties describe the connection, not the endpoints.
•Design for queries, not normalization — Unlike relational modeling, graph design optimizes for traversal patterns. Explicit relationships beat derived JOINs.
•Intermediate nodes handle complexity — For n-ary relationships, versioning, or when relationships need their own connections, promote relationships to nodes.

What's next:

With nodes and edges understood, we'll explore a concrete implementation. The next page examines Neo4j—the most popular graph database—including its architecture, Cypher query language, and practical examples of graph operations.

Page Complete

You now have expert-level understanding of nodes and edges—their structure, property systems, label strategies, relationship type design, indexing approaches, and handling of complex scenarios. Next, we'll apply this knowledge using Neo4j.

Nodes and Edges

The Atoms of Graph Data

Mastering nodes and edges isn't about learning what they are—that's trivial. It's about developing the judgment to use them effectively across diverse graph modeling challenges.

What You Will Learn

Nodes In Depth

Anatomy of a Node:

Every node consists of three core components:

Identity: A unique, immutable identifier assigned by the database. This internal ID enables O(1) lookups and relationship pointers.
Labels: Zero or more type classifications that categorize the node. Labels serve as the primary mechanism for organizing and querying nodes.
Properties: A map of key-value pairs storing the node's attributes. Properties hold the actual data about the entity.

node-anatomy.txt
NODE STRUCTURE ANATOMY
======================
 
┌─────────────────────────────────────────────────────────────────────┐
│                         NODE INSTANCE                                │
├─────────────────────────────────────────────────────────────────────┤
│  IDENTITY                                                            │
│  ─────────                                                           │
│  Internal ID: 4298731 (database-assigned, immutable, reusable)       │
│  Element ID: "4:abc123:4298731" (Neo4j 5.x external-safe ID)         │
│                                                                      │
│  LABELS                                                              │
│  ──────                                                              │
│  Primary:    :Person                                                 │
│  Secondary:  :Employee, :Engineer, :TeamLead                         │
│                                                                      │
│  PROPERTIES                                                          │
│  ──────────                                                          │
│  {                                                                   │
│    "name": "Alice Chen",           // String                         │
│    "email": "alice@techcorp.com",  // String (indexed, unique)       │
│    "age": 32,                      // Integer                        │
│    "salary": 145000.00,            // Float                          │
│    "isActive": true,               // Boolean                        │
│    "skills": ["Python", "Go"],     // List<String>                   │
│    "hiredDate": date("2019-03-15"),// Temporal                       │
│    "location": point({             // Spatial                        │
│        latitude: 37.7749,                                            │
│        longitude: -122.4194                                          │
│    }),                                                               │
│    "metadata": {                   // Map (nested)                   │
│        "source": "HR_IMPORT",                                        │
│        "version": 3                                                  │
│    }                                                                 │
│  }                                                                   │
└─────────────────────────────────────────────────────────────────────┘

Internal vs External IDs

Label Design Strategies

Label Functions:

Query Targeting: MATCH (p:Person) scans only Person-labeled nodes, not the entire graph
Index Scoping: Indexes are label-specific, enabling efficient property lookups within categories
Constraint Binding: Uniqueness and existence constraints apply per label
Schema Documentation: Labels communicate domain semantics to developers

Label Design Patterns
Pattern	Example	Use Case	Trade-offs
Single Primary Label	`:Person`, `:Product`, `:Order`	Core entity types with distinct schemas	Simple, clear; may miss cross-cutting concerns
Hierarchical Labels	`:Animal`, `:Mammal`, `:Dog`	Taxonomy with inheritance semantics	Flexible queries at any level; redundant storage
Role-Based Labels	`:User`, `:Admin`, `:Moderator`	Same entity acting in different capacities	Query by role; nodes may accumulate many labels
State Labels	`:Order:Pending`, `:Order:Shipped`	Entity lifecycle stages	Fast state queries; requires label updates on transitions
Capability Labels	`:Searchable`, `:Cacheable`	Cross-cutting technical concerns	Infrastructure queries; mixes domain and technical concerns

label-patterns.cypher

Cypher

// MULTI-LABEL STRATEGY: E-COMMERCE USER ROLES
// =============================================
 
// Creating a user with multiple roles
CREATE (u:User:Customer:PremiumMember {
    userId: "USR-001",
    email: "alice@example.com",
    name: "Alice Chen",
    memberSince: date("2020-01-15"),
    premiumTier: "Gold"
})
 
// Later, user becomes a seller too
MATCH (u:User {userId: "USR-001"})
SET u:Seller
 
// Query by any applicable label:
// - All users (broad)
MATCH (u:User) RETURN count(u)
 
// - Only customers who can purchase
MATCH (c:Customer) RETURN c.name
 
// - Premium members for special offers
MATCH (p:PremiumMember) RETURN p.email
 
// - Sellers for marketplace analytics
MATCH (s:Seller) RETURN s.userId
 
// =============================================
// HIERARCHICAL LABELS: CONTENT MANAGEMENT
// =============================================
 
// Base content type with specializations
CREATE (a:Content:Article:NewsArticle {
    contentId: "ART-2024-001",
    title: "Breaking Technology News",
    publishedAt: datetime(),
    category: "Technology"
})
 
CREATE (v:Content:Article:OpinionPiece {
    contentId: "ART-2024-002", 
    title: "Why AI Matters",
    author: "Tech Columnist"
})
 
CREATE (m:Content:Media:Video {
    contentId: "VID-2024-001",
    title: "Product Demo",
    duration: 300
})
 
// Query at different hierarchy levels:
// All content for sitemap
MATCH (c:Content) RETURN c.contentId
 
// Only articles for reading list
MATCH (a:Article) RETURN a.title
 
// Only opinion pieces for editor review
MATCH (o:OpinionPiece) RETURN o.author, o.title

Label Best Practices

Property Types and Indexing

Supported Property Types (Neo4j as reference):

Property Data Types
Category	Types	Examples	Indexable
Numeric	Integer, Float	`42`, `3.14159`, `-273`	Yes (range queries)
Text	String	`"Alice"`, `"product-sku-123"`	Yes (exact, prefix, full-text)
Boolean	Boolean	`true`, `false`	Yes (exact match)
Temporal	Date, Time, DateTime, Duration	`date('2024-01-15')`, `duration('P1Y2M')`	Yes (range queries)
Spatial	Point (2D/3D, Cartesian/WGS84)	`point({lat: 37.77, lon: -122.41})`	Yes (distance, bounding box)
Collections	List (homogeneous)	`['red', 'blue', 'green']`, `[1, 2, 3]`	Partial (contains checks)
Nested	Map (limited)	`{key: 'value', nested: {a: 1}}`	No direct indexing

indexing-strategies.cypher

Cypher

// INDEX TYPES AND THEIR PURPOSES
// ================================
 
// 1. RANGE INDEX (default since Neo4j 5.0)
// Used for: Exact lookups, prefix searches, range queries
CREATE INDEX person_email_idx FOR (p:Person) ON (p.email)
 
// Query patterns supported:
MATCH (p:Person) WHERE p.email = 'alice@example.com' RETURN p   // Exact
MATCH (p:Person) WHERE p.email STARTS WITH 'alice' RETURN p     // Prefix
MATCH (p:Person) WHERE p.age > 25 AND p.age < 35 RETURN p       // Range
 
// 2. COMPOSITE INDEX (multiple properties)
// Used for: Queries filtering on multiple properties together
CREATE INDEX person_composite_idx FOR (p:Person) ON (p.country, p.city)
 
// Efficient for:
MATCH (p:Person) WHERE p.country = 'USA' AND p.city = 'NYC' RETURN p
MATCH (p:Person) WHERE p.country = 'USA' RETURN p  // Uses index (left-prefix)
 
// NOT efficient for:
MATCH (p:Person) WHERE p.city = 'NYC' RETURN p  // Ignores index
 
// 3. FULL-TEXT INDEX (Lucene-backed)
// Used for: Natural language search, fuzzy matching
CREATE FULLTEXT INDEX product_search_idx 
FOR (p:Product) 
ON EACH [p.name, p.description]
 
// Query with full-text search:
CALL db.index.fulltext.queryNodes("product_search_idx", "wireless bluetooth")
YIELD node, score
RETURN node.name, score
ORDER BY score DESC
 
// 4. POINT INDEX (spatial)
// Used for: Distance queries, bounding box searches
CREATE POINT INDEX location_point_idx FOR (l:Location) ON (l.coordinates)
 
// Find locations within 10km
MATCH (l:Location)
WHERE point.distance(l.coordinates, point({lat: 37.77, lon: -122.41})) < 10000
RETURN l.name
 
// 5. TOKEN LOOKUP INDEX (label-based)
// Used for: Label scans, existence checks
CREATE LOOKUP INDEX node_label_idx FOR (n) ON EACH labels(n)
 
// Fast label existence check:
MATCH (n:RareLabel) RETURN count(n)
 
// ================================
// CONSTRAINT TYPES
// ================================
 
// Unique constraint (creates index automatically)
CREATE CONSTRAINT person_email_unique 
FOR (p:Person) 
REQUIRE p.email IS UNIQUE
 
// Node key constraint (uniqueness + existence for composite key)
CREATE CONSTRAINT order_key 
FOR (o:Order) 
REQUIRE (o.orderId, o.region) IS NODE KEY
 
// Property existence constraint
CREATE CONSTRAINT person_name_exists 
FOR (p:Person) 
REQUIRE p.name IS NOT NULL
 
// Property type constraint (Neo4j 5.9+)
CREATE CONSTRAINT person_age_type 
FOR (p:Person) 
REQUIRE p.age IS :: INTEGER

Index Selection Strategy

Relationships (Edges) In Depth

Relationship Anatomy:

Every relationship consists of:

Identity: Unique internal ID (like nodes)
Type: A single mandatory label classifying the relationship
Direction: Start node → End node (always directed in storage)
Properties: Key-value pairs describing the relationship
Endpoints: References to start and end nodes

relationship-anatomy.txt
RELATIONSHIP STRUCTURE ANATOMY
==============================
 
┌─────────────────────────────────────────────────────────────────────┐
│                      RELATIONSHIP INSTANCE                           │
├─────────────────────────────────────────────────────────────────────┤
│  IDENTITY                                                            │
│  ─────────                                                           │
│  Internal ID: 89234571                                               │
│                                                                      │
│  TYPE (MANDATORY, SINGLE)                                            │
│  ────                                                                │
│  :WORKS_FOR                                                          │
│                                                                      │
│  DIRECTION                                                           │
│  ─────────                                                           │
│  Start Node: (Person:4298731) ──────────►  End Node: (Company:892)  │
│         "Alice Chen"                           "TechCorp"            │
│                                                                      │
│  PROPERTIES                                                          │
│  ──────────                                                          │
│  {                                                                   │
│    "since": date("2019-03-15"),      // When relationship started   │
│    "role": "Senior Engineer",         // Current position            │
│    "department": "Platform",          // Organizational unit         │
│    "salary": 145000.00,               // Relationship-specific data  │
│    "isRemote": true,                  // Work arrangement            │
│    "performanceRating": 4.5           // Annual review score         │
│  }                                                                   │
└─────────────────────────────────────────────────────────────────────┘
 
VISUAL REPRESENTATION:
                                               
   ┌──────────────┐       :WORKS_FOR         ┌──────────────┐
   │    Alice     │  ──────────────────────► │   TechCorp   │
   │   :Person    │   since: 2019-03-15      │   :Company   │
   └──────────────┘   role: Sr. Engineer     └──────────────┘
                       salary: 145000
 
NOTE: Direction is always stored, but queries can traverse either way:
      (alice)-[:WORKS_FOR]->(company)   // Follow direction
      (alice)<-[:EMPLOYS]-(company)     // Reverse direction  
      (alice)-[:WORKS_FOR]-(company)    // Either direction

Key Differences from Nodes:

Single Type Requirement: Nodes can have multiple labels; relationships have exactly ONE type. This reflects their semantic specificity—a connection is ONE kind of relationship.
Mandatory Direction: Relationships are always stored with direction, though queries can ignore it. This enables asymmetric relationship modeling (followers vs following).
No Label Indices: Relationship types aren't indexed like node labels. Type filtering happens during traversal, not via index lookup. This affects query planning.
Endpoint Coupling: Relationships exist only between nodes. Deleting a node deletes all attached relationships (cascading delete is automatic).

Direction Matters for Performance

Relationship Type Design

Relationship types are the semantic backbone of your graph model. Well-designed types make queries expressive and efficient; poor choices lead to property filtering and complex workarounds.

Design Principles:

Relationship Type Best Practices

•Use verbs, not nouns — :MANAGES, :PURCHASED, :AUTHORED not :Manager, :Purchase, :Authorship
•Be specific over generic — :PURCHASED, :VIEWED, :WISHLISTED not simply :INTERACTED_WITH
•Use SCREAMING_SNAKE_CASE — :WORKS_FOR, :REPORTED_BY following Neo4j conventions
•Direction should be natural — :FOLLOWS points from follower to followed, :MANAGES from manager to report
•Avoid relationship explosion — Don't encode property values in types (:RATED_5_STARS); use properties instead
•Consider query patterns — If you frequently filter by a property, consider whether it should be a separate relationship type

poor-design.cypher

Poor Design

// ANTIPATTERN: Generic relationships
// ==================================
 
// Everything is RELATED_TO
CREATE (a)-[:RELATED_TO {
    type: "manages",
    since: date("2020-01-01")
}]->(b)
 
CREATE (c)-[:RELATED_TO {
    type: "follows",
    since: date("2023-05-15")  
}]->(d)
 
// Query requires property filtering:
MATCH (m)-[r:RELATED_TO {type: "manages"}]->(e)
RETURN m.name, e.name
 
// Problems:
// 1. No type-based filtering (slower)
// 2. No type-specific indexes
// 3. Unclear semantics
// 4. Property typos cause silent bugs

good-design.cypher

Good Design

// BEST PRACTICE: Specific types
// ===============================
 
// Distinct relationship types
CREATE (a)-[:MANAGES {
    since: date("2020-01-01")
}]->(b)
 
CREATE (c)-[:FOLLOWS {
    since: date("2023-05-15")
}]->(d)
 
// Query is type-based:
MATCH (m)-[:MANAGES]->(e)
RETURN m.name, e.name
 
// Benefits:
// 1. Type-based traversal (faster)
// 2. Clear, self-documenting model
// 3. Compile-time schema validation
// 4. Easier query optimization

When to Use Multiple Relationship Types

Properties on Relationships

What to Store on Relationships:

Relationship Property Categories
Category	Examples	When to Use
Temporal Context	`since`, `until`, `lastInteraction`	When the relationship has a time dimension
Quantitative Measures	`weight`, `strength`, `score`, `distance`	When connections have numeric intensity
Transactional Data	`price`, `quantity`, `orderId`	When the relationship represents an event
Qualitative Descriptors	`role`, `type`, `context`	When relationships are subcategorized
Computational Results	`similarity`, `confidence`, `correlation`	When relationships are algorithmically derived

relationship-properties-examples.cypher

Cypher

// RELATIONSHIP PROPERTIES IN PRACTICE
// ====================================
 
// 1. SOCIAL NETWORK - Connection Strength
CREATE (alice:Person {name: "Alice"})-[:KNOWS {
    since: date("2015-09-01"),
    context: "college_roommate",
    interactionFrequency: "weekly",
    trustScore: 0.95
}]->(bob:Person {name: "Bob"})
 
// 2. E-COMMERCE - Purchase Transaction
CREATE (customer:Customer {id: "C-001"})-[:PURCHASED {
    transactionId: "TXN-2024-001",
    purchasedAt: datetime("2024-01-15T14:30:00"),
    quantity: 2,
    unitPrice: 29.99,
    discount: 0.10,
    paymentMethod: "credit_card"
}]->(product:Product {sku: "WIDGET-A"})
 
// 3. RECOMMENDATION - Computed Similarity
CREATE (p1:Product {name: "Running Shoes"})-[:SIMILAR_TO {
    algorithm: "collaborative_filtering",
    similarityScore: 0.87,
    computedAt: datetime("2024-01-01"),
    basedOnPurchases: 15420,
    confidence: 0.92
}]->(p2:Product {name: "Running Socks"})
 
// 4. AUTHORIZATION - Access Permission
CREATE (user:User {email: "dev@company.com"})-[:HAS_ACCESS {
    role: "editor",
    grantedBy: "admin@company.com",
    grantedAt: datetime("2023-06-15"),
    expiresAt: datetime("2024-06-15"),
    permissions: ["read", "write", "delete"],
    auditRequired: true
}]->(resource:Document {id: "DOC-SECRET-001"})
 
// 5. COLLABORATION - Weighted Contribution
CREATE (author1:Author {name: "Dr. Smith"})-[:CO_AUTHORED {
    paperDoi: "10.1234/science.2024.001",
    contributionPercentage: 45,
    role: "lead_author",
    sections: ["Introduction", "Methods", "Conclusion"],
    correspondingAuthor: true
}]->(paper:Paper {title: "Breakthrough Research"})
 
// Query using relationship properties:
// Find strong connections in social network
MATCH (a:Person)-[k:KNOWS]->(b:Person)
WHERE k.trustScore > 0.8 AND k.since < date() - duration('P5Y')
RETURN a.name, b.name, k.context
ORDER BY k.trustScore DESC

Node vs Relationship Properties Decision

Handling Complex Scenarios

Real-world modeling often encounters scenarios that don't fit neatly into simple node-relationship patterns. Here's how to handle common challenges:

complex-scenarios.cypher

Cypher

// SCENARIO 1: HYPEREDGES (N-ary Relationships)
// ==============================================
// Problem: A meeting connects MULTIPLE people simultaneously
// (Not just pairs)
 
// Solution: Intermediate "Event" node
CREATE (meeting:Meeting {
    id: "MTG-001",
    title: "Q4 Planning",
    startTime: datetime("2024-01-15T10:00:00"),
    duration: duration("PT2H")
})
 
CREATE (alice:Person {name: "Alice"})-[:ATTENDED {role: "organizer"}]->(meeting)
CREATE (bob:Person {name: "Bob"})-[:ATTENDED {role: "participant"}]->(meeting)
CREATE (carol:Person {name: "Carol"})-[:ATTENDED {role: "participant"}]->(meeting)
CREATE (meeting)-[:HELD_IN]->(room:Room {name: "Conference Room A"})
 
// Query: Who attended meetings with Alice?
MATCH (alice:Person {name: "Alice"})-[:ATTENDED]->(m:Meeting)<-[:ATTENDED]-(other:Person)
RETURN DISTINCT other.name
 
// SCENARIO 2: TIME-VARYING RELATIONSHIPS
// =======================================
// Problem: Employment history - same person, same company, multiple periods
 
// Solution: Multiple relationships with temporal properties
CREATE (alice:Person {name: "Alice"})
CREATE (techcorp:Company {name: "TechCorp"})
 
CREATE (alice)-[:EMPLOYED_BY {
    startDate: date("2015-03-01"),
    endDate: date("2018-06-30"),
    role: "Junior Developer",
    department: "Engineering"
}]->(techcorp)
 
CREATE (alice)-[:EMPLOYED_BY {
    startDate: date("2021-09-01"),
    endDate: null,  // Current employment
    role: "Engineering Manager",
    department: "Platform"
}]->(techcorp)
 
// Query: Current employers
MATCH (p:Person)-[e:EMPLOYED_BY]->(c:Company)
WHERE e.endDate IS NULL
RETURN p.name, c.name, e.role
 
// SCENARIO 3: VERSIONED DATA
// ==========================
// Problem: Track historical product prices
 
// Solution A: Relationship-based versioning
CREATE (product:Product {sku: "WIDGET-A", name: "Widget A"})
CREATE (product)-[:HAS_PRICE {
    amount: 29.99,
    currency: "USD",
    effectiveFrom: date("2023-01-01"),
    effectiveTo: date("2023-06-30")
}]->(p1:PriceRecord)
CREATE (product)-[:HAS_PRICE {
    amount: 34.99,
    currency: "USD",
    effectiveFrom: date("2023-07-01"),
    effectiveTo: null
}]->(p2:PriceRecord)
 
// Solution B: Event sourcing pattern
CREATE (product)-[:PRICE_CHANGED {
    newPrice: 34.99,
    oldPrice: 29.99,
    changedAt: datetime("2023-07-01T00:00:00"),
    changedBy: "pricing_service",
    reason: "cost_increase"
}]->(priceEvent:PriceChange)
 
// SCENARIO 4: SELF-REFERENTIAL HIERARCHIES
// ========================================
// Problem: Organizational hierarchy, category trees, bill of materials
 
CREATE (ceo:Employee {name: "Alice", title: "CEO"})
CREATE (vp1:Employee {name: "Bob", title: "VP Engineering"})
CREATE (vp2:Employee {name: "Carol", title: "VP Sales"})
CREATE (director:Employee {name: "Dave", title: "Engineering Director"})
CREATE (engineer:Employee {name: "Eve", title: "Senior Engineer"})
 
CREATE (vp1)-[:REPORTS_TO]->(ceo)
CREATE (vp2)-[:REPORTS_TO]->(ceo)
CREATE (director)-[:REPORTS_TO]->(vp1)
CREATE (engineer)-[:REPORTS_TO]->(director)
 
// Query: All people in Eve's management chain
MATCH (eve:Employee {name: "Eve"})-[:REPORTS_TO*]->(manager)
RETURN manager.name, manager.title
 
// Query: All direct and indirect reports of Bob
MATCH (bob:Employee {name: "Bob"})<-[:REPORTS_TO*]-(report)
RETURN report.name, report.title

Intermediate Nodes: Power and Responsibility

Summary: Nodes and Edges

We've comprehensively examined the building blocks of graph databases. Let's consolidate the essential insights:

Key Takeaways

•Nodes hold entity data — Identity, labels, and properties combine to represent discrete entities. Labels enable categorization and query targeting; properties store actual data.
•Multi-labeling is powerful — Unlike relational tables, nodes can have multiple labels (:Person:Employee:Manager), enabling flexible, overlapping categorization without schema changes.
•Indexes are label-scoped — Create indexes on frequently queried properties within specific labels. Composite indexes optimize multi-property filters.
•Relationships are typed and directed — Every relationship has exactly one type and stored direction. Types should be verbs (:PURCHASED, :MANAGES) using specific semantics.
•Relationship properties are essential — Store temporal, quantitative, and contextual data on relationships. Properties describe the connection, not the endpoints.
•Design for queries, not normalization — Unlike relational modeling, graph design optimizes for traversal patterns. Explicit relationships beat derived JOINs.
•Intermediate nodes handle complexity — For n-ary relationships, versioning, or when relationships need their own connections, promote relationships to nodes.

What's next:

Page Complete