Loading content...
If the graph model is the blueprint, nodes and edges are the bricks and mortar. Every entity you store, every connection you represent, every query you write—all reduce to operations on these two fundamental structures.
But the simplicity is deceptive. How you design your node labels determines query efficiency. How you structure relationship types affects data modeling flexibility. How you distribute properties between nodes and edges impacts both storage and traversal performance.
Mastering nodes and edges isn't about learning what they are—that's trivial. It's about developing the judgment to use them effectively across diverse graph modeling challenges.
By the end of this page, you will understand nodes and edges at an expert level—their internal structure, property systems, labeling strategies, relationship type design, the critical role of schema constraints, and the practical patterns that distinguish professional graph modeling from naive approaches.
A node (or vertex) represents an entity in your domain—a person, product, transaction, document, location, or any discrete concept that participates in relationships. But nodes in property graph databases are far richer than simple vertices in mathematical graphs.
Anatomy of a Node:
Every node consists of three core components:
Identity: A unique, immutable identifier assigned by the database. This internal ID enables O(1) lookups and relationship pointers.
Labels: Zero or more type classifications that categorize the node. Labels serve as the primary mechanism for organizing and querying nodes.
Properties: A map of key-value pairs storing the node's attributes. Properties hold the actual data about the entity.
NODE STRUCTURE ANATOMY====================== ┌─────────────────────────────────────────────────────────────────────┐│ NODE INSTANCE │├─────────────────────────────────────────────────────────────────────┤│ IDENTITY ││ ───────── ││ Internal ID: 4298731 (database-assigned, immutable, reusable) ││ Element ID: "4:abc123:4298731" (Neo4j 5.x external-safe ID) ││ ││ LABELS ││ ────── ││ Primary: :Person ││ Secondary: :Employee, :Engineer, :TeamLead ││ ││ PROPERTIES ││ ────────── ││ { ││ "name": "Alice Chen", // String ││ "email": "alice@techcorp.com", // String (indexed, unique) ││ "age": 32, // Integer ││ "salary": 145000.00, // Float ││ "isActive": true, // Boolean ││ "skills": ["Python", "Go"], // List<String> ││ "hiredDate": date("2019-03-15"),// Temporal ││ "location": point({ // Spatial ││ latitude: 37.7749, ││ longitude: -122.4194 ││ }), ││ "metadata": { // Map (nested) ││ "source": "HR_IMPORT", ││ "version": 3 ││ } ││ } │└─────────────────────────────────────────────────────────────────────┘Internal node IDs are for database use only. They may be recycled after node deletion (Neo4j) or change during database operations (some implementations). Never expose internal IDs to external systems or use them as foreign keys in other databases. Instead, create explicit identity properties (e.g., userId, productSku) and index them for lookups.
Labels are the graph equivalent of table names—but with crucial differences. A node can have multiple labels simultaneously, enabling flexible, overlapping categorization that would require complex joins in relational systems.
Label Functions:
MATCH (p:Person) scans only Person-labeled nodes, not the entire graph| Pattern | Example | Use Case | Trade-offs |
|---|---|---|---|
| Single Primary Label | :Person, :Product, :Order | Core entity types with distinct schemas | Simple, clear; may miss cross-cutting concerns |
| Hierarchical Labels | :Animal, :Mammal, :Dog | Taxonomy with inheritance semantics | Flexible queries at any level; redundant storage |
| Role-Based Labels | :User, :Admin, :Moderator | Same entity acting in different capacities | Query by role; nodes may accumulate many labels |
| State Labels | :Order:Pending, :Order:Shipped | Entity lifecycle stages | Fast state queries; requires label updates on transitions |
| Capability Labels | :Searchable, :Cacheable | Cross-cutting technical concerns | Infrastructure queries; mixes domain and technical concerns |
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162
// MULTI-LABEL STRATEGY: E-COMMERCE USER ROLES// ============================================= // Creating a user with multiple rolesCREATE (u:User:Customer:PremiumMember { userId: "USR-001", email: "alice@example.com", name: "Alice Chen", memberSince: date("2020-01-15"), premiumTier: "Gold"}) // Later, user becomes a seller tooMATCH (u:User {userId: "USR-001"})SET u:Seller // Query by any applicable label:// - All users (broad)MATCH (u:User) RETURN count(u) // - Only customers who can purchaseMATCH (c:Customer) RETURN c.name // - Premium members for special offersMATCH (p:PremiumMember) RETURN p.email // - Sellers for marketplace analyticsMATCH (s:Seller) RETURN s.userId // =============================================// HIERARCHICAL LABELS: CONTENT MANAGEMENT// ============================================= // Base content type with specializationsCREATE (a:Content:Article:NewsArticle { contentId: "ART-2024-001", title: "Breaking Technology News", publishedAt: datetime(), category: "Technology"}) CREATE (v:Content:Article:OpinionPiece { contentId: "ART-2024-002", title: "Why AI Matters", author: "Tech Columnist"}) CREATE (m:Content:Media:Video { contentId: "VID-2024-001", title: "Product Demo", duration: 300}) // Query at different hierarchy levels:// All content for sitemapMATCH (c:Content) RETURN c.contentId // Only articles for reading listMATCH (a:Article) RETURN a.title // Only opinion pieces for editor reviewMATCH (o:OpinionPiece) RETURN o.author, o.title1. Use nouns, not adjectives — :Person not :Personal. 2. Prefer singular form — :Product not :Products. 3. Use PascalCase — :ShoppingCart not :shopping_cart. 4. Limit to 4-5 labels per node — Too many labels slow writes and complicate queries. 5. Avoid labels for boolean states — Use properties for simple flags; reserve labels for query-critical categorization.
Properties store the actual data within nodes (and edges). Unlike schemaless document databases where properties can contain arbitrary structures, graph database properties are typically scalar values or homogeneous lists—a constraint that enables efficient indexing and comparison operations.
Supported Property Types (Neo4j as reference):
| Category | Types | Examples | Indexable |
|---|---|---|---|
| Numeric | Integer, Float | 42, 3.14159, -273 | Yes (range queries) |
| Text | String | "Alice", "product-sku-123" | Yes (exact, prefix, full-text) |
| Boolean | Boolean | true, false | Yes (exact match) |
| Temporal | Date, Time, DateTime, Duration | date('2024-01-15'), duration('P1Y2M') | Yes (range queries) |
| Spatial | Point (2D/3D, Cartesian/WGS84) | point({lat: 37.77, lon: -122.41}) | Yes (distance, bounding box) |
| Collections | List (homogeneous) | ['red', 'blue', 'green'], [1, 2, 3] | Partial (contains checks) |
| Nested | Map (limited) | {key: 'value', nested: {a: 1}} | No direct indexing |
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374
// INDEX TYPES AND THEIR PURPOSES// ================================ // 1. RANGE INDEX (default since Neo4j 5.0)// Used for: Exact lookups, prefix searches, range queriesCREATE INDEX person_email_idx FOR (p:Person) ON (p.email) // Query patterns supported:MATCH (p:Person) WHERE p.email = 'alice@example.com' RETURN p // ExactMATCH (p:Person) WHERE p.email STARTS WITH 'alice' RETURN p // PrefixMATCH (p:Person) WHERE p.age > 25 AND p.age < 35 RETURN p // Range // 2. COMPOSITE INDEX (multiple properties)// Used for: Queries filtering on multiple properties togetherCREATE INDEX person_composite_idx FOR (p:Person) ON (p.country, p.city) // Efficient for:MATCH (p:Person) WHERE p.country = 'USA' AND p.city = 'NYC' RETURN pMATCH (p:Person) WHERE p.country = 'USA' RETURN p // Uses index (left-prefix) // NOT efficient for:MATCH (p:Person) WHERE p.city = 'NYC' RETURN p // Ignores index // 3. FULL-TEXT INDEX (Lucene-backed)// Used for: Natural language search, fuzzy matchingCREATE FULLTEXT INDEX product_search_idx FOR (p:Product) ON EACH [p.name, p.description] // Query with full-text search:CALL db.index.fulltext.queryNodes("product_search_idx", "wireless bluetooth")YIELD node, scoreRETURN node.name, scoreORDER BY score DESC // 4. POINT INDEX (spatial)// Used for: Distance queries, bounding box searchesCREATE POINT INDEX location_point_idx FOR (l:Location) ON (l.coordinates) // Find locations within 10kmMATCH (l:Location)WHERE point.distance(l.coordinates, point({lat: 37.77, lon: -122.41})) < 10000RETURN l.name // 5. TOKEN LOOKUP INDEX (label-based)// Used for: Label scans, existence checksCREATE LOOKUP INDEX node_label_idx FOR (n) ON EACH labels(n) // Fast label existence check:MATCH (n:RareLabel) RETURN count(n) // ================================// CONSTRAINT TYPES// ================================ // Unique constraint (creates index automatically)CREATE CONSTRAINT person_email_unique FOR (p:Person) REQUIRE p.email IS UNIQUE // Node key constraint (uniqueness + existence for composite key)CREATE CONSTRAINT order_key FOR (o:Order) REQUIRE (o.orderId, o.region) IS NODE KEY // Property existence constraintCREATE CONSTRAINT person_name_exists FOR (p:Person) REQUIRE p.name IS NOT NULL // Property type constraint (Neo4j 5.9+)CREATE CONSTRAINT person_age_type FOR (p:Person) REQUIRE p.age IS :: INTEGERCreate indexes based on query patterns, not data structure. Profile your common queries with EXPLAIN and PROFILE commands. Indexes accelerate reads but slow writes—every indexed property requires index updates on modification. Rule of thumb: index properties used in WHERE clauses of frequent queries, especially those filtering large portions of the graph.
If nodes are the nouns of your graph, relationships are the verbs—they express how entities connect, interact, and relate. In property graph databases, relationships are first-class citizens with their own identities, types, and properties.
Relationship Anatomy:
Every relationship consists of:
RELATIONSHIP STRUCTURE ANATOMY============================== ┌─────────────────────────────────────────────────────────────────────┐│ RELATIONSHIP INSTANCE │├─────────────────────────────────────────────────────────────────────┤│ IDENTITY ││ ───────── ││ Internal ID: 89234571 ││ ││ TYPE (MANDATORY, SINGLE) ││ ──── ││ :WORKS_FOR ││ ││ DIRECTION ││ ───────── ││ Start Node: (Person:4298731) ──────────► End Node: (Company:892) ││ "Alice Chen" "TechCorp" ││ ││ PROPERTIES ││ ────────── ││ { ││ "since": date("2019-03-15"), // When relationship started ││ "role": "Senior Engineer", // Current position ││ "department": "Platform", // Organizational unit ││ "salary": 145000.00, // Relationship-specific data ││ "isRemote": true, // Work arrangement ││ "performanceRating": 4.5 // Annual review score ││ } │└─────────────────────────────────────────────────────────────────────┘ VISUAL REPRESENTATION: ┌──────────────┐ :WORKS_FOR ┌──────────────┐ │ Alice │ ──────────────────────► │ TechCorp │ │ :Person │ since: 2019-03-15 │ :Company │ └──────────────┘ role: Sr. Engineer └──────────────┘ salary: 145000 NOTE: Direction is always stored, but queries can traverse either way: (alice)-[:WORKS_FOR]->(company) // Follow direction (alice)<-[:EMPLOYS]-(company) // Reverse direction (alice)-[:WORKS_FOR]-(company) // Either directionKey Differences from Nodes:
Single Type Requirement: Nodes can have multiple labels; relationships have exactly ONE type. This reflects their semantic specificity—a connection is ONE kind of relationship.
Mandatory Direction: Relationships are always stored with direction, though queries can ignore it. This enables asymmetric relationship modeling (followers vs following).
No Label Indices: Relationship types aren't indexed like node labels. Type filtering happens during traversal, not via index lookup. This affects query planning.
Endpoint Coupling: Relationships exist only between nodes. Deleting a node deletes all attached relationships (cascading delete is automatic).
While queries can traverse relationships in either direction, storage is directional. Neo4j stores relationships in a doubly-linked list from both endpoints, so direction doesn't affect traversal speed. However, direction affects semantics—:FOLLOWS, :MANAGES, and :PURCHASED have natural directions. Model them correctly to keep queries intuitive and maintainable.
Relationship types are the semantic backbone of your graph model. Well-designed types make queries expressive and efficient; poor choices lead to property filtering and complex workarounds.
Design Principles:
:MANAGES, :PURCHASED, :AUTHORED not :Manager, :Purchase, :Authorship:PURCHASED, :VIEWED, :WISHLISTED not simply :INTERACTED_WITH:WORKS_FOR, :REPORTED_BY following Neo4j conventions:FOLLOWS points from follower to followed, :MANAGES from manager to report:RATED_5_STARS); use properties instead1234567891011121314151617181920212223
// ANTIPATTERN: Generic relationships// ================================== // Everything is RELATED_TOCREATE (a)-[:RELATED_TO { type: "manages", since: date("2020-01-01")}]->(b) CREATE (c)-[:RELATED_TO { type: "follows", since: date("2023-05-15") }]->(d) // Query requires property filtering:MATCH (m)-[r:RELATED_TO {type: "manages"}]->(e)RETURN m.name, e.name // Problems:// 1. No type-based filtering (slower)// 2. No type-specific indexes// 3. Unclear semantics// 4. Property typos cause silent bugs123456789101112131415161718192021
// BEST PRACTICE: Specific types// =============================== // Distinct relationship typesCREATE (a)-[:MANAGES { since: date("2020-01-01")}]->(b) CREATE (c)-[:FOLLOWS { since: date("2023-05-15")}]->(d) // Query is type-based:MATCH (m)-[:MANAGES]->(e)RETURN m.name, e.name // Benefits:// 1. Type-based traversal (faster)// 2. Clear, self-documenting model// 3. Compile-time schema validation// 4. Easier query optimizationCreate separate relationship types when: 1) The relationships have different semantic meanings, 2) You frequently query only one type, 3) The relationship properties differ significantly. Keep a single type when: 1) You usually query all connection types together, 2) The difference is a simple enumerable property, 3) You'd create hundreds of relationship types.
One of the most powerful features of property graphs—and a key differentiator from basic graph models—is the ability to attach properties directly to relationships. This enables rich contextual data about connections without intermediate nodes.
What to Store on Relationships:
| Category | Examples | When to Use |
|---|---|---|
| Temporal Context | since, until, lastInteraction | When the relationship has a time dimension |
| Quantitative Measures | weight, strength, score, distance | When connections have numeric intensity |
| Transactional Data | price, quantity, orderId | When the relationship represents an event |
| Qualitative Descriptors | role, type, context | When relationships are subcategorized |
| Computational Results | similarity, confidence, correlation | When relationships are algorithmically derived |
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455
// RELATIONSHIP PROPERTIES IN PRACTICE// ==================================== // 1. SOCIAL NETWORK - Connection StrengthCREATE (alice:Person {name: "Alice"})-[:KNOWS { since: date("2015-09-01"), context: "college_roommate", interactionFrequency: "weekly", trustScore: 0.95}]->(bob:Person {name: "Bob"}) // 2. E-COMMERCE - Purchase TransactionCREATE (customer:Customer {id: "C-001"})-[:PURCHASED { transactionId: "TXN-2024-001", purchasedAt: datetime("2024-01-15T14:30:00"), quantity: 2, unitPrice: 29.99, discount: 0.10, paymentMethod: "credit_card"}]->(product:Product {sku: "WIDGET-A"}) // 3. RECOMMENDATION - Computed SimilarityCREATE (p1:Product {name: "Running Shoes"})-[:SIMILAR_TO { algorithm: "collaborative_filtering", similarityScore: 0.87, computedAt: datetime("2024-01-01"), basedOnPurchases: 15420, confidence: 0.92}]->(p2:Product {name: "Running Socks"}) // 4. AUTHORIZATION - Access PermissionCREATE (user:User {email: "dev@company.com"})-[:HAS_ACCESS { role: "editor", grantedBy: "admin@company.com", grantedAt: datetime("2023-06-15"), expiresAt: datetime("2024-06-15"), permissions: ["read", "write", "delete"], auditRequired: true}]->(resource:Document {id: "DOC-SECRET-001"}) // 5. COLLABORATION - Weighted ContributionCREATE (author1:Author {name: "Dr. Smith"})-[:CO_AUTHORED { paperDoi: "10.1234/science.2024.001", contributionPercentage: 45, role: "lead_author", sections: ["Introduction", "Methods", "Conclusion"], correspondingAuthor: true}]->(paper:Paper {title: "Breakthrough Research"}) // Query using relationship properties:// Find strong connections in social networkMATCH (a:Person)-[k:KNOWS]->(b:Person)WHERE k.trustScore > 0.8 AND k.since < date() - duration('P5Y')RETURN a.name, b.name, k.contextORDER BY k.trustScore DESCPut on Node if: The property describes the entity itself and is independent of any relationship. Put on Relationship if: The property describes the connection and might differ for connections to different nodes (e.g., Alice's role at Company A differs from her role at Company B). Create Intermediate Node if: You need to connect the relationship itself to other nodes (e.g., a Purchase node that connects to Payment and Shipment nodes).
Real-world modeling often encounters scenarios that don't fit neatly into simple node-relationship patterns. Here's how to handle common challenges:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899
// SCENARIO 1: HYPEREDGES (N-ary Relationships)// ==============================================// Problem: A meeting connects MULTIPLE people simultaneously// (Not just pairs) // Solution: Intermediate "Event" nodeCREATE (meeting:Meeting { id: "MTG-001", title: "Q4 Planning", startTime: datetime("2024-01-15T10:00:00"), duration: duration("PT2H")}) CREATE (alice:Person {name: "Alice"})-[:ATTENDED {role: "organizer"}]->(meeting)CREATE (bob:Person {name: "Bob"})-[:ATTENDED {role: "participant"}]->(meeting)CREATE (carol:Person {name: "Carol"})-[:ATTENDED {role: "participant"}]->(meeting)CREATE (meeting)-[:HELD_IN]->(room:Room {name: "Conference Room A"}) // Query: Who attended meetings with Alice?MATCH (alice:Person {name: "Alice"})-[:ATTENDED]->(m:Meeting)<-[:ATTENDED]-(other:Person)RETURN DISTINCT other.name // SCENARIO 2: TIME-VARYING RELATIONSHIPS// =======================================// Problem: Employment history - same person, same company, multiple periods // Solution: Multiple relationships with temporal propertiesCREATE (alice:Person {name: "Alice"})CREATE (techcorp:Company {name: "TechCorp"}) CREATE (alice)-[:EMPLOYED_BY { startDate: date("2015-03-01"), endDate: date("2018-06-30"), role: "Junior Developer", department: "Engineering"}]->(techcorp) CREATE (alice)-[:EMPLOYED_BY { startDate: date("2021-09-01"), endDate: null, // Current employment role: "Engineering Manager", department: "Platform"}]->(techcorp) // Query: Current employersMATCH (p:Person)-[e:EMPLOYED_BY]->(c:Company)WHERE e.endDate IS NULLRETURN p.name, c.name, e.role // SCENARIO 3: VERSIONED DATA// ==========================// Problem: Track historical product prices // Solution A: Relationship-based versioningCREATE (product:Product {sku: "WIDGET-A", name: "Widget A"})CREATE (product)-[:HAS_PRICE { amount: 29.99, currency: "USD", effectiveFrom: date("2023-01-01"), effectiveTo: date("2023-06-30")}]->(p1:PriceRecord)CREATE (product)-[:HAS_PRICE { amount: 34.99, currency: "USD", effectiveFrom: date("2023-07-01"), effectiveTo: null}]->(p2:PriceRecord) // Solution B: Event sourcing patternCREATE (product)-[:PRICE_CHANGED { newPrice: 34.99, oldPrice: 29.99, changedAt: datetime("2023-07-01T00:00:00"), changedBy: "pricing_service", reason: "cost_increase"}]->(priceEvent:PriceChange) // SCENARIO 4: SELF-REFERENTIAL HIERARCHIES// ========================================// Problem: Organizational hierarchy, category trees, bill of materials CREATE (ceo:Employee {name: "Alice", title: "CEO"})CREATE (vp1:Employee {name: "Bob", title: "VP Engineering"})CREATE (vp2:Employee {name: "Carol", title: "VP Sales"})CREATE (director:Employee {name: "Dave", title: "Engineering Director"})CREATE (engineer:Employee {name: "Eve", title: "Senior Engineer"}) CREATE (vp1)-[:REPORTS_TO]->(ceo)CREATE (vp2)-[:REPORTS_TO]->(ceo)CREATE (director)-[:REPORTS_TO]->(vp1)CREATE (engineer)-[:REPORTS_TO]->(director) // Query: All people in Eve's management chainMATCH (eve:Employee {name: "Eve"})-[:REPORTS_TO*]->(manager)RETURN manager.name, manager.title // Query: All direct and indirect reports of BobMATCH (bob:Employee {name: "Bob"})<-[:REPORTS_TO*]-(report)RETURN report.name, report.titleIntermediate nodes (reifying relationships as nodes) add modeling flexibility but increase traversal cost. Every hop costs performance. Use intermediate nodes when you need to: connect a relationship to other nodes, version relationship history, or model true n-ary associations. Avoid over-engineering—simple direct relationships are usually sufficient.
We've comprehensively examined the building blocks of graph databases. Let's consolidate the essential insights:
:Person:Employee:Manager), enabling flexible, overlapping categorization without schema changes.:PURCHASED, :MANAGES) using specific semantics.What's next:
With nodes and edges understood, we'll explore a concrete implementation. The next page examines Neo4j—the most popular graph database—including its architecture, Cypher query language, and practical examples of graph operations.
You now have expert-level understanding of nodes and edges—their structure, property systems, label strategies, relationship type design, indexing approaches, and handling of complex scenarios. Next, we'll apply this knowledge using Neo4j.