NoSQL DatabasesGraph Databases

Graph Databases: Modeling Connected Data

LevelAdvanced

Duration75 mins

TopicGraph Databases

3 / 5

Neo4j Example

The Graph Database That Changed Everything

When the developers of Neo4j set out to build a database optimized for connected data in 2000, they were solving problems that the broader industry wouldn't recognize for another decade. Today, Neo4j powers fraud detection at financial institutions, recommendation engines at e-commerce giants, and knowledge graphs at research organizations worldwide.

Neo4j isn't just the most popular graph database—it defined the property graph model now adopted across the industry. Understanding Neo4j means understanding the patterns, practices, and query language that have become the de facto standard for graph database work.

What You Will Learn

By the end of this page, you will understand Neo4j's architecture, master the Cypher query language from fundamentals to advanced patterns, learn how to perform CRUD operations, implement indexes and constraints, and understand deployment options including AuraDB (cloud), Desktop, and Enterprise configurations.

Neo4j Architecture

Neo4j is a native graph database—designed from the ground up for graph storage and processing, not a graph layer atop relational or document storage. This native approach provides the performance characteristics that distinguish Neo4j from non-native alternatives.

Core Architectural Components:

neo4j-architecture.txt
NEO4J ARCHITECTURE OVERVIEW
============================
 
┌─────────────────────────────────────────────────────────────────────┐
│                         CLIENT LAYER                                 │
├─────────────────────────────────────────────────────────────────────┤
│  • Bolt Protocol (binary, TLS-encrypted)                            │
│  • HTTP API (REST, for legacy compatibility)                        │
│  • Official Drivers: Java, Python, JavaScript, .NET, Go             │
│  • Neo4j Browser (web-based query interface)                        │
│  • Neo4j Bloom (visual exploration tool)                            │
└────────────────────────────────┬────────────────────────────────────┘
                                 │
                                 ▼
┌─────────────────────────────────────────────────────────────────────┐
│                       QUERY ENGINE                                   │
├─────────────────────────────────────────────────────────────────────┤
│  • Cypher Parser → AST → Logical Plan → Physical Plan               │
│  • Cost-based Query Optimizer                                        │
│  • Query Cache (compiled query plans)                                │
│  • Parallel Execution Engine (Enterprise)                           │
└────────────────────────────────┬────────────────────────────────────┘
                                 │
                                 ▼
┌─────────────────────────────────────────────────────────────────────┐
│                    TRANSACTION LAYER                                 │
├─────────────────────────────────────────────────────────────────────┤
│  • ACID Transactions (full durability)                               │
│  • MVCC (Multi-Version Concurrency Control)                          │
│  • Transaction Log (write-ahead log)                                 │
│  • Lock Manager (node/relationship level)                            │
│  • Deadlock Detection                                                │
└────────────────────────────────┬────────────────────────────────────┘
                                 │
                                 ▼
┌─────────────────────────────────────────────────────────────────────┐
│                     STORAGE ENGINE                                   │
├─────────────────────────────────────────────────────────────────────┤
│  Native Graph Storage:                                               │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐               │
│  │  Node Store  │  │ Relationship │  │   Property   │               │
│  │  (15B/node)  │  │    Store     │  │    Store     │               │
│  │              │  │  (34B/rel)   │  │  (variable)  │               │
│  └──────────────┘  └──────────────┘  └──────────────┘               │
│                                                                      │
│  Index Infrastructure:                                               │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐               │
│  │ Range Index  │  │  Full-Text   │  │  Spatial     │               │
│  │  (B+ Tree)   │  │   (Lucene)   │  │  (R-Tree)    │               │
│  └──────────────┘  └──────────────┘  └──────────────┘               │
│                                                                      │
│  Page Cache (memory-mapped I/O, configurable heap)                  │
└─────────────────────────────────────────────────────────────────────┘
 
CLUSTER ARCHITECTURE (Enterprise):
┌───────────────────────────────────────────────────────────────────┐
│                        RAFT CLUSTER                                │
├───────────────────────────────────────────────────────────────────┤
│  ┌─────────────┐    ┌─────────────┐    ┌─────────────┐           │
│  │   LEADER    │◄──►│  FOLLOWER   │◄──►│  FOLLOWER   │           │
│  │  (writes)   │    │  (replicas) │    │  (replicas) │           │
│  └─────────────┘    └─────────────┘    └─────────────┘           │
│         ▲                                                          │
│         │ Automatic Leader Election                                │
│         │ Synchronous Replication (configurable)                   │
│                                                                    │
│  READ REPLICAS (async):                                            │
│  ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐             │
│  │ Read     │ │ Read     │ │ Read     │ │ Read     │             │
│  │ Replica  │ │ Replica  │ │ Replica  │ │ Replica  │             │
│  └──────────┘ └──────────┘ └──────────┘ └──────────┘             │
└───────────────────────────────────────────────────────────────────┘

Page Cache is Critical

Neo4j performance depends heavily on having enough memory for the page cache to hold the working set. Optimal configuration: page cache should hold the entire graph store files (node, relationship, property, and index files). For a 50GB graph, configure at least 50GB page cache if possible. Monitor cache hit rates—below 95% typically indicates undersized memory.

Cypher Query Language Fundamentals

Cypher is Neo4j's declarative query language, designed to be expressive and intuitive for working with graph patterns. Its ASCII-art syntax makes pattern matching readable:

(node)-[:RELATIONSHIP]->(anotherNode)

Cypher has been open-sourced as openCypher and adopted by other graph databases (Amazon Neptune, SAP HANA Graph, Redis Graph), making it the SQL of the graph world.

Core Clauses:

Essential Cypher Clauses
Clause	Purpose	SQL Equivalent
MATCH	Pattern matching to find data	SELECT ... FROM ... WHERE
WHERE	Filter matched patterns	WHERE clause
RETURN	Specify output data	SELECT columns
CREATE	Create nodes and relationships	INSERT
MERGE	Match or create (upsert)	INSERT ... ON CONFLICT
SET	Update properties	UPDATE
DELETE	Remove nodes/relationships	DELETE
WITH	Chain query parts (subquery)	CTE / subquery
ORDER BY	Sort results	ORDER BY
LIMIT/SKIP	Pagination	LIMIT/OFFSET
UNWIND	Expand lists to rows	LATERAL / UNNEST

cypher-basics.cypher

Cypher

// PATTERN SYNTAX EXPLAINED
// =========================
 
// Node pattern: (variable:Label {properties})
(p:Person {name: "Alice"})    // Node labeled Person with name Alice
(p:Person:Employee)           // Node with multiple labels
(p)                           // Any node
(:Person)                     // Any Person (no variable binding)
 
// Relationship pattern: -[variable:TYPE {properties}]->
-[:KNOWS]->                   // Outgoing KNOWS relationship
<-[:MANAGES]-                 // Incoming MANAGES relationship
-[:FRIEND]-                   // Either direction
-[r:WORKS_AT {since: 2020}]-> // Relationship bound to 'r' with property
 
// Combined patterns:
(alice:Person)-[:KNOWS]->(bob:Person)
(a)-[:MANAGES]->(b)<-[:REPORTS_TO]-(c)
 
// Variable-length paths:
(a)-[:KNOWS*1..3]->(b)        // 1 to 3 hops
(a)-[:KNOWS*]->(b)            // Any number of hops
(a)-[:KNOWS*..5]->(b)         // Up to 5 hops
 
// BASIC QUERIES
// =============
 
// 1. Find all Person nodes
MATCH (p:Person)
RETURN p.name, p.email
 
// 2. Find specific person
MATCH (p:Person {name: "Alice"})
RETURN p
 
// 3. Find Alice's friends
MATCH (alice:Person {name: "Alice"})-[:KNOWS]->(friend:Person)
RETURN friend.name
 
// 4. Find friends of friends (excluding direct friends)
MATCH (me:Person {name: "Alice"})-[:KNOWS*2..2]->(foaf:Person)
WHERE NOT (me)-[:KNOWS]->(foaf)
  AND me <> foaf
RETURN DISTINCT foaf.name
 
// 5. Find shortest path between two people
MATCH path = shortestPath(
  (alice:Person {name: "Alice"})-[:KNOWS*]-(bob:Person {name: "Bob"})
)
RETURN path, length(path) AS hops
 
// 6. Pattern with relationship properties
MATCH (e:Employee)-[w:WORKS_FOR {since: 2020}]->(c:Company)
WHERE w.role = "Engineer"
RETURN e.name, c.name, w.role

Cypher vs GQL Standard

In 2024, ISO ratified GQL (Graph Query Language) as the international standard for graph database queries. GQL is heavily influenced by Cypher, and Neo4j plans full GQL support. For now, Cypher remains the production-ready language, and GQL will be largely syntactically compatible. Skills in Cypher transfer directly to GQL.

CRUD Operations

Let's examine the full lifecycle of data manipulation in Neo4j, from creation through deletion, with attention to best practices and common pitfalls.

crud-operations.cypher

Cypher

// ========================================
// CREATE - Adding Data
// ========================================
 
// Create a single node
CREATE (alice:Person:Employee {
    personId: "PER-001",
    name: "Alice Chen",
    email: "alice@techcorp.com",
    age: 32,
    skills: ["Python", "Go", "Machine Learning"],
    joinedAt: datetime("2019-03-15T09:00:00")
})
RETURN alice
 
// Create multiple nodes in one query
CREATE (bob:Person:Employee {personId: "PER-002", name: "Bob Singh"})
CREATE (carol:Person:Manager {personId: "PER-003", name: "Carol Davis"})
CREATE (techcorp:Company {companyId: "COM-001", name: "TechCorp Inc."})
 
// Create nodes and relationships together
CREATE (alice:Person {name: "Alice"})-[:KNOWS {since: date()}]->(bob:Person {name: "Bob"})
 
// Create relationships between existing nodes
MATCH (a:Person {name: "Alice"}), (c:Company {name: "TechCorp Inc."})
CREATE (a)-[:WORKS_FOR {since: date("2019-03-15"), role: "Senior Engineer"}]->(c)
 
// ========================================
// MERGE - Create If Not Exists (Upsert)
// ========================================
 
// MERGE checks for existence before creating
// Critical for idempotent data loading
 
// Merge on unique property (recommended)
MERGE (p:Person {personId: "PER-001"})
ON CREATE SET 
    p.name = "Alice Chen",
    p.createdAt = datetime()
ON MATCH SET
    p.lastSeen = datetime()
RETURN p
 
// Merge with full pattern (creates both nodes and relationship)
MERGE (a:Person {personId: "PER-001"})-[:WORKS_FOR]->(c:Company {companyId: "COM-001"})
 
// CAUTION: Merge relationship between existing nodes
// DO THIS (correct - match nodes first):
MATCH (a:Person {personId: "PER-001"})
MATCH (c:Company {companyId: "COM-001"})
MERGE (a)-[:WORKS_FOR]->(c)
 
// NOT THIS (may create duplicate nodes):
MERGE (a:Person {name: "Alice"})-[:WORKS_FOR]->(c:Company {name: "TechCorp"})
 
// ========================================
// SET - Updating Data
// ========================================
 
// Update single property
MATCH (p:Person {personId: "PER-001"})
SET p.email = "alice.chen@techcorp.com"
RETURN p
 
// Update multiple properties
MATCH (p:Person {personId: "PER-001"})
SET p.age = 33, p.lastModified = datetime()
RETURN p
 
// Replace all properties (destructive!)
MATCH (p:Person {personId: "PER-001"})
SET p = {personId: "PER-001", name: "Alice Chen", age: 33}
RETURN p
 
// Add to existing properties (non-destructive, recommended)
MATCH (p:Person {personId: "PER-001"})
SET p += {phone: "+1-555-0123", verified: true}
RETURN p
 
// Add/remove labels
MATCH (p:Person {personId: "PER-001"})
SET p:Manager, p:TechLead
REMOVE p:Employee
RETURN labels(p)
 
// ========================================
// REMOVE - Removing Properties/Labels
// ========================================
 
// Remove a property
MATCH (p:Person {personId: "PER-001"})
REMOVE p.temporaryField
RETURN p
 
// Remove a label
MATCH (p:Person:Manager {personId: "PER-001"})
REMOVE p:Manager
RETURN labels(p)
 
// ========================================
// DELETE - Removing Nodes/Relationships
// ========================================
 
// Delete a relationship
MATCH (a:Person {personId: "PER-001"})-[r:WORKS_FOR]->(c:Company)
DELETE r
 
// Delete a node (must have no relationships)
MATCH (p:Person {personId: "TEMP-001"})
DELETE p
 
// Delete node and ALL connected relationships (DETACH DELETE)
MATCH (p:Person {personId: "TEMP-001"})
DETACH DELETE p
 
// Delete multiple nodes by condition
MATCH (p:Person)
WHERE p.isInactive = true AND p.lastSeen < date() - duration('P1Y')
DETACH DELETE p
 
// ========================================
// RETURNING RESULTS
// ========================================
 
// Return specific properties
MATCH (p:Person)-[r:WORKS_FOR]->(c:Company)
RETURN p.name AS employee, c.name AS company, r.role AS position
 
// Return with computed fields
MATCH (p:Person)
RETURN p.name, 
       p.age, 
       CASE WHEN p.age >= 65 THEN "Senior" ELSE "Active" END AS status
 
// Aggregate results
MATCH (p:Person)-[:WORKS_FOR]->(c:Company)
RETURN c.name AS company, count(p) AS employeeCount
ORDER BY employeeCount DESC
 
// Return graph structure
MATCH path = (a:Person)-[:KNOWS*1..3]-(b:Person)
RETURN path

MERGE Pitfalls

Common Mistake: MERGE (a:Person {name: 'Alice'})-[:KNOWS]->(b:Person {name: 'Bob'}) creates new Alice and Bob nodes if ANY part of the pattern doesn't exist. Correct Approach: MATCH existing nodes first, then MERGE the relationship. Only MERGE with complete patterns during initial data load with indexed unique properties.

Advanced Cypher Patterns

Beyond basic CRUD, Cypher provides sophisticated capabilities for complex graph analytics, path operations, and data transformation.

advanced-cypher.cypher

Cypher

// ========================================
// AGGREGATION AND GROUPING
// ========================================
 
// Basic aggregations
MATCH (p:Person)-[:WORKS_FOR]->(c:Company)
RETURN c.name AS company,
       count(p) AS employees,
       avg(p.salary) AS avgSalary,
       min(p.joinedAt) AS firstHire,
       max(p.salary) AS highestSalary,
       collect(p.name) AS employeeNames
 
// Count distinct values
MATCH (p:Person)-[:PURCHASED]->(product:Product)
RETURN p.name, count(DISTINCT product) AS uniqueProducts
 
// Conditional aggregation
MATCH (p:Person)
RETURN 
    count(CASE WHEN p.age < 30 THEN 1 END) AS under30,
    count(CASE WHEN p.age >= 30 AND p.age < 50 THEN 1 END) AS between30and50,
    count(CASE WHEN p.age >= 50 THEN 1 END) AS over50
 
// ========================================
// WITH CLAUSE - Query Chaining
// ========================================
 
// Pipeline query stages
MATCH (p:Person)-[:WORKS_FOR]->(c:Company {name: "TechCorp"})
WITH p, c
WHERE p.age > 30
WITH p ORDER BY p.salary DESC LIMIT 10
MATCH (p)-[:KNOWS]->(colleague:Person)
RETURN p.name, collect(colleague.name) AS topColleagues
 
// Subquery-like behavior
MATCH (dept:Department)
CALL {
    WITH dept
    MATCH (dept)<-[:BELONGS_TO]-(e:Employee)
    RETURN count(e) AS empCount
}
RETURN dept.name, empCount
ORDER BY empCount DESC
 
// ========================================
// PATH OPERATIONS
// ========================================
 
// All shortest paths between nodes
MATCH paths = allShortestPaths(
    (alice:Person {name: "Alice"})-[:KNOWS*]-(bob:Person {name: "Bob"})
)
RETURN paths, length(paths) AS hops
 
// Variable-length path with filters
MATCH path = (start:Person {name: "Alice"})-[:KNOWS*1..6]-(end:Person)
WHERE ALL(node IN nodes(path) WHERE node.age > 25)
  AND ALL(rel IN relationships(path) WHERE rel.since > date("2020-01-01"))
RETURN end.name, length(path) AS distance
 
// Path operations
MATCH path = (a:Person)-[:KNOWS*2..4]->(b:Person)
RETURN 
    nodes(path) AS pathNodes,
    relationships(path) AS pathRels,
    length(path) AS hops,
    [n IN nodes(path) | n.name] AS names
 
// ========================================
// LIST OPERATIONS
// ========================================
 
// List comprehension
MATCH (p:Person)
RETURN p.name, [skill IN p.skills WHERE skill CONTAINS "Python" | skill] AS pythonSkills
 
// UNWIND - expand lists to rows
WITH ["Alice", "Bob", "Carol"] AS names
UNWIND names AS name
MERGE (p:Person {name: name})
RETURN p
 
// Pattern from list
MATCH (p:Product)
WHERE p.productId IN ["PROD-001", "PROD-002", "PROD-003"]
RETURN p.name
 
// Collect and process
MATCH (c:Customer)-[:PURCHASED]->(p:Product)
WITH c, collect(p) AS products
WHERE size(products) >= 3
RETURN c.name, [prod IN products | prod.name] AS purchasedProducts
 
// ========================================
// GRAPH DATA SCIENCE PROCEDURES
// ========================================
 
// PageRank (requires GDS library)
CALL gds.pageRank.stream('myGraph')
YIELD nodeId, score
RETURN gds.util.asNode(nodeId).name AS name, score
ORDER BY score DESC
LIMIT 10
 
// Community detection (Louvain)
CALL gds.louvain.stream('socialGraph')
YIELD nodeId, communityId
RETURN communityId, collect(gds.util.asNode(nodeId).name) AS members
ORDER BY size(members) DESC
 
// Shortest path (Dijkstra)
MATCH (start:City {name: "New York"}), (end:City {name: "Los Angeles"})
CALL gds.shortestPath.dijkstra.stream('roadNetwork', {
    sourceNode: start,
    targetNode: end,
    relationshipWeightProperty: 'distance'
})
YIELD totalCost, nodeIds, costs
RETURN totalCost, [nodeId IN nodeIds | gds.util.asNode(nodeId).name] AS path
 
// ========================================
// APOC UTILITIES (common procedures)
// ========================================
 
// Batch processing
CALL apoc.periodic.iterate(
    "MATCH (p:Person) WHERE p.needsUpdate = true RETURN p",
    "SET p.processed = true, p.processedAt = datetime()",
    {batchSize: 1000, parallel: true}
)
 
// Dynamic relationship creation
MATCH (a:Person {name: "Alice"}), (b:Person {name: "Bob"})
CALL apoc.create.relationship(a, "CUSTOM_REL", {weight: 0.5}, b)
YIELD rel
RETURN rel
 
// Load JSON data
CALL apoc.load.json("https://api.example.com/users.json")
YIELD value
MERGE (p:Person {userId: value.id})
SET p.name = value.name, p.email = value.email

Performance Tips for Complex Queries

1. Use PROFILE/EXPLAIN to understand query plans. 2. Anchor patterns on indexed properties to avoid full scans. 3. Filter early with WHERE after the first MATCH when possible. 4. Limit variable-length paths (*1..5 not just *). 5. Use APOC for batch operations instead of thousands of individual queries.

Indexes and Constraints

Proper indexing is critical for Neo4j performance. Without indexes, MATCH clauses with property filters require full label scans—acceptable for development but prohibitive at scale.

indexes-constraints.cypher

Cypher

// ========================================
// INDEX CREATION (Neo4j 5.x syntax)
// ========================================
 
// Range index (default, most common)
CREATE INDEX person_email FOR (p:Person) ON (p.email)
 
// Named index with options
CREATE INDEX person_id_idx IF NOT EXISTS 
FOR (p:Person) ON (p.personId)
OPTIONS {
    indexConfig: {
        `spatial.wgs-84.min`: [-180.0, -90.0],
        `spatial.wgs-84.max`: [180.0, 90.0]
    }
}
 
// Composite index (multiple properties)
CREATE INDEX product_category_status 
FOR (p:Product) ON (p.category, p.status)
 
// Text index (for text search operations)
CREATE TEXT INDEX person_bio FOR (p:Person) ON (p.bio)
 
// Full-text index (Lucene-backed)
CREATE FULLTEXT INDEX product_search 
FOR (p:Product) ON EACH [p.name, p.description, p.tags]
OPTIONS {
    indexConfig: {
        `fulltext.analyzer`: "english"
    }
}
 
// Point index (spatial queries)
CREATE POINT INDEX location_coords 
FOR (l:Location) ON (l.coordinates)
 
// Relationship property index (Neo4j 5.7+)
CREATE INDEX rel_since FOR ()-[w:WORKS_FOR]-() ON (w.since)
 
// ========================================
// CONSTRAINT CREATION
// ========================================
 
// Unique constraint (creates index automatically)
CREATE CONSTRAINT person_email_unique IF NOT EXISTS
FOR (p:Person) REQUIRE p.email IS UNIQUE
 
// Node key constraint (composite uniqueness + existence)
CREATE CONSTRAINT order_key
FOR (o:Order) REQUIRE (o.orderId, o.region) IS NODE KEY
 
// Existence constraint
CREATE CONSTRAINT person_name_required
FOR (p:Person) REQUIRE p.name IS NOT NULL
 
// Type constraint (Neo4j 5.9+)
CREATE CONSTRAINT person_age_integer
FOR (p:Person) REQUIRE p.age IS :: INTEGER
 
// Relationship property existence
CREATE CONSTRAINT employment_since_required
FOR ()-[w:WORKS_FOR]-() REQUIRE w.since IS NOT NULL
 
// ========================================
// INDEX MANAGEMENT
// ========================================
 
// List all indexes
SHOW INDEXES
 
// List with details
SHOW INDEXES WHERE type = 'RANGE'
YIELD name, type, entityType, labelsOrTypes, properties, state
 
// Drop index
DROP INDEX person_email
 
// Drop constraint (removes associated index)
DROP CONSTRAINT person_email_unique
 
// Check index usage in query
PROFILE MATCH (p:Person {email: "alice@example.com"}) RETURN p
 
// Example output shows "NodeIndexSeek" when index is used
// vs "NodeByLabelScan" + "Filter" when not indexed
 
// ========================================
// PERFORMANCE DIAGNOSTICS
// ========================================
 
// Explain query plan (without executing)
EXPLAIN MATCH (p:Person)-[:WORKS_FOR]->(c:Company)
WHERE p.email STARTS WITH "alice"
RETURN p.name, c.name
 
// Profile query (executes and shows db hits)
PROFILE MATCH (p:Person)-[:WORKS_FOR]->(c:Company)
WHERE p.email STARTS WITH "alice"
RETURN p.name, c.name
 
// Monitor index population status
SHOW INDEXES WHERE state <> 'ONLINE'

Index Type Selection Guide
Query Pattern	Index Type	When to Use
Exact match, range queries	Range (default)	Most lookup patterns
Multi-property lookup	Composite	WHERE a.x = ? AND a.y = ?
Text CONTAINS/ENDS WITH	Text	Substring matching
Natural language search	Full-text	Search box functionality
Distance/bounding box	Point	Geo-spatial queries
Label existence check	Token lookup	Fast `MATCH (n:Label)` scans

Index Overhead Considerations

Every index adds write overhead—properties are indexed synchronously during commits. For write-heavy workloads, balance read performance against write latency. Create indexes based on actual query patterns, not theoretical completeness. Monitor index population status after creation (SHOW INDEXES WHERE state <> 'ONLINE').

Driver Integration

Neo4j provides official drivers for major programming languages, all using the Bolt binary protocol for efficient, encrypted communication.

neo4j_python.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
# Neo4j Python Driver
# pip install neo4j
 
from neo4j import GraphDatabase
from contextlib import contextmanager
 
class Neo4jConnection:
    def __init__(self, uri: str, user: str, password: str):
        self._driver = GraphDatabase.driver(uri, auth=(user, password))
    
    def close(self):
        self._driver.close()
    
    @contextmanager
    def session(self, database: str = "neo4j"):
        session = self._driver.session(database=database)
        try:
            yield session
        finally:
            session.close()
 
    # Transaction functions are recommended for production
    def create_person(self, person_id: str, name: str, email: str):
        def _create(tx, person_id, name, email):
            result = tx.run("""
                MERGE (p:Person {personId: $personId})
                ON CREATE SET p.name = $name, p.email = $email, 
                              p.createdAt = datetime()
                RETURN p
            """, personId=person_id, name=name, email=email)
            return result.single()
        
        with self.session() as session:
            return session.execute_write(_create, person_id, name, email)
    
    def find_connections(self, person_id: str, max_hops: int = 3):
        def _find(tx, person_id, max_hops):
            result = tx.run("""
                MATCH path = (start:Person {personId: $personId})
                             -[:KNOWS*1..$maxHops]-
                             (connected:Person)
                WHERE start <> connected
                RETURN DISTINCT connected.name AS name,
                       length(path) AS distance
                ORDER BY distance
            """, personId=person_id, maxHops=max_hops)
            return [dict(record) for record in result]
        
        with self.session() as session:
            return session.execute_read(_find, person_id, max_hops)
 
# Usage
if __name__ == "__main__":
    conn = Neo4jConnection(
        uri="bolt://localhost:7687",
        user="neo4j",
        password="password"
    )
    
    try:
        # Create nodes
        conn.create_person("P001", "Alice Chen", "alice@example.com")
        conn.create_person("P002", "Bob Singh", "bob@example.com")
        
        # Query connections
        connections = conn.find_connections("P001", max_hops=2)
        for c in connections:
            print(f"{c['name']} - {c['distance']} hops away")
    finally:
        conn.close()

Driver Best Practices

1. Use transaction functions (executeRead, executeWrite) for automatic retry on transient failures. 2. Share driver instance across your application—it's thread-safe and manages connection pooling. 3. Close sessions after use to return connections to the pool. 4. Use parameterized queries to prevent injection and enable query plan caching.

Deployment Options

Neo4j offers deployment options ranging from local development to managed cloud services for enterprise production workloads.

Neo4j Deployment Comparison
Option	Best For	Features	Considerations
Neo4j Desktop	Development, learning, prototyping	Free, local graphs, multiple databases, plugin management	Single machine, no production support
Community Edition	Small projects, single-server production	Open source, full Cypher, APOC support	Single instance, no clustering, 34B node limit
Enterprise Edition	Production workloads, high availability	Clustering, causal consistency, advanced security, online backup	Commercial license required
AuraDB Free	Learning, small prototypes	Managed, 200K nodes, automatic suspend	Limited resources, pauses after 3 days inactive
AuraDB Professional	Production SaaS applications	Managed, auto-scaling, backups, SLA	Pay per use, limited configuration
AuraDB Enterprise	Enterprise/mission-critical	Dedicated infrastructure, VPC, SSO, compliance	Premium pricing, custom contracts

docker-deployment.yml
YAML
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
# Docker Compose for Neo4j Development
version: '3.8'
 
services:
  neo4j:
    image: neo4j:5.15-community
    container_name: neo4j-dev
    ports:
      - "7474:7474"   # HTTP (Browser)
      - "7687:7687"   # Bolt (Driver)
    environment:
      # Authentication
      - NEO4J_AUTH=neo4j/development_password
      
      # Memory configuration
      - NEO4J_dbms_memory_pagecache_size=1G
      - NEO4J_dbms_memory_heap_initial__size=512M
      - NEO4J_dbms_memory_heap_max__size=1G
      
      # Enable APOC
      - NEO4J_PLUGINS=["apoc"]
      - NEO4J_dbms_security_procedures_unrestricted=apoc.*
      
      # Allow CSV import from URL
      - NEO4J_dbms_security_allow__csv__import__from__file__urls=true
    
    volumes:
      - neo4j_data:/data
      - neo4j_logs:/logs
      - neo4j_import:/var/lib/neo4j/import
    
    healthcheck:
      test: ["CMD", "wget", "-q", "--spider", "http://localhost:7474"]
      interval: 30s
      timeout: 10s
      retries: 5
 
volumes:
  neo4j_data:
  neo4j_logs:
  neo4j_import:
 
# Run: docker-compose up -d
# Access: http://localhost:7474 (Neo4j Browser)

Memory Sizing Guidelines

Page Cache: Should ideally hold entire graph store (node + relationship + property files). Check file sizes in data/databases/<db>/. Heap: 2-8GB typically sufficient; larger for heavy queries or GDS algorithms. OS Reserve: Leave 1-2GB for operating system. Formula: Total RAM = Page Cache + Heap + OS Reserve

Summary: Neo4j Example

We've thoroughly explored Neo4j—from architecture through practical implementation. Let's consolidate the key insights:

Key Takeaways

•Native graph storage delivers performance — Neo4j's fixed-size record stores and index-free adjacency enable O(1) relationship traversal regardless of total database size.
•Cypher is intuitive and powerful — ASCII-art pattern syntax makes graph queries readable, while advanced features (variable-length paths, aggregation, APOC) handle complex analytics.
•MERGE is for upserts, not pattern matching — Always MATCH existing nodes before MERGEing relationships to avoid accidental node creation.
•Indexes are essential for performance — Create range indexes on properties used in WHERE clauses; full-text indexes for search functionality; point indexes for spatial queries.
•Transaction functions provide resilience — Use executeRead/executeWrite for automatic retry on transient failures in driver code.
•Deployment options span the spectrum — From local Docker for development to AuraDB Enterprise for mission-critical production workloads.

What's next:

With Neo4j fundamentals mastered, we'll explore Graph Queries in depth—advanced Cypher patterns, graph algorithms, path analysis, and the analytical capabilities that make graph databases indispensable for connected data problems.

Page Complete

You now have practical, production-ready knowledge of Neo4j—its architecture, Cypher query language, CRUD operations, indexing strategies, driver integration, and deployment options. Next, we'll dive into advanced graph query techniques.

3 / 5

Loading learning content...

NoSQL DatabasesGraph Databases

Graph Databases: Modeling Connected Data

LevelAdvanced

Duration75 mins

TopicGraph Databases

3 / 5

Neo4j Example

The Graph Database That Changed Everything

What You Will Learn

Neo4j Architecture

Core Architectural Components:

neo4j-architecture.txt
NEO4J ARCHITECTURE OVERVIEW
============================
 
┌─────────────────────────────────────────────────────────────────────┐
│                         CLIENT LAYER                                 │
├─────────────────────────────────────────────────────────────────────┤
│  • Bolt Protocol (binary, TLS-encrypted)                            │
│  • HTTP API (REST, for legacy compatibility)                        │
│  • Official Drivers: Java, Python, JavaScript, .NET, Go             │
│  • Neo4j Browser (web-based query interface)                        │
│  • Neo4j Bloom (visual exploration tool)                            │
└────────────────────────────────┬────────────────────────────────────┘
                                 │
                                 ▼
┌─────────────────────────────────────────────────────────────────────┐
│                       QUERY ENGINE                                   │
├─────────────────────────────────────────────────────────────────────┤
│  • Cypher Parser → AST → Logical Plan → Physical Plan               │
│  • Cost-based Query Optimizer                                        │
│  • Query Cache (compiled query plans)                                │
│  • Parallel Execution Engine (Enterprise)                           │
└────────────────────────────────┬────────────────────────────────────┘
                                 │
                                 ▼
┌─────────────────────────────────────────────────────────────────────┐
│                    TRANSACTION LAYER                                 │
├─────────────────────────────────────────────────────────────────────┤
│  • ACID Transactions (full durability)                               │
│  • MVCC (Multi-Version Concurrency Control)                          │
│  • Transaction Log (write-ahead log)                                 │
│  • Lock Manager (node/relationship level)                            │
│  • Deadlock Detection                                                │
└────────────────────────────────┬────────────────────────────────────┘
                                 │
                                 ▼
┌─────────────────────────────────────────────────────────────────────┐
│                     STORAGE ENGINE                                   │
├─────────────────────────────────────────────────────────────────────┤
│  Native Graph Storage:                                               │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐               │
│  │  Node Store  │  │ Relationship │  │   Property   │               │
│  │  (15B/node)  │  │    Store     │  │    Store     │               │
│  │              │  │  (34B/rel)   │  │  (variable)  │               │
│  └──────────────┘  └──────────────┘  └──────────────┘               │
│                                                                      │
│  Index Infrastructure:                                               │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐               │
│  │ Range Index  │  │  Full-Text   │  │  Spatial     │               │
│  │  (B+ Tree)   │  │   (Lucene)   │  │  (R-Tree)    │               │
│  └──────────────┘  └──────────────┘  └──────────────┘               │
│                                                                      │
│  Page Cache (memory-mapped I/O, configurable heap)                  │
└─────────────────────────────────────────────────────────────────────┘
 
CLUSTER ARCHITECTURE (Enterprise):
┌───────────────────────────────────────────────────────────────────┐
│                        RAFT CLUSTER                                │
├───────────────────────────────────────────────────────────────────┤
│  ┌─────────────┐    ┌─────────────┐    ┌─────────────┐           │
│  │   LEADER    │◄──►│  FOLLOWER   │◄──►│  FOLLOWER   │           │
│  │  (writes)   │    │  (replicas) │    │  (replicas) │           │
│  └─────────────┘    └─────────────┘    └─────────────┘           │
│         ▲                                                          │
│         │ Automatic Leader Election                                │
│         │ Synchronous Replication (configurable)                   │
│                                                                    │
│  READ REPLICAS (async):                                            │
│  ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐             │
│  │ Read     │ │ Read     │ │ Read     │ │ Read     │             │
│  │ Replica  │ │ Replica  │ │ Replica  │ │ Replica  │             │
│  └──────────┘ └──────────┘ └──────────┘ └──────────┘             │
└───────────────────────────────────────────────────────────────────┘

Page Cache is Critical

Cypher Query Language Fundamentals

Cypher is Neo4j's declarative query language, designed to be expressive and intuitive for working with graph patterns. Its ASCII-art syntax makes pattern matching readable:

(node)-[:RELATIONSHIP]->(anotherNode)

Cypher has been open-sourced as openCypher and adopted by other graph databases (Amazon Neptune, SAP HANA Graph, Redis Graph), making it the SQL of the graph world.

Core Clauses:

Essential Cypher Clauses
Clause	Purpose	SQL Equivalent
MATCH	Pattern matching to find data	SELECT ... FROM ... WHERE
WHERE	Filter matched patterns	WHERE clause
RETURN	Specify output data	SELECT columns
CREATE	Create nodes and relationships	INSERT
MERGE	Match or create (upsert)	INSERT ... ON CONFLICT
SET	Update properties	UPDATE
DELETE	Remove nodes/relationships	DELETE
WITH	Chain query parts (subquery)	CTE / subquery
ORDER BY	Sort results	ORDER BY
LIMIT/SKIP	Pagination	LIMIT/OFFSET
UNWIND	Expand lists to rows	LATERAL / UNNEST

cypher-basics.cypher

Cypher

// PATTERN SYNTAX EXPLAINED
// =========================
 
// Node pattern: (variable:Label {properties})
(p:Person {name: "Alice"})    // Node labeled Person with name Alice
(p:Person:Employee)           // Node with multiple labels
(p)                           // Any node
(:Person)                     // Any Person (no variable binding)
 
// Relationship pattern: -[variable:TYPE {properties}]->
-[:KNOWS]->                   // Outgoing KNOWS relationship
<-[:MANAGES]-                 // Incoming MANAGES relationship
-[:FRIEND]-                   // Either direction
-[r:WORKS_AT {since: 2020}]-> // Relationship bound to 'r' with property
 
// Combined patterns:
(alice:Person)-[:KNOWS]->(bob:Person)
(a)-[:MANAGES]->(b)<-[:REPORTS_TO]-(c)
 
// Variable-length paths:
(a)-[:KNOWS*1..3]->(b)        // 1 to 3 hops
(a)-[:KNOWS*]->(b)            // Any number of hops
(a)-[:KNOWS*..5]->(b)         // Up to 5 hops
 
// BASIC QUERIES
// =============
 
// 1. Find all Person nodes
MATCH (p:Person)
RETURN p.name, p.email
 
// 2. Find specific person
MATCH (p:Person {name: "Alice"})
RETURN p
 
// 3. Find Alice's friends
MATCH (alice:Person {name: "Alice"})-[:KNOWS]->(friend:Person)
RETURN friend.name
 
// 4. Find friends of friends (excluding direct friends)
MATCH (me:Person {name: "Alice"})-[:KNOWS*2..2]->(foaf:Person)
WHERE NOT (me)-[:KNOWS]->(foaf)
  AND me <> foaf
RETURN DISTINCT foaf.name
 
// 5. Find shortest path between two people
MATCH path = shortestPath(
  (alice:Person {name: "Alice"})-[:KNOWS*]-(bob:Person {name: "Bob"})
)
RETURN path, length(path) AS hops
 
// 6. Pattern with relationship properties
MATCH (e:Employee)-[w:WORKS_FOR {since: 2020}]->(c:Company)
WHERE w.role = "Engineer"
RETURN e.name, c.name, w.role

Cypher vs GQL Standard

CRUD Operations

Let's examine the full lifecycle of data manipulation in Neo4j, from creation through deletion, with attention to best practices and common pitfalls.

crud-operations.cypher

Cypher

// ========================================
// CREATE - Adding Data
// ========================================
 
// Create a single node
CREATE (alice:Person:Employee {
    personId: "PER-001",
    name: "Alice Chen",
    email: "alice@techcorp.com",
    age: 32,
    skills: ["Python", "Go", "Machine Learning"],
    joinedAt: datetime("2019-03-15T09:00:00")
})
RETURN alice
 
// Create multiple nodes in one query
CREATE (bob:Person:Employee {personId: "PER-002", name: "Bob Singh"})
CREATE (carol:Person:Manager {personId: "PER-003", name: "Carol Davis"})
CREATE (techcorp:Company {companyId: "COM-001", name: "TechCorp Inc."})
 
// Create nodes and relationships together
CREATE (alice:Person {name: "Alice"})-[:KNOWS {since: date()}]->(bob:Person {name: "Bob"})
 
// Create relationships between existing nodes
MATCH (a:Person {name: "Alice"}), (c:Company {name: "TechCorp Inc."})
CREATE (a)-[:WORKS_FOR {since: date("2019-03-15"), role: "Senior Engineer"}]->(c)
 
// ========================================
// MERGE - Create If Not Exists (Upsert)
// ========================================
 
// MERGE checks for existence before creating
// Critical for idempotent data loading
 
// Merge on unique property (recommended)
MERGE (p:Person {personId: "PER-001"})
ON CREATE SET 
    p.name = "Alice Chen",
    p.createdAt = datetime()
ON MATCH SET
    p.lastSeen = datetime()
RETURN p
 
// Merge with full pattern (creates both nodes and relationship)
MERGE (a:Person {personId: "PER-001"})-[:WORKS_FOR]->(c:Company {companyId: "COM-001"})
 
// CAUTION: Merge relationship between existing nodes
// DO THIS (correct - match nodes first):
MATCH (a:Person {personId: "PER-001"})
MATCH (c:Company {companyId: "COM-001"})
MERGE (a)-[:WORKS_FOR]->(c)
 
// NOT THIS (may create duplicate nodes):
MERGE (a:Person {name: "Alice"})-[:WORKS_FOR]->(c:Company {name: "TechCorp"})
 
// ========================================
// SET - Updating Data
// ========================================
 
// Update single property
MATCH (p:Person {personId: "PER-001"})
SET p.email = "alice.chen@techcorp.com"
RETURN p
 
// Update multiple properties
MATCH (p:Person {personId: "PER-001"})
SET p.age = 33, p.lastModified = datetime()
RETURN p
 
// Replace all properties (destructive!)
MATCH (p:Person {personId: "PER-001"})
SET p = {personId: "PER-001", name: "Alice Chen", age: 33}
RETURN p
 
// Add to existing properties (non-destructive, recommended)
MATCH (p:Person {personId: "PER-001"})
SET p += {phone: "+1-555-0123", verified: true}
RETURN p
 
// Add/remove labels
MATCH (p:Person {personId: "PER-001"})
SET p:Manager, p:TechLead
REMOVE p:Employee
RETURN labels(p)
 
// ========================================
// REMOVE - Removing Properties/Labels
// ========================================
 
// Remove a property
MATCH (p:Person {personId: "PER-001"})
REMOVE p.temporaryField
RETURN p
 
// Remove a label
MATCH (p:Person:Manager {personId: "PER-001"})
REMOVE p:Manager
RETURN labels(p)
 
// ========================================
// DELETE - Removing Nodes/Relationships
// ========================================
 
// Delete a relationship
MATCH (a:Person {personId: "PER-001"})-[r:WORKS_FOR]->(c:Company)
DELETE r
 
// Delete a node (must have no relationships)
MATCH (p:Person {personId: "TEMP-001"})
DELETE p
 
// Delete node and ALL connected relationships (DETACH DELETE)
MATCH (p:Person {personId: "TEMP-001"})
DETACH DELETE p
 
// Delete multiple nodes by condition
MATCH (p:Person)
WHERE p.isInactive = true AND p.lastSeen < date() - duration('P1Y')
DETACH DELETE p
 
// ========================================
// RETURNING RESULTS
// ========================================
 
// Return specific properties
MATCH (p:Person)-[r:WORKS_FOR]->(c:Company)
RETURN p.name AS employee, c.name AS company, r.role AS position
 
// Return with computed fields
MATCH (p:Person)
RETURN p.name, 
       p.age, 
       CASE WHEN p.age >= 65 THEN "Senior" ELSE "Active" END AS status
 
// Aggregate results
MATCH (p:Person)-[:WORKS_FOR]->(c:Company)
RETURN c.name AS company, count(p) AS employeeCount
ORDER BY employeeCount DESC
 
// Return graph structure
MATCH path = (a:Person)-[:KNOWS*1..3]-(b:Person)
RETURN path

MERGE Pitfalls

Advanced Cypher Patterns

Beyond basic CRUD, Cypher provides sophisticated capabilities for complex graph analytics, path operations, and data transformation.

advanced-cypher.cypher

Cypher

// ========================================
// AGGREGATION AND GROUPING
// ========================================
 
// Basic aggregations
MATCH (p:Person)-[:WORKS_FOR]->(c:Company)
RETURN c.name AS company,
       count(p) AS employees,
       avg(p.salary) AS avgSalary,
       min(p.joinedAt) AS firstHire,
       max(p.salary) AS highestSalary,
       collect(p.name) AS employeeNames
 
// Count distinct values
MATCH (p:Person)-[:PURCHASED]->(product:Product)
RETURN p.name, count(DISTINCT product) AS uniqueProducts
 
// Conditional aggregation
MATCH (p:Person)
RETURN 
    count(CASE WHEN p.age < 30 THEN 1 END) AS under30,
    count(CASE WHEN p.age >= 30 AND p.age < 50 THEN 1 END) AS between30and50,
    count(CASE WHEN p.age >= 50 THEN 1 END) AS over50
 
// ========================================
// WITH CLAUSE - Query Chaining
// ========================================
 
// Pipeline query stages
MATCH (p:Person)-[:WORKS_FOR]->(c:Company {name: "TechCorp"})
WITH p, c
WHERE p.age > 30
WITH p ORDER BY p.salary DESC LIMIT 10
MATCH (p)-[:KNOWS]->(colleague:Person)
RETURN p.name, collect(colleague.name) AS topColleagues
 
// Subquery-like behavior
MATCH (dept:Department)
CALL {
    WITH dept
    MATCH (dept)<-[:BELONGS_TO]-(e:Employee)
    RETURN count(e) AS empCount
}
RETURN dept.name, empCount
ORDER BY empCount DESC
 
// ========================================
// PATH OPERATIONS
// ========================================
 
// All shortest paths between nodes
MATCH paths = allShortestPaths(
    (alice:Person {name: "Alice"})-[:KNOWS*]-(bob:Person {name: "Bob"})
)
RETURN paths, length(paths) AS hops
 
// Variable-length path with filters
MATCH path = (start:Person {name: "Alice"})-[:KNOWS*1..6]-(end:Person)
WHERE ALL(node IN nodes(path) WHERE node.age > 25)
  AND ALL(rel IN relationships(path) WHERE rel.since > date("2020-01-01"))
RETURN end.name, length(path) AS distance
 
// Path operations
MATCH path = (a:Person)-[:KNOWS*2..4]->(b:Person)
RETURN 
    nodes(path) AS pathNodes,
    relationships(path) AS pathRels,
    length(path) AS hops,
    [n IN nodes(path) | n.name] AS names
 
// ========================================
// LIST OPERATIONS
// ========================================
 
// List comprehension
MATCH (p:Person)
RETURN p.name, [skill IN p.skills WHERE skill CONTAINS "Python" | skill] AS pythonSkills
 
// UNWIND - expand lists to rows
WITH ["Alice", "Bob", "Carol"] AS names
UNWIND names AS name
MERGE (p:Person {name: name})
RETURN p
 
// Pattern from list
MATCH (p:Product)
WHERE p.productId IN ["PROD-001", "PROD-002", "PROD-003"]
RETURN p.name
 
// Collect and process
MATCH (c:Customer)-[:PURCHASED]->(p:Product)
WITH c, collect(p) AS products
WHERE size(products) >= 3
RETURN c.name, [prod IN products | prod.name] AS purchasedProducts
 
// ========================================
// GRAPH DATA SCIENCE PROCEDURES
// ========================================
 
// PageRank (requires GDS library)
CALL gds.pageRank.stream('myGraph')
YIELD nodeId, score
RETURN gds.util.asNode(nodeId).name AS name, score
ORDER BY score DESC
LIMIT 10
 
// Community detection (Louvain)
CALL gds.louvain.stream('socialGraph')
YIELD nodeId, communityId
RETURN communityId, collect(gds.util.asNode(nodeId).name) AS members
ORDER BY size(members) DESC
 
// Shortest path (Dijkstra)
MATCH (start:City {name: "New York"}), (end:City {name: "Los Angeles"})
CALL gds.shortestPath.dijkstra.stream('roadNetwork', {
    sourceNode: start,
    targetNode: end,
    relationshipWeightProperty: 'distance'
})
YIELD totalCost, nodeIds, costs
RETURN totalCost, [nodeId IN nodeIds | gds.util.asNode(nodeId).name] AS path
 
// ========================================
// APOC UTILITIES (common procedures)
// ========================================
 
// Batch processing
CALL apoc.periodic.iterate(
    "MATCH (p:Person) WHERE p.needsUpdate = true RETURN p",
    "SET p.processed = true, p.processedAt = datetime()",
    {batchSize: 1000, parallel: true}
)
 
// Dynamic relationship creation
MATCH (a:Person {name: "Alice"}), (b:Person {name: "Bob"})
CALL apoc.create.relationship(a, "CUSTOM_REL", {weight: 0.5}, b)
YIELD rel
RETURN rel
 
// Load JSON data
CALL apoc.load.json("https://api.example.com/users.json")
YIELD value
MERGE (p:Person {userId: value.id})
SET p.name = value.name, p.email = value.email

Performance Tips for Complex Queries

Indexes and Constraints

Proper indexing is critical for Neo4j performance. Without indexes, MATCH clauses with property filters require full label scans—acceptable for development but prohibitive at scale.

indexes-constraints.cypher

Cypher

// ========================================
// INDEX CREATION (Neo4j 5.x syntax)
// ========================================
 
// Range index (default, most common)
CREATE INDEX person_email FOR (p:Person) ON (p.email)
 
// Named index with options
CREATE INDEX person_id_idx IF NOT EXISTS 
FOR (p:Person) ON (p.personId)
OPTIONS {
    indexConfig: {
        `spatial.wgs-84.min`: [-180.0, -90.0],
        `spatial.wgs-84.max`: [180.0, 90.0]
    }
}
 
// Composite index (multiple properties)
CREATE INDEX product_category_status 
FOR (p:Product) ON (p.category, p.status)
 
// Text index (for text search operations)
CREATE TEXT INDEX person_bio FOR (p:Person) ON (p.bio)
 
// Full-text index (Lucene-backed)
CREATE FULLTEXT INDEX product_search 
FOR (p:Product) ON EACH [p.name, p.description, p.tags]
OPTIONS {
    indexConfig: {
        `fulltext.analyzer`: "english"
    }
}
 
// Point index (spatial queries)
CREATE POINT INDEX location_coords 
FOR (l:Location) ON (l.coordinates)
 
// Relationship property index (Neo4j 5.7+)
CREATE INDEX rel_since FOR ()-[w:WORKS_FOR]-() ON (w.since)
 
// ========================================
// CONSTRAINT CREATION
// ========================================
 
// Unique constraint (creates index automatically)
CREATE CONSTRAINT person_email_unique IF NOT EXISTS
FOR (p:Person) REQUIRE p.email IS UNIQUE
 
// Node key constraint (composite uniqueness + existence)
CREATE CONSTRAINT order_key
FOR (o:Order) REQUIRE (o.orderId, o.region) IS NODE KEY
 
// Existence constraint
CREATE CONSTRAINT person_name_required
FOR (p:Person) REQUIRE p.name IS NOT NULL
 
// Type constraint (Neo4j 5.9+)
CREATE CONSTRAINT person_age_integer
FOR (p:Person) REQUIRE p.age IS :: INTEGER
 
// Relationship property existence
CREATE CONSTRAINT employment_since_required
FOR ()-[w:WORKS_FOR]-() REQUIRE w.since IS NOT NULL
 
// ========================================
// INDEX MANAGEMENT
// ========================================
 
// List all indexes
SHOW INDEXES
 
// List with details
SHOW INDEXES WHERE type = 'RANGE'
YIELD name, type, entityType, labelsOrTypes, properties, state
 
// Drop index
DROP INDEX person_email
 
// Drop constraint (removes associated index)
DROP CONSTRAINT person_email_unique
 
// Check index usage in query
PROFILE MATCH (p:Person {email: "alice@example.com"}) RETURN p
 
// Example output shows "NodeIndexSeek" when index is used
// vs "NodeByLabelScan" + "Filter" when not indexed
 
// ========================================
// PERFORMANCE DIAGNOSTICS
// ========================================
 
// Explain query plan (without executing)
EXPLAIN MATCH (p:Person)-[:WORKS_FOR]->(c:Company)
WHERE p.email STARTS WITH "alice"
RETURN p.name, c.name
 
// Profile query (executes and shows db hits)
PROFILE MATCH (p:Person)-[:WORKS_FOR]->(c:Company)
WHERE p.email STARTS WITH "alice"
RETURN p.name, c.name
 
// Monitor index population status
SHOW INDEXES WHERE state <> 'ONLINE'

Index Type Selection Guide
Query Pattern	Index Type	When to Use
Exact match, range queries	Range (default)	Most lookup patterns
Multi-property lookup	Composite	WHERE a.x = ? AND a.y = ?
Text CONTAINS/ENDS WITH	Text	Substring matching
Natural language search	Full-text	Search box functionality
Distance/bounding box	Point	Geo-spatial queries
Label existence check	Token lookup	Fast `MATCH (n:Label)` scans

Index Overhead Considerations

Driver Integration

Neo4j provides official drivers for major programming languages, all using the Bolt binary protocol for efficient, encrypted communication.

neo4j_python.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
# Neo4j Python Driver
# pip install neo4j
 
from neo4j import GraphDatabase
from contextlib import contextmanager
 
class Neo4jConnection:
    def __init__(self, uri: str, user: str, password: str):
        self._driver = GraphDatabase.driver(uri, auth=(user, password))
    
    def close(self):
        self._driver.close()
    
    @contextmanager
    def session(self, database: str = "neo4j"):
        session = self._driver.session(database=database)
        try:
            yield session
        finally:
            session.close()
 
    # Transaction functions are recommended for production
    def create_person(self, person_id: str, name: str, email: str):
        def _create(tx, person_id, name, email):
            result = tx.run("""
                MERGE (p:Person {personId: $personId})
                ON CREATE SET p.name = $name, p.email = $email, 
                              p.createdAt = datetime()
                RETURN p
            """, personId=person_id, name=name, email=email)
            return result.single()
        
        with self.session() as session:
            return session.execute_write(_create, person_id, name, email)
    
    def find_connections(self, person_id: str, max_hops: int = 3):
        def _find(tx, person_id, max_hops):
            result = tx.run("""
                MATCH path = (start:Person {personId: $personId})
                             -[:KNOWS*1..$maxHops]-
                             (connected:Person)
                WHERE start <> connected
                RETURN DISTINCT connected.name AS name,
                       length(path) AS distance
                ORDER BY distance
            """, personId=person_id, maxHops=max_hops)
            return [dict(record) for record in result]
        
        with self.session() as session:
            return session.execute_read(_find, person_id, max_hops)
 
# Usage
if __name__ == "__main__":
    conn = Neo4jConnection(
        uri="bolt://localhost:7687",
        user="neo4j",
        password="password"
    )
    
    try:
        # Create nodes
        conn.create_person("P001", "Alice Chen", "alice@example.com")
        conn.create_person("P002", "Bob Singh", "bob@example.com")
        
        # Query connections
        connections = conn.find_connections("P001", max_hops=2)
        for c in connections:
            print(f"{c['name']} - {c['distance']} hops away")
    finally:
        conn.close()

Driver Best Practices

Deployment Options

Neo4j offers deployment options ranging from local development to managed cloud services for enterprise production workloads.

Neo4j Deployment Comparison
Option	Best For	Features	Considerations
Neo4j Desktop	Development, learning, prototyping	Free, local graphs, multiple databases, plugin management	Single machine, no production support
Community Edition	Small projects, single-server production	Open source, full Cypher, APOC support	Single instance, no clustering, 34B node limit
Enterprise Edition	Production workloads, high availability	Clustering, causal consistency, advanced security, online backup	Commercial license required
AuraDB Free	Learning, small prototypes	Managed, 200K nodes, automatic suspend	Limited resources, pauses after 3 days inactive
AuraDB Professional	Production SaaS applications	Managed, auto-scaling, backups, SLA	Pay per use, limited configuration
AuraDB Enterprise	Enterprise/mission-critical	Dedicated infrastructure, VPC, SSO, compliance	Premium pricing, custom contracts

docker-deployment.yml
YAML
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
# Docker Compose for Neo4j Development
version: '3.8'
 
services:
  neo4j:
    image: neo4j:5.15-community
    container_name: neo4j-dev
    ports:
      - "7474:7474"   # HTTP (Browser)
      - "7687:7687"   # Bolt (Driver)
    environment:
      # Authentication
      - NEO4J_AUTH=neo4j/development_password
      
      # Memory configuration
      - NEO4J_dbms_memory_pagecache_size=1G
      - NEO4J_dbms_memory_heap_initial__size=512M
      - NEO4J_dbms_memory_heap_max__size=1G
      
      # Enable APOC
      - NEO4J_PLUGINS=["apoc"]
      - NEO4J_dbms_security_procedures_unrestricted=apoc.*
      
      # Allow CSV import from URL
      - NEO4J_dbms_security_allow__csv__import__from__file__urls=true
    
    volumes:
      - neo4j_data:/data
      - neo4j_logs:/logs
      - neo4j_import:/var/lib/neo4j/import
    
    healthcheck:
      test: ["CMD", "wget", "-q", "--spider", "http://localhost:7474"]
      interval: 30s
      timeout: 10s
      retries: 5
 
volumes:
  neo4j_data:
  neo4j_logs:
  neo4j_import:
 
# Run: docker-compose up -d
# Access: http://localhost:7474 (Neo4j Browser)

Memory Sizing Guidelines

Summary: Neo4j Example

We've thoroughly explored Neo4j—from architecture through practical implementation. Let's consolidate the key insights:

Key Takeaways

•Native graph storage delivers performance — Neo4j's fixed-size record stores and index-free adjacency enable O(1) relationship traversal regardless of total database size.
•Cypher is intuitive and powerful — ASCII-art pattern syntax makes graph queries readable, while advanced features (variable-length paths, aggregation, APOC) handle complex analytics.
•MERGE is for upserts, not pattern matching — Always MATCH existing nodes before MERGEing relationships to avoid accidental node creation.
•Indexes are essential for performance — Create range indexes on properties used in WHERE clauses; full-text indexes for search functionality; point indexes for spatial queries.
•Transaction functions provide resilience — Use executeRead/executeWrite for automatic retry on transient failures in driver code.
•Deployment options span the spectrum — From local Docker for development to AuraDB Enterprise for mission-critical production workloads.

What's next:

Page Complete

3 / 5