What Interviewers Look For - Learning Module

Loading content...

0/273

System Design Knowledge

The Knowledge That Powers Design Decisions

Problem-solving ability provides the framework for thinking through system design challenges. But frameworks need content—you can't design what you don't understand. System design knowledge is the raw material from which designs are built.

When interviewers evaluate your system design knowledge, they're not checking whether you've memorized CAP theorem or can recite the properties of consistent hashing. They're assessing whether you possess working knowledge—the kind of knowledge that activates when you encounter a real design decision, that connects to other concepts, and that you can apply in novel contexts.

This distinction matters profoundly. Candidates with surface-level knowledge can define terms but falter when asked to apply concepts. Candidates with working knowledge seamlessly integrate their understanding into design decisions, explaining not just what a concept is but why it matters for the problem at hand.

What You Will Learn

By the end of this page, you will understand the major knowledge domains that interviewers expect, how depth of understanding is evaluated, the difference between memorization and working knowledge, and how to demonstrate mastery authentically. You'll know precisely what technical breadth and depth to develop for system design interviews.

The System Design Knowledge Landscape

System design knowledge spans an enormous territory. No one masters everything—but interviewers expect fluency across a set of foundational domains. These domains form the building blocks from which any system can be constructed.

The core knowledge domains are:

Distributed Systems Fundamentals — The theoretical underpinnings: CAP theorem, consistency models, consensus protocols, network partitions, distributed clocks.
Storage Systems — Relational databases, NoSQL datastores, caching layers, object storage, database internals (indexing, transactions, replication, sharding).
Compute and Processing — Stateless services, workers, batch processing, stream processing, serverless architectures, container orchestration.
Networking and Communication — Load balancing, API gateways, DNS, CDNs, RPC vs messaging, synchronous vs asynchronous communication.
Reliability Engineering — Redundancy, failover, health checking, circuit breakers, rate limiting, graceful degradation, chaos engineering principles.
Scalability Patterns — Horizontal vs vertical scaling, caching strategies, partitioning, replication, database scaling patterns, eventual consistency implications.
Security Fundamentals — Authentication, authorization, encryption at rest and in transit, secrets management, threat modeling basics.

Depth Over Breadth

While you need breadth across these domains, you don't need expert-level depth in all of them. Interviewers expect you to have solid working knowledge across the board, with deeper expertise in a few areas. A Staff Engineer might have exceptional depth in distributed storage but only working knowledge of ML systems—and that's appropriate.

Knowledge Domains and Key Concepts
Domain	Core Concepts	Why It Matters in Interviews
Distributed Systems	CAP theorem, Consistency models, Consensus (Paxos/Raft), Clock synchronization	Every non-trivial system is distributed; understanding trade-offs here is foundational
Storage Systems	ACID, Indexing, Replication, Sharding, LSM trees vs B-trees	Data is central to most systems; storage decisions cascade through the design
Compute/Processing	Stateless design, Worker queues, Stream processing, Container orchestration	Computing at scale requires specific patterns; naive approaches don't scale
Networking	Load balancing algorithms, DNS resolution, CDN edge caching, Protocol trade-offs	Users connect through networks; networking decisions affect latency and availability
Reliability	Redundancy patterns, Circuit breakers, Health checks, Graceful degradation	Systems must handle failures; reliability knowledge prevents naive designs
Scalability	Horizontal scaling, Caching tiers, Partitioning strategies, Read replicas	Scale is the defining challenge; scalability patterns are essential vocabulary
Security	AuthN/AuthZ, Encryption, Token validation, Threat models	Security constraints shape architecture; basic security knowledge is expected

Working Knowledge vs. Memorization

The distinction between working knowledge and memorization is perhaps the most important concept on this page. Interviewers can instantly tell the difference, and it dramatically affects their evaluation.

Memorization looks like:

Reciting definitions without connecting them to the problem
Using technical terms correctly but being unable to explain implications
Suggesting approaches because 'that's what the resources say' without reasoning
Struggling when asked follow-up questions that probe deeper

Working knowledge looks like:

Applying concepts precisely where they're relevant in the design
Explaining why a trade-off matters for this specific problem
Connecting concepts to each other naturally ('Since we chose eventual consistency here, we need to consider how to handle conflicts...')
Engaging thoughtfully with follow-up questions, sometimes saying 'I'm not certain about that specific detail, but my understanding is...'

Memorization Signal

•'CAP theorem says you can't have all three of consistency, availability, and partition tolerance.'
•(Asked: How does CAP apply to your design?) 'Well... we should probably choose availability over consistency?'
•No connection to the specific problem constraints
•Cannot explain what the choice actually means in practice

Working Knowledge Signal

•'For this real-time messaging system, partitions will happen. When they do, we must choose between rejecting writes (strong consistency) or accepting writes on both sides (availability with eventual consistency).'
•'Given our requirement for 99.99% uptime and the tolerance for messages being eventually ordered, I'd lean toward availability. This means we need a conflict resolution strategy—last-write-wins with vector clocks could work, but we should discuss the user experience implications.'
•Connects directly to stated requirements
•Acknowledges implications and downstream decisions

How to Build Working Knowledge

Working knowledge comes from applying concepts, not just reading about them. For each concept you study, ask: 'In what situations would I use this? What would go wrong if I didn't? What are the alternatives, and when would I prefer them?' Better yet, review real-world system architectures (company engineering blogs, conference talks) and identify how these concepts manifest in practice.

Distributed Systems: The Foundational Domain

Every interesting system design problem involves distribution. Users are distributed globally. Data must be replicated for reliability. Processing must be parallelized for scale. Distributed systems knowledge is non-negotiable.

The core concepts interviewers expect you to understand deeply include:

Essential Distributed Systems Concepts

•CAP Theorem and its Implications — Not just the theorem statement, but what it means practically: in the presence of partitions, you choose between consistency (all nodes see the same data) and availability (all requests get a response). Know that 'CA' systems don't exist in practice because partitions are unavoidable.
•Consistency Models Spectrum — From linearizability (strongest) through sequential consistency, causal consistency, eventual consistency, and weaker models. Understand what each means for clients and when each is appropriate.
•Consensus Protocols — Paxos and Raft at a conceptual level: they solve the problem of getting distributed nodes to agree on a value even when some nodes fail. Know that consensus is expensive (multiple round trips, quorum requirements) and why this matters for performance.
•Replication Strategies — Leader-follower, multi-leader, leaderless (quorum-based). Understand the trade-offs: leader-follower is simpler but has a write bottleneck; multi-leader adds complexity with conflicts; leaderless offers availability but requires coordination.
•Network Partitions and Failure Modes — Understand that distributed systems fail partially: some nodes are reachable, others aren't. Designs must handle this gracefully—messages are lost, duplicated, or reordered.
•Distributed Clocks and Ordering — Physical clocks drift and can't be trusted across nodes. Logical clocks (Lamport, vector clocks) establish ordering without relying on synchronized time. Hybrid logical clocks combine both approaches.

Common Distributed Systems Mistakes

Candidates often stumble by: (1) Treating the network as reliable when it isn't; (2) Assuming clocks are synchronized when they diverge; (3) Ignoring partial failure scenarios where some components work and others don't; (4) Underestimating the cost of coordination across nodes. These assumptions lead to designs that fail under real-world conditions.

Demonstrating distributed systems mastery:

In an interview, you demonstrate distributed systems knowledge by:

Proactively identifying where CAP trade-offs apply in your design
Choosing consistency models appropriately for each data type (e.g., 'User profile updates can be eventually consistent, but financial transactions require linearizability')
Recognizing when consensus is needed and acknowledging its cost
Designing for partial failures rather than assuming all-or-nothing availability
Using appropriate logical ordering mechanisms when temporal ordering matters

Storage Systems: Where Data Lives

Data is central to every system, and storage decisions profoundly impact every other aspect of the design. Interviewers pay particular attention to your storage knowledge because poor storage choices cascade into performance problems, scaling limitations, and operational headaches.

The storage knowledge you need:

Storage Technologies and When to Use Them
Storage Type	Characteristics	Ideal Use Cases
Relational (PostgreSQL, MySQL)	ACID transactions, structured schema, complex queries, mature tooling	Transactional systems, complex relationships, reporting needs
Document (MongoDB, DynamoDB)	Flexible schema, nested documents, horizontal scaling	Evolving data models, read-heavy workloads, hierarchical data
Key-Value (Redis, Memcached)	Extreme speed, simple operations, in-memory option	Caching, session storage, real-time counters
Wide-Column (Cassandra, HBase)	Massive write throughput, time-series optimization	Time-series data, IoT events, high-write workloads
Graph (Neo4j, Neptune)	Relationship-first model, traversal queries	Social networks, recommendation engines, knowledge graphs
Object Storage (S3, GCS)	Infinite scale, blob storage, cheap at rest	Media files, backups, data lake storage
Search Engines (Elasticsearch)	Full-text search, aggregations, log analysis	Search features, logging, analytics

Beyond choosing databases—understanding internals:

Interviewers value candidates who understand why storage systems behave as they do:

B-trees vs LSM trees: B-trees (used in PostgreSQL, traditional RDBMS) optimize for read-heavy workloads with random writes. LSM trees (used in Cassandra, RocksDB) optimize for write-heavy workloads by sequentializing writes. Knowing this helps you choose databases appropriately.
Indexing fundamentals: Indexes speed up reads but slow down writes. Composite indexes have ordering implications. Full-text indexes use inverted structures. Understanding indexes helps you make schema decisions.
Transaction isolation levels: From READ UNCOMMITTED through SERIALIZABLE, each level offers different trade-offs between consistency and performance. Know what anomalies each level allows and when to use stricter vs. relaxed isolation.
Replication and consistency: Synchronous replication guarantees consistency but adds latency. Asynchronous replication is faster but risks data loss. Read replicas introduce replication lag that affects query semantics.

Demonstrating Storage Mastery

Don't just say 'we'll use PostgreSQL.' Explain why: 'We need ACID transactions for the payment flow, complex queries for the admin dashboard, and the data model is highly relational. PostgreSQL fits these requirements. For the activity feed, which is write-heavy and can tolerate eventual consistency, I'd consider Cassandra or a similar wide-column store.'

Scalability and Performance: Designing for Growth

Scalability is often the central challenge in system design interviews. The question isn't 'can you build a system that works?' but 'can you build a system that works at scale?'

Core scalability concepts:

Scalability Fundamentals

•Horizontal vs Vertical Scaling — Vertical scaling (bigger machines) is simpler but has hard limits and creates single points of failure. Horizontal scaling (more machines) is harder architecturally but offers near-unlimited growth and redundancy. Almost all large systems rely on horizontal scaling.
•Statelessness Enables Scaling — Services that store no local state can be replicated trivially—just add more instances behind a load balancer. Services with local state require sticky sessions, distributed coordination, or state externalization.
•Database Scaling is the Hard Part — While compute scales horizontally with minimal effort, databases require replication (for read scale) and sharding (for write scale). Sharding introduces complexity: cross-shard queries, transactions, and data movement challenges.
•Caching Creates Read Scale — Caching is the most impactful technique for read-heavy workloads. It shifts load from expensive database queries to cheap memory lookups. But caching introduces consistency challenges and cache invalidation complexity.
•Asynchronous Processing Creates Write Scale — Accepting writes into a queue and processing them asynchronously decouples user-facing latency from processing time. This enables write bursts that exceed processing capacity short-term.

Performance characteristics you should know:

Latency hierarchy: L1 cache (~1ns) → L2 cache (~4ns) → RAM (~100ns) → SSD (~100μs) → HDD (~10ms) → Network round-trip (~1-100ms depending on distance). This hierarchy explains why caching matters and why network calls are expensive.
Throughput vs Latency: Systems can be optimized for high throughput (many requests per second) or low latency (fast individual responses). Sometimes these conflict—batching improves throughput but may add latency.
Amdahl's Law implications: The speedup from parallelization is limited by the serial portion of the workload. If 10% of your work is inherently sequential, you can never achieve more than 10x speedup regardless of parallelization.
Little's Law: For a stable system, L = λW (L = items in system, λ = arrival rate, W = time in system). This helps reason about queue sizes, service times, and capacity planning.

Scalability Anti-Patterns to Avoid

Common interview mistakes: (1) Adding more servers without explaining how they coordinate; (2) Ignoring database bottlenecks while only scaling compute; (3) Using synchronous processing for high-volume ingest; (4) Assuming caching solves all problems without addressing invalidation; (5) Not considering the cost of horizontal distribution (network latency, data movement).

Reliability and Availability: Building Systems That Stay Up

Scalability matters, but systems that scale but crash aren't useful. Reliability and availability are equally critical dimensions that interviewers evaluate.

Key reliability concepts:

Availability Levels and Their Meaning
Availability	Annual Downtime	Monthly Downtime	Practical Meaning
99.9% (three nines)	8.76 hours	43.8 minutes	Acceptable for non-critical internal tools
99.95%	4.38 hours	21.9 minutes	Standard for most consumer applications
99.99% (four nines)	52.6 minutes	4.38 minutes	High-availability requirements, financial services
99.999% (five nines)	5.26 minutes	26.3 seconds	Critical infrastructure, telecom, life-safety systems

Patterns for achieving reliability:

Redundancy at every layer: Single points of failure (SPOFs) are the enemy of availability. Every critical component should have redundant instances—load balancers, application servers, database replicas, even multiple data centers.
Health checking and automatic failover: Systems must detect when components fail and route around them. Load balancers perform health checks; database clusters promote replicas when leaders fail; circuit breakers stop calling unhealthy services.
Graceful degradation: When parts of a system fail or become overloaded, the system should degrade gracefully rather than fail completely. Disable non-essential features to preserve core functionality.
Blast radius containment: Failures should be isolated to prevent cascade. Use bulkheads to separate critical paths, implement timeouts to prevent hanging, and shed load before resources exhaust completely.
Chaos engineering mindset: Assume failures will happen and design for them. Proactively inject failures in non-production environments to discover weaknesses before they cause incidents.

Demonstrating Reliability Thinking

In interviews, proactively address failure scenarios: 'What happens if the database goes down? We have a read replica that can be promoted. What if the entire region fails? We replicate to a secondary region with a recovery time objective of 15 minutes.' Interviewers notice when candidates think about reliability without prompting—it signals production experience.

Networking and Communication: How Components Talk

Systems are composed of many components—but those components must communicate. Networking and communication patterns form another essential knowledge domain.

Communication paradigms:

Synchronous Communication

•Request-response (HTTP/REST): Simple, well-understood, widely supported. Best for user-facing APIs and direct service calls.
•gRPC: Binary protocol with strong typing, efficient serialization, streaming support. Best for internal microservice communication.
•Trade-offs: Tight coupling, caller blocks waiting for response, cascade failures when downstream is slow.

Asynchronous Communication

•Message queues (Kafka, SQS): Decouple producers from consumers. Enables buffering, replay, multiple consumers.
•Event-driven architecture: Services emit events; other services react. Loose coupling, high scalability.
•Trade-offs: Complexity in error handling, eventual consistency, harder debugging.

Networking components you should understand:

Load balancers: Distribute traffic across instances. L4 (TCP) vs L7 (HTTP) balancing. Algorithms: round-robin, least connections, consistent hashing, weighted.
API gateways: Entry point for external traffic. Handle authentication, rate limiting, routing, protocol translation. Important for microservices architectures.
CDNs: Cache static (and increasingly dynamic) content at edge locations. Reduce latency for global users, offload origin traffic. Understand TTLs, cache invalidation, edge compute.
Service discovery: How services find each other in dynamic environments. DNS-based, registry-based (Consul, Eureka), Kubernetes built-in. Essential for container orchestration.
DNS: The internet's address book. Understand TTL implications, DNS-based failover, latency-based routing, GeoDNS for multi-region.

When to Use Sync vs Async

Use synchronous communication when: the user needs an immediate response; operations must be transactional; the dependency is highly reliable. Use asynchronous communication when: the operation can be completed later; you need to decouple services; you're dealing with high-volume events; you need guaranteed delivery even if consumers are temporarily down.

How Interviewers Assess Knowledge Depth

Interviewers use several techniques to evaluate whether your knowledge is superficial or deep:

Probing questions: After you make a design choice, expect follow-up questions that test the depth of your understanding:

'Why did you choose X over Y?'
'What are the trade-offs of this approach?'
'What happens if [failure scenario] occurs?'
'How would this scale to 10x the load?'
'Can you explain how [component] works internally?'

Increasing precision requests: The interviewer may ask you to be more specific:

'What do you mean by "consistent"? What consistency model?'
'When you say the cache "handles it," can you describe the cache invalidation strategy?'
'You mentioned sharding—what's your sharding key, and what happens for queries that span shards?'

Behaviors That Signal Depth

•Spontaneously mentioning trade-offs: 'We'll use read replicas for scale, but this introduces replication lag, so recent writes may not be immediately visible.'
•Connecting concepts: 'Since we chose eventual consistency, we need to handle conflicts. I'd suggest vector clocks for ordering with last-writer-wins as the merge strategy.'
•Acknowledging limits: 'I'm not certain about the exact replication protocol Cassandra uses, but I know it's a leaderless system with tunable consistency—I'd verify the specifics before implementing.'
•Real-world references: 'Netflix faced a similar challenge and used [approach]. Our context differs because [reason], so I'd adapt it like this.'
•Asking clarifying questions that reveal knowledge: 'Is this a hot-spot scenario like celebrity accounts on Twitter, where a few users have disproportionate activity?'

What Signals Shallow Knowledge

Red flags include: using buzzwords without explanation; inability to answer 'why' questions; defensiveness when probed; mixing up related concepts (e.g., confusing replication with sharding); overconfidence about things you don't actually understand. It's far better to say 'I don't know the specifics here' than to bluff and get caught.

Summary: Building and Demonstrating System Design Knowledge

We've explored the second dimension of what interviewers evaluate: system design knowledge. Let's consolidate the key insights:

Key Takeaways

•Breadth across domains, depth where it matters — Know the major domains (distributed systems, storage, networking, reliability, scalability) with working fluency and develop deeper expertise in a few areas.
•Working knowledge > memorization — Understanding why and when matters far more than reciting definitions. Concepts should activate naturally in the context of design decisions.
•Distributed systems knowledge is foundational — CAP theorem, consistency models, consensus, replication, and failure modes underpin every non-trivial system design.
•Storage decisions cascade through the design — Choose databases based on access patterns, consistency needs, and scalability requirements, not familiarity.
•Scalability requires thinking beyond compute — Database scaling, caching strategies, and asynchronous processing are often the real challenges.
•Reliability needs proactive design — Redundancy, failover, graceful degradation, and blast radius containment don't happen by accident.
•Communication patterns shape architecture — Choose sync vs async based on latency requirements, coupling tolerance, and reliability needs.
•Depth is assessed through probing — Prepare for follow-up questions that test whether your knowledge is surface-level or genuinely integrated.

What's next:

Problem-solving ability determines how you approach challenges; system design knowledge determines what tools you have available. The third dimension—Trade-off Analysis—is where these combine. Next, we'll explore how interviewers evaluate your ability to navigate competing concerns, make defensible choices, and articulate why one design is better than another for a given context.

Page Complete

You now understand the knowledge landscape that interviewers expect, the difference between memorization and working knowledge, and how depth is assessed. Building this knowledge base is a long-term investment—continue deepening your understanding through study, practice, and real-world experience.