Loading system design...
Design a content moderation system that automatically detects and acts on policy-violating content (text, images, videos, audio) across a social platform at scale. The system uses ML models (NLP transformers for text, CNNs for images, frame-sampling + ASR for video) for multi-label classification, perceptual hash matching (PhotoDNA) for known-violating content, a configurable policy engine with region/content-type-specific thresholds, a priority-based human review queue with moderator assignment and well-being features, and handles adversarial evasion, appeals, and regulatory compliance (GDPR, DSA, CSAM reporting).
| Metric | Value |
|---|---|
| Content items moderated per day | 3+ billion |
| Pipeline throughput | 35,000+ items/sec |
| Text classification latency | < 100ms |
| Image classification latency | < 500ms |
| Video classification latency | ~2s per minute of video |
| Hash check latency | < 50ms |
| Auto-moderation rate | 95%+ (only 5% needs human review) |
| Human moderators | 15,000+ |
| Moderator throughput | ~200 items/day |
| False positive rate (target) | < 1% |
| Policy categories | 10+ (hate, nudity, violence, CSAM, spam, etc.) |
| Languages supported | 50+ |
Multi-modal content analysis: moderate text (posts, comments, messages, bios), images (photos, memes, profile pictures), videos (uploaded clips, live streams), and audio (voice messages, podcast uploads); detect policy violations across all content types in near real-time
Automated ML-based detection: classify content against policy categories — hate speech, nudity/sexual content, violence/gore, spam/scam, self-harm/suicide, terrorism/extremism, misinformation, harassment/bullying, child exploitation (CSAM), copyright infringement; output: violation category + confidence score (0–1)
Policy-based decision engine: given ML confidence scores, apply platform policies to decide action: AUTO_APPROVE (score < 0.3), SEND_TO_REVIEW (0.3–0.85), AUTO_REMOVE (score > 0.85); thresholds configurable per category and per region (different legal requirements); escalation rules for high-severity content (CSAM → immediate removal + law enforcement report)
Human review queue: content flagged for human review routed to a queue; moderators see content + ML signals + context (user history, reports); moderator actions: approve, remove, escalate, warn user, restrict account; prioritise queue by severity (CSAM/terrorism first, spam last)
User reporting: users report content with reason (hate speech, spam, nudity, etc.); reports aggregated — if multiple users report the same content, priority increases; reporter notified of outcome; false reporters tracked to prevent abuse of reporting system
Appeals process: users whose content is removed can appeal the decision; appeal routed to a different (senior) moderator for unbiased re-review; appeal outcome: uphold removal or restore content; SLA: 24-hour response for appeals
Proactive vs reactive moderation: proactive — scan all uploaded content before or immediately after publication (pre-publish for high-risk categories); reactive — act on user reports and trending signals; live streams moderated in real-time with frame sampling and audio transcription
Content fingerprinting / hashing: compute perceptual hashes (pHash, PhotoDNA for CSAM) of images and videos; compare against database of known violating content; exact and near-duplicate matching; enables instant removal of previously identified violating content without re-running ML
Contextual and behavioural signals: consider context beyond content — user's account age, history of violations, follower count, content engagement velocity (viral content needs faster review); network analysis — coordinated inauthentic behaviour (bot farms, brigading) detection
Transparency and compliance reporting: generate transparency reports (content removed by category, by region, government requests, appeal outcomes); comply with regulations — GDPR (right to erasure), DSA (Digital Services Act — EU, systemic risk assessment), COPPA (children's safety); audit trail for every moderation decision
Non-functional requirements define the system qualities critical to your users. Frame them as 'The system should be able to...' statements. These will guide your deep dives later.
Think about CAP theorem trade-offs, scalability limits, latency targets, durability guarantees, security requirements, fault tolerance, and compliance needs.
Frame NFRs for this specific system. 'Low latency search under 100ms' is far more valuable than just 'low latency'.
Add concrete numbers: 'P99 response time < 500ms', '99.9% availability', '10M DAU'. This drives architectural decisions.
Choose the 3-5 most critical NFRs. Every system should be 'scalable', but what makes THIS system's scaling uniquely challenging?