Cloud Security Architecture - Learning Module

Loading content...

0/273

Identity and Access Management

The Keys to Your Cloud Kingdom

In 2020, a security researcher discovered that Tesla's Kubernetes dashboard was exposed to the internet without authentication. Attackers had been using Tesla's cloud infrastructure to mine cryptocurrency. The vulnerability wasn't in Kubernetes itself—it was an access control misconfiguration. This incident illustrates a universal truth: identity and access management (IAM) is the most critical security control in cloud computing.

Every interaction with cloud resources—every API call, every service invocation, every data access—is mediated by IAM. A single overly permissive policy can expose your entire infrastructure. A single misconfigured role can give attackers the keys to everything. Conversely, a well-designed IAM architecture creates defense in depth that can contain breaches and limit blast radius.

IAM is simultaneously the most powerful security tool and the most common source of security failures in cloud environments. Mastering it is non-negotiable for any cloud architect.

What You Will Learn

By the end of this page, you will understand IAM architectures across major cloud providers, the principle of least privilege and how to implement it, identity federation and single sign-on, service identities and machine-to-machine authentication, and common IAM anti-patterns that lead to breaches.

IAM Fundamentals: Authentication vs. Authorization

Before diving into cloud-specific implementations, we must establish clear definitions of the two distinct functions IAM provides:

Authentication: Proving who (or what) you are

"You claim to be 'admin@company.com'—prove it."
Involves credentials: passwords, MFA tokens, certificates, API keys
Answers the question: "Is this entity who they claim to be?"

Authorization: Determining what you can do

"You've proven you're 'admin@company.com'—now, what are you allowed to access?"
Involves policies, roles, permissions, and access control lists
Answers the question: "Is this authenticated entity allowed to perform this action on this resource?"

These two functions are often conflated, but they're fundamentally different. A strong password doesn't matter if the authenticated user has permission to do things they shouldn't. Precise permissions don't matter if anyone can claim any identity.

Authentication vs. Authorization
Aspect	Authentication	Authorization
Question	Who are you?	What can you do?
Mechanism	Credentials, MFA, certificates	Policies, roles, permissions
Failure mode	Impersonation, credential theft	Privilege escalation, over-permission
Point in flow	Before authorization	After authentication
Cloud examples	Login, assume role, federated identity	IAM policies, resource policies, ACLs

Identity Types in Cloud Environments:

Cloud IAM systems must handle multiple types of identities:

Human users — Employees, contractors, administrators who access cloud consoles or use CLI tools
Service accounts / IAM roles — Non-human identities for automated processes, applications, and services
Federated identities — Users authenticated by external identity providers (corporate AD, social logins)
Temporary credentials — Short-lived tokens issued for specific sessions or tasks
Anonymous access — Unauthenticated access to public resources (limited but sometimes necessary)

Each identity type requires different handling. Human users need MFA and password policies. Service accounts need tight scoping and rotation. Federated identities need trust relationships and attribute mapping. A mature IAM architecture addresses all of these.

The Root Account Danger

Every cloud account has a 'root' or 'owner' account with unlimited privileges. This account should be protected with MFA, used only for initial setup and emergency recovery, and never for day-to-day operations. Many breaches escalate because root credentials were compromised or left exposed.

The Principle of Least Privilege

The Principle of Least Privilege (PoLP) is the foundational concept of IAM security: grant only the minimum permissions necessary to perform a task, and nothing more. This principle is simple to state but challenging to implement—it requires ongoing effort, tooling, and organizational commitment.

Why Least Privilege Matters:

Limits Blast Radius — If credentials are compromised, attackers can only do what those credentials allow. An over-privileged service account with admin access gives attackers everything. A narrowly-scoped account limits damage.
Prevents Lateral Movement — Attackers who gain initial access try to move laterally to higher-value targets. Least privilege makes each hop harder, creating multiple opportunities to detect and stop intrusions.
Reduces Accidental Damage — Overly broad permissions enable costly mistakes. A developer with production delete permissions can accidentally destroy data. Least privilege protects against human error.
Simplifies Auditing — Narrow permissions create clear audit trails. When each identity has specific, documented permissions, anomalies stand out. Broad permissions make normal and malicious activity indistinguishable.

Implementing Least Privilege

•Start with zero permissions — Begin with no access and add only what's required. Never start with broad access and try to restrict later.
•Use specific resource ARNs — Instead of * for resources, specify exact resources or use tagging-based conditions to limit scope.
•Constrain actions precisely — Grant only the actions needed. If a function only reads from S3, don't grant s3:*; grant s3:GetObject.
•Apply time-based constraints — Use session duration limits and credential expiration. Temporary access should be genuinely temporary.
•Implement just-in-time access — For sensitive operations, require users to explicitly request elevated access with approval workflows.
•Review and revoke regularly — Permissions accumulate over time. Regular reviews should remove unused permissions before they become attack vectors.

The Permission Creep Problem:

In practice, permissions tend to expand over time—a phenomenon called permission creep:

A developer needs temporary access to debug an issue → gets added to a group
A new feature requires a new permission → gets added to an existing role
An old project is deprecated → but its permissions remain
An employee changes teams → retains old permissions while gaining new ones

Without active management, every identity eventually accumulates far more permissions than needed. Organizations must implement:

Regular access reviews — At least quarterly, ideally monthly for sensitive resources
Automated unused permission detection — Tools that flag permissions that haven't been used
Time-limited permissions — Permissions that automatically expire unless renewed
Separation of duties — No single identity can perform all steps of sensitive workflows

Least Privilege in Practice

A practical approach: when defining permissions for a new service or role, write down exactly what operations it needs to perform. Then grant those specific permissions. If the service fails with 'access denied,' add only the specific missing permission—don't broaden to fix problems.

AWS IAM Architecture

Amazon Web Services implements one of the most comprehensive IAM systems in cloud computing. Understanding AWS IAM in depth provides a model that transfers to other clouds, as the concepts are universal even if terminology differs.

Core AWS IAM Components:

AWS IAM Building Blocks

•Users — Individual identities with credentials. Each user can have passwords (for console), access keys (for API/CLI), and MFA devices. Users should represent individual humans, not shared accounts.
•Groups — Collections of users that share the same permissions. Groups simplify management—add a user to 'Developers' group instead of manually granting each developer permission.
•Roles — Identities that can be assumed by entities (users, services, external accounts). Roles don't have permanent credentials; they provide temporary security tokens when assumed.
•Policies — JSON documents that define permissions. Policies specify what actions are allowed or denied on which resources under what conditions.
•Identity Providers — External identity systems (SAML, OIDC) that can be trusted for authentication, enabling federation with corporate directories.

AWS IAM Policy Structure:

AWS policies use a JSON structure that, once understood, applies to nearly all AWS authorization decisions:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "AllowS3Read",
      "Effect": "Allow",
      "Action": [
        "s3:GetObject",
        "s3:ListBucket"
      ],
      "Resource": [
        "arn:aws:s3:::my-bucket",
        "arn:aws:s3:::my-bucket/*"
      ],
      "Condition": {
        "IpAddress": {
          "aws:SourceIp": "192.168.1.0/24"
        }
      }
    }
  ]
}

Policy Elements Explained:

Element	Purpose	Best Practice
Effect	Allow or Deny	Explicit denies override allows
Action	API operations	Specify exact actions, avoid wildcards
Resource	Target ARNs	Use specific ARNs or tag conditions
Condition	Contextual constraints	Add IP, time, MFA, tag conditions
Principal	Who the policy applies to	Used in resource policies

Policy Evaluation Logic:

AWS evaluates policies in a specific order:

Explicit Deny — Any explicit deny immediately stops evaluation with denial
Organizations SCPs — Service Control Policies provide account-wide guardrails
Resource Policies — Policies attached directly to resources (S3 bucket policies, KMS key policies)
Identity Policies — Policies attached to users, groups, or roles
Permission Boundaries — Maximum permissions an identity can have
Session Policies — Temporary additional constraints for assumed roles

If no policy explicitly allows an action, it's denied by default. This default-deny approach means permissions must be explicitly granted.

Roles Over Users for Machines

For EC2 instances, Lambda functions, and other AWS services, always use IAM roles rather than IAM user credentials. Roles provide temporary credentials that rotate automatically, eliminating the risk of long-lived access keys being exposed or forgotten in code.

IAM Patterns Across Cloud Providers

While AWS IAM provides a comprehensive reference model, Azure Active Directory (Entra ID) and Google Cloud IAM implement similar concepts with different abstractions. Understanding these differences is essential for multi-cloud and hybrid architectures.

Azure (Entra ID) Model

•Tenant — Top-level container, like AWS account
•Users & Groups — Similar to AWS, managed in Entra ID
•Service Principals — Application identities (like AWS roles)
•Managed Identities — Azure-assigned identities for resources
•RBAC Roles — Predefined roles: Owner, Contributor, Reader
•Scopes — Hierarchy: Management Group → Subscription → Resource Group → Resource

Google Cloud IAM Model

•Organization — Top-level, tied to Google Workspace domain
•Members — Users, groups, service accounts, domains
•Service Accounts — Machine identities (like AWS roles)
•Roles — Predefined, custom, and basic roles
•Policies — Bind members to roles at resource level
•Resource Hierarchy — Org → Folder → Project → Resource

IAM Concept Mapping Across Providers
Concept	AWS	Azure	GCP
Machine identity	IAM Role	Service Principal / Managed Identity	Service Account
Human identity	IAM User	Entra ID User	Cloud Identity User
Permission grouping	IAM Group, Role	Entra ID Group, RBAC Role	Group, Role
Policy attachment	Identity or Resource	Scope (hierarchical)	Resource (hierarchical)
Federation	SAML, OIDC Providers	B2B, B2C, SAML	Workforce Identity Federation
Cross-account access	Assume Role	Lighthouse, B2B	Cross-project IAM

Key Architectural Differences:

AWS: Policy-centric. Permissions are defined in JSON policy documents attached to identities or resources. Very flexible but can become complex with many policies interacting.

Azure: RBAC-centric. Built around predefined roles assigned at scopes. Integrates deeply with Entra ID (formerly Azure AD), making it natural for organizations already using Microsoft identity.

GCP: Resource-hierarchy-centric. Permissions inherit down the organization → folder → project → resource hierarchy. Policies are simple bindings of members to roles at each level.

Multi-Cloud IAM Strategies:

When operating across clouds, organizations typically:

Federate from a central IdP — Use one identity provider (corporate AD, Okta, etc.) and federate to all clouds. Users authenticate once and access any cloud.
Standardize role definitions — Create equivalent roles across clouds: 'Developer' in AWS ≈ 'Contributor' in Azure ≈ custom role in GCP with similar permissions.
Use cloud-native tools for each — Accept that tools differ but apply consistent principles: least privilege, MFA, regular review.
Centralize monitoring — Aggregate IAM audit logs from all clouds to detect suspicious patterns.

Identity Federation and Single Sign-On

Identity Federation allows users to authenticate with one system (the Identity Provider, or IdP) and access resources in another system (the Service Provider, or SP) without maintaining separate credentials. For cloud security, federation is essential—it keeps identity management centralized while enabling access to distributed cloud resources.

Why Federation Matters:

Single source of truth — User accounts exist in one place (corporate directory). No duplicate accounts to manage or forget.
Consistent security policies — Password policies, MFA requirements, and access reviews happen once, centrally.
Immediate offboarding — Disable a user in the IdP and they lose access to all federated systems immediately.
Reduced credential exposure — Users don't have separate cloud credentials that could be compromised.
Audit consolidation — Authentication events are logged centrally.

Federation Protocols

•SAML 2.0 — Security Assertion Markup Language. XML-based, mature, widely supported. Common for enterprise applications and cloud console access.
•OIDC — OpenID Connect. Built on OAuth 2.0, uses JWT tokens. Lighter weight, JSON-based, preferred for modern applications and APIs.
•OAuth 2.0 — Authorization framework (not authentication). Often combined with OIDC. Grants limited access without sharing credentials.
•WS-Federation — Microsoft-originated, used with ADFS. Being superseded by SAML/OIDC in many environments.

Federation Architecture:

┌─────────────────┐         ┌──────────────────┐         ┌─────────────────┐
│                 │         │                  │         │                 │
│  User Browser   │ ──1──▸  │  Cloud Console   │ ──2──▸  │  Corporate IdP  │
│                 │         │  (Service Provider)       │  (e.g., Okta)   │
│                 │ ◂──5──  │                  │ ◂──4──  │                 │
│                 │         │                  │         │                 │
└─────────────────┘         └──────────────────┘         └─────────────────┘
                                                                  │
                                                                  │ 3
                                                                  ▼
                                                         ┌─────────────────┐
                                                         │  Corporate AD / │
                                                         │  User Directory │
                                                         └─────────────────┘

Flow:
1. User accesses cloud console
2. Console redirects to IdP for authentication
3. IdP authenticates user against corporate directory
4. IdP returns signed assertion (SAML) or ID token (OIDC)
5. Console grants access based on assertion/token claims

Attribute Mapping:

The IdP includes attributes (claims) in the assertion that the cloud provider uses for authorization:

IdP Attribute	Cloud Mapping	Authorization Use
Email	User identifier	Unique user identity
Groups	IAM roles/groups	Permission assignment
Department	Tags/labels	Cost allocation, policies
Manager	Audit trails	Access approval workflows
MFA status	Session conditions	Require MFA for sensitive ops

Proper attribute mapping is critical—it determines what roles users assume and what resources they can access.

Just-in-Time Provisioning

With SCIM (System for Cross-domain Identity Management), user accounts can be automatically provisioned and deprovisioned in cloud systems as they're added or removed from the IdP. This eliminates manual account management and ensures immediate offboarding.

Service Identities and Workload IAM

While human IAM gets significant attention, service identities—the machine-to-machine authentication mechanisms—are equally critical. In modern cloud architectures, the majority of API calls come from automated systems, not humans. Securing these non-human identities requires specific patterns.

The Service Identity Challenge:

Unlike humans, machines can't:

Enter passwords interactively
Use push notifications for MFA
Recognize phishing attempts
Make context-aware security decisions

But machines also have advantages:

Can use cryptographic credentials humans can't memorize
Can rotate credentials automatically
Can be tightly scoped to specific tasks
Don't take credentials home or write them on sticky notes

Service Identity Patterns

•Instance/Workload Identities — Cloud-assigned identities for compute resources (EC2 instance roles, GCE service accounts, Azure Managed Identities). The cloud platform handles credential issuance and rotation. Always prefer these over static credentials.
•Service Accounts with Keys — Static credentials for services, especially for on-premises or multi-cloud access. Require careful rotation and secure storage in secrets managers.
•Workload Identity Federation — External identities (Kubernetes service accounts, GitHub Actions, etc.) exchange tokens for cloud credentials without storing long-lived secrets.
•Certificate-Based Authentication — X.509 certificates for mutual TLS. Common in service mesh architectures where every service has a cryptographic identity.

AWS: Instance Profiles and IAM Roles

┌─────────────────────────────────────────────┐
│               EC2 Instance                   │
│                                              │
│  ┌────────────────────────────────────────┐ │
│  │ Application requests credentials from   │ │
│  │ Instance Metadata Service (IMDS):       │ │
│  │ http://169.254.169.254/...             │ │
│  └────────────────────────────────────────┘ │
│                     │                        │
│                     ▼                        │
│  ┌────────────────────────────────────────┐ │
│  │ IMDS returns temporary credentials:     │ │
│  │ - Access Key ID                         │ │
│  │ - Secret Access Key                     │ │
│  │ - Session Token                         │ │
│  │ - Expiration (typically 6 hours)        │ │
│  └────────────────────────────────────────┘ │
└─────────────────────────────────────────────┘
                     │
                     ▼
┌─────────────────────────────────────────────┐
│         AWS STS (Security Token Service)     │
│   Manages temporary credential issuance     │
│   based on IAM role trust policy            │
└─────────────────────────────────────────────┘

Security Best Practices for Service Identities:

Practice	Rationale	Implementation
Use instance/managed identities	No credentials in code or config	Attach roles to compute resources
Implement IMDSv2	Prevents SSRF credential theft	Require session tokens for metadata
Rotate static credentials	Limits window of compromise	Automate rotation via secrets manager
Least privilege scoping	Limits blast radius	Grant only required permissions
Audit credential usage	Detect anomalies	CloudTrail, Cloud Audit Logs
Use condition keys	Enforce context requirements	Source IP, VPC, encryption context

SSRF and Credential Theft

Server-Side Request Forgery (SSRF) attacks can steal instance credentials by making the server request its own metadata service. The Capital One breach exploited this. Mitigate by using IMDSv2, implementing WAF rules, and never trusting user-provided URLs for server-side requests.

IAM Anti-Patterns and Common Mistakes

Understanding what NOT to do is as important as knowing best practices. These anti-patterns appear repeatedly in breach reports and audit findings.

Critical IAM Anti-Patterns

•Wildcard permissions — "Action": "*" or "Resource": "*" grants far more than intended. Attackers dream of finding these overly permissive policies.
•Shared credentials — Multiple people or systems using the same access keys. Makes accountability impossible and rotation dangerous.
•Embedded credentials — Access keys hardcoded in application code, config files, or worse, committed to version control.
•No MFA for humans — Console access without MFA is trivially exploitable via password compromise. MFA should be mandatory for all human users.
•Long-lived access keys — Static credentials that are never rotated. Every day they exist is another day they could be compromised.
•Unused permissions — Permissions granted long ago for a specific task that were never revoked. Creates unnecessary attack surface.
•Cross-account trust without conditions — Trusting another account without scoping by external ID, source IP, or other conditions.
•Root account usage — Using the root/owner account for day-to-day work instead of creating appropriately scoped admin roles.

Real-World Breach Case Studies:

Case 1: Uber (2016) Developers committed AWS credentials to a private GitHub repository. Attackers who breached GitHub found the credentials and accessed an S3 bucket containing 57 million user records. The fix: never commit credentials, use instance roles, scan repos for secrets.

Case 2: Capital One (2019) A misconfigured WAF role had excessive permissions. An SSRF vulnerability allowed an attacker to steal instance credentials via the metadata service. Those credentials had access to S3 buckets with sensitive data. The fix: least privilege, IMDSv2, validation that roles can't access data they shouldn't.

Case 3: Twitch (2021) An exposed server configuration allowed access to internal Git repositories and AWS credentials. Overly broad internal access meant one exposure led to complete code and data exfiltration. The fix: network segmentation, credential scoping, zero trust internal architecture.

Common Themes:

Overly permissive credentials
Credentials in code or configuration
Lateral movement enabled by excessive permissions
Detection failures allowing extended dwell time

Credential Scanning is Essential

Implement automated scanning for credentials in code repos, container images, and configuration files. Tools like git-secrets, trufflehog, and cloud provider secret scanners can catch exposed credentials before attackers do. This is not optional—it's essential.

Summary: Mastering Cloud IAM

Identity and Access Management is the cornerstone of cloud security—the mechanism through which every action is authenticated and authorized. Getting IAM right prevents breaches; getting it wrong enables them.

Key Takeaways

•Authentication and authorization are distinct — Proving identity and determining permissions are separate functions that both must be robust.
•Least privilege is non-negotiable — Grant minimum permissions required. Start with zero access and add only what's needed.
•Federation centralizes identity — Use corporate identity providers for SSO. One source of truth, immediate offboarding, consistent policies.
•Service identities need special handling — Use instance/managed identities over static credentials. Rotate what you can't eliminate.
•Avoid anti-patterns religiously — No wildcards, no shared credentials, no embedded secrets, mandatory MFA, regular access reviews.
•Audit everything — Enable logging, set up alerts for anomalies, regularly review who has access to what.

What's Next:

With IAM controlling who can access cloud resources, we next examine network security—the controls that determine what traffic can reach those resources in the first place. VPCs, security groups, network ACLs, and related controls form the network perimeter that complements IAM's identity perimeter.

Page Complete

You now understand cloud Identity and Access Management deeply—from fundamentals through provider-specific implementations to federation and service identities. This knowledge is essential for designing secure cloud architectures that protect resources while enabling legitimate access.