RBAC, encryption, and deploying on AWS, Azure, and GCP!
Role-Based Access Control (RBAC) in Airflow lets you decide who can see and do what. Instead of giving every user full power, you assign roles (e.g. Admin, User, Viewer) and each role has a set of permissions (e.g. can edit DAGs, can trigger runs, can only view).
Imagine a library. The librarian can add books, remove books, and see who borrowed what. A member can only browse and borrow. A guest can only look through the window. RBAC is like that: different "keys" (roles) open different doors (permissions).
| Role | Typical permissions |
|---|---|
| Admin | Full access: users, roles, DAGs, config, variables, connections |
| Op | Run/trigger DAGs, view logs, clear tasks; cannot change config or users |
| User | View DAGs and trigger runs within their DAG access |
| Viewer | Read-only: see DAGs, task states, logs; cannot trigger or edit |
| Public | No UI access (for API-only or anonymous) |
You can create custom roles and assign specific permissions (e.g. "can edit only DAGs in folder team_a/"). This keeps production safe when many people use the same Airflow instance.
Airflow stores Variables and Connections (passwords, API keys) in the metadata database. By default they can be stored in plain text. For production, you should encrypt them using Fernet (symmetric encryption).
Fernet is like a lockbox. You put your secret (e.g. a password) inside and lock it with a key. Only someone with the same key can open the box and read the password. Airflow uses one key (the Fernet key) to lock all secrets before saving them to the database.
You generate a Fernet key once, set it in airflow.cfg (or an env var), and Airflow automatically encrypts Variables and Connections when saving and decrypts when reading. Never commit the Fernet key to Git!
If you lose the Fernet key, you cannot decrypt existing secrets. Store the key in a secure secret manager (e.g. AWS Secrets Manager, HashiCorp Vault) and rotate it with care.
LDAP (Lightweight Directory Access Protocol) lets you plug Airflow into your company's existing user directory (e.g. Active Directory). Users log in with their corporate username and password; you don't have to create accounts manually in Airflow.
HTTPS is a must in production. It encrypts traffic between the browser and the Airflow web server so passwords and session cookies can't be sniffed. You typically put a reverse proxy (e.g. Nginx, Caddy) or a load balancer in front of the web server and terminate SSL there.
Use LDAP (or OAuth/OIDC) for authentication and HTTPS everywhere. Never run the Airflow UI over plain HTTP in production.
Instead of storing secrets only in the Airflow metadata database, you can use a secret backend so that Variables and Connections are fetched from an external store (e.g. AWS Secrets Manager, Azure Key Vault, GCP Secret Manager, HashiCorp Vault).
Benefits: central key management, rotation without touching Airflow config, and tighter access control. You configure the backend in airflow.cfg or via environment variables; Airflow then pulls secrets from that backend when tasks run.
Running Airflow in the cloud usually means either a managed service (someone else runs Airflow for you) or self-hosted (you run it on VMs, Kubernetes, etc.). The three main managed options are:
| Cloud | Managed service | Notes |
|---|---|---|
| AWS | Amazon Managed Workflows for Apache Airflow (MWAA) | Managed Airflow; you choose version, worker size, and DAGs; AWS handles scheduler, workers, DB, and scaling. |
| Azure | Azure Data Factory with Airflow (or self-host on AKS) | Integrated with Azure data services; you can also run open-source Airflow on AKS. |
| GCP | Cloud Composer (Airflow managed) | Fully managed Airflow on GKE; automatic upgrades, monitoring, and integration with BigQuery, GCS, etc. |
Managed = Like a gym membership: the gym (AWS/Azure/GCP) provides the building, equipment, and maintenance; you just show up and run your workouts (DAGs). Self-hosted = You buy the equipment and set up the gym yourself; more control but more work.
Pros: No server or DB maintenance, automatic patches and upgrades, built-in monitoring, SLA. Cons: Less control over versions and config, can be costlier at very large scale.
Pros: Full control, can optimize cost and topology, any Airflow version/plugins. Cons: You own availability, backups, upgrades, and security.
Choose managed when you want to focus on pipelines and have a small team. Choose self-hosted when you need strict control, custom integrations, or have dedicated platform engineers.
Who can do what: a simple view of Airflow roles and permissions.
RBAC: from full control (Admin) to read-only (Viewer)
Where can Airflow run? Managed vs self-hosted across the big three clouds.
Deployment options: managed services vs self-hosted on each cloud
Enable RBAC and create a custom role via airflow.cfg and the UI or CLI.
# airflow.cfg — enable RBAC (default in Airflow 2+) [api] auth_backends = airflow.api.auth.backend.session,airflow.api.auth.backend.basic_auth [webserver] rbac = True # Create a user and assign role via CLI airflow users create \ --username data_engineer \ --password <secure_password> \ --firstname Data \ --lastname Engineer \ --role Op \ --email engineer@company.com # List roles (Admin UI: Security → List Roles) airflow roles list
Generate a Fernet key and set it so Airflow can encrypt Variables and Connections.
# Generate a Fernet key (Python) from cryptography.fernet import Fernet print(Fernet.generate_key().decode()) # Example output (use your own!): # xK8s2Fg...base64...= # Set in airflow.cfg [core] fernet_key = your_generated_fernet_key_here # Or via environment variable (recommended for production) # AIRFLOW__CORE__FERNET_KEY=your_generated_fernet_key_here
Never commit the Fernet key to Git. Use a secret manager (e.g. AWS Secrets Manager) and inject it via env vars or a secure config service.
Run the web server behind a reverse proxy that terminates SSL (example: Nginx).
# Example: Nginx proxy to Airflow (HTTPS) server { listen 443 ssl; server_name airflow.company.com; ssl_certificate /etc/ssl/certs/airflow.crt; ssl_certificate_key /etc/ssl/private/airflow.key; location / { proxy_pass http://localhost:8080; proxy_set_header Host $host; proxy_set_header X-Real-IP $remote_addr; proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; proxy_set_header X-Forwarded-Proto $scheme; } }
In airflow.cfg set base_url to your HTTPS URL so links in the UI and emails use HTTPS.
Try these to reinforce Security & Cloud concepts. Use ELI5 thinking!
Your team has: (1) a data engineer who triggers DAGs and debugs, (2) an analyst who only needs to see dashboards and logs, (3) you as the platform owner. Which Airflow roles would you assign?
Engineer = needs to "run" and "fix" = Op. Analyst = only "look" = Viewer. You = full control = Admin.
Data engineer → Op (trigger DAGs, view/clear tasks, see logs). Analyst → Viewer (read-only). Platform owner → Admin (users, config, roles).
Your DAG uses a Connection that stores a database password. Someone gets read-only access to the Airflow metadata database. What risk do you have if Fernet is NOT used vs if it IS used?
Without Fernet: Passwords are in plain text in the DB → the attacker can steal them. With Fernet: Only encrypted values are stored → without the Fernet key they cannot decrypt; risk is much lower.
Startup of 5 people, one data engineer. They need Airflow for 20 DAGs. Would you recommend managed (e.g. MWAA or Composer) or self-hosted? Why?
Small team + not many DAGs = you want to spend time on data, not servers. Managed = someone else handles upgrades and outages.
Recommend managed. One data engineer shouldn't also own DB backups, Airflow upgrades, and scheduler availability. MWAA or Composer gives a known cost and SLA so the team can focus on pipelines.
You have 50 Connections and 30 Variables. Security wants to rotate DB passwords every 90 days. How does using a secret backend (e.g. AWS Secrets Manager) help?
With a secret backend, Airflow fetches Connection and Variable values from the external store at runtime. You rotate the password in one place (Secrets Manager); no need to update 50 Connections in the Airflow UI. Fewer mistakes and audit trail in the secret manager.
Test your Security & Cloud knowledge. Click the answer you think is correct.