Manage configuration, secrets, and external system credentials like a pro!
When you write a DAG that connects to a database or calls an API, it's tempting to stick the password or API key right in the code. Don't do it! Hardcoding credentials is dangerous and inflexible.
Imagine writing your phone number in permanent marker on every document you own. If you change numbers, you'd have to track down every single paper and fix it! And anyone who sees a document knows your number.
Hardcoding configs is the same: Your password lives in 10 DAG files. You change it for security? You must update 10 files. Someone checks your code into Git? The password is now in version history forever. Oops!
Security: Credentials in code can leak via Git, logs, or screenshots.
Flexibility: Different environments (dev, staging, prod) need different values. Hardcoding means separate DAG files per environment.
Maintainability: One change (new API key) should be in one place, not scattered across dozens of files.
Airflow Variables are key-value pairs stored in Airflow's metadata database. You set them once, and any DAG can read them. Perfect for configuration that changes between environments or over time.
Go to the Airflow web UI, click Admin → Variables. Add a new variable with a key and value. Great for quick edits and non-technical users.
airflow variables set key value — Sets a variable from the command line. Useful for CI/CD pipelines and automation scripts.
Set AIRFLOW_VAR_<KEY> (e.g., AIRFLOW_VAR_MY_CONFIG). Airflow reads these at runtime. Great for Docker/Kubernetes where you inject secrets via env vars.
Variable.get('key') — Returns the value as a stringVariable.get('config', deserialize_json=True) — For JSON variables, returns a Python dictVariable.get('key', default_var='fallback') — Use a default if the variable doesn't exist| Use Case | Use This | Why |
|---|---|---|
| System credentials (DB, API) | Connections | Airflow encrypts and manages them; operators expect conn_id |
| Config values (retry count, batch size) | Variables | Changeable via UI/CLI without editing DAG code |
| Run-time overrides (specific date) | Params | Per-DAG-run values; users can tweak when triggering |
| Highly sensitive (prod passwords) | Env vars / Secret Backends | Never stored in Airflow DB; injected at runtime |
Connections store credentials for external systems: databases, APIs, cloud providers. Each connection has a Connection ID plus fields like host, port, login, password, schema, and optional extra JSON.
| Field | Purpose |
|---|---|
| Connection ID | Unique name used in DAGs (e.g., postgres_default) |
| Conn Type | postgres, mysql, http, aws, google_cloud, etc. |
| Host | Server address |
| Port | Connection port |
| Login | Username |
| Password | Password (stored encrypted) |
| Schema | Database schema (for DB connections) |
| Extra | JSON for additional params (e.g., AWS region) |
airflow connections add my_conn --conn-type postgres --conn-host localhost ...AIRFLOW_CONN_<CONN_ID> — Set to a connection URI (e.g., postgres://user:pass@host:5432/db)Operators have parameters like postgres_conn_id, http_conn_id, aws_conn_id, etc. You pass the Connection ID, and the operator (via its underlying Hook) fetches the credentials automatically.
When you write PostgresOperator(conn_id='my_postgres', ...), the operator creates a PostgresHook. The hook calls get_connection('my_postgres'), loads host, port, login, password from the connection, and opens a real connection to PostgreSQL. You never see the password in your DAG!
Hooks are the interface between Airflow and external systems. They encapsulate connection logic and provide methods to interact with the system (run queries, upload files, etc.).
Connect to PostgreSQL; run SQL, bulk load
Upload/download from AWS S3
Make HTTP requests (GET, POST) to APIs
Send messages to Slack
To get connection details inside a task: hook.get_connection(conn_id) returns the full Connection object.
For high-security environments, Airflow can fetch Variables and Connections from Secret Backends instead of its own database:
You configure the backend in airflow.cfg. Airflow then looks up variables and connections from the external vault instead of the metadata DB.
Never hardcode credentials in DAG files! Use connections for system credentials (databases, APIs, cloud). Use variables for configuration values (batch size, retry count). Use environment variables or Secret Backends for highly sensitive data. Follow the principle of least privilege — give connections only the permissions they need.
How variables flow from sources into your DAG tasks:
Variables: Set → Stored → Retrieved → Used in tasks
Connections feed Hooks, which Operators use to talk to external systems
Different hooks connect Airflow to different external systems
# Set via CLI airflow variables set my_config "production" airflow variables set retry_count 5 # JSON variable airflow variables set config_json '{"batch_size": 1000, "timeout": 60}' # Set via environment variable (prefix AIRFLOW_VAR_) # export AIRFLOW_VAR_MY_CONFIG=production
from airflow import DAG from airflow.models import Variable from airflow.operators.python import PythonOperator from datetime import datetime with DAG( dag_id="variables_demo", start_date=datetime(2024, 1, 1), schedule="@daily", catchup=False, ) as dag: def use_variables(**context): env = Variable.get('my_config') config = Variable.get('config_json', deserialize_json=True) batch_size = config['batch_size'] print(f"Env: {env}, Batch: {batch_size}") task = PythonOperator(task_id="use_vars", python_callable=use_variables)
# Add a PostgreSQL connection via CLI airflow connections add postgres_warehouse \ --conn-type postgres \ --conn-host mydb.example.com \ --conn-port 5432 \ --conn-login myuser \ --conn-password secret123 \ --conn-schema analytics # Via environment variable (connection URI) # export AIRFLOW_CONN_POSTGRES_WAREHOUSE="postgres://user:pass@host:5432/dbname"
from airflow import DAG from airflow.providers.postgres.operators.postgres import PostgresOperator from airflow.providers.http.operators.http import SimpleHttpOperator from datetime import datetime with DAG( dag_id="connections_demo", start_date=datetime(2024, 1, 1), schedule="@daily", catchup=False, ) as dag: run_sql = PostgresOperator( task_id="run_query", postgres_conn_id="postgres_warehouse", sql="SELECT 1", ) call_api = SimpleHttpOperator( task_id="call_api", http_conn_id="my_api", method="GET", endpoint="/data", )
from airflow.providers.postgres.hooks.postgres import PostgresHook def custom_task(**context): hook = PostgresHook(postgres_conn_id="postgres_warehouse") conn = hook.get_connection("postgres_warehouse") host = conn.host # Use hook to run query result = hook.get_records("SELECT * FROM users LIMIT 10") return result
You need to store: (a) your PostgreSQL password, (b) a batch size of 1000 records, (c) an API key for a paid service. Which should go in Variables, Connections, or a Secret Backend?
(a) PostgreSQL password → Connections (it's a credential for a system). (b) Batch size → Variables (config value). (c) API key → Connections (credential) or Secret Backend if highly sensitive.
Write the exact CLI command to set a variable named env with value staging.
airflow variables set env staging
Write Python code to read a variable timeout, defaulting to 30 if it doesn't exist.
timeout = Variable.get('timeout', default_var='30')
What environment variable would you set to provide a PostgreSQL connection with user admin, password secret, host db.local, port 5432, database mydb? Use connection ID pg_main.
AIRFLOW_CONN_PG_MAIN=postgres://admin:secret@db.local:5432/mydb — The connection ID becomes uppercase and underscores in the env var name. The URI format is conn_type://login:password@host:port/schema.
Test your understanding! Click on the answer you think is correct.