MODULE 9 OF 15

Variables & Connections

Manage configuration, secrets, and external system credentials like a pro!

The Problem: Hardcoding Configs

When you write a DAG that connects to a database or calls an API, it's tempting to stick the password or API key right in the code. Don't do it! Hardcoding credentials is dangerous and inflexible.

Explain Like I'm 5

Imagine writing your phone number in permanent marker on every document you own. If you change numbers, you'd have to track down every single paper and fix it! And anyone who sees a document knows your number.

Hardcoding configs is the same: Your password lives in 10 DAG files. You change it for security? You must update 10 files. Someone checks your code into Git? The password is now in version history forever. Oops!

Why This Matters

Security: Credentials in code can leak via Git, logs, or screenshots.
Flexibility: Different environments (dev, staging, prod) need different values. Hardcoding means separate DAG files per environment.
Maintainability: One change (new API key) should be in one place, not scattered across dozens of files.

Airflow Variables

Airflow Variables are key-value pairs stored in Airflow's metadata database. You set them once, and any DAG can read them. Perfect for configuration that changes between environments or over time.

Setting Variables

Three Ways to Set Variables

1
Via the UI (Admin → Variables)

Go to the Airflow web UI, click AdminVariables. Add a new variable with a key and value. Great for quick edits and non-technical users.

2
Via CLI

airflow variables set key value — Sets a variable from the command line. Useful for CI/CD pipelines and automation scripts.

3
Via Environment Variables

Set AIRFLOW_VAR_<KEY> (e.g., AIRFLOW_VAR_MY_CONFIG). Airflow reads these at runtime. Great for Docker/Kubernetes where you inject secrets via env vars.

Reading Variables in DAGs

  • Variable.get('key') — Returns the value as a string
  • Variable.get('config', deserialize_json=True) — For JSON variables, returns a Python dict
  • Variable.get('key', default_var='fallback') — Use a default if the variable doesn't exist

When to Use Variables vs Params vs Env Vars

Use CaseUse ThisWhy
System credentials (DB, API)ConnectionsAirflow encrypts and manages them; operators expect conn_id
Config values (retry count, batch size)VariablesChangeable via UI/CLI without editing DAG code
Run-time overrides (specific date)ParamsPer-DAG-run values; users can tweak when triggering
Highly sensitive (prod passwords)Env vars / Secret BackendsNever stored in Airflow DB; injected at runtime

Airflow Connections

Connections store credentials for external systems: databases, APIs, cloud providers. Each connection has a Connection ID plus fields like host, port, login, password, schema, and optional extra JSON.

FieldPurpose
Connection IDUnique name used in DAGs (e.g., postgres_default)
Conn Typepostgres, mysql, http, aws, google_cloud, etc.
HostServer address
PortConnection port
LoginUsername
PasswordPassword (stored encrypted)
SchemaDatabase schema (for DB connections)
ExtraJSON for additional params (e.g., AWS region)

Setting Connections

  • UI: Admin → Connections → Add
  • CLI: airflow connections add my_conn --conn-type postgres --conn-host localhost ...
  • Env var: AIRFLOW_CONN_<CONN_ID> — Set to a connection URI (e.g., postgres://user:pass@host:5432/db)

Using Connections in Operators

Operators have parameters like postgres_conn_id, http_conn_id, aws_conn_id, etc. You pass the Connection ID, and the operator (via its underlying Hook) fetches the credentials automatically.

How Hooks Use Connections

When you write PostgresOperator(conn_id='my_postgres', ...), the operator creates a PostgresHook. The hook calls get_connection('my_postgres'), loads host, port, login, password from the connection, and opens a real connection to PostgreSQL. You never see the password in your DAG!

Hooks Explained

Hooks are the interface between Airflow and external systems. They encapsulate connection logic and provide methods to interact with the system (run queries, upload files, etc.).

PostgresHook

Connect to PostgreSQL; run SQL, bulk load

S3Hook

Upload/download from AWS S3

HttpHook

Make HTTP requests (GET, POST) to APIs

SlackHook

Send messages to Slack

To get connection details inside a task: hook.get_connection(conn_id) returns the full Connection object.

Secret Backends

For high-security environments, Airflow can fetch Variables and Connections from Secret Backends instead of its own database:

  • HashiCorp Vault — Popular enterprise secrets manager
  • AWS Secrets Manager — Native AWS integration
  • GCP Secret Manager — For Google Cloud deployments

You configure the backend in airflow.cfg. Airflow then looks up variables and connections from the external vault instead of the metadata DB.

Best Practices

Summary

Never hardcode credentials in DAG files! Use connections for system credentials (databases, APIs, cloud). Use variables for configuration values (batch size, retry count). Use environment variables or Secret Backends for highly sensitive data. Follow the principle of least privilege — give connections only the permissions they need.

Variable Flow

How variables flow from sources into your DAG tasks:

UI / CLI / Env Metadata DB (Variables) Variable.get() Your Task

Variables: Set → Stored → Retrieved → Used in tasks

Connection Architecture

Airflow Connections (DB / API creds) Hook get_connection() Operator conn_id= Postgres S3 / API

Connections feed Hooks, which Operators use to talk to external systems

Hooks Connecting to External Systems

PostgresHook PostgreSQL S3 / API Slack S3Hook HttpHook SlackHook

Different hooks connect Airflow to different external systems

Variables: Setting & Reading

# Set via CLI
airflow variables set my_config "production"
airflow variables set retry_count 5

# JSON variable
airflow variables set config_json '{"batch_size": 1000, "timeout": 60}'

# Set via environment variable (prefix AIRFLOW_VAR_)
# export AIRFLOW_VAR_MY_CONFIG=production
from airflow import DAG
from airflow.models import Variable
from airflow.operators.python import PythonOperator
from datetime import datetime

with DAG(
    dag_id="variables_demo",
    start_date=datetime(2024, 1, 1),
    schedule="@daily",
    catchup=False,
) as dag:

    def use_variables(**context):
        env = Variable.get('my_config')
        config = Variable.get('config_json', deserialize_json=True)
        batch_size = config['batch_size']
        print(f"Env: {env}, Batch: {batch_size}")

    task = PythonOperator(task_id="use_vars", python_callable=use_variables)

Connections: CLI & Usage

# Add a PostgreSQL connection via CLI
airflow connections add postgres_warehouse \
  --conn-type postgres \
  --conn-host mydb.example.com \
  --conn-port 5432 \
  --conn-login myuser \
  --conn-password secret123 \
  --conn-schema analytics

# Via environment variable (connection URI)
# export AIRFLOW_CONN_POSTGRES_WAREHOUSE="postgres://user:pass@host:5432/dbname"
from airflow import DAG
from airflow.providers.postgres.operators.postgres import PostgresOperator
from airflow.providers.http.operators.http import SimpleHttpOperator
from datetime import datetime

with DAG(
    dag_id="connections_demo",
    start_date=datetime(2024, 1, 1),
    schedule="@daily",
    catchup=False,
) as dag:

    run_sql = PostgresOperator(
        task_id="run_query",
        postgres_conn_id="postgres_warehouse",
        sql="SELECT 1",
    )

    call_api = SimpleHttpOperator(
        task_id="call_api",
        http_conn_id="my_api",
        method="GET",
        endpoint="/data",
    )

Using Hooks & get_connection()

from airflow.providers.postgres.hooks.postgres import PostgresHook

def custom_task(**context):
    hook = PostgresHook(postgres_conn_id="postgres_warehouse")
    conn = hook.get_connection("postgres_warehouse")
    host = conn.host
    # Use hook to run query
    result = hook.get_records("SELECT * FROM users LIMIT 10")
    return result

Practice Exercises

Exercise 1: Pick the Right Tool

Scenario

You need to store: (a) your PostgreSQL password, (b) a batch size of 1000 records, (c) an API key for a paid service. Which should go in Variables, Connections, or a Secret Backend?

Answer

(a) PostgreSQL password → Connections (it's a credential for a system). (b) Batch size → Variables (config value). (c) API key → Connections (credential) or Secret Backend if highly sensitive.

Exercise 2: Set a Variable via CLI

Challenge

Write the exact CLI command to set a variable named env with value staging.

Answer

airflow variables set env staging

Exercise 3: Use Variable.get() with a Default

Challenge

Write Python code to read a variable timeout, defaulting to 30 if it doesn't exist.

Answer

timeout = Variable.get('timeout', default_var='30')

Exercise 4: Connection URI

Challenge

What environment variable would you set to provide a PostgreSQL connection with user admin, password secret, host db.local, port 5432, database mydb? Use connection ID pg_main.

Answer

AIRFLOW_CONN_PG_MAIN=postgres://admin:secret@db.local:5432/mydb — The connection ID becomes uppercase and underscores in the env var name. The URI format is conn_type://login:password@host:port/schema.

Module 9 Quiz

Test your understanding! Click on the answer you think is correct.

1. Where are Airflow Variables stored?

2. How do you set a Variable via environment variable?

3. What does Variable.get('config', deserialize_json=True) return?

4. Connections are best used for ___?

5. How do you pass a connection to PostgresOperator?

6. What is a Hook in Airflow?

7. Which is NOT a Secret Backend option?

8. What is the principle of least privilege?