Connecting to Backend Databases

The Hook

Figure: A stylized pipeline connecting a database cylinder to a Python logo, symbolizing a robust connection.

Figure: A stylized pipeline connecting a database cylinder to a Python logo, symbolizing a robust connection.

How do I direct my SQL commands to a database system and connect my Python application to it?

Today’s Journey (1/2)

  1. Setting Up the Database
    • Environment setup and virtualization
    • Docker containers vs virtual machines
    • Running PostgreSQL with Docker Compose
  2. Connecting the Application
    • Using psycopg driver
    • DB-API 2.0 protocol
    • Transactions and cursors

Today’s Journey (2/2)

  1. SQLAlchemy Framework
    • Core concepts: Engine, Metadata, Tables
    • Building queries in Python
    • Connection pooling
  2. Database Migrations
    • Version control for schema
    • Migration scripts with dbmate
    • Safe schema evolution

Part 1: Setting Up the Database

The Problem: Environment Complexity

Software needs an environment

  • CPU architecture (x86/x64, ARM)
  • Operating system
  • Memory and I/O devices
  • Installed binaries and libraries
  • Configuration files

The challenge: Conflicts between different software requirements

The Solution: Containerization

Docker packages everything together

  • Application binaries
  • Libraries and dependencies
  • Configuration files
  • All in one isolated container

Figure: A transparent shipping container revealing neatly packed binaries, libraries, and configurations. (Click to Enlarge)

Figure: A transparent shipping container revealing neatly packed binaries, libraries, and configurations. (Click to Enlarge)

Benefits of Containerization

  • Better development experience
    • Focus on real problems, not environment setup
  • Easier collaboration
    • Every developer has the same environment
  • Project isolation
    • Switch between projects without conflicts
  • Simplified deployment
    • Same container works in development and production

Virtualization Background

  • Virtualization (1966)
    • Dividing expensive mainframes into multiple virtual computers
    • Foundation of cloud computing
  • Containerization (2008-2013)
    • Internal project at dotCloud (France)
    • Open-sourced as Docker by Solomon Hykes (March 2013)

Figure: A timeline showing the evolution from mainframes to virtual machines and finally to containers.

Figure: A timeline showing the evolution from mainframes to virtual machines and finally to containers.

Dockerfile vs Image vs Container

Dockerfile, Image, and Container

Dockerfile, Image, and Container

Dockerfile: A recipe for building a Docker image

Image: A template for creating a container

Container: A running instance of an image

Container vs Virtual Machine

Evolution of Containers

Evolution of Containers

Both are isolated environments from an image

Difference: What the image contains

  • VM Image: includes a full operating system
  • Container Image: shares the host OS kernel

Containers are lighter

  • Less resource consumption
  • Faster startup/shutdown
  • Easier to manage

Running PostgreSQL with Docker

Step 1: Create environment file (.env)

POSTGRES_HOST=127.0.0.1
POSTGRES_DB=mydatabase
POSTGRES_USER=user
POSTGRES_PASSWORD=secret

Docker Compose Configuration

Step 2: Define services in compose.yaml

services:
  db:
    image: postgres:17.6
    env_file:
      - .env
    ports:
      - "5432:5432"
    volumes:
      - pgdata:/var/lib/postgresql/data

volumes:
  pgdata:

Understanding Docker Compose Elements

  • image: postgres:17.6 - Image name and version from DockerHub
  • env_file - References our .env file
  • ports: "5432:5432" - Maps host port to container port
  • volumes: pgdata - Persists data outside the container

Starting and Stopping

Start the database:

docker compose up

Stop the database:

docker compose down

Connecting to the Database

Many Approaches:

  1. CLI: psql (text interface)
  2. GUI: Adminer or VS Code Database Client
  3. DB-API: psycopg driver and sqlalchemy framework
  4. MCP: DBHub (Bytebase) MCP server for agentic access

Part 2: Connecting the Application

The Question

How do I connect a Python application to the database to query and store data?

Answer: Use the psycopg package

DB-API 2.0 Protocol

Standardized database interface in Python: fetchone(), fetchmany(), fetchall(), execute(), commit(), rollback(), etc.

  • psycopg implements this protocol for PostgreSQL
  • Works with other databases: SQLite, MySQL, etc.

Step 1: Import and Install

Install the package:

uv add "psycopg[binary]"

Import in Python:

import psycopg

Step 2: Create Connection

Connection string format (URI):

conn = psycopg.connect(
    'postgresql://user:password@localhost:5432/dbname'
)

Or with parameters (conninfo):

conn = psycopg.connect(
    conninfo=(
      'user=user password=secret '
      'host=localhost port=5432 dbname=testdb')
)

Connection String Sources

Two places connection info can come from:

  1. Environment variables (from .env file)
  2. Connection string parameters (URI or conninfo)

Both are checked by psycopg

Step 3: Working with Data

Create a cursor (cur) from the connection (conn):

with conn.cursor() as cur:
    cur.execute("SELECT name, age FROM users WHERE age > 18")
    rows = cur.fetchall()

Why execute AND fetch?

According to the PEP 249 specification:

  • execute(): Submits the SQL to the database, parses it, and identifies the matching rows. The database creates a “result set” or a pointer (cursor) to those rows.
  • fetch*(): These methods are the actual data transfer mechanism. They move the data from the database server/buffer into your Python environment as objects (tuple or list).

Reading Data

Four ways to fetch rows:

# Fetch one row
row = cur.fetchone()

# Fetch many rows
rows = cur.fetchmany(10)

# Fetch all rows
all_rows = cur.fetchall()

# Or iterate
for row in cur:
    print(row)

Writing Data

Using transactions:

with psycopg.connect(conn_string) as conn:
    with conn.cursor() as cur:
        cur.execute(
            "INSERT INTO test (num, data) VALUES (%s, %s)",
            (100, "abc'def")
        )
        # Transaction commits automatically

Problems with Raw Drivers

What psycopg doesn’t solve:

  1. SQL written as strings (raw SQL) → typos, syntax errors
  2. Parameter passing (not using placeholders) → SQL injection risks
  3. Connection management (not using connection pooling) → expensive to create connections

Part 3: SQLAlchemy Framework

What is SQLAlchemy?

Python’s most popular database framework

  • Raises database work to a higher level
  • Solves driver limitations
  • Two parts: Core and ORM

Figure: A diagram showing raw SQL transforming through SQLAlchemy into clean Python code.

Figure: A diagram showing raw SQL transforming through SQLAlchemy into clean Python code.

The Engine

Central component: manages connections

from sqlalchemy import create_engine, URL

engine = create_engine(
    url=URL.create(
        "postgresql+psycopg",
        username=os.getenv("POSTGRES_USER"),
        password=os.getenv("POSTGRES_PASSWORD"),
        host=os.getenv("POSTGRES_HOST"),
        database=os.getenv("POSTGRES_DB"),
    )
)

Usually a singleton (one per database)

Dialect vs Driver

Connection string: postgresql+psycopg

  • Dialect (postgresql): SQL language variant
    • Also: mysql, sqlite, mssql, oracle
  • Driver (psycopg): Low-level communication
    • Also: asyncpg, pymysql, pyodbc

Separation allows flexibility

Defining Schema with Metadata

Metadata: describes your database structure

from sqlalchemy.schema import MetaData, Table, Column
from sqlalchemy.types import Integer, String, Date

metadata = MetaData()

users_table = Table(
    "users",
    metadata,
    Column("id", Integer, primary_key=True),
    Column("name", String(100)),
    Column("birth_date", Date)
)

Creating Tables

Apply definitions to the database:

metadata.create_all(engine)

Translates to SQL:

CREATE TABLE users (
    id INTEGER NOT NULL,
    name VARCHAR(100),
    birth_date DATE,
    PRIMARY KEY (id)
);

Foreign Keys

Defining relationships:

from sqlalchemy.schema import ForeignKey

user_emails_table = Table(
    "user_emails",
    metadata,
    Column("id", Integer, primary_key=True),
    Column("user_id", ForeignKey("users.id"), nullable=False),
    Column("email", String, nullable=False),
)

Inserting Data

Building insert statements:

from sqlalchemy.sql.expression import insert

stmt = (
    insert(users_table)
    .values(
        name="Adam Banana",
        birth_date=datetime.date(1990, 1, 1)
    )
    .returning(users_table.c.id)
)

Executing Statements

Sending to database:

with engine.connect() as conn:
    result = conn.execute(stmt)
    conn.commit()
    row = result.fetchone()
    user_id = row[0]

Selecting Data

Building SELECT queries:

from sqlalchemy.sql.expression import select

stmt = (
    select(users_table.c.name, users_table.c.birth_date)
    .where(users_table.c.id == user_id)
)

with engine.connect() as conn:
    result = conn.execute(stmt)
    row = result.fetchone()

Aggregations

Using SQL functions:

from sqlalchemy.sql.functions import count

stmt = (
    select(
        user_emails_table.c.user_id,
        count(user_emails_table.c.email)
    )
    .group_by(user_emails_table.c.user_id)
)

Raw SQL When Needed

SQLAlchemy allows raw SQL:

from sqlalchemy import text

sql_text = text(
    "SELECT name, salary FROM employees "
    "WHERE department = :dept AND salary > :min_sal"
)

result = conn.execute(
    sql_text,
    {"dept": "Sales", "min_sal": 50000}
)

Part 4: Database Migrations

The Problem

Schema changes (CREATE TABLE, ALTER TABLE, DROP TABLE, …etc.) applied manually are dangerous:

  • data loss risk, hard to roll back (undo)
  • not reproducible, out-of-sync between:
    • different environments (development, staging, production)
    • different developers working on the same codebase
  • not auditable, no history of:
    • what was changed
    • who changed it
    • when it was changed
    • why it was changed

Version Control for Code

Git tracks code changes:

  • Line-by-line history
  • Time travel (undo)
  • Review before merging
  • Collaboration

We need the same for database schema.

Figure: A split-view document showing code diffs with red deletions and green additions under a branching icon.

Figure: A split-view document showing code diffs with red deletions and green additions under a branching icon.

Migration Scripts

Solution: Migration scripts

  • Each change is a script
  • Tracked in git
  • Reversible (up and down)
  • Applied automatically

Tool: dbmate

Installing dbmate

On Linux:

sudo curl -fsSL -o /usr/local/bin/dbmate \
  https://github.com/amacneil/dbmate/releases/latest/download/dbmate-linux-amd64
sudo chmod +x /usr/local/bin/dbmate

Configuration

Set DATABASE_URL in .env:

DATABASE_URL="postgres://user:password@127.0.0.1:5432/testdb?sslmode=disable"

dbmate reads this automatically

Creating a Migration File

Generate new migration file:

dbmate new "add_users_table"

Creates a file with two sections:

-- migrate:up


-- migrate:down

Writing the Migration SQL

We write the SQL for both directions. This ensures we can always roll back.

-- migrate:up
CREATE TABLE users (id SERIAL PRIMARY KEY);

-- migrate:down
DROP TABLE users;

Applying Migrations

Run pending migrations:

dbmate up

Roll back last migration:

dbmate down

Check status:

dbmate status

Version Controlled Migrations Workflow (1/2)

1. Create migration:

dbmate new "add_email_column"

2. Write up and down SQL:

-- migrate:up
ALTER TABLE users ADD COLUMN email VARCHAR;

-- migrate:down
ALTER TABLE users DROP COLUMN email;

Version Controlled Migrations Workflow (2/2)

3. Update Python models:

users = Table(
    "users",
    metadata,
    Column("id", Integer, primary_key=True),
    Column("email", String),  # NEW
)

4. Commit both together:

git add db/migrations/xxx_add_email_column.sql
git add src/models.py
git commit -m "schema: add email column to users"

Key Takeaways

1. Database Setup

  • Containers solve environment problems
    • Package everything together
    • Isolated from conflicts
    • Same environment everywhere
  • Docker Compose simplifies orchestration
    • Define services in YAML
    • Volumes persist data
    • One command to start/stop

2. Database Connections

  • DB-API 2.0 is the standard
    • psycopg implements it for PostgreSQL
    • Patterns transfer to other databases
  • Transactions ensure data integrity
    • Automatic commit/rollback
    • Context managers handle cleanup

3. SQLAlchemy Benefits

  • Type-safe database code
    • Python objects instead of SQL strings
    • IDE autocomplete and error checking
  • Connection pooling
    • Reuses expensive connections
    • Better performance
  • Database portability
    • Switch databases with minimal code changes
    • Dialect handles SQL differences

4. Migration Best Practices

  • Version control your schema
    • Every change is a script
    • Tracked in git
    • Reversible
  • Keep migrations and models in sync
    • Commit migration and code changes together
    • Ensures consistency across environments

The Complete Picture

Figure: A system diagram connecting a Dockerized PostgreSQL database, Python app via SQLAlchemy, and Git-tracked migrations.

Figure: A system diagram connecting a Dockerized PostgreSQL database, Python app via SQLAlchemy, and Git-tracked migrations.

You now have a Professional workflow: