AI Pros Bootcamp – Connecting to Backend Databases

The Hook

Figure: A stylized pipeline connecting a database cylinder to a Python logo, symbolizing a robust connection.

How do I direct my SQL commands to a database system and connect my Python application to it?

Today’s Journey (1/2)

Setting Up the Database
- Environment setup and virtualization
- Docker containers vs virtual machines
- Running PostgreSQL with Docker Compose
Connecting the Application
- Using psycopg driver
- DB-API 2.0 protocol
- Transactions and cursors

Today’s Journey (2/2)

SQLAlchemy Framework
- Core concepts: Engine, Metadata, Tables
- Building queries in Python
- Connection pooling
Database Migrations
- Version control for schema
- Migration scripts with dbmate
- Safe schema evolution

Part 1: Setting Up the Database

The Problem: Environment Complexity

Software needs an environment

CPU architecture (x86/x64, ARM)
Operating system
Memory and I/O devices
Installed binaries and libraries
Configuration files

The challenge: Conflicts between different software requirements

The Solution: Containerization

Docker packages everything together

Application binaries
Libraries and dependencies
Configuration files
All in one isolated container

Figure: A transparent shipping container revealing neatly packed binaries, libraries, and configurations. (Click to Enlarge) — Figure: A transparent shipping container revealing neatly packed binaries, libraries, and configurations. **(Click to Enlarge)**

Benefits of Containerization

Better development experience
- Focus on real problems, not environment setup
Easier collaboration
- Every developer has the same environment
Project isolation
- Switch between projects without conflicts
Simplified deployment
- Same container works in development and production

Virtualization Background

Virtualization (1966)
- Dividing expensive mainframes into multiple virtual computers
- Foundation of cloud computing
Containerization (2008-2013)
- Internal project at dotCloud (France)
- Open-sourced as Docker by Solomon Hykes (March 2013)

Figure: A timeline showing the evolution from mainframes to virtual machines and finally to containers.

Dockerfile vs Image vs Container

Dockerfile: A recipe for building a Docker image

Image: A template for creating a container

Container: A running instance of an image

Container vs Virtual Machine

Both are isolated environments from an image

Difference: What the image contains

VM Image: includes a full operating system
Container Image: shares the host OS kernel

Containers are lighter

Less resource consumption
Faster startup/shutdown
Easier to manage

Running PostgreSQL with Docker

Step 1: Create environment file (.env)

POSTGRES_HOST=127.0.0.1
POSTGRES_DB=mydatabase
POSTGRES_USER=user
POSTGRES_PASSWORD=secret

Docker Compose Configuration

Step 2: Define services in compose.yaml

services:
  db:
    image: postgres:17.6
    env_file:
      - .env
    ports:
      - "5432:5432"
    volumes:
      - pgdata:/var/lib/postgresql/data

volumes:
  pgdata:

Understanding Docker Compose Elements

image: postgres:17.6 - Image name and version from DockerHub
env_file - References our .env file
ports: "5432:5432" - Maps host port to container port
volumes: pgdata - Persists data outside the container

Starting and Stopping

Start the database:

docker compose up

Stop the database:

docker compose down

Connecting to the Database

Many Approaches:

CLI: psql (text interface)
GUI: Adminer or VS Code Database Client
DB-API: psycopg driver and sqlalchemy framework
MCP: DBHub (Bytebase) MCP server for agentic access

Part 2: Connecting the Application

The Question

How do I connect a Python application to the database to query and store data?

Answer: Use the psycopg package

DB-API 2.0 Protocol

Standardized database interface in Python: fetchone(), fetchmany(), fetchall(), execute(), commit(), rollback(), etc.

psycopg implements this protocol for PostgreSQL
Works with other databases: SQLite, MySQL, etc.

Step 1: Import and Install

Install the package:

uv add "psycopg[binary]"

Import in Python:

import psycopg

Step 2: Create Connection

Connection string format (URI):

conn = psycopg.connect(
    'postgresql://user:password@localhost:5432/dbname'
)

Or with parameters (conninfo):

conn = psycopg.connect(
    conninfo=(
      'user=user password=secret '
      'host=localhost port=5432 dbname=testdb')
)

Connection String Sources

Two places connection info can come from:

Environment variables (from .env file)
Connection string parameters (URI or conninfo)

Both are checked by psycopg

Step 3: Working with Data

Create a cursor (`cur`) from the connection (`conn`):

with conn.cursor() as cur:
    cur.execute("SELECT name, age FROM users WHERE age > 18")
    rows = cur.fetchall()

Why `execute` AND `fetch`?

According to the PEP 249 specification:

execute(): Submits the SQL to the database, parses it, and identifies the matching rows. The database creates a “result set” or a pointer (cursor) to those rows.
fetch*(): These methods are the actual data transfer mechanism. They move the data from the database server/buffer into your Python environment as objects (tuple or list).

Reading Data

Four ways to fetch rows:

# Fetch one row
row = cur.fetchone()

# Fetch many rows
rows = cur.fetchmany(10)

# Fetch all rows
all_rows = cur.fetchall()

# Or iterate
for row in cur:
    print(row)

Writing Data

Using transactions:

with psycopg.connect(conn_string) as conn:
    with conn.cursor() as cur:
        cur.execute(
            "INSERT INTO test (num, data) VALUES (%s, %s)",
            (100, "abc'def")
        )
        # Transaction commits automatically

Problems with Raw Drivers

What psycopg doesn’t solve:

SQL written as strings (raw SQL) → typos, syntax errors
Parameter passing (not using placeholders) → SQL injection risks
Connection management (not using connection pooling) → expensive to create connections

Part 3: SQLAlchemy Framework

What is SQLAlchemy?

Python’s most popular database framework

Raises database work to a higher level
Solves driver limitations
Two parts: Core and ORM

Figure: A diagram showing raw SQL transforming through SQLAlchemy into clean Python code.

The Engine

Central component: manages connections

from sqlalchemy import create_engine, URL

engine = create_engine(
    url=URL.create(
        "postgresql+psycopg",
        username=os.getenv("POSTGRES_USER"),
        password=os.getenv("POSTGRES_PASSWORD"),
        host=os.getenv("POSTGRES_HOST"),
        database=os.getenv("POSTGRES_DB"),
    )
)

Usually a singleton (one per database)

Dialect vs Driver

Connection string: postgresql+psycopg

Dialect (postgresql): SQL language variant
- Also: mysql, sqlite, mssql, oracle
Driver (psycopg): Low-level communication
- Also: asyncpg, pymysql, pyodbc

Separation allows flexibility

Defining Schema with Metadata

Metadata: describes your database structure

from sqlalchemy.schema import MetaData, Table, Column
from sqlalchemy.types import Integer, String, Date

metadata = MetaData()

users_table = Table(
    "users",
    metadata,
    Column("id", Integer, primary_key=True),
    Column("name", String(100)),
    Column("birth_date", Date)
)

Creating Tables

Apply definitions to the database:

metadata.create_all(engine)

Translates to SQL:

CREATE TABLE users (
    id INTEGER NOT NULL,
    name VARCHAR(100),
    birth_date DATE,
    PRIMARY KEY (id)
);

Foreign Keys

Defining relationships:

from sqlalchemy.schema import ForeignKey

user_emails_table = Table(
    "user_emails",
    metadata,
    Column("id", Integer, primary_key=True),
    Column("user_id", ForeignKey("users.id"), nullable=False),
    Column("email", String, nullable=False),
)

Inserting Data

Building insert statements:

from sqlalchemy.sql.expression import insert

stmt = (
    insert(users_table)
    .values(
        name="Adam Banana",
        birth_date=datetime.date(1990, 1, 1)
    )
    .returning(users_table.c.id)
)

Executing Statements

Sending to database:

with engine.connect() as conn:
    result = conn.execute(stmt)
    conn.commit()
    row = result.fetchone()
    user_id = row[0]

Selecting Data

Building SELECT queries:

from sqlalchemy.sql.expression import select

stmt = (
    select(users_table.c.name, users_table.c.birth_date)
    .where(users_table.c.id == user_id)
)

with engine.connect() as conn:
    result = conn.execute(stmt)
    row = result.fetchone()

Aggregations

Using SQL functions:

from sqlalchemy.sql.functions import count

stmt = (
    select(
        user_emails_table.c.user_id,
        count(user_emails_table.c.email)
    )
    .group_by(user_emails_table.c.user_id)
)

Raw SQL When Needed

SQLAlchemy allows raw SQL:

from sqlalchemy import text

sql_text = text(
    "SELECT name, salary FROM employees "
    "WHERE department = :dept AND salary > :min_sal"
)

result = conn.execute(
    sql_text,
    {"dept": "Sales", "min_sal": 50000}
)

Part 4: Database Migrations

The Problem

Schema changes (CREATE TABLE, ALTER TABLE, DROP TABLE, …etc.) applied manually are dangerous:

data loss risk, hard to roll back (undo)
not reproducible, out-of-sync between:
- different environments (development, staging, production)
- different developers working on the same codebase
not auditable, no history of:
- what was changed
- who changed it
- when it was changed
- why it was changed

Version Control for Code

Git tracks code changes:

Line-by-line history
Time travel (undo)
Review before merging
Collaboration

We need the same for database schema.

Figure: A split-view document showing code diffs with red deletions and green additions under a branching icon.

Migration Scripts

Solution: Migration scripts

Each change is a script
Tracked in git
Reversible (up and down)
Applied automatically

Tool: dbmate

Installing `dbmate`

On Linux:

sudo curl -fsSL -o /usr/local/bin/dbmate \
  https://github.com/amacneil/dbmate/releases/latest/download/dbmate-linux-amd64
sudo chmod +x /usr/local/bin/dbmate

Configuration

Set DATABASE_URL in .env:

DATABASE_URL="postgres://user:password@127.0.0.1:5432/testdb?sslmode=disable"

dbmate reads this automatically

Creating a Migration File

Generate new migration file:

dbmate new "add_users_table"

Creates a file with two sections:

-- migrate:up


-- migrate:down

Writing the Migration SQL

We write the SQL for both directions. This ensures we can always roll back.

-- migrate:up
CREATE TABLE users (id SERIAL PRIMARY KEY);

-- migrate:down
DROP TABLE users;

Applying Migrations

Run pending migrations:

dbmate up

Roll back last migration:

dbmate down

Check status:

dbmate status

Version Controlled Migrations Workflow (1/2)

1. Create migration:

dbmate new "add_email_column"

2. Write up and down SQL:

-- migrate:up
ALTER TABLE users ADD COLUMN email VARCHAR;

-- migrate:down
ALTER TABLE users DROP COLUMN email;

Version Controlled Migrations Workflow (2/2)

3. Update Python models:

users = Table(
    "users",
    metadata,
    Column("id", Integer, primary_key=True),
    Column("email", String),  # NEW
)

4. Commit both together:

git add db/migrations/xxx_add_email_column.sql
git add src/models.py
git commit -m "schema: add email column to users"

Key Takeaways

1. Database Setup

Containers solve environment problems
- Package everything together
- Isolated from conflicts
- Same environment everywhere
Docker Compose simplifies orchestration
- Define services in YAML
- Volumes persist data
- One command to start/stop

2. Database Connections

DB-API 2.0 is the standard
- psycopg implements it for PostgreSQL
- Patterns transfer to other databases
Transactions ensure data integrity
- Automatic commit/rollback
- Context managers handle cleanup

3. SQLAlchemy Benefits

Type-safe database code
- Python objects instead of SQL strings
- IDE autocomplete and error checking
Connection pooling
- Reuses expensive connections
- Better performance
Database portability
- Switch databases with minimal code changes
- Dialect handles SQL differences

4. Migration Best Practices

Version control your schema
- Every change is a script
- Tracked in git
- Reversible
Keep migrations and models in sync
- Commit migration and code changes together
- Ensures consistency across environments

The Complete Picture

Figure: A system diagram connecting a Dockerized PostgreSQL database, Python app via SQLAlchemy, and Git-tracked migrations.

You now have a Professional workflow:

Isolated database environment (PostgreSQL in Docker)
Type-safe database code (sqlalchemy with psycopg)
Version-controlled schema (dbmate and git)