Machine Learning

AI Professionals Bootcamp | Week 3

2025-12-31

Announcements / admin

Today’s theme: training is not done until prediction works
We will use yesterday’s artifacts:
- schema/input_schema.json
- tables/holdout_input.*
Goal: ml-baseline predict works on new files with guardrails

Note

Don’t commit generated artifacts: models/runs/, outputs/, data/processed/.

Day 4: Predict CLI + inference contracts

Goal: run ml-baseline predict on new data reliably (schema guardrails + correct outputs).

Bootcamp • SDAIA Academy

Today’s Flow

Session 1 (60m): Inference mental model (inputs, outputs, thresholds)
Asr Prayer (20m)
Session 2 (60m): Schema validation + alignment (fail fast)
Maghrib Prayer (20m)
Session 3 (60m): Predict command end-to-end (run registry + sanity checks)
Isha Prayer (20m)
Hands-on (120m): Implement/verify predict and test on holdout + “new” files

Learning Objectives

By the end of today, you can:

Explain the inference contract (what goes in, what comes out)
Run ml-baseline predict on holdout_input.* successfully
Enforce input_schema.json (missing required columns → error; forbidden columns → error)
Preserve optional ID columns in prediction outputs
Do 3 sanity checks on prediction outputs (rows, columns, ranges)
Add 1–2 helpful failure messages that save debugging time

Warm-up (5 minutes)

Run prediction on your saved holdout input.

macOS/Linux

run_id=$(cat models/registry/latest.txt)
holdout=$(ls models/runs/$run_id/tables/holdout_input.* | head -n 1)
uv run ml-baseline predict --run latest --input "$holdout" --output outputs/preds.csv
head -n 6 outputs/preds.csv

Windows PowerShell

$run_id = Get-Content models/registry/latest.txt
$holdout = (Get-ChildItem "models/runs/$run_id/tables" -Filter "holdout_input.*" | Select-Object -First 1).FullName
uv run ml-baseline predict --run latest --input $holdout --output outputs/preds.csv
Get-Content outputs/preds.csv -TotalCount 6

Checkpoint: you produced outputs/preds.csv and it has prediction (and maybe score).

Where today fits in the Week 3 loop

Define → Split → Baseline → Train → Evaluate → Save → Predict → Report

Today: Predict uses yesterday’s schema + holdout_input.

Session 1

Inference mental model (inputs, outputs, thresholds)

Session 1 objectives

Define “inference” in 1 sentence
Describe the input contract (no target; required features must exist)
Describe the output contract (predictions file)
Understand what a threshold does (classification)

Training vs inference (the only difference that matters)

Training - input: features + target (y) - output: a saved model (pipeline) - you compute metrics (holdout)

Inference - input: features only - output: predictions file - you must handle missing/extra columns safely

Warning

If the target leaks into inference input, your system becomes “cheat mode” and metrics become meaningless.

Inference contract (what goes in / what comes out)

Input file (--input) - must include: schema.required_feature_columns - may include: schema.optional_id_columns (pass-through) - must NOT include: schema.forbidden_columns (usually the target)

Output file (--output) - always includes: prediction - classification also includes: score - includes ID columns if they were provided in the input

Classification outputs: score vs prediction

Score - a probability-like number - used to rank cases

Prediction - a 0/1 decision - made by score >= threshold

Example

score	threshold	prediction
0.82	0.50	1
0.49	0.50	0
0.82	0.90	0

The same score can become a different decision if you change the threshold.

Thresholds: keep it simple today

Minimum ✅ - use a fixed threshold (0.50) for classification

Optional ⭐ - choose threshold to maximize F1 (max_f1) - choose threshold to meet a business rule (e.g., precision ≥ 0.80)

Tip

Pick one decision policy and document it in your model card.

Quick Check

Question: If we increase the threshold from 0.50 to 0.80, what usually happens?

More predicted positives, higher recall
Fewer predicted positives, higher precision (often)
Nothing changes

Answer: B (fewer positives; precision often increases; recall often decreases).

Micro-exercise: design your prediction output (6 minutes)

In pairs:

Decide which columns your outputs/preds.csv must contain
Decide which columns should be “pass-through” IDs
Write one sentence: “A teammate can use this output to ___.”

Checkpoint: you can explain score vs prediction in one sentence.

Solution (example)

Classification output columns: user_id (optional), score, prediction
Pass-through IDs: columns like user_id, customer_id, transaction_id
Sentence: “A teammate can use this file to rank customers by score and take action on predicted positives.”

Session 1 recap

Inference = run a saved model on features-only input
Output = predictions (and score for classification)
Threshold converts score → decision

Asr break

20 minutes

When you return: open schema/input_schema.json in your latest run.

Session 2

Schema validation + alignment (fail fast)

Session 2 objectives

Explain why we validate inference inputs
Implement/verify validate_and_align(df, schema)
Fail fast on:
- forbidden columns
- missing required features
Preserve optional IDs in the output

Real-world CSV problems (that break models)

A required column is missing
A column name changed (typo / rename)
The target column was accidentally included
A numeric column arrives as text ("12", "N/A")

Tip

Your schema turns “mysterious errors” into clear messages.

What `validate_and_align(...)` must do

input df
  → check forbidden (fail)
  → check missing required (fail)
  → select optional IDs (passthrough)
  → coerce dtypes (simple)
  → return (X, ids)

This function is the “seatbelt” for your predict command.

Example: validation rules

If inference input contains:

is_high_value (target) → error: forbidden column
missing avg_spend_30d → error: missing required feature
extra column notes → ignore (unless you choose to fail) ⭐ optional

Warning

Do not silently fill missing required features. Fail fast.

Friendly failures: prefer `ValueError` over `assert`

Hard to read - assert not missing - can be skipped with Python optimizations - message might be unclear

Better - raise ValueError("Missing required columns: ...") - always runs - clear for teammates

Asserts are okay for learning — but clear errors are better for shipping.

Micro-exercise: what should the error say? (6 minutes)

You receive this inference file columns:

["user_id", "country", "avg_spend_30d", "is_high_value"]

What is the problem?
Write the error message you want a teammate to see.

Checkpoint: your message includes the exact bad column name.

Solution (example)

Problem: target column is_high_value is present (forbidden)
Example error message:
- Forbidden columns present in inference input: ['is_high_value']

Quick Check

Question: Should we automatically add a missing required feature column as zeros?

Answer: Usually no. That hides data problems and can silently degrade predictions.

Session 2 recap

Schema validation prevents silent bugs
Fail fast on forbidden/missing columns
Return (X, ids) so IDs can be preserved in outputs

Maghrib break

20 minutes

When you return: be ready to run predict on a file with an intentional mistake.

Session 3

Predict end-to-end (run registry + sanity checks)

Session 3 objectives

Explain what a run folder is and why we load from it
Use --run latest to predict without guessing paths
Do sanity checks on prediction outputs

Run folders: prediction must be reproducible

A trained run lives at:

models/runs/<run_id>/
  model/model.joblib
  schema/input_schema.json
  run_meta.json

And the “pointer” lives at:

models/registry/latest.txt

Tip

latest.txt lets you predict without copy/pasting long paths.

Predict command anatomy

Example

uv run ml-baseline predict \
  --run latest \
  --input data/processed/features.csv \
  --output outputs/preds.csv

Under the hood:

resolve run dir (latest → models/runs/<run_id>)
load schema + model
read input table
validate + align columns
predict + write output

Sanity checks after `predict`

Minimum checks you should do every time:

Row count: output rows == input rows
Columns: output contains expected columns
Ranges:
- classification: 0 ≤ score ≤ 1
- regression: predictions are finite (no all-NaN)

Tip

If row counts don’t match, stop. Something is wrong.

Optional ⭐: a tiny “skew check” idea

Not required today, but good to know:

Compare today’s inference input to yesterday’s holdout_input:
- missingness rates
- numeric ranges (min/max)
- new/unseen categories

We’ll do richer checks in Week 7 (MLOps). Today we just ship predict.

Session 3 recap

A run folder contains everything needed for inference
--run latest uses models/registry/latest.txt
Always do sanity checks after prediction

Isha break

20 minutes

When you return: start Hands-on Task 1 immediately.

Hands-on

Implement/verify predict end-to-end

Hands-on success criteria (today)

Minimum ✅ - uv run ml-baseline predict --run latest ... writes an output file - Input schema is enforced: - forbidden target column → clear error - missing required feature → clear error - Output includes optional IDs if they exist in the input - uv run pytest passes - 1+ commit pushed to GitHub

Optional ⭐ - Improve error messages (ValueError + actionable hint) - Add --threshold override behavior (classification) - Add a tiny skew-check script (missingness + ranges)

Project touch points (Day 4)

src/ml_baseline/
  predict.py     # run_predict + resolve_run_dir
  schema.py      # InputSchema + validate_and_align
  io.py          # read_tabular / write_tabular
models/runs/<run_id>/
  model/
  schema/
outputs/
  preds.csv

Task 1 — Predict on holdout_input (15 minutes)

Find holdout_input.* inside your latest run
Run predict on it
Inspect the output file

macOS/Linux

run_id=$(cat models/registry/latest.txt)
holdout=$(ls models/runs/$run_id/tables/holdout_input.* | head -n 1)
uv run ml-baseline predict --run latest --input "$holdout" --output outputs/preds.csv
head -n 6 outputs/preds.csv

Windows PowerShell

$run_id = Get-Content models/registry/latest.txt
$holdout = (Get-ChildItem "models/runs/$run_id/tables" -Filter "holdout_input.*" | Select-Object -First 1).FullName
uv run ml-baseline predict --run latest --input $holdout --output outputs/preds.csv
Get-Content outputs/preds.csv -TotalCount 6

Checkpoint: output file exists and includes prediction.

Task 2 — Create an intentional failure (10 minutes)

Goal: prove your guardrails work.

Copy holdout_input.* to outputs/bad_input.csv
Add the target column name (or delete one required feature)
Run predict again

Checkpoint: predict fails with a clear message.

Hint: easiest way to create a bad file

If you have holdout_input.csv, open it and add a column header:
- is_high_value
Or delete one required feature header

Tip

You’re testing the contract, not the model.

Task 3 — Verify/implement `resolve_run_dir` (10 minutes)

Open: src/ml_baseline/predict.py

Ensure:

--run latest loads models/registry/latest.txt
it returns models/runs/<run_id>
it errors clearly when latest doesn’t exist

Checkpoint: ml-baseline show-run latest prints run_meta.json.

Solution (example logic)

if run == "latest":
    p = models_dir / "registry" / "latest.txt"
    if not p.exists():
        raise FileNotFoundError("No latest.txt found. Train a model first.")
    run_id = p.read_text(encoding="utf-8").strip()
    return models_dir / "runs" / run_id
return Path(run).resolve()

Task 4 — Verify/implement `validate_and_align` (25 minutes)

Open: src/ml_baseline/schema.py

Minimum behavior:

Fail if forbidden columns exist
Fail if required columns are missing
Return:
- X with required features in schema order
- ids with optional ID columns (if present)

Checkpoint: predict works on good input and fails on bad input.

Solution pattern (high level)

- forbidden = ...
- missing = ...
- ids = df[optional_ids]
- coerce dtypes (optional)
- X = df[required_features]
- return X, ids

Keep it boring. Reliability beats cleverness.

Task 5 — Output contract (10 minutes)

After prediction, check:

output row count equals input row count
columns are correct:
- classification: score, prediction (+ IDs)
- regression: prediction (+ IDs)

Checkpoint: you can point to one row and explain it.

Task 6 — Tests + small doc update (15 minutes)

Run tests:

uv run pytest

Update reports/model_card.md:

add a short “How to predict” section (1–3 commands)
list what inference input must contain

Checkpoint: tests pass and model card explains prediction.

Vibe coding (safe version)

Write the plan in 5 bullets (no code yet)
Implement the smallest piece
Run → break → read error → fix
Commit
Repeat

Warning

Do not ask GenAI to write your solution code. Ask it to explain concepts or errors.

Git checkpoint (2 minutes)

git status
commit with message: "w3d4: predict cli + schema guardrails"
push to GitHub

Checkpoint: repo shows the new commit online.

Debug playbook (predict edition)

Confirm files exist:
- models/registry/latest.txt
- models/runs/<run_id>/model/model.joblib
- models/runs/<run_id>/schema/input_schema.json
Print columns of your input file
Compare with schema required list
If it fails, fix schema validation first

Tip

Most predict bugs are column contract bugs, not model bugs.

Stretch goals (optional ⭐)

Add --threshold override in predict (classification)
Add a “strict mode” (--strict) to fail on extra columns
Add a tiny skew-check script that prints:
- missingness rate per column
- numeric min/max

Exit Ticket

In 1–2 sentences each:

What is the difference between a score and a prediction?
Name 2 ways inference input can break a model.
What file makes --run latest possible?

What to do after class (Day 4 assignment)

Due: before Day 5 (Jan 1, 2026)

Run predict on 2 inputs:
- your holdout_input.*
- one “new” file you create (copy + edit 5 rows)
Write 3 bullet points in reports/model_card.md:
- required columns
- forbidden columns
- how to run predict
Commit + push

Deliverable: GitHub repo link + screenshot of outputs/preds.csv (first 5 rows).

Tip

Tomorrow you’ll polish reporting + submission. Today is about reliable inference.

Machine Learning

Announcements / admin

Day 4: Predict CLI + inference contracts

Today’s Flow

Learning Objectives

Warm-up (5 minutes)

Where today fits in the Week 3 loop

Session 1

Session 1 objectives

Training vs inference (the only difference that matters)

Inference contract (what goes in / what comes out)

Classification outputs: score vs prediction

Thresholds: keep it simple today

Quick Check

Micro-exercise: design your prediction output (6 minutes)

Solution (example)

Session 1 recap

Asr break

20 minutes

Session 2

Session 2 objectives

Real-world CSV problems (that break models)

What validate_and_align(...) must do

Example: validation rules

Friendly failures: prefer ValueError over assert

Micro-exercise: what should the error say? (6 minutes)

Solution (example)

Quick Check

Session 2 recap

Maghrib break

20 minutes

Session 3

Session 3 objectives

Run folders: prediction must be reproducible

Predict command anatomy

Sanity checks after predict

Optional ⭐: a tiny “skew check” idea

Session 3 recap

Isha break

20 minutes

Hands-on

Hands-on success criteria (today)

Project touch points (Day 4)

Task 1 — Predict on holdout_input (15 minutes)

Task 2 — Create an intentional failure (10 minutes)

Hint: easiest way to create a bad file

Task 3 — Verify/implement resolve_run_dir (10 minutes)

Solution (example logic)

Task 4 — Verify/implement validate_and_align (25 minutes)

Solution pattern (high level)

Task 5 — Output contract (10 minutes)

Task 6 — Tests + small doc update (15 minutes)

Vibe coding (safe version)

Git checkpoint (2 minutes)

Debug playbook (predict edition)

Stretch goals (optional ⭐)

Exit Ticket

What to do after class (Day 4 assignment)

Thank You!

What `validate_and_align(...)` must do

Friendly failures: prefer `ValueError` over `assert`

Sanity checks after `predict`

Task 3 — Verify/implement `resolve_run_dir` (10 minutes)

Task 4 — Verify/implement `validate_and_align` (25 minutes)