AI Professionals Bootcamp | Week 3
2026-01-01
You may use Generative AI only for clarifying questions.
Warning
Today is a submission day. Your repo must reflect your skill.
What you submit today (minimum ✅) - updated reports/model_card.md - updated reports/eval_summary.md - uv run pytest passes - uv run ruff check . passes (install dev extra if needed) - pushed to GitHub (public)
Note
Capstone teams + project ideas are finalized by end of Week 5 (Jan 15, 2026).
Goal: turn your working ML system into a reviewable, shippable repo.
Bootcamp • SDAIA Academy
By the end of today, you can:
model_card.md with: problem, data contract, split, metrics, limitationseval_summary.md with: baseline vs model, key caveats, recommendationruff + pytest + end-to-end CLI demoConfirm end-to-end still works on your latest run.
macOS/Linux
Windows PowerShell
uv run ml-baseline make-sample-data
uv run ml-baseline train --target is_high_value
$run_id = Get-Content models/registry/latest.txt
$holdout = (Get-ChildItem "models/runs/$run_id/tables" -Filter "holdout_input.*" | Select-Object -First 1).FullName
uv run ml-baseline predict --run latest --input $holdout --output outputs/preds.csvCheckpoint: outputs/preds.csv exists.
Your system is “done” when:
ml-baseline train saves a run folder with model + schema + metricsml-baseline predict produces a predictions file on new inputTip
A working model with no documentation is hard to trust.
Model card = data contract + honest story
model_card.md to match your actual run artifactsA model card is a short document that answers:
Note
Think: “If I leave the company, can a teammate understand and rerun this?”
Keep it simple and scannable:
Use your artifacts to fill the model card:
schema/input_schema.jsonrun_meta.json + models/registry/latest.txtmetrics/baseline_holdout.json + metrics/holdout_metrics.jsontables/holdout_predictions.*Tip
Copy numbers from artifacts. Do not “estimate” your metrics.
From your latest run folder, find:
run_idCheckpoint: you can point to the exact file for each fact.
models/registry/latest.txt (or run_meta.json)models/runs/<run_id>/metrics/baseline_holdout.jsonmodels/runs/<run_id>/metrics/holdout_metrics.jsonWrite 3–5 bullets that are true now:
Warning
Avoid vague limits like “model may be biased.” Be specific: “performance may drop for ___ because ___.”
When you return: open reports/model_card.md and your latest run folder side-by-side.
Turn artifacts into a clear evaluation summary
holdout_predictions.*eval_summary.md?It is a short decision memo:
Keep it short. The goal is clarity, not word count.
Choose 1–2 “primary” metrics.
Classification (common) - F1 or recall/precision (pick based on the decision) - ROC-AUC is OK as a secondary metric
Regression (common) - MAE as primary
Tip
Report the baseline metric next to the model metric.
Open your holdout_metrics.json and answer:
Checkpoint: your sentence is understandable by a non-ML teammate.
Use holdout_predictions.* to find:
Note
You don’t need perfect analysis. You need evidence you looked.
If your holdout_metrics.json includes a CI field (example: roc_auc_ci or mae_ci):
Optional does not block submission.
eval_summary.md is a decision memoWhen you return: run python -m json.tool on your metrics files and copy numbers into your report.
Final quality gate (tests, ruff, reproducibility)
From repo root:
uv run pytestuv run ruff check .uv run ruff format .Note
If ruff is missing: uv sync --extra dev.
src/ package structureinput_schema.jsonTip
When tests fail, read the assertion message first. It usually tells you what artifact is missing.
Answer in one sentence each:
Checkpoint: you can point to the exact file paths.
uv run ml-baseline train --target <your_target>uv run ml-baseline predict --run latest --input <file> --output outputs/preds.csvreports/model_card.md and reports/eval_summary.mdWhen you return: start Hands-on Task 1 and don’t stop until the checklist is green.
Fill reports + run checks + submit
Minimum ✅ - reports/model_card.md complete (no blanks) - reports/eval_summary.md complete (baseline vs model) - uv run pytest passes - uv run ruff check . passes - pushed to GitHub (public)
Optional ⭐ - add confidence intervals to the write-up (if available) - add 1 small error slice (metrics by a segment column)
models/registry/latest.txtmodels/runs/<run_id>/baseline_holdout.jsonholdout_metrics.jsonCheckpoint: you can open both JSON files.
macOS/Linux
Windows PowerShell
reports/eval_summary.md (25 minutes)tables/holdout_predictions.*Checkpoint: your summary includes numbers and a recommendation.
score and look at wrong predictionsabs(pred - y) and sortYou can do this in a quick notebook, or export a small CSV sample and inspect it.
In reports/eval_summary.md you have:
reports/model_card.md (30 minutes)Checkpoint: a teammate can run your commands without asking you questions.
# Model Card — Week 3 Baseline
## 1) What is the prediction?
- **Target (y):** `__________`
- **Unit of analysis:** one row = __________
- **Decision supported:** __________
## 2) Data contract (inference)
- **ID passthrough columns:** __________
- **Required feature columns (X):** __________
- **Forbidden columns:** `__________` (target + leakage)
## 3) Training recipe
- **Split strategy:** random holdout (test_size=___, seed=___)
- **Baseline:** Dummy (most_frequent / mean)
- **Model family:** __________ (pipeline: preprocessing + estimator)
## 4) Results (holdout)
- **Baseline:** __________
- **Model:** __________
## 5) Limitations + failure modes
- …
- …
- …
## 6) How to run
```bash
uv run ml-baseline train --target __________
uv run ml-baseline predict --run latest --input <file> --output outputs/preds.csv
```uv sync --extra devuv run ruff check .uv run pytestCheckpoint: both commands exit with code 0.
macOS/Linux
Windows PowerShell
Tip
If you used ruff format ., re-run tests after formatting.
git statusCheckpoint: your latest commit is visible online.
Warning
Do not commit models/runs/ or outputs/.
If something fails:
ruff not found → uv sync --extra devschema/input_schema.jsonpyproject.toml is)⭐ If your minimum is done:
country)--threshold-strategy max_f1 (classification) and report the chosen thresholdIn 1–2 sentences:
predict with a missing column, what should happen?Due: today (Jan 1, 2026)
reports/model_card.mdreports/eval_summary.mduv run pytestuv run ruff check .Deliverable: GitHub repo link.
Tip
Before you submit, clone your repo into a new folder and run the quickstart commands. If it works there, it will grade well.