AI Professionals Bootcamp | Week 3
2025-12-31
schema/input_schema.jsontables/holdout_input.*ml-baseline predict works on new files with guardrailsNote
Don’t commit generated artifacts: models/runs/, outputs/, data/processed/.
Goal: run ml-baseline predict on new data reliably (schema guardrails + correct outputs).
Bootcamp • SDAIA Academy
predict and test on holdout + “new” filesBy the end of today, you can:
ml-baseline predict on holdout_input.* successfullyinput_schema.json (missing required columns → error; forbidden columns → error)Run prediction on your saved holdout input.
macOS/Linux
Windows PowerShell
Checkpoint: you produced outputs/preds.csv and it has prediction (and maybe score).
Today: Predict uses yesterday’s schema + holdout_input.
Inference mental model (inputs, outputs, thresholds)
Training - input: features + target (y) - output: a saved model (pipeline) - you compute metrics (holdout)
Inference - input: features only - output: predictions file - you must handle missing/extra columns safely
Warning
If the target leaks into inference input, your system becomes “cheat mode” and metrics become meaningless.
Input file (--input) - must include: schema.required_feature_columns - may include: schema.optional_id_columns (pass-through) - must NOT include: schema.forbidden_columns (usually the target)
Output file (--output) - always includes: prediction - classification also includes: score - includes ID columns if they were provided in the input
Score - a probability-like number - used to rank cases
Prediction - a 0/1 decision - made by score >= threshold
Example
| score | threshold | prediction |
|---|---|---|
| 0.82 | 0.50 | 1 |
| 0.49 | 0.50 | 0 |
| 0.82 | 0.90 | 0 |
The same score can become a different decision if you change the threshold.
Minimum ✅ - use a fixed threshold (0.50) for classification
Optional ⭐ - choose threshold to maximize F1 (max_f1) - choose threshold to meet a business rule (e.g., precision ≥ 0.80)
Tip
Pick one decision policy and document it in your model card.
Question: If we increase the threshold from 0.50 to 0.80, what usually happens?
Answer: B (fewer positives; precision often increases; recall often decreases).
In pairs:
outputs/preds.csv must containCheckpoint: you can explain score vs prediction in one sentence.
user_id (optional), score, predictionuser_id, customer_id, transaction_idWhen you return: open schema/input_schema.json in your latest run.
Schema validation + alignment (fail fast)
validate_and_align(df, schema)"12", "N/A")Tip
Your schema turns “mysterious errors” into clear messages.
validate_and_align(...) must doThis function is the “seatbelt” for your predict command.
If inference input contains:
is_high_value (target) → error: forbidden columnavg_spend_30d → error: missing required featurenotes → ignore (unless you choose to fail) ⭐ optionalWarning
Do not silently fill missing required features. Fail fast.
ValueError over assertHard to read - assert not missing - can be skipped with Python optimizations - message might be unclear
Better - raise ValueError("Missing required columns: ...") - always runs - clear for teammates
Asserts are okay for learning — but clear errors are better for shipping.
You receive this inference file columns:
Checkpoint: your message includes the exact bad column name.
is_high_value is present (forbidden)Forbidden columns present in inference input: ['is_high_value']Question: Should we automatically add a missing required feature column as zeros?
Answer: Usually no. That hides data problems and can silently degrade predictions.
(X, ids) so IDs can be preserved in outputsWhen you return: be ready to run predict on a file with an intentional mistake.
Predict end-to-end (run registry + sanity checks)
--run latest to predict without guessing pathsA trained run lives at:
And the “pointer” lives at:
Tip
latest.txt lets you predict without copy/pasting long paths.
Example
Under the hood:
latest → models/runs/<run_id>)predictMinimum checks you should do every time:
0 ≤ score ≤ 1Tip
If row counts don’t match, stop. Something is wrong.
Not required today, but good to know:
holdout_input:
We’ll do richer checks in Week 7 (MLOps). Today we just ship predict.
--run latest uses models/registry/latest.txtWhen you return: start Hands-on Task 1 immediately.
Implement/verify predict end-to-end
Minimum ✅ - uv run ml-baseline predict --run latest ... writes an output file - Input schema is enforced: - forbidden target column → clear error - missing required feature → clear error - Output includes optional IDs if they exist in the input - uv run pytest passes - 1+ commit pushed to GitHub
Optional ⭐ - Improve error messages (ValueError + actionable hint) - Add --threshold override behavior (classification) - Add a tiny skew-check script (missingness + ranges)
holdout_input.* inside your latest runpredict on itmacOS/Linux
Windows PowerShell
Checkpoint: output file exists and includes prediction.
Goal: prove your guardrails work.
holdout_input.* to outputs/bad_input.csvpredict againCheckpoint: predict fails with a clear message.
holdout_input.csv, open it and add a column header:
is_high_valueTip
You’re testing the contract, not the model.
resolve_run_dir (10 minutes)Open: src/ml_baseline/predict.py
Ensure:
--run latest loads models/registry/latest.txtmodels/runs/<run_id>Checkpoint: ml-baseline show-run latest prints run_meta.json.
validate_and_align (25 minutes)Open: src/ml_baseline/schema.py
Minimum behavior:
X with required features in schema orderids with optional ID columns (if present)Checkpoint: predict works on good input and fails on bad input.
Keep it boring. Reliability beats cleverness.
After prediction, check:
score, prediction (+ IDs)prediction (+ IDs)Checkpoint: you can point to one row and explain it.
reports/model_card.md:Checkpoint: tests pass and model card explains prediction.
Warning
Do not ask GenAI to write your solution code. Ask it to explain concepts or errors.
git status"w3d4: predict cli + schema guardrails"Checkpoint: repo shows the new commit online.
models/registry/latest.txtmodels/runs/<run_id>/model/model.joblibmodels/runs/<run_id>/schema/input_schema.jsonTip
Most predict bugs are column contract bugs, not model bugs.
--threshold override in predict (classification)--strict) to fail on extra columnsIn 1–2 sentences each:
--run latest possible?Due: before Day 5 (Jan 1, 2026)
predict on 2 inputs:
holdout_input.*reports/model_card.md:
Deliverable: GitHub repo link + screenshot of outputs/preds.csv (first 5 rows).
Tip
Tomorrow you’ll polish reporting + submission. Today is about reliable inference.