AI Professionals Bootcamp | Week 1
2025-12-18
Goal: Publish a clean GitHub repo for your CSV Profiler (CLI + Streamlit) with a clear README and a reproducible setup.
Bootcamp • SDAIA Academy
By the end of today, you can:
status, add, commit, log, diffrestore, revert (and when to avoid reset)origin, push, pullWarning
No Generative AI for coding this week.
Allowed: - clarifying questions (concepts, error meaning, docs navigation) - official documentation - your notes + course slides
Not allowed: - “write this code for me” - “fix my code” with pasted solutions - copying generated code into your repo
A public (or instructor-accessible) GitHub repository that contains:
csv_profiler/ package (either src/csv_profiler/ or csv_profiler/ at repo root)report.jsonreport.md.gitignorePYTHONPATH)Today you will run two entry points:
report.json + report.mduv venv -p 3.11 creates a project virtual environment in .venv/uv pip install -r requirements.txt installs the packages listed in requirements.txtPYTHONPATH)Today you will run two entry points:
report.json + report.mdpython -m some_package.some_module means: run that module as a programsrc/, Python won’t find it automatically → we temporarily set PYTHONPATH=srcTip
If you do not have a src/ folder (your csv_profiler/ folder is at repo root), you can skip PYTHONPATH.
PYTHONPATH)Today you will run two entry points:
report.json + report.mdPYTHONPATH (only needed for src/ layout)Mac/Linux (bash/zsh)
Windows PowerShell
From the repo root:
Mac/Linux (bash/zsh)
Windows PowerShell
Tip
If you do not have a src/ folder, skip the PYTHONPATH lines.
Git Essentials (commit history you can trust)
Git gives you:
Three places
One rule: only committed work is “saved”.
Question: If you edited app.py but didn’t commit it… is it “saved”?
Answer: It’s only on your machine. Git history doesn’t know it yet.
Check:
From your repo root:
You should see:
main)git status is your “dashboard”Run it constantly:
Practice reading:
Or stage everything:
View staged changes:
A good commit message:
Examples:
Add Typer CLI entrypointRender Markdown reportFix numeric parsing for empty strings.gitignore (protect your repo)You usually should NOT commit:
.venv/ (virtual env)__pycache__/outputs/ (generated reports).env (secrets)If you commit these, your repo becomes: - huge - noisy - sometimes unsafe
.gitignore for this projectTip
Sometimes Python creates __pycache__/ and *.py[cod] in your local machine to help it run your code faster. These files are specific to your machine, so keep them local and don’t commit them.
Note
*.py[cod] meens any file that has the extension *.pyc, *.pyo, or *.pyd.
.gitignore (8 minutes).gitignore file in the repo root.venv/__pycache__/outputs/Checkpoint: git status no longer lists .venv/ contents.
.gitignoreCreate .gitignore:
Then:
A compact view:
Undo unstaged changes (restore file from last commit):
Unstage a staged file:
Warning
Avoid git reset --hard unless you really know what you’re doing.
revert vs reset| Tool | What it does | Safe after pushing? | Use it when |
|---|---|---|---|
git revert <hash> |
Creates a new commit that undoes changes | ✅ | You already pushed a bad commit |
git reset --hard <hash> |
Moves branch pointer + rewrites history | ❌ | You have not pushed yet |
Show a file at a previous commit:
Show details of a commit:
You pushed a commit that breaks the CLI. You want to undo it safely.
A. git reset --hard HEAD~1
B. git revert HEAD
Answer: B (git revert) on shared branches.
Rename last commit message (no new content):
Add a missed file to the last commit:
Stop tracking a file without deleting it (useful after fixing .gitignore):
Warning
Only amend commits that you have not pushed yet.
Be careful: git rm without --cached deletes the file.
A branch is:
Common workflow:
main: stablefeature/...: new workGood patterns:
feature/<short-name>fix/<short-name>docs/<short-name>Avoid:
testfinal_final2wipgit switchYou will see two actions:
git stash (park work temporarily)When you must switch context but you’re not ready to commit:
Warning
Use stash short-term. Prefer commits for real progress.
feature/readmemainCheckpoint: git log --oneline --graph shows a merge.
status + diff to stay oriented.gitignore keeps repos clean20 minutes
GitHub Workflow (remote, push, README)
Think of GitHub as: “Google Drive for Git repos” (but with workflows).
Prefer clone because:
Clone:
Even solo, PRs can be useful for: - review before merging into main - discussion + feedback
What does origin mean?
A. Your current branch
B. The default remote name
C. A GitHub feature
Answer: B — it’s the default remote name (just a label).
originA “remote” is a named URL.
You usually have:
origin → your GitHub repoCheck:
On GitHub:
csv-profiler (example)Copy the URL from GitHub, then:
Now your future pushes can be:
Which comes first?
A. git push
B. git remote add origin ...
Answer: Add remote first (git remote add origin ...), then push.
If you see authentication errors:
Tip
Ask a clarifying question to the instructor if auth blocks you. Don’t spend 30 minutes stuck.
Before starting work each day:
If you are behind, Git updates your local branch.
A good README answers:
# CSV Profiler
Generate a profiling report for a CSV file.
## Features
- CLI: JSON + Markdown report
- Streamlit GUI: upload CSV + export reports
## Setup
uv venv -p 3.11
uv pip install -r requirements.txt
## Run CLI
# If you have a src/ folder:
# Mac/Linux: export PYTHONPATH=src
# Windows: $env:PYTHONPATH="src"
uv run python -m csv_profiler.cli profile data/sample.csv --out-dir outputs
## Run GUI
# If you have a src/ folder:
# Mac/Linux: export PYTHONPATH=src
# Windows: $env:PYTHONPATH="src"
uv run streamlit run app.pyAdd these sections:
## Setup## Run CLI## Run GUI## Output FilesCheckpoint: A new student can follow your README without asking you questions.
Then commit:
assets/ or images/.env in .gitignore)outputs/ ignored)origin is the remote name you’ll use most20 minutes
Polish + Submission Readiness
You’re “done” when:
.gitignorerequirements.txt (or equivalent)We will roughly do:
git clone ...data/sample.csvIf any step is confusing → points lost.
data/sample.csv should prove your app worksInclude a tiny CSV with:
Examples:
src/, set PYTHONPATH=src
export PYTHONPATH=src$env:PYTHONPATH="src"uv commands fail:
uv venv in the repo.venv/ existsIn README, add:
Checkpoint: Another student can run it in < 2 minutes.
## Manual Test Plan
1. Setup:
- `uv venv -p 3.11`
- `uv pip install -r requirements.txt`
2. CLI:
- (If you have a `src/` folder: set `PYTHONPATH=src` first)
- `uv run python -m csv_profiler.cli profile data/sample.csv --out-dir outputs`
3. Verify:
- `outputs/report.json` and `outputs/report.md` exist
4. GUI:
- (If you have a `src/` folder: set `PYTHONPATH=src` first)
- `uv run streamlit run app.py`
5. Export:
- download JSON + Markdown from the UIrequirements.txt is a plain text list of the packages (and versions) your project needs.
uv pip freeze prints “what’s installed” in your project environment> means “write this output into a file” (it will overwrite the file)From the repo root (where your .venv/ is):
Then commit:
Question: Should requirements.txt include your .venv/?
Answer: No. requirements.txt is text. .venv/ stays untracked.
Example:
Bad:
.env committedGood:
.env in .gitignore.env.example committed (no secrets)Warning
Once a secret is in Git history, removing it is hard. Treat repos as public.
.venv/ or huge files| Area | Points | What we look for |
|---|---|---|
| CLI works | 30 | Reads CSV, writes JSON + MD, helpful errors |
| Streamlit works | 30 | Upload CSV, preview, export JSON + MD |
| Code quality | 15 | Clear functions/modules, reasonable naming |
| Reproducibility | 15 | README + requirements, fresh-clone runnable |
| Git/GitHub hygiene | 10 | commits, .gitignore, pushed on time |
Passing (Week 1): ≥ 70/100
Conflict happens when:
Signs:
CONFLICT (content) message<<<<<<<, =======, >>>>>>>git statusgit add <file>git commitfeature/conflictREADME.mdmainmainmain → conflict appearsgit switch -c feature/conflict
# edit README.md (change SAME line)
git add README.md
git commit -m "Edit README on branch"
git switch main
# edit README.md (change SAME line differently)
git add README.md
git commit -m "Edit README on main"
git merge feature/conflict
# resolve file
git add README.md
git commit -m "Resolve merge conflict"requirements.txt)20 minutes
Week 1 Project — Finalize + Push
Goal: Push a polished Week 1 repo by tonight, 11:59pm.
Deliverable: A Gib link that anyone can run.
Work style: - work in pairs (review each other’s README + commands) - ask instructors clarifying questions quickly
data/sample.csv existsCheckpoint: Your repo can be tested without extra files.
Run both (from repo root):
Mac/Linux (bash/zsh)
Windows PowerShell
Checkpoint: Both run without editing code.
outputs/report.jsonoutputs/report.mdsample.csvIf one fails: fix it before touching GitHub.
requirements.txt (10 minutes)Checkpoint: file exists and is not empty.
Commit:
.gitignore is correct (10 minutes)Verify these are NOT tracked:
.venv/outputs/__pycache__/Check:
If you already committed .venv/ or outputs/ by mistake:
Then ensure .gitignore contains those patterns.
Your README must include:
Checkpoint: Your partner can follow it without help.
## Setup
uv venv -p 3.11
uv pip install -r requirements.txt
## Run CLI
# If you have a src/ folder:
# Mac/Linux: export PYTHONPATH=src
# Windows: $env:PYTHONPATH="src"
uv run python -m csv_profiler.cli profile data/sample.csv --out-dir outputs
## Run GUI
# If you have a src/ folder:
# Mac/Linux: export PYTHONPATH=src
# Windows: $env:PYTHONPATH="src"
uv run streamlit run app.pyCommit:
Checkpoint: You can open GitHub and see your files.
If you already had a remote but it’s wrong:
A tag is a human-friendly name for a specific commit.
Generic pattern:
Create a “submission tag” so it’s easy to find:
Checkpoint: GitHub shows the tag under Releases/Tags.
If push is rejected, first push commits:
On GitHub:
src/csv_profiler/...app.pyrequirements.txt.gitignoredata/sample.csvCheckpoint: Repo looks “professional”.
Option A: src/ layout (common in bootcamps)
Option B: “flat” layout (also acceptable for Week 1)
Send the following to the instructor/portal:
Checkpoint: Submission sent before 11:59pm.
Pick ONE:
int vs float--delimiter option in CLIYou can now:
uvNext week: Data Work (ETL + EDA)
In 1–2 sentences: