Python for Data — Set Up the Lab

By the end of this chapter you'll have a clean, isolated Python data-science environment on your machine — the exact same toolbox a working business analyst opens every morning: a notebook, a virtual environment, and pandas ready to crunch a spreadsheet into answers.

Install Python → make a virtual environment → activate it → pip install the data stack → launch Jupyter → run one cell. That's a lab.

Why a whole "lab" and not just python.exe?

You've already stood up a LAMP server, so you know the drill: real work needs a real environment, not a global free-for-all. Python's version of "lock it down" is the virtual environment — a self-contained folder that holds one project's Python and its exact packages. Different project, different folder, zero version fights. This is the single habit that separates people who enjoy Python from people who fight it.

🐘 PHP: A venv is the spiritual cousin of Composer's vendor/ directory — per-project dependencies that don't leak into the rest of the system. The difference is a venv also pins the interpreter, not just the libraries.

Step 1: Get Python itself

You want Python 3.11 or newer. Check what you've got first — open a terminal and ask:

python --version
# or, on many systems:
python3 --version

If that prints Python 3.11.x (or higher), you're set. If it says "command not found" or shows something ancient like 3.8, grab the installer from python.org/downloads. On Windows, tick "Add Python to PATH" during install — skipping that box is the #1 reason beginners can't find python afterwards.

Windows vs. Linux naming: On Windows the command is usually python; on Lubuntu/macOS it's often python3 (and pip3). Wherever you see python below, use whichever one your machine answers to.

Step 2: Make a home for the project

Pick a folder for your analytics work and create a virtual environment inside it:

mkdir ba-lab
cd ba-lab
python -m venv .venv

python -m venv .venv means "run the built-in venv module and build an environment in a folder called .venv." The dot just hides it from casual folder listings. Nothing is installed globally — it all lives in that one folder you can delete anytime to start fresh.

Step 3: Activate it

Creating the venv isn't enough; you have to step into it so your terminal uses that Python instead of the system one.

# Windows (PowerShell):
.venv\Scripts\Activate.ps1

# Lubuntu / macOS:
source .venv/bin/activate

Your prompt should now show (.venv) at the front. That little tag is your proof you're inside the lab. Type deactivate any time to step back out.

PowerShell blocked it? If you get a "running scripts is disabled" error, run Set-ExecutionPolicy -Scope CurrentUser RemoteSigned once, then try activating again. It's a Windows safety setting, not a Python problem.

Step 4: Install the data stack

With the venv active, install the core business-analytics toolkit in one shot:

pip install pandas numpy matplotlib seaborn jupyterlab scikit-learn openpyxl

Quick tour of what you just installed, because names matter:

pandas — spreadsheets in code. The single most important library in this whole module.
numpy — fast numeric arrays; pandas is built on top of it.
matplotlib + seaborn — charts. Seaborn makes matplotlib look good with less effort.
jupyterlab — the notebook app where analysts actually live.
scikit-learn — your first machine-learning models (Chapter 10).
openpyxl — lets pandas read and write real .xlsx Excel files.

Step 5: Launch the notebook

jupyter lab

This opens JupyterLab in your browser. A notebook is a stack of cells — you type code in one, press Shift+Enter, and the output appears right underneath. It's a conversation with your data: ask a question, see the answer, ask the next one. This back-and-forth is why notebooks won the data-science world.

In JupyterLab, click the big Python 3 tile under "Notebook" to make a new notebook
In the first cell, type import pandas as pd and press Shift+Enter
No error = pandas loaded. In the next cell, run pd.__version__
It prints a version like 2.2.1. Your lab is live.

Hello, Data

Goal: prove the whole stack works end to end by turning three rows of numbers into a chart — in four cells.

Cell 1: import pandas as pd
Cell 2: build a tiny table —
sales = pd.DataFrame({'month': ['Jan','Feb','Mar'], 'revenue': [1200, 1800, 1500]})
Cell 3: just type sales and run it — Jupyter renders a clean table
Cell 4: sales.plot(x='month', y='revenue', kind='bar') — a bar chart appears inline

You just did the entire analytics loop — load, inspect, visualise — in under a minute. Everything after this is just doing it with bigger, messier, more interesting data.

You have Python installed, a virtual environment you can activate, the data stack inside it, and a notebook that renders a chart. That is a professional analytics setup. Keep this ba-lab folder — we'll build in it for the rest of the module.

Freeze your environment so it's reproducible: run pip freeze > requirements.txt. Open the file — it's the exact recipe of every package and version you installed. Anyone (including future-you on a new laptop) can recreate this lab with pip install -r requirements.txt. That one file is how teams stay in sync.