Functions & Modules — Package Your Logic

A function is a named, reusable chunk of logic — the cure for copy-paste. In analytics you'll wrap every repeated calculation (a margin, a growth rate, a cleaning step) in one, so it's defined once, tested once, and trusted everywhere. You'll also learn to pull in other people's functions via import, which is most of what data-science code actually is.

def name(args): defines a function, return hands a value back, default arguments make parameters optional, lambda is a throwaway one-liner, and import borrows code from the ecosystem.

Defining a function

def profit_margin(revenue, cost):
    profit = revenue - cost
    return profit / revenue

m = profit_margin(1800, 1100)
print(f"{m:.1%}")      # 38.9%

def opens the definition, the parameters sit in the parentheses, the colon and indentation mark the body, and return sends a value back to the caller. A function with no return hands back None (Python's "nothing", like PHP's null).

🐘 PHP: Nearly identical to function profitMargin($revenue, $cost) { ... return ...; } — just def instead of function, no $, no braces, no semicolons. Same call syntax. Functions are first-class in both languages.

Default and keyword arguments

Give a parameter a default and it becomes optional. Callers can also pass arguments by name, which makes call sites self-documenting — a habit worth forming early.

def grow(value, rate=0.10, periods=1):
    return value * (1 + rate) ** periods

grow(1000)                       # 1100.0  -> defaults used
grow(1000, 0.05)                 # 1050.0
grow(1000, periods=3)            # 1331.0  -> skip rate, name periods
grow(1000, rate=0.05, periods=2) # 1102.5  -> fully explicit

That ** is exponentiation (1.1 ** 3 = 1.1 cubed). Note how grow(1000, periods=3) reads almost like English — keyword arguments are a readability gift, especially when a function has several knobs.

The one trap: never use a mutable default like def f(items=[]). That empty list is created once and shared across every call, so it quietly accumulates data between calls and causes baffling bugs. The fix is the standard idiom: default to None, then build inside.

def add_item(item, items=None):
    if items is None:
        items = []
    items.append(item)
    return items

Docstrings — document the why

A triple-quoted string as the first line of a function is its docstring — official documentation that help(func) and your editor will surface. Worth writing for anything non-obvious.

def cagr(start, end, years):
    """Compound annual growth rate as a decimal (0.12 = 12%)."""
    return (end / start) ** (1 / years) - 1

Lambdas — disposable one-line functions

A lambda is a tiny anonymous function for when defining a whole def would be overkill — most often as the key= that tells sorted(), min(), or max() what to compare. You met one last chapter; here's the shape:

people = [{"name": "Ana", "spend": 900}, {"name": "Bo", "spend": 300}]

# sort by spend, biggest first
people.sort(key=lambda p: p["spend"], reverse=True)

# the single biggest spender
top = max(people, key=lambda p: p["spend"])

Read lambda p: p["spend"] as "given a person p, hand back their spend." Keep lambdas to one short expression — if it grows, promote it to a named def.

🐘 PHP: A lambda is PHP's fn($p) => $p['spend'] arrow function. Same purpose — a quick callback for sorting and mapping — just spelled lambda x: ....

Modules — borrowing the ecosystem

The real reason Python rules data science is its libraries, and you reach them with import. There are a few forms, and the aliasing convention matters because everyone uses the same nicknames:

import statistics                 # use as statistics.mean(...)
from statistics import mean       # pull one name in directly
import pandas as pd               # the universal alias
import numpy as np                # also universal

data = [10, 20, 30, 40]
from statistics import mean, median
print(mean(data))     # 25
print(median(data))   # 25

The standard library ships with batteries included — statistics, math, random, datetime, csv, json — no install needed. Then pip install brings the data stack on top. Sticking to the conventional aliases (pd, np, plt) means your code looks like every tutorial and every colleague's code, which matters more than it sounds.

🐘 PHP: import is the grown-up version of require — but instead of pasting a file's contents, it pulls a namespaced module you call into. from x import y is like a targeted use statement. Composer brings packages; pip brings packages; import is how you actually use them.

A Reusable Metrics Toolkit

Goal: build your own importable module of business metrics, then use it from a separate script — exactly how real projects grow.

Create metrics.py in ba-lab with three documented functions:

def margin(revenue, cost):
    """Profit margin as a decimal."""
    return (revenue - cost) / revenue

def growth(old, new):
    """Period-over-period growth as a decimal."""
    return (new - old) / old

def cagr(start, end, years):
    """Compound annual growth rate as a decimal."""
    return (end / start) ** (1 / years) - 1

Create report.py in the same folder and import your own module:

import metrics

print(f"Margin: {metrics.margin(1800, 1100):.1%}")
print(f"Growth: {metrics.growth(1200, 1500):.1%}")
print(f"CAGR:   {metrics.cagr(1000, 2000, 3):.1%}")

Run it: python report.py

You just wrote a library and consumed it — the same relationship pandas has with your notebook, only now you're on the authoring side. This is how a pile of scripts becomes a maintainable project.

You can define functions with defaults and keyword args, dodge the mutable-default trap, write lambdas for sorting, and import both standard-library and your own modules. That's the toolkit for keeping analytics code DRY and trustworthy.

Add a fourth function to metrics.py: def summary(rows) that takes a list-of-dicts (each with a "rev" key) and returns a dict with the total, average, max, and count. Reuse sum(), max(), and len() inside it. Now you've got a single call that profiles a dataset — the seed of what DataFrame.describe() does for you automatically in two chapters' time.