Python Programming Refresher¶
This refresher covers the Python concepts you'll need for the Jupyter notebook sessions on Day 2, where you'll collect sensor data, train ML models, and export TFLite models.
No prior Python experience? This page starts from the very basics — what Python is, how to write your first program, and builds up to the ML/DL concepts you'll use in the workshop. If you've only written "Hello World" in school, you're in the right place.
If you're already comfortable with Python, skim the basics and jump to the ML/DL section.
Why Python in This Workshop?¶
Python is used in two places:
- TinyML Training (Day 2) — You'll use Jupyter notebooks to train a neural network on sensor data and export it as a TFLite model
- Data Collection & Analysis — Parsing serial data, visualizing sensor readings, and preparing datasets
The firmware on the ESP32-S3 is always C — Python is only used on your laptop for the ML pipeline.
What Is Python?¶
Python is a programming language that reads almost like English. Unlike C, where you have to declare types and manage memory manually, Python handles a lot of that for you.
Your First Python Program¶
That's it. One line. No #include, no main(), no semicolons. Python is designed to be simple.
# Variables — no type declaration needed
name = "ESP32" # Python figures out this is a string
temperature = 25.5 # this is a float
count = 42 # this is an integer
is_active = True # this is a boolean (True/False)
# You can change a variable to a different type (Python is flexible)
x = 10 # x is an integer
x = 3.14 # now x is a float — Python allows this
x = "hello" # now x is a string — Python allows this too
Comments in Python
In Python, comments start with #. Everything after # on that line is ignored by Python.
Python vs C — Quick Comparison¶
| Feature | C | Python |
|---|---|---|
| Hello World | 4 lines (include, main, printf, return) | 1 line (print()) |
| Types | Must declare (int x = 5;) |
Automatic (x = 5) |
| Semicolons | Required at end of every statement | Not needed (but allowed) |
| Blocks | Curly braces { } |
Indentation (spaces/tabs) |
| Memory | Manual (malloc / free) |
Automatic (garbage collector) |
| Speed | Very fast (compiled) | Slower (interpreted) — but fast enough for ML training |
| Use in workshop | ESP32 firmware | ML model training on your laptop |
Variables and Types¶
Python is dynamically typed — no need to declare types, but you can add type hints for clarity.
# Basic types
temperature = 25.5 # float — decimal number
count = 42 # int — whole number
label = "normal" # str — text (string)
is_active = True # bool — True or False
# Type hints (optional but helpful for readability)
temperature: float = 25.5
readings: list[float] = []
# Check the type of a variable
print(type(temperature)) # <class 'float'>
print(type(count)) # <class 'int'>
print(type(label)) # <class 'str'>
Converting Between Types¶
# String to number
age_str = "25"
age = int(age_str) # 25 (now an integer)
price = float("9.99") # 9.99 (now a float)
# Number to string
msg = "Count: " + str(42) # "Count: 42"
# Float to int (truncates — does NOT round)
x = int(3.9) # 3 (not 4!)
Data Structures¶
Lists — Ordered, Changeable Collections¶
A list is like an array in C, but it can grow/shrink and hold mixed types.
# Create a list
temperatures = [22.5, 23.1, 24.0, 25.5, 26.2]
# Access by index (starts at 0, just like C)
first = temperatures[0] # 22.5
last = temperatures[-1] # 26.2 (negative index = from the end)
# Slice — get a range
middle = temperatures[1:4] # [23.1, 24.0, 25.5]
# 1:4 means "from index 1 up to (not including) index 4"
# Modify
temperatures.append(27.0) # add to end → [22.5, ..., 27.0]
temperatures.insert(0, 21.0) # insert at index 0
temperatures.remove(24.0) # remove by value
temperatures.pop() # remove and return last item
# Length
n = len(temperatures) # number of items
# List comprehensions — create lists with a formula
celsius = [t for t in temperatures if t > 25.0]
# [25.5, 26.2, 27.0] — only values above 25
fahrenheit = [(t * 9/5) + 32 for t in temperatures]
# Convert every temperature to Fahrenheit
Dictionaries — Key-Value Pairs¶
A dictionary (dict) maps keys to values — like a real dictionary maps words to definitions.
# Create — key-value pairs
sensor_reading = {
"temperature": 25.5,
"humidity": 60.0,
"label": "normal"
}
# Access by key
temp = sensor_reading["temperature"] # 25.5
# Safe access — returns default if key doesn't exist
pressure = sensor_reading.get("pressure", 1013.25) # 1013.25
# Add or update
sensor_reading["timestamp"] = 1713427200 # new key
sensor_reading["temperature"] = 26.0 # update existing
# Iterate over all key-value pairs
for key, value in sensor_reading.items():
print(f"{key}: {value}")
# Check if a key exists
if "temperature" in sensor_reading:
print("Temperature is present")
Tuples — Immutable (Unchangeable) Sequences¶
# Tuples can't be modified after creation — useful for fixed groupings
point = (3.14, 2.71)
x, y = point # unpacking — x=3.14, y=2.71
# Common in ML: (features, label)
sample = ([25.5, 60.0], "normal")
features, label = sample
Control Flow¶
Conditionals — Making Decisions¶
value = 75
# if / elif / else — note the indentation!
if value > 80:
print("High")
elif value > 50: # elif = "else if"
print("Normal")
else:
print("Low")
# Ternary expression — one-line conditional
status = "hot" if temperature > 30 else "normal"
Indentation matters in Python!
Python uses indentation (spaces) to define code blocks instead of curly braces { }. 4 spaces is the standard. If your indentation is wrong, Python will give an IndentationError.
Loops — Repeating Actions¶
# for loop — iterate over items in a list
for temp in temperatures:
print(f"Reading: {temp}°C")
# for with enumerate — when you need the index too
for i, temp in enumerate(temperatures):
print(f"Sample {i}: {temp}°C")
# for with range — repeat a specific number of times
for epoch in range(50): # epoch goes from 0 to 49
print(f"Training epoch {epoch}")
# while loop — repeat while a condition is true
retry_count = 0
while retry_count < 5:
print(f"Attempt {retry_count}")
retry_count += 1
Functions¶
# Basic function
def celsius_to_fahrenheit(celsius):
return (celsius * 9/5) + 32
result = celsius_to_fahrenheit(25.5) # 77.9
# Function with default arguments
def read_sensor(gpio, samples=10, delay_ms=100):
readings = []
for _ in range(samples):
readings.append(analog_read(gpio))
time.sleep(delay_ms / 1000)
return readings
# Call with defaults
data = read_sensor(2) # uses defaults: samples=10, delay_ms=100
data = read_sensor(2, samples=20) # override samples only
data = read_sensor(2, delay_ms=50) # override delay only
# Multiple return values (tuple unpacking)
def min_max(data):
return min(data), max(data)
lo, hi = min_max([22.5, 25.5, 26.2]) # lo=22.5, hi=26.2
# Lambda — short anonymous function (useful for sorting/filtering)
readings = [(25.5, "normal"), (35.0, "hot"), (20.0, "cold")]
sorted_readings = sorted(readings, key=lambda x: x[0]) # sort by temperature
String Formatting¶
device_id = "XIAO-S3-01"
temp = 25.5
# f-strings (preferred — Python 3.6+) — put variable inside {}
print(f"Device {device_id}: {temp:.1f}°C") # Device XIAO-S3-01: 25.5°C
# Multiple values
print(f"Temp={temp:.1f}, Humidity={humidity:.0f}%")
# Formatting options
print(f"Hex: {255:#04x}") # 0xff
print(f"Pad: {5:03d}") # 005
print(f"Percent: {0.85:.0%}") # 85%
File I/O¶
You'll read sensor data from CSV files and write training results.
# Read a file line by line
with open("sensor_data.csv", "r") as f: # "r" = read mode
header = f.readline().strip().split(",")
for line in f:
values = line.strip().split(",")
temp, hum, label = float(values[0]), float(values[1]), values[2]
# Write to a file
with open("output.csv", "w") as f: # "w" = write mode (overwrites!)
f.write("temperature,humidity,label\n")
for row in dataset:
f.write(f"{row[0]},{row[1]},{row[2]}\n")
# Using csv module (handles quoting and edge cases properly)
import csv
with open("sensor_data.csv", "r") as f:
reader = csv.DictReader(f)
for row in reader:
temp = float(row["temperature"])
label = row["label"]
The with statement
with open(...) as f: automatically closes the file when you're done, even if an error occurs. Always use with — never f = open(...) without closing.
Working with NumPy¶
NumPy is the foundation for all numerical computing in Python. You'll use it for data manipulation and as the basis for TensorFlow/Keras.
import numpy as np
# Create arrays
data = np.array([22.5, 23.1, 24.0, 25.5, 26.2])
zeros = np.zeros(100) # 100 zeros
ones = np.ones((3, 4)) # 3×4 matrix of ones
range_arr = np.arange(0, 10, 2) # [0, 2, 4, 6, 8]
# Statistics
mean = np.mean(data) # 24.26
std = np.std(data) # standard deviation
minimum = np.min(data) # 22.5
maximum = np.max(data) # 26.2
# Normalization (critical for ML — see ML section below)
normalized = (data - np.mean(data)) / np.std(data)
# Min-max scaling (0 to 1)
scaled = (data - np.min(data)) / (np.max(data) - np.min(data))
# Reshape (ML models need specific input shapes)
features = data.reshape(-1, 1) # column vector: [[22.5], [23.1], ...]
# Boolean indexing — filter data with conditions
hot_readings = data[data > 25.0] # [25.5, 26.2]
Working with Matplotlib¶
Visualize your sensor data and training curves.
import matplotlib.pyplot as plt
# Line plot — sensor readings over time
temperatures = [22.5, 23.1, 24.0, 25.5, 26.2, 27.0]
plt.plot(temperatures, marker='o')
plt.xlabel("Sample Index")
plt.ylabel("Temperature (°C)")
plt.title("Sensor Readings")
plt.grid(True)
plt.show()
# Training loss curve
epochs = range(1, 51)
loss = [0.8, 0.6, 0.45, 0.35, 0.28] # your training losses
plt.plot(epochs, loss)
plt.xlabel("Epoch")
plt.ylabel("Loss")
plt.title("Training Progress")
plt.show()
# Scatter plot — feature visualization
plt.scatter(temperatures, humidities, c=labels, cmap='viridis')
plt.xlabel("Temperature (°C)")
plt.ylabel("Humidity (%)")
plt.colorbar(label="Class")
plt.show()
Basics of Machine Learning¶
Machine Learning (ML) is teaching a computer to find patterns in data without being explicitly programmed with rules.
The Intuition — Learning Like a Human¶
Imagine you're teaching a child to recognize cats:
- Traditional programming: You write rules — "if it has pointy ears, whiskers, and a tail, it's a cat." But what about a lynx? A cartoon cat? A cat from behind? You can never write enough rules.
- Machine Learning: You show the child 1000 photos of cats and 1000 photos of dogs. The child figures out the patterns on their own. You never told them how to tell the difference — they just learned from examples.
That's the key idea: instead of writing rules, you provide examples, and the computer learns the rules by itself.
In our workshop, instead of cats and dogs, we'll classify sensor readings — is the environment normal, hot, or humid?
A Simple Example — Predicting the Weather¶
Let's say you want to predict whether it will rain today. You look at two things:
- Humidity (how much water is in the air)
- Air pressure (how heavy the air is pushing down)
You collect data for 100 days:
| Day | Humidity (%) | Pressure (hPa) | Rained? |
|---|---|---|---|
| 1 | 45 | 1020 | No |
| 2 | 82 | 1005 | Yes |
| 3 | 50 | 1018 | No |
| 4 | 90 | 998 | Yes |
| 5 | 55 | 1015 | No |
| ... | ... | ... | ... |
After seeing enough examples, a pattern emerges:
- High humidity + Low pressure → Rain
- Low humidity + High pressure → No rain
You didn't write this rule — the computer found it by looking at the data. That's ML!
How Is This Different from Regular Programming?¶
Traditional Programming:
Input + Rules written by you → Output
Machine Learning:
Input + Output (examples) → Rules learned by the computer
In traditional programming, you figure out the rules and code them. In ML, the computer figures out the rules from examples. This is powerful when the rules are too complex to write by hand (like recognizing a cat) or when you don't even know what the rules are.
Types of Machine Learning¶
| Type | What It Does | Analogy | Example in Workshop |
|---|---|---|---|
| Supervised Learning | Learn from labeled examples (input + correct answer) | Studying with a textbook that has answer keys | "This sensor reading is hot" → learn to classify new readings |
| Unsupervised Learning | Find patterns without labels (no correct answers given) | Exploring a new city without a map — you find neighborhoods on your own | Group similar sensor readings together without knowing the categories |
| Reinforcement Learning | Learn by trial and error with rewards | Training a dog — treat for good behavior, nothing for bad | (Not used in this workshop) |
We use supervised learning in this workshop — we train the model on data where we already know the correct answer (the label).
Supervised Learning — The Two Flavors¶
Predicting a category (a label from a fixed set of options).
Examples:
- Is this email spam or not spam?
- Is this sensor reading normal, hot, or humid?
- Is this image a cat, dog, or bird?
Predicting a number (a continuous value).
Examples:
- What will the temperature be tomorrow? → 28.3°C
- How much will this house sell for? → $350,000
- How many units will we sell next month? → 1,247
In this workshop, we use classification — we classify sensor readings into categories like "normal", "hot", and "humid".
The ML Pipeline (What You'll Do)¶
Think of the ML pipeline like cooking a meal:
1. Collect Data → Buy ingredients (read sensor values from ESP32)
2. Preprocess Data → Wash, chop, measure (normalize, clean, split into train/test)
3. Train Model → Cook the recipe (feed data to a neural network)
4. Evaluate Model → Taste test (check accuracy on test data)
5. Export Model → Pack leftovers (convert to TFLite for ESP32)
6. Deploy Model → Serve the dish (run inference on the ESP32-S3)
Each step matters — bad ingredients = bad food, bad data = bad model.
Key ML Terms — Explained Simply¶
| Term | Meaning | Analogy |
|---|---|---|
| Feature | An input variable (e.g., temperature, humidity) | A clue that helps you guess the answer |
| Label | The correct answer (e.g., "hot", "normal") | The answer key at the back of the textbook |
| Training | Showing the model many examples so it learns | Studying with flashcards |
| Inference | Using the trained model to make a prediction | Taking the exam |
| Loss | How wrong the model's predictions are | Your exam score (lower = better, unlike school!) |
| Epoch | One complete pass through all training data | Going through all flashcards once |
| Batch | A small group of examples processed together | Studying a few flashcards at a time |
| Accuracy | Percentage of correct predictions | Your grade on the exam |
| Overfitting | Model memorizes training data but fails on new data | Memorizing answers without understanding — you ace the practice test but fail the real exam |
| Underfitting | Model is too simple to learn the patterns | Studying only 5 flashcards — you didn't learn enough |
| Weights | Numbers the model adjusts during training to make better predictions | The "knowledge" the model has — like notes you take while studying |
| Bias | A number added to each neuron's output (like a baseline offset) | The "starting point" — like always guessing the most common answer before you learn anything |
Overfitting vs Underfitting — The Goldilocks Problem¶
Underfitting (too simple) Just Right Overfitting (too complex)
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ · · · │ │ · · │ │ · · │
│ · · · │ │ · · · │ │ · · · │
│ · · · │ │ · · · │ │ · · · │
│ │ │ · · │ │ · · │
│ ─ ─ ─ ─ ─ ─ ─ │ │ ~~~~~~~~~~~ │ │ /\/\//\/\/\/\ │
│ (straight line │ │ (smooth curve │ │ (wiggly curve │
│ misses pattern)│ │ fits pattern)│ │ memorizes noise)│
└─────────────────┘ └─────────────────┘ └─────────────────┘
- Underfitting: The model is too simple — like trying to fit a straight line through curved data. It can't capture the pattern.
- Overfitting: The model is too complex — it memorizes every tiny detail, including noise. It fails on new, unseen data.
- Just right: The model captures the real pattern without memorizing noise.
How to avoid overfitting
- Use more training data — more examples means less memorization
- Use fewer neurons/layers — simpler model = less capacity to memorize
- Use early stopping — stop training when validation loss starts increasing
- Use dropout — randomly turn off some neurons during training (forces the model to learn robust patterns)
Data Preprocessing — Why It Matters¶
ML models work best when input data is small numbers centered around 0. Raw sensor data (like temperature = 45.5, humidity = 89.0) needs preprocessing.
Think of it like this: if one feature ranges from 0–100 and another from 0–1, the model will think the first feature is 100× more important just because the numbers are bigger. Normalization fixes this by putting everything on the same scale.
import numpy as np
# Raw sensor data
temperatures = np.array([22.5, 23.1, 24.0, 25.5, 26.2, 35.0, 40.1])
humidity = np.array([45.0, 50.2, 55.0, 60.0, 65.3, 80.0, 90.0])
labels = np.array([0, 0, 0, 0, 0, 1, 1]) # 0=normal, 1=hot
# Step 1: Combine features into a 2D array (samples × features)
X = np.column_stack([temperatures, humidity])
# [[22.5, 45.0], [23.1, 50.2], ...]
# Step 2: Normalize — scale each feature to mean=0, std=1
# Formula: normalized = (value - mean) / standard_deviation
# This transforms values so they're centered around 0 with similar ranges
X_mean = X.mean(axis=0) # mean of each column
X_std = X.std(axis=0) # std of each column
X_normalized = (X - X_mean) / X_std
# Before normalization: temperature=40.1, humidity=90.0
# After normalization: temperature=1.52, humidity=1.38
# Now both features have similar ranges — the model treats them equally!
# Step 3: Split into training and testing sets (80/20)
# CRITICAL: The model must be tested on data it has NEVER seen before
# Otherwise, you're just testing its memorization, not its learning
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(
X_normalized, labels, test_size=0.2, random_state=42
)
print(f"Training samples: {len(X_train)}") # used to teach the model
print(f"Testing samples: {len(X_test)}") # used to check if it actually learned
Always save the mean and std!
When you deploy the model on the ESP32, you need to normalize sensor readings using the same mean and std from training. Save these values — you'll hardcode them in your C firmware.
Why Split Data into Train and Test?¶
Imagine you're studying for a math exam:
- Training data = practice problems you study with
- Test data = the actual exam questions you've never seen
If the exam used the exact same questions as the practice problems, you could just memorize the answers and score 100% — but you wouldn't actually understand the math. That's overfitting.
By testing on unseen data, you check whether the model truly learned the patterns or just memorized the examples.
Basics of Deep Learning¶
Deep Learning is a subset of ML that uses neural networks — layers of interconnected "neurons" that learn patterns. The "deep" in deep learning just means many layers stacked together.
What Is a Neural Network?¶
Think of a neural network as a team of people in a factory, each doing a simple job, passing their work to the next person:
Raw Material Assembly Line Final Product
(temperature, → Person 1 → Person 2 → "hot" (95%)
humidity) | | "normal" (4%)
"Is temp "Is temp high "humid" (1%)
above 25?" AND humid?"
- Person 1 looks at the raw numbers and notices simple things: "temperature is above average"
- Person 2 combines those observations: "high temperature AND high humidity"
- Final person makes the call: "This is hot — 95% sure"
Each person (neuron) does a simple calculation. But together, they can learn very complex patterns.
A Real Example — The Sensor Classifier¶
Let's say we want to classify sensor readings into three categories: normal, hot, or humid.
Our neural network would look like this:
Input Layer Hidden Layer 1 Hidden Layer 2 Output Layer
┌──────────┐ ┌──────────────┐ ┌──────────────┐ ┌──────────┐
│ Temp: 35.5│──┬──→ │ Neuron 1 │──┬─→│ Neuron 1 │──┬─→│ normal: │
│ │ │ │ (is it hot?)│ │ │ (hot+humi?)│ │ │ 0.04 │
│ Humid: 85 │──┤──→ │ Neuron 2 │──┤─→│ Neuron 2 │──┤─→│ hot: │
│ │ │ │ (is it humid│ │ │ (only hot?)│ │ │ 0.93 │
└──────────┘ └──→ │ Neuron 3 │──┘─→│ Neuron 3 │──┘─→│ humid: │
│ (both high?) │ │ (only humid?│ │ 0.03 │
└──────────────┘ └──────────────┘ └──────────┘
- 2 inputs → temperature and humidity
- 3 neurons in hidden layer 1 → each detects a different simple pattern
- 3 neurons in hidden layer 2 → each combines patterns from layer 1
- 3 outputs → one probability per category (they sum to 1.0 = 100%)
The highest probability wins. Here, "hot" at 93% is the prediction.
Neurons — The Building Blocks¶
A neuron is like a tiny decision-maker. It takes inputs, does some math, and produces an output.
┌─────────┐
temperature ─────→│ │
(35.5) │ NEURON │────→ output (e.g., 0.8 = "yes, it's hot")
humidity ───────→│ │
(85.0) └─────────┘
Inside the neuron:
output = activation( (temperature × weight₁) + (humidity × weight₂) + bias )
Let's break this down:
- Weights — How important each input is. If
weight₁is large, temperature matters a lot. Ifweight₂is small, humidity doesn't matter much. - Bias — A baseline value. Think of it as the neuron's "default answer" before looking at any inputs.
- Activation function — Decides the final output format. Without it, the neuron can only do linear math (like a calculator). With it, the neuron can make decisions (like a brain).
A concrete example:
# A single neuron doing its calculation
temperature = 35.5
humidity = 85.0
weight1 = 0.8 # temperature is important
weight2 = 0.3 # humidity is less important
bias = -15.0 # baseline offset
# Step 1: Weighted sum
weighted_sum = (temperature * weight1) + (humidity * weight2) + bias
# = (35.5 × 0.8) + (85.0 × 0.3) + (-15.0)
# = 28.4 + 25.5 - 15.0
# = 38.9
# Step 2: Apply activation function (ReLU)
output = max(0, weighted_sum) # ReLU: if negative, output 0; if positive, pass through
# = max(0, 38.9) = 38.9
# This neuron says: "Yes, this is hot!" (large positive output)
During training, the model adjusts the weights and bias to make better predictions. That's what "learning" means in ML — finding the right weights and biases.
Activation Functions — The Decision Makers¶
Without an activation function, a neural network is just a fancy calculator — it can only do multiplication and addition. Activation functions add non-linearity, which means the network can learn complex, curvy patterns instead of just straight lines.
| Activation | What It Does | Formula | When to Use | Analogy |
|---|---|---|---|---|
| ReLU | If input is negative, output 0. If positive, pass it through. | max(0, x) |
Hidden layers (most common) | A bouncer at a club — negative vibes get blocked, positive vibes pass through |
| Sigmoid | Squishes any number into the range 0.0 to 1.0 | 1 / (1 + e^(-x)) |
Binary classification output (yes/no) | A dimmer switch — any brightness becomes a value between off (0) and full (1) |
| Softmax | Converts a list of numbers into probabilities that sum to 1.0 | e^x / sum(e^all) |
Multi-class classification output | A pie chart — each slice is a percentage, all slices add up to 100% |
| Tanh | Squishes any number into the range -1.0 to 1.0 | (e^x - e^(-x)) / (e^x + e^(-x)) |
Sometimes used in hidden layers | Like sigmoid but centered at 0 — can express "against" (-1) or "for" (+1) |
import numpy as np
# ReLU — the most popular activation function
def relu(x):
return max(0, x)
print(relu(-5)) # 0 (negative → blocked)
print(relu(0)) # 0 (zero → zero)
print(relu(3.7)) # 3.7 (positive → passed through)
# Sigmoid — for binary classification (yes/no)
def sigmoid(x):
return 1 / (1 + np.exp(-x))
print(sigmoid(-5)) # 0.007 (very unlikely)
print(sigmoid(0)) # 0.5 (50/50 — not sure)
print(sigmoid(5)) # 0.993 (very likely)
# Softmax — for multi-class classification (pick one of many)
def softmax(x):
exp_x = np.exp(x - np.max(x)) # subtract max for numerical stability
return exp_x / exp_x.sum()
logits = np.array([2.0, 5.0, 1.0]) # raw scores for [normal, hot, humid]
probs = softmax(logits)
print(probs) # [0.04, 0.93, 0.03] → 93% chance it's "hot"
How Training Works — Step by Step¶
Training a neural network is like learning to throw darts blindfolded, with someone telling you how far off you are each time:
- Forward pass (throw the dart): Input data flows through the network → produces a prediction
- Calculate loss (measure how far from the bullseye): Compare prediction to the correct answer → measure how wrong it is
- Backward pass / backpropagation (figure out what went wrong): Calculate how each weight contributed to the error — "the dart went left because my elbow was too high"
- Update weights (adjust your aim): Adjust weights slightly to reduce the error — "move elbow down a tiny bit"
- Repeat for many epochs until the loss stops decreasing — "keep practicing until you hit the bullseye consistently"
Epoch 1: Loss = 0.80 (very wrong — dart missed the board)
Epoch 5: Loss = 0.60 (getting closer — on the board now)
Epoch 10: Loss = 0.45 (getting better — hitting the outer ring)
Epoch 30: Loss = 0.12 (pretty good — near the bullseye)
Epoch 50: Loss = 0.05 (accurate! — hitting the bullseye consistently)
The Learning Rate — How Big of a Step to Take¶
The learning rate controls how much the weights change after each training step:
- Too high (e.g., 1.0): The model jumps around wildly, never settling on good weights — like taking huge steps and overshooting the target
- Too low (e.g., 0.00001): The model takes tiny steps, taking forever to learn — like inching toward the target at a snail's pace
- Just right (e.g., 0.001): The model makes steady progress toward good weights — like walking confidently toward the target
# The Adam optimizer automatically adjusts the learning rate — you rarely need to change it
model.compile(
optimizer='adam', # Adam = Adaptive Moment Estimation — smart learning rate
loss='sparse_categorical_crossentropy',
metrics=['accuracy']
)
Why "Deep" Learning?¶
A neural network with 1-2 hidden layers is just "neural network" or "shallow learning." When you stack 3 or more layers, it's called deep learning. More layers = more complex patterns the network can learn.
Shallow (1-2 layers): Deep (3+ layers):
Input → Hidden → Output Input → H1 → H2 → H3 → ... → Output
Learns simple patterns Learns hierarchical patterns:
like "is X > threshold?" H1: edges
H2: shapes
H3: objects
H4: scenes
In our workshop, we use a small deep network (2 hidden layers) — enough to classify sensor data, but small enough to run on the ESP32-S3.
How Big Should My Network Be?¶
| Network Size | Neurons | Pros | Cons | Use Case |
|---|---|---|---|---|
| Tiny | 4-8 per layer | Fast to train, fits on microcontroller | May underfit (too simple) | Simple 2-feature classification |
| Small | 16-32 per layer | Good balance, fits on ESP32 | Limited complexity | Our workshop models |
| Medium | 64-128 per layer | Can learn complex patterns | Won't fit on microcontroller | Phone/edge device models |
| Large | 256+ per layer | Very powerful | Needs GPU, won't fit on MCU | Cloud/server models |
Bigger is not always better
A huge network will memorize your training data (overfitting) instead of learning real patterns. Start small and only add more neurons if the model underfits.
TensorFlow / Keras Basics¶
TensorFlow is Google's ML framework. Keras is its high-level API that makes building neural networks easy.
Building a Model¶
import tensorflow as tf
from tensorflow import keras
import numpy as np
# Define a simple neural network
model = keras.Sequential([
# Input layer + first hidden layer
# Dense = fully connected (every input connects to every neuron)
# 16 = number of neurons
# relu = activation function (most common for hidden layers)
# input_shape=(2,) = 2 input features (temperature, humidity)
keras.layers.Dense(16, activation='relu', input_shape=(2,)),
# Second hidden layer — 8 neurons
keras.layers.Dense(8, activation='relu'),
# Output layer — 3 neurons (one per class: normal, hot, humid)
# softmax = converts outputs to probabilities that sum to 1.0
keras.layers.Dense(3, activation='softmax')
])
# Print a summary of the model
model.summary()
Compiling the Model¶
Before training, you must tell Keras how to train:
model.compile(
optimizer='adam', # How to update weights (adam is the best default)
loss='sparse_categorical_crossentropy', # How to measure error (for integer labels)
metrics=['accuracy'] # What to track during training
)
| Setting | What It Means | Why This Choice |
|---|---|---|
optimizer='adam' |
Algorithm that adjusts weights | Best general-purpose optimizer |
loss='sparse_categorical_crossentropy' |
Error function for multi-class with integer labels | Labels are 0, 1, 2 (not one-hot) |
metrics=['accuracy'] |
Track percentage of correct predictions | Easy to understand |
Training the Model¶
# Train the model
history = model.fit(
X_train, # input features (normalized)
y_train, # correct labels
epochs=50, # number of complete passes through the data
batch_size=32, # examples per gradient update
validation_split=0.2 # use 20% of training data to monitor overfitting
)
# Plot training progress
import matplotlib.pyplot as plt
plt.plot(history.history['loss'], label='Training Loss')
plt.plot(history.history['val_loss'], label='Validation Loss')
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.legend()
plt.show()
Evaluating and Predicting¶
# Evaluate on test data (data the model has never seen)
loss, accuracy = model.evaluate(X_test, y_test)
print(f"Test Accuracy: {accuracy:.2%}")
# Make a prediction on new data
new_reading = np.array([[25.5, 60.0]]) # temperature, humidity (normalized!)
prediction = model.predict(new_reading)
# prediction = [[0.05, 0.90, 0.05]] → 90% chance it's class 1
predicted_class = np.argmax(prediction) # 1 (the class with highest probability)
class_names = ["normal", "hot", "humid"]
print(f"Prediction: {class_names[predicted_class]}")
Exporting to TFLite (For ESP32)¶
The trained model must be converted to TFLite format — a small, optimized version that runs on microcontrollers.
# Convert to TFLite with INT8 quantization
# INT8 = 8-bit integers → much smaller model, faster inference on ESP32
# Step 1: Create a representative dataset generator
# This provides sample inputs so the converter can calibrate the quantization
def representative_dataset():
for i in range(len(X_train)):
yield [X_train[i:i+1].astype(np.float32)]
# Step 2: Configure the converter
converter = tf.lite.TFLiteConverter.from_keras_model(model)
converter.optimizations = [tf.lite.Optimize.DEFAULT] # enable quantization
converter.representative_dataset = representative_dataset
converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
converter.inference_input_type = tf.int8 # input will be int8 on ESP32
converter.inference_output_type = tf.int8 # output will be int8 on ESP32
# Step 3: Convert and save
tflite_model = converter.convert()
with open("model.tflite", "wb") as f:
f.write(tflite_model)
print(f"Model size: {len(tflite_model)} bytes")
# Typically 2-5 KB for a small model — fits easily on ESP32!
Why INT8 quantization?
- Float32 model: ~10 KB, requires floating-point math (slow on ESP32)
- INT8 model: ~3 KB, uses integer math only (fast on ESP32)
- The ESP32-S3 doesn't have hardware for float neural network inference, so INT8 is essential for real-time performance.
What Happens on the ESP32¶
The TFLite model runs on the ESP32-S3 like this:
1. Read sensor (e.g., temperature=35.5, humidity=80.0)
2. Normalize using saved mean/std from training
3. Quantize: convert float to int8
4. Run inference through TFLite Micro interpreter
5. Dequantize: convert int8 output back to float
6. Get predicted class (e.g., "hot")
7. Take action (e.g., turn on fan, send alert)
Jupyter Notebook Quick Reference¶
Jupyter notebooks let you run Python code cell-by-cell with inline visualizations.
Running a Notebook¶
Notebook Cells¶
- Code cell — Python code, run with
Shift+Enter - Markdown cell — Documentation and explanations
- Output — Appears below the cell (plots, print statements, tables)
Magic Commands¶
%matplotlib inline # show plots inside the notebook
!pip install tensorflow # run shell command
%%timeit # time a cell's execution
Serial Data Parsing¶
You'll parse CSV data streamed from the ESP32 over serial.
import serial
import csv
# Open serial port
ser = serial.Serial('/dev/ttyUSB0', 115200, timeout=1)
# Read and parse lines
dataset = []
with open("sensor_log.csv", "w", newline="") as f:
writer = csv.writer(f)
writer.writerow(["timestamp", "temperature", "humidity", "label"])
while True:
line = ser.readline().decode('utf-8').strip()
if not line:
continue
# Parse: "1713427200,25.5,60.0,normal"
parts = line.split(",")
if len(parts) == 4:
timestamp, temp, hum, label = parts
writer.writerow([timestamp, temp, hum, label])
dataset.append((float(temp), float(hum), label))
ser.close()