Skip to content

Python Programming Refresher

This refresher covers the Python concepts you'll need for the Jupyter notebook sessions on Day 2, where you'll collect sensor data, train ML models, and export TFLite models.

No prior Python experience? This page starts from the very basics — what Python is, how to write your first program, and builds up to the ML/DL concepts you'll use in the workshop. If you've only written "Hello World" in school, you're in the right place.

If you're already comfortable with Python, skim the basics and jump to the ML/DL section.


Why Python in This Workshop?

Python is used in two places:

  1. TinyML Training (Day 2) — You'll use Jupyter notebooks to train a neural network on sensor data and export it as a TFLite model
  2. Data Collection & Analysis — Parsing serial data, visualizing sensor readings, and preparing datasets

The firmware on the ESP32-S3 is always C — Python is only used on your laptop for the ML pipeline.


What Is Python?

Python is a programming language that reads almost like English. Unlike C, where you have to declare types and manage memory manually, Python handles a lot of that for you.

Your First Python Program

print("Hello, World!")

That's it. One line. No #include, no main(), no semicolons. Python is designed to be simple.

# Variables — no type declaration needed
name = "ESP32"          # Python figures out this is a string
temperature = 25.5      # this is a float
count = 42              # this is an integer
is_active = True        # this is a boolean (True/False)

# You can change a variable to a different type (Python is flexible)
x = 10          # x is an integer
x = 3.14        # now x is a float — Python allows this
x = "hello"     # now x is a string — Python allows this too

Comments in Python

In Python, comments start with #. Everything after # on that line is ignored by Python.

# This is a comment
temperature = 25.5  # This is also a comment (inline)

Python vs C — Quick Comparison

Feature C Python
Hello World 4 lines (include, main, printf, return) 1 line (print())
Types Must declare (int x = 5;) Automatic (x = 5)
Semicolons Required at end of every statement Not needed (but allowed)
Blocks Curly braces { } Indentation (spaces/tabs)
Memory Manual (malloc / free) Automatic (garbage collector)
Speed Very fast (compiled) Slower (interpreted) — but fast enough for ML training
Use in workshop ESP32 firmware ML model training on your laptop

Variables and Types

Python is dynamically typed — no need to declare types, but you can add type hints for clarity.

# Basic types
temperature = 25.5          # float — decimal number
count = 42                  # int — whole number
label = "normal"            # str — text (string)
is_active = True            # bool — True or False

# Type hints (optional but helpful for readability)
temperature: float = 25.5
readings: list[float] = []

# Check the type of a variable
print(type(temperature))    # <class 'float'>
print(type(count))          # <class 'int'>
print(type(label))          # <class 'str'>

Converting Between Types

# String to number
age_str = "25"
age = int(age_str)          # 25 (now an integer)
price = float("9.99")       # 9.99 (now a float)

# Number to string
msg = "Count: " + str(42)   # "Count: 42"

# Float to int (truncates — does NOT round)
x = int(3.9)               # 3 (not 4!)

Data Structures

Lists — Ordered, Changeable Collections

A list is like an array in C, but it can grow/shrink and hold mixed types.

# Create a list
temperatures = [22.5, 23.1, 24.0, 25.5, 26.2]

# Access by index (starts at 0, just like C)
first = temperatures[0]     # 22.5
last = temperatures[-1]     # 26.2 (negative index = from the end)

# Slice — get a range
middle = temperatures[1:4]  # [23.1, 24.0, 25.5]
# 1:4 means "from index 1 up to (not including) index 4"

# Modify
temperatures.append(27.0)   # add to end → [22.5, ..., 27.0]
temperatures.insert(0, 21.0)  # insert at index 0
temperatures.remove(24.0)  # remove by value
temperatures.pop()          # remove and return last item

# Length
n = len(temperatures)      # number of items

# List comprehensions — create lists with a formula
celsius = [t for t in temperatures if t > 25.0]
# [25.5, 26.2, 27.0] — only values above 25

fahrenheit = [(t * 9/5) + 32 for t in temperatures]
# Convert every temperature to Fahrenheit

Dictionaries — Key-Value Pairs

A dictionary (dict) maps keys to values — like a real dictionary maps words to definitions.

# Create — key-value pairs
sensor_reading = {
    "temperature": 25.5,
    "humidity": 60.0,
    "label": "normal"
}

# Access by key
temp = sensor_reading["temperature"]  # 25.5

# Safe access — returns default if key doesn't exist
pressure = sensor_reading.get("pressure", 1013.25)  # 1013.25

# Add or update
sensor_reading["timestamp"] = 1713427200  # new key
sensor_reading["temperature"] = 26.0      # update existing

# Iterate over all key-value pairs
for key, value in sensor_reading.items():
    print(f"{key}: {value}")

# Check if a key exists
if "temperature" in sensor_reading:
    print("Temperature is present")

Tuples — Immutable (Unchangeable) Sequences

# Tuples can't be modified after creation — useful for fixed groupings
point = (3.14, 2.71)
x, y = point  # unpacking — x=3.14, y=2.71

# Common in ML: (features, label)
sample = ([25.5, 60.0], "normal")
features, label = sample

Control Flow

Conditionals — Making Decisions

value = 75

# if / elif / else — note the indentation!
if value > 80:
    print("High")
elif value > 50:        # elif = "else if"
    print("Normal")
else:
    print("Low")

# Ternary expression — one-line conditional
status = "hot" if temperature > 30 else "normal"

Indentation matters in Python!

Python uses indentation (spaces) to define code blocks instead of curly braces { }. 4 spaces is the standard. If your indentation is wrong, Python will give an IndentationError.

# ✅ Correct indentation
if value > 80:
    print("High")      # 4 spaces — this line belongs to the if block
    print("Very high") # 4 spaces — same block

# ❌ Wrong indentation — this will cause an error
if value > 80:
print("High")  # IndentationError: expected an indented block

Loops — Repeating Actions

# for loop — iterate over items in a list
for temp in temperatures:
    print(f"Reading: {temp}°C")

# for with enumerate — when you need the index too
for i, temp in enumerate(temperatures):
    print(f"Sample {i}: {temp}°C")

# for with range — repeat a specific number of times
for epoch in range(50):      # epoch goes from 0 to 49
    print(f"Training epoch {epoch}")

# while loop — repeat while a condition is true
retry_count = 0
while retry_count < 5:
    print(f"Attempt {retry_count}")
    retry_count += 1

Functions

# Basic function
def celsius_to_fahrenheit(celsius):
    return (celsius * 9/5) + 32

result = celsius_to_fahrenheit(25.5)  # 77.9

# Function with default arguments
def read_sensor(gpio, samples=10, delay_ms=100):
    readings = []
    for _ in range(samples):
        readings.append(analog_read(gpio))
        time.sleep(delay_ms / 1000)
    return readings

# Call with defaults
data = read_sensor(2)                  # uses defaults: samples=10, delay_ms=100
data = read_sensor(2, samples=20)      # override samples only
data = read_sensor(2, delay_ms=50)     # override delay only

# Multiple return values (tuple unpacking)
def min_max(data):
    return min(data), max(data)

lo, hi = min_max([22.5, 25.5, 26.2])  # lo=22.5, hi=26.2

# Lambda — short anonymous function (useful for sorting/filtering)
readings = [(25.5, "normal"), (35.0, "hot"), (20.0, "cold")]
sorted_readings = sorted(readings, key=lambda x: x[0])  # sort by temperature

String Formatting

device_id = "XIAO-S3-01"
temp = 25.5

# f-strings (preferred — Python 3.6+) — put variable inside {}
print(f"Device {device_id}: {temp:.1f}°C")  # Device XIAO-S3-01: 25.5°C

# Multiple values
print(f"Temp={temp:.1f}, Humidity={humidity:.0f}%")

# Formatting options
print(f"Hex: {255:#04x}")     # 0xff
print(f"Pad: {5:03d}")         # 005
print(f"Percent: {0.85:.0%}")  # 85%

File I/O

You'll read sensor data from CSV files and write training results.

# Read a file line by line
with open("sensor_data.csv", "r") as f:   # "r" = read mode
    header = f.readline().strip().split(",")
    for line in f:
        values = line.strip().split(",")
        temp, hum, label = float(values[0]), float(values[1]), values[2]

# Write to a file
with open("output.csv", "w") as f:        # "w" = write mode (overwrites!)
    f.write("temperature,humidity,label\n")
    for row in dataset:
        f.write(f"{row[0]},{row[1]},{row[2]}\n")

# Using csv module (handles quoting and edge cases properly)
import csv

with open("sensor_data.csv", "r") as f:
    reader = csv.DictReader(f)
    for row in reader:
        temp = float(row["temperature"])
        label = row["label"]

The with statement

with open(...) as f: automatically closes the file when you're done, even if an error occurs. Always use with — never f = open(...) without closing.


Working with NumPy

NumPy is the foundation for all numerical computing in Python. You'll use it for data manipulation and as the basis for TensorFlow/Keras.

import numpy as np

# Create arrays
data = np.array([22.5, 23.1, 24.0, 25.5, 26.2])
zeros = np.zeros(100)           # 100 zeros
ones = np.ones((3, 4))          # 3×4 matrix of ones
range_arr = np.arange(0, 10, 2) # [0, 2, 4, 6, 8]

# Statistics
mean = np.mean(data)       # 24.26
std = np.std(data)         # standard deviation
minimum = np.min(data)     # 22.5
maximum = np.max(data)     # 26.2

# Normalization (critical for ML — see ML section below)
normalized = (data - np.mean(data)) / np.std(data)

# Min-max scaling (0 to 1)
scaled = (data - np.min(data)) / (np.max(data) - np.min(data))

# Reshape (ML models need specific input shapes)
features = data.reshape(-1, 1)  # column vector: [[22.5], [23.1], ...]

# Boolean indexing — filter data with conditions
hot_readings = data[data > 25.0]  # [25.5, 26.2]

Working with Matplotlib

Visualize your sensor data and training curves.

import matplotlib.pyplot as plt

# Line plot — sensor readings over time
temperatures = [22.5, 23.1, 24.0, 25.5, 26.2, 27.0]
plt.plot(temperatures, marker='o')
plt.xlabel("Sample Index")
plt.ylabel("Temperature (°C)")
plt.title("Sensor Readings")
plt.grid(True)
plt.show()

# Training loss curve
epochs = range(1, 51)
loss = [0.8, 0.6, 0.45, 0.35, 0.28]  # your training losses
plt.plot(epochs, loss)
plt.xlabel("Epoch")
plt.ylabel("Loss")
plt.title("Training Progress")
plt.show()

# Scatter plot — feature visualization
plt.scatter(temperatures, humidities, c=labels, cmap='viridis')
plt.xlabel("Temperature (°C)")
plt.ylabel("Humidity (%)")
plt.colorbar(label="Class")
plt.show()

Basics of Machine Learning

Machine Learning (ML) is teaching a computer to find patterns in data without being explicitly programmed with rules.

The Intuition — Learning Like a Human

Imagine you're teaching a child to recognize cats:

  • Traditional programming: You write rules — "if it has pointy ears, whiskers, and a tail, it's a cat." But what about a lynx? A cartoon cat? A cat from behind? You can never write enough rules.
  • Machine Learning: You show the child 1000 photos of cats and 1000 photos of dogs. The child figures out the patterns on their own. You never told them how to tell the difference — they just learned from examples.

That's the key idea: instead of writing rules, you provide examples, and the computer learns the rules by itself.

In our workshop, instead of cats and dogs, we'll classify sensor readings — is the environment normal, hot, or humid?

A Simple Example — Predicting the Weather

Let's say you want to predict whether it will rain today. You look at two things:

  • Humidity (how much water is in the air)
  • Air pressure (how heavy the air is pushing down)

You collect data for 100 days:

Day Humidity (%) Pressure (hPa) Rained?
1 45 1020 No
2 82 1005 Yes
3 50 1018 No
4 90 998 Yes
5 55 1015 No
... ... ... ...

After seeing enough examples, a pattern emerges:

  • High humidity + Low pressure → Rain
  • Low humidity + High pressure → No rain

You didn't write this rule — the computer found it by looking at the data. That's ML!

How Is This Different from Regular Programming?

Traditional Programming:
  Input + Rules written by you → Output

Machine Learning:
  Input + Output (examples) → Rules learned by the computer

In traditional programming, you figure out the rules and code them. In ML, the computer figures out the rules from examples. This is powerful when the rules are too complex to write by hand (like recognizing a cat) or when you don't even know what the rules are.

Types of Machine Learning

Type What It Does Analogy Example in Workshop
Supervised Learning Learn from labeled examples (input + correct answer) Studying with a textbook that has answer keys "This sensor reading is hot" → learn to classify new readings
Unsupervised Learning Find patterns without labels (no correct answers given) Exploring a new city without a map — you find neighborhoods on your own Group similar sensor readings together without knowing the categories
Reinforcement Learning Learn by trial and error with rewards Training a dog — treat for good behavior, nothing for bad (Not used in this workshop)

We use supervised learning in this workshop — we train the model on data where we already know the correct answer (the label).

Supervised Learning — The Two Flavors

Predicting a category (a label from a fixed set of options).

Examples:

  • Is this email spam or not spam?
  • Is this sensor reading normal, hot, or humid?
  • Is this image a cat, dog, or bird?
# Classification example — predict the category
# Input: temperature and humidity
# Output: one of ["normal", "hot", "humid"]

features = [35.5, 85.0]  # temperature, humidity
prediction = model.predict(features)
# → "hot"  (a category)

Predicting a number (a continuous value).

Examples:

  • What will the temperature be tomorrow? → 28.3°C
  • How much will this house sell for? → $350,000
  • How many units will we sell next month? → 1,247
# Regression example — predict a number
# Input: humidity and pressure
# Output: temperature (a number, not a category)

features = [85.0, 1005]  # humidity, pressure
prediction = model.predict(features)
# → 31.2  (a number)

In this workshop, we use classification — we classify sensor readings into categories like "normal", "hot", and "humid".

The ML Pipeline (What You'll Do)

Think of the ML pipeline like cooking a meal:

1. Collect Data      → Buy ingredients (read sensor values from ESP32)
2. Preprocess Data   → Wash, chop, measure (normalize, clean, split into train/test)
3. Train Model       → Cook the recipe (feed data to a neural network)
4. Evaluate Model    → Taste test (check accuracy on test data)
5. Export Model      → Pack leftovers (convert to TFLite for ESP32)
6. Deploy Model      → Serve the dish (run inference on the ESP32-S3)

Each step matters — bad ingredients = bad food, bad data = bad model.

Key ML Terms — Explained Simply

Term Meaning Analogy
Feature An input variable (e.g., temperature, humidity) A clue that helps you guess the answer
Label The correct answer (e.g., "hot", "normal") The answer key at the back of the textbook
Training Showing the model many examples so it learns Studying with flashcards
Inference Using the trained model to make a prediction Taking the exam
Loss How wrong the model's predictions are Your exam score (lower = better, unlike school!)
Epoch One complete pass through all training data Going through all flashcards once
Batch A small group of examples processed together Studying a few flashcards at a time
Accuracy Percentage of correct predictions Your grade on the exam
Overfitting Model memorizes training data but fails on new data Memorizing answers without understanding — you ace the practice test but fail the real exam
Underfitting Model is too simple to learn the patterns Studying only 5 flashcards — you didn't learn enough
Weights Numbers the model adjusts during training to make better predictions The "knowledge" the model has — like notes you take while studying
Bias A number added to each neuron's output (like a baseline offset) The "starting point" — like always guessing the most common answer before you learn anything

Overfitting vs Underfitting — The Goldilocks Problem

Underfitting (too simple)     Just Right              Overfitting (too complex)
┌─────────────────┐    ┌─────────────────┐    ┌─────────────────┐
│  · · ·          │    │  · ·            │    │  · ·            │
│      · · ·      │    │    · · ·        │    │    · · ·        │
│          · · ·  │    │        · · ·    │    │        · · ·    │
│                 │    │            · ·  │    │            · ·  │
│  ─ ─ ─ ─ ─ ─ ─ │    │   ~~~~~~~~~~~  │    │  /\/\//\/\/\/\  │
│  (straight line │    │  (smooth curve │    │  (wiggly curve │
│   misses pattern)│    │   fits pattern)│    │   memorizes noise)│
└─────────────────┘    └─────────────────┘    └─────────────────┘
  • Underfitting: The model is too simple — like trying to fit a straight line through curved data. It can't capture the pattern.
  • Overfitting: The model is too complex — it memorizes every tiny detail, including noise. It fails on new, unseen data.
  • Just right: The model captures the real pattern without memorizing noise.

How to avoid overfitting

  • Use more training data — more examples means less memorization
  • Use fewer neurons/layers — simpler model = less capacity to memorize
  • Use early stopping — stop training when validation loss starts increasing
  • Use dropout — randomly turn off some neurons during training (forces the model to learn robust patterns)

Data Preprocessing — Why It Matters

ML models work best when input data is small numbers centered around 0. Raw sensor data (like temperature = 45.5, humidity = 89.0) needs preprocessing.

Think of it like this: if one feature ranges from 0–100 and another from 0–1, the model will think the first feature is 100× more important just because the numbers are bigger. Normalization fixes this by putting everything on the same scale.

import numpy as np

# Raw sensor data
temperatures = np.array([22.5, 23.1, 24.0, 25.5, 26.2, 35.0, 40.1])
humidity = np.array([45.0, 50.2, 55.0, 60.0, 65.3, 80.0, 90.0])
labels = np.array([0, 0, 0, 0, 0, 1, 1])  # 0=normal, 1=hot

# Step 1: Combine features into a 2D array (samples × features)
X = np.column_stack([temperatures, humidity])
# [[22.5, 45.0], [23.1, 50.2], ...]

# Step 2: Normalize — scale each feature to mean=0, std=1
# Formula: normalized = (value - mean) / standard_deviation
# This transforms values so they're centered around 0 with similar ranges
X_mean = X.mean(axis=0)   # mean of each column
X_std = X.std(axis=0)     # std of each column
X_normalized = (X - X_mean) / X_std

# Before normalization: temperature=40.1, humidity=90.0
# After normalization:  temperature=1.52, humidity=1.38
# Now both features have similar ranges — the model treats them equally!

# Step 3: Split into training and testing sets (80/20)
# CRITICAL: The model must be tested on data it has NEVER seen before
# Otherwise, you're just testing its memorization, not its learning
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(
    X_normalized, labels, test_size=0.2, random_state=42
)

print(f"Training samples: {len(X_train)}")  # used to teach the model
print(f"Testing samples: {len(X_test)}")    # used to check if it actually learned

Always save the mean and std!

When you deploy the model on the ESP32, you need to normalize sensor readings using the same mean and std from training. Save these values — you'll hardcode them in your C firmware.

# Save these values — you'll need them in your ESP32 C code!
print(f"Temperature mean: {X_mean[0]:.4f}, std: {X_std[0]:.4f}")
print(f"Humidity mean: {X_mean[1]:.4f}, std: {X_std[1]:.4f}")

Why Split Data into Train and Test?

Imagine you're studying for a math exam:

  • Training data = practice problems you study with
  • Test data = the actual exam questions you've never seen

If the exam used the exact same questions as the practice problems, you could just memorize the answers and score 100% — but you wouldn't actually understand the math. That's overfitting.

By testing on unseen data, you check whether the model truly learned the patterns or just memorized the examples.


Basics of Deep Learning

Deep Learning is a subset of ML that uses neural networks — layers of interconnected "neurons" that learn patterns. The "deep" in deep learning just means many layers stacked together.

What Is a Neural Network?

Think of a neural network as a team of people in a factory, each doing a simple job, passing their work to the next person:

Raw Material          Assembly Line                    Final Product
(temperature,    →    Person 1    →    Person 2    →    "hot" (95%)
 humidity)              |               |              "normal" (4%)
                   "Is temp        "Is temp high        "humid" (1%)
                    above 25?"      AND humid?"
  • Person 1 looks at the raw numbers and notices simple things: "temperature is above average"
  • Person 2 combines those observations: "high temperature AND high humidity"
  • Final person makes the call: "This is hot — 95% sure"

Each person (neuron) does a simple calculation. But together, they can learn very complex patterns.

A Real Example — The Sensor Classifier

Let's say we want to classify sensor readings into three categories: normal, hot, or humid.

Our neural network would look like this:

Input Layer          Hidden Layer 1        Hidden Layer 2        Output Layer
┌──────────┐       ┌──────────────┐      ┌──────────────┐      ┌──────────┐
│ Temp: 35.5│──┬──→ │  Neuron 1    │──┬─→│  Neuron 1    │──┬─→│ normal:  │
│           │  │    │  (is it hot?)│  │  │  (hot+humi?)│  │  │   0.04   │
│ Humid: 85 │──┤──→ │  Neuron 2    │──┤─→│  Neuron 2    │──┤─→│ hot:     │
│           │  │    │  (is it humid│  │  │  (only hot?)│  │  │   0.93   │
└──────────┘  └──→ │  Neuron 3    │──┘─→│  Neuron 3    │──┘─→│ humid:   │
                   │  (both high?) │      │  (only humid?│      │   0.03   │
                   └──────────────┘      └──────────────┘      └──────────┘
  • 2 inputs → temperature and humidity
  • 3 neurons in hidden layer 1 → each detects a different simple pattern
  • 3 neurons in hidden layer 2 → each combines patterns from layer 1
  • 3 outputs → one probability per category (they sum to 1.0 = 100%)

The highest probability wins. Here, "hot" at 93% is the prediction.

Neurons — The Building Blocks

A neuron is like a tiny decision-maker. It takes inputs, does some math, and produces an output.

                    ┌─────────┐
  temperature ─────→│         │
       (35.5)       │  NEURON │────→ output (e.g., 0.8 = "yes, it's hot")
   humidity ───────→│         │
       (85.0)       └─────────┘

Inside the neuron:
  output = activation( (temperature × weight₁) + (humidity × weight₂) + bias )

Let's break this down:

  1. Weights — How important each input is. If weight₁ is large, temperature matters a lot. If weight₂ is small, humidity doesn't matter much.
  2. Bias — A baseline value. Think of it as the neuron's "default answer" before looking at any inputs.
  3. Activation function — Decides the final output format. Without it, the neuron can only do linear math (like a calculator). With it, the neuron can make decisions (like a brain).

A concrete example:

# A single neuron doing its calculation
temperature = 35.5
humidity = 85.0
weight1 = 0.8     # temperature is important
weight2 = 0.3     # humidity is less important
bias = -15.0      # baseline offset

# Step 1: Weighted sum
weighted_sum = (temperature * weight1) + (humidity * weight2) + bias
# = (35.5 × 0.8) + (85.0 × 0.3) + (-15.0)
# = 28.4 + 25.5 - 15.0
# = 38.9

# Step 2: Apply activation function (ReLU)
output = max(0, weighted_sum)  # ReLU: if negative, output 0; if positive, pass through
# = max(0, 38.9) = 38.9

# This neuron says: "Yes, this is hot!" (large positive output)

During training, the model adjusts the weights and bias to make better predictions. That's what "learning" means in ML — finding the right weights and biases.

Activation Functions — The Decision Makers

Without an activation function, a neural network is just a fancy calculator — it can only do multiplication and addition. Activation functions add non-linearity, which means the network can learn complex, curvy patterns instead of just straight lines.

Activation What It Does Formula When to Use Analogy
ReLU If input is negative, output 0. If positive, pass it through. max(0, x) Hidden layers (most common) A bouncer at a club — negative vibes get blocked, positive vibes pass through
Sigmoid Squishes any number into the range 0.0 to 1.0 1 / (1 + e^(-x)) Binary classification output (yes/no) A dimmer switch — any brightness becomes a value between off (0) and full (1)
Softmax Converts a list of numbers into probabilities that sum to 1.0 e^x / sum(e^all) Multi-class classification output A pie chart — each slice is a percentage, all slices add up to 100%
Tanh Squishes any number into the range -1.0 to 1.0 (e^x - e^(-x)) / (e^x + e^(-x)) Sometimes used in hidden layers Like sigmoid but centered at 0 — can express "against" (-1) or "for" (+1)
import numpy as np

# ReLU — the most popular activation function
def relu(x):
    return max(0, x)

print(relu(-5))   # 0   (negative → blocked)
print(relu(0))    # 0   (zero → zero)
print(relu(3.7))  # 3.7 (positive → passed through)

# Sigmoid — for binary classification (yes/no)
def sigmoid(x):
    return 1 / (1 + np.exp(-x))

print(sigmoid(-5))   # 0.007  (very unlikely)
print(sigmoid(0))    # 0.5    (50/50 — not sure)
print(sigmoid(5))   # 0.993  (very likely)

# Softmax — for multi-class classification (pick one of many)
def softmax(x):
    exp_x = np.exp(x - np.max(x))  # subtract max for numerical stability
    return exp_x / exp_x.sum()

logits = np.array([2.0, 5.0, 1.0])  # raw scores for [normal, hot, humid]
probs = softmax(logits)
print(probs)  # [0.04, 0.93, 0.03] → 93% chance it's "hot"

How Training Works — Step by Step

Training a neural network is like learning to throw darts blindfolded, with someone telling you how far off you are each time:

  1. Forward pass (throw the dart): Input data flows through the network → produces a prediction
  2. Calculate loss (measure how far from the bullseye): Compare prediction to the correct answer → measure how wrong it is
  3. Backward pass / backpropagation (figure out what went wrong): Calculate how each weight contributed to the error — "the dart went left because my elbow was too high"
  4. Update weights (adjust your aim): Adjust weights slightly to reduce the error — "move elbow down a tiny bit"
  5. Repeat for many epochs until the loss stops decreasing — "keep practicing until you hit the bullseye consistently"
Epoch 1:  Loss = 0.80  (very wrong — dart missed the board)
Epoch 5:  Loss = 0.60  (getting closer — on the board now)
Epoch 10: Loss = 0.45  (getting better — hitting the outer ring)
Epoch 30: Loss = 0.12  (pretty good — near the bullseye)
Epoch 50: Loss = 0.05  (accurate! — hitting the bullseye consistently)

The Learning Rate — How Big of a Step to Take

The learning rate controls how much the weights change after each training step:

  • Too high (e.g., 1.0): The model jumps around wildly, never settling on good weights — like taking huge steps and overshooting the target
  • Too low (e.g., 0.00001): The model takes tiny steps, taking forever to learn — like inching toward the target at a snail's pace
  • Just right (e.g., 0.001): The model makes steady progress toward good weights — like walking confidently toward the target
# The Adam optimizer automatically adjusts the learning rate — you rarely need to change it
model.compile(
    optimizer='adam',  # Adam = Adaptive Moment Estimation — smart learning rate
    loss='sparse_categorical_crossentropy',
    metrics=['accuracy']
)

Why "Deep" Learning?

A neural network with 1-2 hidden layers is just "neural network" or "shallow learning." When you stack 3 or more layers, it's called deep learning. More layers = more complex patterns the network can learn.

Shallow (1-2 layers):     Deep (3+ layers):
  Input → Hidden → Output    Input → H1 → H2 → H3 → ... → Output

Learns simple patterns      Learns hierarchical patterns:
like "is X > threshold?"    H1: edges
                             H2: shapes
                             H3: objects
                             H4: scenes

In our workshop, we use a small deep network (2 hidden layers) — enough to classify sensor data, but small enough to run on the ESP32-S3.

How Big Should My Network Be?

Network Size Neurons Pros Cons Use Case
Tiny 4-8 per layer Fast to train, fits on microcontroller May underfit (too simple) Simple 2-feature classification
Small 16-32 per layer Good balance, fits on ESP32 Limited complexity Our workshop models
Medium 64-128 per layer Can learn complex patterns Won't fit on microcontroller Phone/edge device models
Large 256+ per layer Very powerful Needs GPU, won't fit on MCU Cloud/server models

Bigger is not always better

A huge network will memorize your training data (overfitting) instead of learning real patterns. Start small and only add more neurons if the model underfits.


TensorFlow / Keras Basics

TensorFlow is Google's ML framework. Keras is its high-level API that makes building neural networks easy.

Building a Model

import tensorflow as tf
from tensorflow import keras
import numpy as np

# Define a simple neural network
model = keras.Sequential([
    # Input layer + first hidden layer
    # Dense = fully connected (every input connects to every neuron)
    # 16 = number of neurons
    # relu = activation function (most common for hidden layers)
    # input_shape=(2,) = 2 input features (temperature, humidity)
    keras.layers.Dense(16, activation='relu', input_shape=(2,)),

    # Second hidden layer — 8 neurons
    keras.layers.Dense(8, activation='relu'),

    # Output layer — 3 neurons (one per class: normal, hot, humid)
    # softmax = converts outputs to probabilities that sum to 1.0
    keras.layers.Dense(3, activation='softmax')
])

# Print a summary of the model
model.summary()

Compiling the Model

Before training, you must tell Keras how to train:

model.compile(
    optimizer='adam',                          # How to update weights (adam is the best default)
    loss='sparse_categorical_crossentropy',     # How to measure error (for integer labels)
    metrics=['accuracy']                        # What to track during training
)
Setting What It Means Why This Choice
optimizer='adam' Algorithm that adjusts weights Best general-purpose optimizer
loss='sparse_categorical_crossentropy' Error function for multi-class with integer labels Labels are 0, 1, 2 (not one-hot)
metrics=['accuracy'] Track percentage of correct predictions Easy to understand

Training the Model

# Train the model
history = model.fit(
    X_train,               # input features (normalized)
    y_train,               # correct labels
    epochs=50,             # number of complete passes through the data
    batch_size=32,         # examples per gradient update
    validation_split=0.2   # use 20% of training data to monitor overfitting
)

# Plot training progress
import matplotlib.pyplot as plt
plt.plot(history.history['loss'], label='Training Loss')
plt.plot(history.history['val_loss'], label='Validation Loss')
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.legend()
plt.show()

Evaluating and Predicting

# Evaluate on test data (data the model has never seen)
loss, accuracy = model.evaluate(X_test, y_test)
print(f"Test Accuracy: {accuracy:.2%}")

# Make a prediction on new data
new_reading = np.array([[25.5, 60.0]])  # temperature, humidity (normalized!)
prediction = model.predict(new_reading)
# prediction = [[0.05, 0.90, 0.05]] → 90% chance it's class 1

predicted_class = np.argmax(prediction)  # 1 (the class with highest probability)
class_names = ["normal", "hot", "humid"]
print(f"Prediction: {class_names[predicted_class]}")

Exporting to TFLite (For ESP32)

The trained model must be converted to TFLite format — a small, optimized version that runs on microcontrollers.

# Convert to TFLite with INT8 quantization
# INT8 = 8-bit integers → much smaller model, faster inference on ESP32

# Step 1: Create a representative dataset generator
# This provides sample inputs so the converter can calibrate the quantization
def representative_dataset():
    for i in range(len(X_train)):
        yield [X_train[i:i+1].astype(np.float32)]

# Step 2: Configure the converter
converter = tf.lite.TFLiteConverter.from_keras_model(model)
converter.optimizations = [tf.lite.Optimize.DEFAULT]  # enable quantization
converter.representative_dataset = representative_dataset
converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
converter.inference_input_type = tf.int8    # input will be int8 on ESP32
converter.inference_output_type = tf.int8   # output will be int8 on ESP32

# Step 3: Convert and save
tflite_model = converter.convert()
with open("model.tflite", "wb") as f:
    f.write(tflite_model)

print(f"Model size: {len(tflite_model)} bytes")
# Typically 2-5 KB for a small model — fits easily on ESP32!

Why INT8 quantization?

  • Float32 model: ~10 KB, requires floating-point math (slow on ESP32)
  • INT8 model: ~3 KB, uses integer math only (fast on ESP32)
  • The ESP32-S3 doesn't have hardware for float neural network inference, so INT8 is essential for real-time performance.

What Happens on the ESP32

The TFLite model runs on the ESP32-S3 like this:

1. Read sensor (e.g., temperature=35.5, humidity=80.0)
2. Normalize using saved mean/std from training
3. Quantize: convert float to int8
4. Run inference through TFLite Micro interpreter
5. Dequantize: convert int8 output back to float
6. Get predicted class (e.g., "hot")
7. Take action (e.g., turn on fan, send alert)

Jupyter Notebook Quick Reference

Jupyter notebooks let you run Python code cell-by-cell with inline visualizations.

Running a Notebook

  1. Open the provided .ipynb link in Google Colab
  2. Click Runtime → Run all to execute all cells
  3. Or press Shift+Enter to run one cell at a time
  4. Free GPU available: Runtime → Change runtime type → T4 GPU
pip install jupyter
jupyter notebook training.ipynb

Notebook Cells

  • Code cell — Python code, run with Shift+Enter
  • Markdown cell — Documentation and explanations
  • Output — Appears below the cell (plots, print statements, tables)

Magic Commands

%matplotlib inline    # show plots inside the notebook
!pip install tensorflow  # run shell command
%%timeit             # time a cell's execution

Serial Data Parsing

You'll parse CSV data streamed from the ESP32 over serial.

import serial
import csv

# Open serial port
ser = serial.Serial('/dev/ttyUSB0', 115200, timeout=1)

# Read and parse lines
dataset = []
with open("sensor_log.csv", "w", newline="") as f:
    writer = csv.writer(f)
    writer.writerow(["timestamp", "temperature", "humidity", "label"])

    while True:
        line = ser.readline().decode('utf-8').strip()
        if not line:
            continue

        # Parse: "1713427200,25.5,60.0,normal"
        parts = line.split(",")
        if len(parts) == 4:
            timestamp, temp, hum, label = parts
            writer.writerow([timestamp, temp, hum, label])
            dataset.append((float(temp), float(hum), label))

ser.close()

Further Reading