Math for Machine Minds Tutorial

Chapter 1 — Algebra Refresher: The Language of Math Machines

Sections 1–3

🎯 Learning Objectives

By the end of these sections, you will be able to:

Explain variables, constants, coefficients, and expressions.
Simplify algebraic expressions and evaluate linear equations.
Solve linear and quadratic equations algebraically and programmatically.
Connect algebraic manipulation to machine learning intuition.

Machine Learning Angle (MLA): Algebra is the grammar of ML. A linear model is just an algebraic sentence: $\hat{y} = w_0 + w_1x_1 + \dots + w_dx_d$

1. Numbers, Operations & Order of Operations

Numbers you'll meet often: integers ( $\mathbb{Z}$ ), rationals ( $\mathbb{Q}$ ), reals ( $\mathbb{R}$ ). Computers approximate reals with floating points (IEEE 754).

Order of operations (PEMDAS): Parentheses → Exponents → Multiplication/Division → Addition/Subtraction.

Example:

3 + 2 \cdot (5 - 1)^2 = 3 + 2 \cdot 4^2 = 3 + 32 = 35

Properties Table

Property	Formula	Example
Distributive	$a(b+c)=ab+ac$	$2(3+5)=16$
Commutative	$a+b=b+a$	$4+7=7+4$
Associative	$(a+b)+c=a+(b+c)$	$(2+3)+4=2+(3+4)$

TypeScript Example

export const pemdasExample = (): number => 3 + 2 * Math.pow(5 - 1, 2); // 35

MLA: Loss functions and kernel computations rely on strict order of operations. Misplacing parentheses can completely change training results.

2. Variables, Constants, Coefficients & Expressions

Variable: placeholder for many possible values ( $x$ ).
Constant: fixed number ( $\pi, e, 42$ ).
Coefficient: multiplier on a variable (3 in 3x).
Expression: combination of numbers and symbols without an equals sign.

Simplify example:

2x + 3x - 4 + (x - 2) = (2x + 3x + x) + (-4 - 2) = 6x - 6

TypeScript Example

// evaluate a*x + b for given x
export const linearExpr = (a: number, b: number, x: number): number => a * x + b;

// Try it
console.log(linearExpr(3, 2, 4)); // 3*4+2 = 14

MLA: In linear regression, coefficients are weights, and constants are bias terms. Combining like terms parallels model simplification.

3. Equations — Solving for the Unknown

An equation says two expressions are equal. The goal: isolate the variable.

Linear Example

3x + 5 = 11 \Rightarrow 3x = 6 \Rightarrow x = 2

Quadratic Example

x^2 - 5x + 6 = 0 \Rightarrow (x - 2)(x - 3) = 0 \Rightarrow x \in \{2,3\}

General Quadratic Formula

x = \frac{-b \pm \sqrt{b^2 - 4ac}}{2a}

TypeScript Helper

export const solveLinear = (a: number, b: number): number => {
  // Solve a*x + b = 0
  if (a === 0) throw new Error("No unique solution");
  return -b / a;
};

// Usage
console.log(solveLinear(3, 5)); // -1.666...

MLA: Solving for x mirrors finding model parameters analytically (closed-form ridge regression).

4. Inequalities — Because Life Isn't Always Equal

Inequalities describe ranges of possible values instead of single solutions. They use the symbols: $>$ , $<$ , $\ge$ , $\le$ .

Golden Rule

When you multiply or divide an inequality by a negative number, flip the sign!

Example:

-2x < 6 \Rightarrow x > -3

Compound Inequalities

-1 \le x < 2

means "x is between -1 and 2, including -1 but not 2."

Interval notation: $[-1, 2)$

Visual Intuition: Think of inequalities as "zones of truth" on a number line.

TypeScript Example

export function inInterval(x: number, a: number, b: number): boolean {
  return a <= x && x < b;
}

// Example
console.log(inInterval(1.5, -1, 2)); // true
console.log(inInterval(2.5, -1, 2)); // false

MLA: Regularization often introduces inequality-like constraints. For example, "keep weights small" ( $|w| < \lambda$ ) ensures stability and prevents overfitting.

5. Functions — Input–Output Machines

A function is a rule that takes an input and produces an output.

f: X \to Y

If $y = f(x)$ , then $y$ depends on $x$ .

5.1 Domain and Range

Domain: All valid inputs.
Range: All possible outputs.

Example: $f(x) = \frac{1}{x}$ – domain excludes $x=0$ .

5.2 Composition and Inverses

Composition stacks functions:

(f \circ g)(x) = f(g(x))

Inverse functions undo each other:

f^{-1}(f(x)) = x

5.3 Common Function Shapes

Function	Formula	Key Idea
Linear	$f(x)=mx+b$	Straight line
Quadratic	$f(x)=ax^2+bx+c$	Parabola
Exponential	$f(x)=a \cdot b^x$	Rapid growth
Logarithmic	$f(x)=\log_b x$	Inverse of exponential

Function Composition Example

export const compose = <T, U, V>(f: (u: U) => V, g: (t: T) => U) => (x: T): V => f(g(x));

export const f = (x: number) => 2 * x + 1;
export const g = (x: number) => x * x;

console.log(compose(f, g)(3)); // f(g(3)) = 2*(3^2)+1 = 19

Inverse Example

f(x)=4x-9 \Rightarrow f^{-1}(x)=\frac{x+9}{4}

TypeScript Check

export const fInv = (x: number): number => (x + 9) / 4;

MLA: In ML, "composition" corresponds to layer stacking, and "inverse" appears in autoencoders or when reconstructing features.

6. Polynomials, Factoring, and Roots

Polynomials are multi-term expressions built from powers of x:

p(x)=a_nx^n+a_{n-1}x^{n-1}+\dots+a_1x+a_0

The degree is the highest exponent (e.g., quadratic = degree 2). Roots (or zeros) make $p(x)=0$ .

6.1 Factoring Example

x^2 - 9 = (x-3)(x+3)

6.2 Special Identities

Identity	Formula
Difference of squares	$a^2 - b^2 = (a-b)(a+b)$
Perfect square	$(a+b)^2 = a^2 + 2ab + b^2$
Sum of cubes	$a^3+b^3=(a+b)(a^2-ab+b^2)$

6.3 Polynomial Evaluation (Horner's Method)

For efficiency, evaluate $p(x) = ((a_nx + a_{n-1}')x + \dots)x + a_0$ .

Horner's Method in TypeScript

export const evalPoly = (coeffs: number[], x: number): number => {
  // coeffs = [a_n, a_{n-1}, ..., a_1, a_0]
  let result = 0;
  for (const coeff of coeffs) {
    result = result * x + coeff;
  }
  return result;
};

// Example: p(x) = 2x^2 + 3x + 1, evaluate at x=4
console.log(evalPoly([2, 3, 1], 4)); // 2*16 + 3*4 + 1 = 45

MLA: Polynomial basis functions expand feature spaces (kernel tricks, polynomial regression).

6.4 Finding Roots

For quadratics, factor if possible; otherwise use:

x = \frac{-b \pm \sqrt{b^2 - 4ac}}{2a}

TypeScript Example

export function solveQuadratic(a: number, b: number, c: number): [number, number] | string {
  const disc = b * b - 4 * a * c;
  if (disc < 0) return "No real roots";
  const sqrtDisc = Math.sqrt(disc);
  const x1 = (-b + sqrtDisc) / (2 * a);
  const x2 = (-b - sqrtDisc) / (2 * a);
  return [x1, x2];
}

console.log(solveQuadratic(1, -5, 6)); // [3, 2]

MLA: Polynomial intuition helps when understanding kernel expansions, Taylor approximations, and model stability near critical points.

7. Exponents & Logarithms

7.1 Exponents Recap

Exponents represent repeated multiplication.

a^m \cdot a^n = a^{m+n}, \quad \frac{a^m}{a^n} = a^{m-n}, \quad (a^m)^n = a^{mn}

Example:

2^3 \cdot 2^4 = 2^7 = 128

Negative and fractional powers:

a^{-n} = \frac{1}{a^n}, \quad a^{1/n} = \sqrt[n]{a}

7.2 Logarithms — The Undo Button for Exponents

y = \log_a x \iff a^y = x

If $2^3 = 8,$ then $\log_2 8 = 3.$

Key Identities:

\log_a(xy) = \log_a x + \log_a y

\log_a\left(\frac{x}{y}\right) = \log_a x - \log_a y

\log_a(x^k) = k \log_a x

Natural Logarithm:

$\ln x = \log_e x$ where $e \approx 2.71828.$

7.3 TypeScript Utilities

TypeScript Implementation

export const power = (a: number, n: number): number => Math.pow(a, n);
export const logBase = (x: number, a: number): number => Math.log(x) / Math.log(a);

// Examples
console.log(power(2, 3)); // 8
console.log(logBase(8, 2)); // 3

MLA: Exponentials model growth/decay (learning rates, activations). Logarithms appear in log-loss and entropy calculations.

8. Linear Equations in Two Variables — Lines and Slopes

A line can be written as:

y = mx + b

Where:

$m$ = slope (rise/run)
$b$ = y-intercept (value when $x=0$ )

8.1 Finding the Equation from Two Points

Given points $A(x_1, y_1)$ and $B(x_2, y_2)$ :

m = \frac{y_2 - y_1}{x_2 - x_1}, \quad b = y_1 - m x_1

TypeScript Example

export function lineThrough(p1: [number, number], p2: [number, number]) {
  const [x1, y1] = p1, [x2, y2] = p2;
  if (x1 === x2) return { vertical: true, x: x1 };
  const m = (y2 - y1) / (x2 - x1);
  const b = y1 - m * x1;
  return { m, b, vertical: false };
}

console.log(lineThrough([2, 3], [6, 11])); // { m: 2, b: -1 }

8.2 Parallel and Perpendicular Lines

Parallel lines: same slope ( $m_1 = m_2$ ).
Perpendicular lines: slopes are negative reciprocals ( $m_1 m_2 = -1$ ).

MLA: In high dimensions, linear equations generalize to hyperplanes, which separate data classes in linear models.

9. Systems of Linear Equations — Working in Teams

A system is a group of equations with shared variables.

Example:

\begin{aligned}2x + y &= 7 \\ x - y &= 1\end{aligned}

We can solve by substitution, elimination, or matrix methods.

9.1 Substitution Example

x = y + 1

Substitute into $2x + y = 7 \Rightarrow 2(y + 1) + y = 7$ $\Rightarrow 3y = 5 \Rightarrow y = \frac{5}{3}, x = \frac{8}{3}.$

9.2 Determinant (Cramer's Rule for 2×2)

For:

\begin{aligned}a_1x + b_1y &= c_1 \\ a_2x + b_2y &= c_2\end{aligned}

Determinant:

D = a_1b_2 - a_2b_1

Solutions:

x = \frac{c_1b_2 - c_2b_1}{D}, \quad y = \frac{a_1c_2 - a_2c_1}{D}

9.3 TypeScript Solver

TypeScript Implementation

export function solve2x2(a: number, b: number, c: number, d: number, e: number, f: number) {
  const det = a * d - b * c;
  if (Math.abs(det) < 1e-12) throw new Error("No unique solution");
  const x = (e * d - b * f) / det;
  const y = (a * f - e * c) / det;
  return { x, y };
}

console.log(solve2x2(2, 1, 1, -1, 7, 1)); // { x: 8/3, y: 5/3 }

MLA: Solving systems is how we train linear models. Ridge regression simply adds a term $\lambda I$ to stabilize this solve.

10. Distance and Vectors — A Gentle Preview

A vector is an ordered list of numbers. In 2D: $\mathbf{v} = (v_1, v_2)$ .

10.1 Vector Operations

Addition:

\mathbf{u} + \mathbf{v} = (u_1+v_1, u_2+v_2)

Scaling:

c\mathbf{v} = (cv_1, cv_2)

10.2 Euclidean Distance

Between points $\mathbf{x}$ and $\mathbf{z}$ :

d(\mathbf{x}, \mathbf{z}) = \sqrt{(x_1-z_1)^2 + (x_2-z_2)^2}

TypeScript Example

export const l2 = (v: number[]) => Math.sqrt(v.reduce((s, vi) => s + vi * vi, 0));
export const dist = (a: number[], b: number[]) => l2(a.map((ai, i) => ai - b[i]));

console.log(dist([3, -1], [-1, 2])); // 5

MLA: RBF kernels depend on vector distances: $K(\mathbf{x}, \mathbf{z}) = \exp(-\gamma \|\mathbf{x} - \mathbf{z}\|^2)$ .

11. Why Algebra Powers ELM/KELM

Let's connect all your algebra knowledge to the math inside AsterMind's ELM and KELM engines.

11.1 Extreme Learning Machines (ELM)

The ELM's core equation looks very familiar:

\hat{Y} = H \beta

Here:

$H$ is a hidden layer matrix of nonlinear features (like $\sigma(Wx + b)$ )
$\beta$ are output weights we solve for
$\hat{Y}$ are predictions

To find $\beta$ , ELM doesn't "train" with epochs — it solves algebraically:

\beta = (H^T H + \lambda I)^{-1} H^T Y

That's just algebra on steroids — using matrix inverses (Chapter 2) and ridge regression (Chapter 4).

11.2 Kernel ELM (KELM)

In Kernel ELM, we swap $H$ with a kernel matrix $K$ that measures similarities between data points:

\beta = (K + \lambda I)^{-1} Y

So when you learned to solve for $x$ in $ax+b=0$ , you were already learning how KELM computes its weights!

Key idea: Algebra isn't old math — it's the foundation of fast, closed-form learning.

11.3 Recursive Least Squares (RLS)

Online learning adds new data and updates $\beta$ recursively:

\beta_k = \beta_{k-1} + K_k (Y_k - H_k \beta_{k-1})

Still algebra: we're updating a parameter by adding a correction term.

Every adaptive system in ARS — from schema updates to Starfish activations — is built on these algebraic building blocks.

Chapter 2 — Linear Algebra: The Geometry of Data

🎯 Learning Objectives

By the end of this chapter, you will be able to:

Perform vector and matrix operations fluently.
Compute determinants and inverses.
Understand eigenvalues and eigenvectors.
Apply linear algebra concepts to machine learning problems.

§1 Vectors — Data in Line Form

A vector is just an ordered list of numbers — think of it as a spreadsheet column that describes a single thing (like a form entry or feature row).

\mathbf{v} = \begin{bmatrix} v_1 \\ v_2 \\ v_3 \end{bmatrix}

1.1 Operations

Addition:

\mathbf{u} + \mathbf{v} = \begin{bmatrix}u_1 + v_1 \\ u_2 + v_2 \\ u_3 + v_3\end{bmatrix}

Scalar multiplication:

c \mathbf{v} = \begin{bmatrix}c v_1 \\ c v_2 \\ c v_3\end{bmatrix}

Dot product:

\mathbf{u} \cdot \mathbf{v} = \sum_i u_i v_i

Length (Euclidean norm):

||\mathbf{v}||_2 = \sqrt{v_1^2 + v_2^2 + v_3^2}

TypeScript Examples:

export const addVec = (a: number[], b: number[]): number[] =>
  a.map((ai, i) => ai + b[i]);

export const scaleVec = (a: number[], c: number): number[] =>
  a.map((ai) => ai * c);

export const dot = (a: number[], b: number[]): number =>
  a.reduce((sum, ai, i) => sum + ai * b[i], 0);

export const norm = (a: number[]): number =>
  Math.sqrt(dot(a, a));

// Example
console.log(addVec([1, 2, 3], [4, 5, 6])); // [5,7,9]
console.log(dot([1, 2], [3, 4])); // 11
console.log(norm([3, 4])); // 5

MLA: Vectors are the atoms of ML — feature vectors, weight vectors, gradient vectors, all the same idea.

§2 Matrices

A matrix is a 2D array of numbers arranged in rows and columns:

A = \begin{bmatrix}a_11 & a_12 & \cdots & a_1n \\ a_21 & a_22 & \cdots & a_2n \\ \vdots & \vdots & \ddots & \vdots \\ a_m1 & a_m2 & \cdots & a_mn\end{bmatrix}

Dimensions: $m \times n$ (m rows, n columns).

Special Matrices

Square matrix: $m = n$
Identity matrix: $I$ with 1s on diagonal, 0s elsewhere
Zero matrix: All elements are 0
Diagonal matrix: Non-zero only on diagonal
Symmetric matrix: $A = A^T$

Matrix Operations

Addition: Element-wise, same dimensions required

C = A + B \Rightarrow c_{ij} = a_{ij} + b_{ij}

Scalar Multiplication:

B = cA \Rightarrow b_{ij} = c \cdot a_{ij}

Transpose: Flip rows and columns

(A^T)_{ij} = A_{ji}

Matrix Operations

type Matrix = number[][];

export const matrixAdd = (A: Matrix, B: Matrix): Matrix =>
  A.map((row, i) => row.map((val, j) => val + B[i][j]));

export const matrixScale = (c: number, A: Matrix): Matrix =>
  A.map(row => row.map(val => c * val));

export const transpose = (A: Matrix): Matrix => {
  const m = A.length, n = A[0].length;
  return Array.from({ length: n }, (_, j) =>
    Array.from({ length: m }, (_, i) => A[i][j])
  );
};

// Example
const A = [[1, 2], [3, 4]];
console.log(transpose(A)); // [[1, 3], [2, 4]]

MLA: Data matrices store samples as rows, features as columns. Weight matrices transform input spaces to output spaces.

§3 Matrix Multiplication

Matrix multiplication is not element-wise. For $C = AB$ , we compute:

c_{ij} = \sum_{k=1}^n a_{ik} b_{kj}

Requirement: Number of columns in A must equal number of rows in B.

Example

\begin{bmatrix} 1 & 2 \\ 3 & 4 \end{bmatrix} \begin{bmatrix} 5 & 6 \\ 7 & 8 \end{bmatrix} = \begin{bmatrix} 19 & 22 \\ 43 & 50 \end{bmatrix}

Matrix Multiplication

export const matrixMultiply = (A: Matrix, B: Matrix): Matrix => {
  const m = A.length, n = B[0].length, p = B.length;
  return Array.from({ length: m }, (_, i) =>
    Array.from({ length: n }, (_, j) =>
      A[i].reduce((sum, aik, k) => sum + aik * B[k][j], 0)
    )
  );
};

const A = [[1, 2], [3, 4]];
const B = [[5, 6], [7, 8]];
console.log(matrixMultiply(A, B)); // [[19, 22], [43, 50]]

Properties

Associative: $(AB)C = A(BC)$
Distributive: $A(B+C) = AB + AC$
NOT Commutative: $AB \ne BA$ (in general)
Transpose property: $(AB)^T = B^T A^T$

MLA: Matrix multiplication represents layer transformations in neural networks. Forward pass: $y = Wx + b$ .

§4 Determinants

The determinant is a scalar value computed from a square matrix. It indicates if the matrix is invertible.

2×2 Determinant

\det\begin{bmatrix} a & b \\ c & d \end{bmatrix} = ad - bc

3×3 Determinant (Sarrus Rule)

\det(A) = a_{11}(a_{22}a_{33} - a_{23}a_{32}) - a_{12}(a_{21}a_{33} - a_{23}a_{31}) + a_{13}(a_{21}a_{32} - a_{22}a_{31})

Computing 2×2 Determinant

export const det2x2 = (a: number, b: number, c: number, d: number): number =>
  a * d - b * c;

console.log(det2x2(3, 8, 4, 6)); // 3*6 - 8*4 = -14

Properties

If $\det(A) = 0$ , matrix is singular (not invertible)
If $\det(A) \ne 0$ , matrix is invertible
$\det(AB) = \det(A) \det(B)$
$\det(A^T) = \det(A)$

MLA: Determinants measure volume scaling. In ML, they appear in covariance matrices and Jacobians for change-of-variables in probability.

§5 Matrix Inverses

The inverse of matrix A, denoted $A^{-1}$ , satisfies:

AA^{-1} = A^{-1}A = I

Only square matrices with $\det(A) \ne 0$ have inverses.

2×2 Inverse Formula

A^{-1} = \frac{1}{ad-bc} \begin{bmatrix} d & -b \\ -c & a \end{bmatrix}

2×2 Matrix Inverse

export const inverse2x2 = (
  a: number, b: number,
  c: number, d: number
): number[][] | null => {
  const det = a * d - b * c;
  if (det === 0) return null;
  return [
    [d / det, -b / det],
    [-c / det, a / det]
  ];
};

const inv = inverse2x2(4, 7, 2, 6);
console.log(inv); // [[0.6, -0.7], [-0.2, 0.4]]

Properties

$(A^{-1})^{-1} = A$
$(AB)^{-1} = B^{-1}A^{-1}$
$(A^T)^{-1} = (A^{-1})^T$

MLA: Matrix inversion solves normal equations in linear regression: $\beta = (X^TX)^{-1}X^Ty$ .

§6 Eigenvalues & Eigenvectors

An eigenvector of matrix A is a vector that only gets scaled (not rotated) when multiplied by A:

A\vec{v} = \lambda\vec{v}

where $\lambda$ is the eigenvalue (scaling factor).

Finding Eigenvalues

Solve the characteristic equation:

\det(A - \lambda I) = 0

Example: 2×2 Matrix

For $A = \begin{bmatrix} 4 & 1 \\ 2 & 3 \end{bmatrix}$ :

\det\begin{bmatrix} 4-\lambda & 1 \\ 2 & 3-\lambda \end{bmatrix} = (4-\lambda)(3-\lambda) - 2 = 0

Solving: $\lambda^2 - 7\lambda + 10 = 0 \Rightarrow \lambda = 5, 2$

Properties

Sum of eigenvalues = trace of matrix (sum of diagonal elements)
Product of eigenvalues = determinant of matrix
Symmetric matrices have real eigenvalues and orthogonal eigenvectors

MLA: Principal Component Analysis (PCA) finds eigenvectors of the covariance matrix to identify directions of maximum variance.

§7 Orthogonality & Pseudoinverse

Orthogonal Vectors

Vectors $\vec{u}$ and $\vec{v}$ are orthogonal if:

\vec{u} \cdot \vec{v} = 0

Orthonormal Basis

A set of vectors is orthonormal if they are mutually orthogonal and each has unit length.

Moore-Penrose Pseudoinverse

For non-square or singular matrices, the pseudoinverse $A^{\dagger}$ generalizes the inverse:

A^{\dagger} = (A^T A)^{-1} A^T

(for full column rank matrices)

Properties

$AA^{\dagger}A = A$
$A^{\dagger}AA^{\dagger} = A^{\dagger}$
If A is invertible, $A^{\dagger} = A^{-1}$

MLA: The pseudoinverse is central to ELM: $\beta = H^{\dagger}T$ . It provides least-squares solutions even when exact solutions don't exist.

Chapter 3 — The Geometry of Learning

Sections 1–3 (Vectors, Hyperplanes, and the Geometry of Classification)

🎯 Learning Objectives

By the end of these sections, you will be able to:

Visualize data as vectors in geometric space.
Understand hyperplanes as geometric decision boundaries.
Compute distances, projections, and dot products as geometric tools.
Translate geometric concepts into TypeScript code used in ELM and KELM.
Recognize how ELM decision functions create linear and nonlinear boundaries.

Machine Learning Angle (MLA): Every model — from simple ELMs to deep transformers — carves shapes in space. Algebra describes them; geometry reveals how they separate and generalize.

§1 Vectors as Points and Directions

We've met vectors before as "ordered lists." Now we'll see them as arrows in space.

1.1 Points and Directions

Point: A position in space — like $(x, y)$ in 2D or $(x, y, z)$ in 3D.
Vector: An arrow with direction and length.

ASCII sketch (2D example):

(0,0) → (3,2)

y ↑
  |
3 |        *
2 |      *
1 |   *
0 +-----------→ x
    0   1   2   3

Here, the arrow from (0,0) to (3,2) is the vector v = (3,2).

1.2 Magnitude (Length)

\|\mathbf{v}\| = \sqrt{x^2 + y^2}

Example: $\\|(3,2)\\| = \\sqrt13 \\approx 3.606$

TypeScript Example

export const magnitude = (v: number[]): number =>
  Math.sqrt(v.reduce((s, vi) => s + vi * vi, 0));

console.log(magnitude([3, 2])); // 3.606

1.3 Normalization

To make a vector length 1:

\hat{\mathbf{v}} = \frac{\mathbf{v}}{\|\mathbf{v}\|}

TypeScript Example

export const normalize = (v: number[]): number[] => {
  const len = magnitude(v);
  return v.map((vi) => vi / len);
};

console.log(normalize([3, 2])); // [0.832, 0.554]

MLA: Feature normalization (unit vectors) keeps all features on equal footing, stabilizing kernel and distance computations.

§2 Hyperplanes and Linear Decision Boundaries

2.1 What Is a Hyperplane?

A hyperplane is the higher-dimensional cousin of a line.

Dimension	Object	Equation
1D	Point	$x = a$
2D	Line	$w_1x_1 + w_2x_2 + b = 0$
3D	Plane	$w_1x_1 + w_2x_2 + w_3x_3 + b = 0$
nD	Hyperplane	$\\mathbf{w}\\cdot\\mathbf{x} + b = 0$

The normal vector $\\mathbf{w}$ is perpendicular to the hyperplane — it defines its orientation.

ASCII illustration (2D line separating classes):

  Class +1 (Above line)
          o   o
       o
  --------------------  ←  w·x + b = 0
         x   x   x
  Class -1 (Below line)

2.2 Signed Distance from a Point to a Hyperplane

For a point $\\mathbf{x}$ and hyperplane $\\mathbf{w}\\cdot\\mathbf{x} + b = 0$ :

d = \frac{\mathbf{w}\cdot\mathbf{x} + b}{\|\mathbf{w}\|}

$d > 0$ → point is on the "positive" side
$d < 0$ → point is on the "negative" side

TypeScript Example

export const signedDistance = (w: number[], b: number, x: number[]): number => {
  const dotProd = w.reduce((s, wi, i) => s + wi * x[i], 0);
  const wMag = Math.sqrt(w.reduce((s, wi) => s + wi * wi, 0));
  return (dotProd + b) / wMag;
};

const w = [2, 1], b = -6;
console.log(signedDistance(w, b, [3, 2])); // 0.894 → positive class

MLA: This distance is the margin used in SVMs and ELMs to measure confidence. Larger margin = more confident classification.

2.3 Classifying with Hyperplanes

Decision rule:

\text{class}(\mathbf{x}) = \text{sign}(\mathbf{w}\cdot\mathbf{x} + b)

TypeScript Example

export const classify = (w: number[], b: number, x: number[]): number => {
  const score = w.reduce((s, wi, i) => s + wi * x[i], 0) + b;
  return Math.sign(score); // +1 or -1
};

console.log(classify([2, 1], -6, [3, 2])); // +1

MLA: ELM's output layer is just this: linear hyperplanes stacking the decisions from hidden neurons.

§3 Dot Products, Projections & Angles

3.1 The Dot Product

The dot product measures how much two vectors "align."

\mathbf{u}\cdot\mathbf{v} = u_1v_1 + u_2v_2 + \cdots + u_nv_n

TypeScript Example

export const dotProduct = (u: number[], v: number[]): number =>
  u.reduce((s, ui, i) => s + ui * v[i], 0);

console.log(dotProduct([1, 2, 3], [4, 5, 6])); // 32

MLA: The dot product is at the core of neural networks — every weighted sum $\\mathbf{w}\\cdot\\mathbf{x}$ is a dot product.

3.2 Vector Projection

Projection of vector $\\mathbf{a}$ onto $\\mathbf{b}$ :

\text{proj}_{\mathbf{b}}(\mathbf{a}) = \frac{\mathbf{a}\cdot\mathbf{b}}{\|\mathbf{b}\|^2}\mathbf{b}

TypeScript Example

export const project = (a: number[], b: number[]): number[] => {
  const dotProd = a.reduce((s, ai, i) => s + ai * b[i], 0);
  const bNormSq = b.reduce((s, bi) => s + bi * bi, 0);
  return b.map((bi) => (dotProd / bNormSq) * bi);
};

console.log(project([2, 3], [1, 0])); // projection onto x-axis => [2,0]

ASCII illustration:

y ↑
|        • a(2,3)
|       /
|      /   → projection (shadow of a on b)
|-----•-----------→ b-axis (1,0)

3.3 Angle Between Vectors

The angle $\\theta$ between vectors:

\cos(\theta) = \frac{\mathbf{u}\cdot\mathbf{v}}{\|\mathbf{u}\|\|\mathbf{v}\|}

TypeScript Example

export const angleBetween = (u: number[], v: number[]): number => {
  const dotProd = u.reduce((s, ui, i) => s + ui * v[i], 0);
  const magU = Math.sqrt(u.reduce((s, ui) => s + ui * ui, 0));
  const magV = Math.sqrt(v.reduce((s, vi) => s + vi * vi, 0));
  return Math.acos(dotProd / (magU * magV)); // radians
};

console.log(angleBetween([1, 0], [0, 1])); // ≈ 1.5708 radians (90°)

MLA: The angle between feature vectors corresponds to correlation. Orthogonal features (90° apart) are statistically independent — a holy grail in model design.

End of Sections 1–3

Next: Feature Space, Kernels, and Visualizing ELM Decision Boundaries.

Chapter 3 — The Geometry of Learning

Sections 4–7 (Feature Space, Kernels, Visualizing ELM Boundaries, Practice Problems & Summary)

🎯 Learning Objectives

By the end of this part, you'll be able to:

Explain what "feature space" means and how kernels warp it.
Understand geometric intuition behind kernel-based ELMs.
Visualize linear and nonlinear decision boundaries.
Practice computing hyperplanes and distances for classification tasks.
Connect geometry, algebra, and ELM's architecture into one mental model.

§4 Feature Space and Kernels — Bending the World

4.1 What Is Feature Space?

Each input vector $\\mathbf{x} = (x_1, x_2, \\ldots, x_d)$ is a point in feature space. When we add features (like polynomial or trigonometric ones), we move to higher dimensions.

Example:

Original input: $x = [a, b]$

Augmented feature space: $[a, b, a^2, ab, b^2]$

ASCII metaphor:

Feature space (2D)
   x2 ↑
      o  o
   o
---+----------→ x1
   o
      o

Now imagine lifting it to 3D — each point's height ( $x_3$ ) could represent $a^2+b^2$ . That's feature mapping.

MLA: In ELM/KELM, hidden-layer activations (tanh, sigmoid, etc.) implicitly map data into a new curved feature space where classes are linearly separable.

4.2 Kernels — Computing in the Warped Space

A kernel function computes similarity without explicitly transforming the data.

K(x, z) = \phi(x) \cdot \phi(z)

Common kernels:

Type	Formula	Geometry
Linear	$K(x,z)=x\\cdot z$	Flat space
Polynomial	$(x\\cdot z + c)^d$	Curved surfaces
RBF (Gaussian)	$\\exp(-\\gamma\\\|x-z\\\|^2)$	Infinite-dimensional smoothness

TypeScript Example:

export const rbfKernel = (x: number[], z: number[], gamma: number): number => {
  const distSq = x.reduce((s, xi, i) => s + Math.pow(xi - z[i], 2), 0);
  return Math.exp(-gamma * distSq);
};

console.log(rbfKernel([1, 2], [1.5, 2.5], 0.5)); // 0.882

ASCII sketch:

Flat space (linear):      Curved space (kernel):
  x x x                        x
   o                            x
  x x x                        x

MLA: Kernels let ELMs separate complex shapes (like circles or spirals) by treating curved surfaces as "flat" in a hidden dimension.

§5 Visualizing ELM Decision Boundaries

ELMs learn decision surfaces that look like folded sheets in high-dimensional space.

5.1 Linear Boundary Example (2D)

f(x_1,x_2)=w_1x_1+w_2x_2+b

When $f(x)=0$ , you're on the boundary.

ASCII Visualization:

  +1 side (positive class)
     o o o
  ----------   ←  f(x)=0
     x x x
  -1 side (negative class)

5.2 Nonlinear Boundary with RBF Kernel

With kernel mapping, the surface bends:

 Curved separation (2D)
 
   +++++
   +   +   o o o
   +   +  o
   +++++------------
       o    o o

The boundary now wraps around data points, forming islands of classification.

5.3 ELM Decision Function

f(x) = \sum_{i=1}^{N} \beta_i K(x, x_i) + b

$K(x, x_i)$ measures similarity to each training point.
$\\beta_i$ are learned output weights.
The decision boundary is where $f(x)=0$ .

TypeScript Example (Mini KELM Predictor):

export function kelmPredict(Xtrain: number[][], beta: number[], b: number, x: number[], gamma: number): number {
  const f = Xtrain.reduce((sum, xi, i) => sum + beta[i] * rbfKernel(x, xi, gamma), 0) + b;
  return Math.sign(f);
}

MLA: ELMs learn not just "weights" but geometry — reshaping data space to make categories separable.

§6 Practice Problems — Geometry in Action

Compute the distance from point $x=(2,3)$ to line $2x_1+x_2-6=0$ .
Normalize vector $v=(3,4)$ .
Compute $\\text{proj}_b(a)$ for $a=(2,3)$ , $b=(1,0)$ .
Find angle between $u=(1,0)$ and $v=(1,1)$ .
For $f(x)=2x_1+x_2-6$ , find $f(3,2)$ and predict the class.
Given $w=(2,1), b=-6$ , what is the margin for $x=(3,2)$ ?
Compute RBF kernel between $x=[1,2]$ and $z=[2,3]$ with $\\gamma=0.5$ .
Describe geometrically what $\\gamma$ controls in the RBF kernel.
Plot (conceptually) the boundary $x_1^2+x_2^2=9$ . What shape is it?
In your own words, explain how kernels make nonlinear separation possible.

✅ Answer Key (Sketch Solutions)

$d=\frac{2(2)+3-6}{\sqrt{5}}=0.447$
$\hat{v}=(0.6,0.8)$
$\text{proj}_b(a)=(2,0)$
$\theta=45^\circ$
$f(3,2)=2*3+2-6=2\Rightarrow \text{class}=+1$
$d=\frac{2*3+1*2-6}{\sqrt{5}}=0.894$
$K=\exp(-0.5((1-2)^2+(2-3)^2))=\exp(-1)=0.3679$
$\gamma$ controls "tightness" of curvature — higher $\gamma$ = tighter decision zones.
Circle with radius 3 centered at origin.
Kernels lift flat input space into curved feature space where linear cuts become curved in original space.

§7 Glossary & Chapter Summary

📘 Glossary

Term	Definition
Feature Space	The abstract space where each dimension represents a feature.
Hyperplane	A flat boundary separating two regions in feature space.
Margin	Distance from a point to the decision boundary.
Kernel	Function that computes inner products in transformed feature space.
RBF Kernel	Exponential similarity function producing smooth curved boundaries.
Projection	Shadow of one vector on another.
Angle Between Vectors	Measure of similarity (cosine of angle = correlation).
Decision Boundary	Geometric surface dividing predicted classes.
Nonlinear Mapping	Transforming inputs so that they become linearly separable.
Feature Normalization	Scaling feature vectors to unit length.

🧠 Chapter Summary

Geometry is the soul of learning — every model is a shape transformer.
Linear boundaries (hyperplanes) are easy; kernels bend them into curves.
Distances, projections, and angles govern how models compare data.
ELM/KELM visualize as hyperplanes in hidden space — smooth, separable surfaces built from algebraic magic.
Once you understand geometry, model behavior becomes visible math, not black boxes.

You now see what the machine sees — data as geometry.

🎓 Conclusion

Congratulations! You've completed the core mathematical foundations needed for machine learning. These concepts form the backbone of algorithms from linear regression to deep neural networks.

Continue exploring by implementing these concepts in your own projects, and see how they power the learning machines you build with AsterMind's ELM technology.

Published: November 30, 2025

Last updated: December 10, 2025

Cookie Preferences

Math for Machine Minds

Table of Contents

Chapter 1: Algebra Refresher

Chapter 2: Linear Algebra

Chapter 3: Geometry of Learning

Chapter 1 — Algebra Refresher: The Language of Math Machines

Sections 1–3

🎯 Learning Objectives

1. Numbers, Operations & Order of Operations

Properties Table

2. Variables, Constants, Coefficients & Expressions

3. Equations — Solving for the Unknown

Linear Example

Quadratic Example

General Quadratic Formula

4. Inequalities — Because Life Isn't Always Equal

Golden Rule

Compound Inequalities

5. Functions — Input–Output Machines

5.1 Domain and Range

5.2 Composition and Inverses

5.3 Common Function Shapes

Inverse Example

6. Polynomials, Factoring, and Roots

6.1 Factoring Example

6.2 Special Identities

6.3 Polynomial Evaluation (Horner's Method)

6.4 Finding Roots

7. Exponents & Logarithms

7.1 Exponents Recap

Example:

Negative and fractional powers:

7.2 Logarithms — The Undo Button for Exponents

Key Identities:

Natural Logarithm:

7.3 TypeScript Utilities

8. Linear Equations in Two Variables — Lines and Slopes

8.1 Finding the Equation from Two Points

8.2 Parallel and Perpendicular Lines

9. Systems of Linear Equations — Working in Teams

9.1 Substitution Example

9.2 Determinant (Cramer's Rule for 2×2)

9.3 TypeScript Solver

10. Distance and Vectors — A Gentle Preview

10.1 Vector Operations

10.2 Euclidean Distance

11. Why Algebra Powers ELM/KELM

11.1 Extreme Learning Machines (ELM)

11.2 Kernel ELM (KELM)

11.3 Recursive Least Squares (RLS)

Chapter 2 — Linear Algebra: The Geometry of Data

🎯 Learning Objectives

§1 Vectors — Data in Line Form

1.1 Operations

TypeScript Examples:

§2 Matrices

Special Matrices

Matrix Operations

§3 Matrix Multiplication

Example

Properties

§4 Determinants

2×2 Determinant

3×3 Determinant (Sarrus Rule)

Properties

§5 Matrix Inverses

2×2 Inverse Formula

Properties

§6 Eigenvalues & Eigenvectors

Finding Eigenvalues

Example: 2×2 Matrix

Properties

§7 Orthogonality & Pseudoinverse

Orthogonal Vectors

Orthonormal Basis

Moore-Penrose Pseudoinverse

Properties

Chapter 3 — The Geometry of Learning

Sections 1–3 (Vectors, Hyperplanes, and the Geometry of Classification)