Cookie Preferences

    We use cookies to enhance your browsing experience, analyze site traffic, and personalize content. By clicking "Accept All", you consent to our use of cookies. Learn more

    Educational Resource

    Math for Machine Minds

    A textbook refresher for math used in AI and machine learning

    Chapter 1 — Algebra Refresher: The Language of Math Machines

    Sections 1–3

    🎯 Learning Objectives

    By the end of these sections, you will be able to:

    1. Explain variables, constants, coefficients, and expressions.
    2. Simplify algebraic expressions and evaluate linear equations.
    3. Solve linear and quadratic equations algebraically and programmatically.
    4. Connect algebraic manipulation to machine learning intuition.
    Machine Learning Angle (MLA): Algebra is the grammar of ML. A linear model is just an algebraic sentence: y^=w0+w1x1++wdxd\hat{y} = w_0 + w_1x_1 + \dots + w_dx_d

    1. Numbers, Operations & Order of Operations

    Numbers you'll meet often: integers (Z\mathbb{Z}), rationals (Q\mathbb{Q}), reals (R\mathbb{R}). Computers approximate reals with floating points (IEEE 754).

    Order of operations (PEMDAS): Parentheses → Exponents → Multiplication/Division → Addition/Subtraction.

    Example:

    3+2(51)2=3+242=3+32=353 + 2 \cdot (5 - 1)^2 = 3 + 2 \cdot 4^2 = 3 + 32 = 35

    Properties Table

    PropertyFormulaExample
    Distributivea(b+c)=ab+aca(b+c)=ab+ac2(3+5)=162(3+5)=16
    Commutativea+b=b+aa+b=b+a4+7=7+44+7=7+4
    Associative(a+b)+c=a+(b+c)(a+b)+c=a+(b+c)(2+3)+4=2+(3+4)(2+3)+4=2+(3+4)

    TypeScript Example

    export const pemdasExample = (): number => 3 + 2 * Math.pow(5 - 1, 2); // 35
    MLA: Loss functions and kernel computations rely on strict order of operations. Misplacing parentheses can completely change training results.

    2. Variables, Constants, Coefficients & Expressions

    • Variable: placeholder for many possible values (xx).
    • Constant: fixed number (π,e,42\pi, e, 42).
    • Coefficient: multiplier on a variable (3 in 3x).
    • Expression: combination of numbers and symbols without an equals sign.

    Simplify example:

    2x+3x4+(x2)=(2x+3x+x)+(42)=6x62x + 3x - 4 + (x - 2) = (2x + 3x + x) + (-4 - 2) = 6x - 6

    TypeScript Example

    // evaluate a*x + b for given x
    export const linearExpr = (a: number, b: number, x: number): number => a * x + b;
    
    // Try it
    console.log(linearExpr(3, 2, 4)); // 3*4+2 = 14
    MLA: In linear regression, coefficients are weights, and constants are bias terms. Combining like terms parallels model simplification.

    3. Equations — Solving for the Unknown

    An equation says two expressions are equal. The goal: isolate the variable.

    Linear Example

    3x+5=113x=6x=23x + 5 = 11 \Rightarrow 3x = 6 \Rightarrow x = 2

    Quadratic Example

    x25x+6=0(x2)(x3)=0x{2,3}x^2 - 5x + 6 = 0 \Rightarrow (x - 2)(x - 3) = 0 \Rightarrow x \in \{2,3\}

    General Quadratic Formula

    x=b±b24ac2ax = \frac{-b \pm \sqrt{b^2 - 4ac}}{2a}

    TypeScript Helper

    export const solveLinear = (a: number, b: number): number => {
      // Solve a*x + b = 0
      if (a === 0) throw new Error("No unique solution");
      return -b / a;
    };
    
    // Usage
    console.log(solveLinear(3, 5)); // -1.666...
    MLA: Solving for x mirrors finding model parameters analytically (closed-form ridge regression).

    4. Inequalities — Because Life Isn't Always Equal

    Inequalities describe ranges of possible values instead of single solutions. They use the symbols: >>, <<, \ge, \le.

    Golden Rule

    When you multiply or divide an inequality by a negative number, flip the sign!

    Example:

    2x<6x>3-2x < 6 \Rightarrow x > -3

    Compound Inequalities

    1x<2-1 \le x < 2

    means "x is between -1 and 2, including -1 but not 2."

    Interval notation: [1,2)[-1, 2)

    Visual Intuition: Think of inequalities as "zones of truth" on a number line.

    TypeScript Example

    export function inInterval(x: number, a: number, b: number): boolean {
      return a <= x && x < b;
    }
    
    // Example
    console.log(inInterval(1.5, -1, 2)); // true
    console.log(inInterval(2.5, -1, 2)); // false
    MLA: Regularization often introduces inequality-like constraints. For example, "keep weights small" (w<λ|w| < \lambda) ensures stability and prevents overfitting.

    5. Functions — Input–Output Machines

    A function is a rule that takes an input and produces an output.

    f:XYf: X \to Y

    If y=f(x)y = f(x), then yy depends on xx.

    5.1 Domain and Range

    • Domain: All valid inputs.
    • Range: All possible outputs.

    Example: f(x)=1xf(x) = \frac{1}{x} – domain excludes x=0x=0.

    5.2 Composition and Inverses

    Composition stacks functions:

    (fg)(x)=f(g(x))(f \circ g)(x) = f(g(x))

    Inverse functions undo each other:

    f1(f(x))=xf^{-1}(f(x)) = x

    5.3 Common Function Shapes

    FunctionFormulaKey Idea
    Linearf(x)=mx+bf(x)=mx+bStraight line
    Quadraticf(x)=ax2+bx+cf(x)=ax^2+bx+cParabola
    Exponentialf(x)=abxf(x)=a \cdot b^xRapid growth
    Logarithmicf(x)=logbxf(x)=\log_b xInverse of exponential

    Function Composition Example

    export const compose = <T, U, V>(f: (u: U) => V, g: (t: T) => U) => (x: T): V => f(g(x));
    
    export const f = (x: number) => 2 * x + 1;
    export const g = (x: number) => x * x;
    
    console.log(compose(f, g)(3)); // f(g(3)) = 2*(3^2)+1 = 19

    Inverse Example

    f(x)=4x9f1(x)=x+94f(x)=4x-9 \Rightarrow f^{-1}(x)=\frac{x+9}{4}

    TypeScript Check

    export const fInv = (x: number): number => (x + 9) / 4;
    MLA: In ML, "composition" corresponds to layer stacking, and "inverse" appears in autoencoders or when reconstructing features.

    6. Polynomials, Factoring, and Roots

    Polynomials are multi-term expressions built from powers of x:

    p(x)=anxn+an1xn1++a1x+a0p(x)=a_nx^n+a_{n-1}x^{n-1}+\dots+a_1x+a_0

    The degree is the highest exponent (e.g., quadratic = degree 2). Roots (or zeros) make p(x)=0p(x)=0.

    6.1 Factoring Example

    x29=(x3)(x+3)x^2 - 9 = (x-3)(x+3)

    6.2 Special Identities

    IdentityFormula
    Difference of squaresa2b2=(ab)(a+b)a^2 - b^2 = (a-b)(a+b)
    Perfect square(a+b)2=a2+2ab+b2(a+b)^2 = a^2 + 2ab + b^2
    Sum of cubesa3+b3=(a+b)(a2ab+b2)a^3+b^3=(a+b)(a^2-ab+b^2)

    6.3 Polynomial Evaluation (Horner's Method)

    For efficiency, evaluate p(x)=((anx+an1)x+)x+a0p(x) = ((a_nx + a_{n-1}')x + \dots)x + a_0.

    Horner's Method in TypeScript

    export const evalPoly = (coeffs: number[], x: number): number => {
      // coeffs = [a_n, a_{n-1}, ..., a_1, a_0]
      let result = 0;
      for (const coeff of coeffs) {
        result = result * x + coeff;
      }
      return result;
    };
    
    // Example: p(x) = 2x^2 + 3x + 1, evaluate at x=4
    console.log(evalPoly([2, 3, 1], 4)); // 2*16 + 3*4 + 1 = 45
    MLA: Polynomial basis functions expand feature spaces (kernel tricks, polynomial regression).

    6.4 Finding Roots

    For quadratics, factor if possible; otherwise use:

    x=b±b24ac2ax = \frac{-b \pm \sqrt{b^2 - 4ac}}{2a}

    TypeScript Example

    export function solveQuadratic(a: number, b: number, c: number): [number, number] | string {
      const disc = b * b - 4 * a * c;
      if (disc < 0) return "No real roots";
      const sqrtDisc = Math.sqrt(disc);
      const x1 = (-b + sqrtDisc) / (2 * a);
      const x2 = (-b - sqrtDisc) / (2 * a);
      return [x1, x2];
    }
    
    console.log(solveQuadratic(1, -5, 6)); // [3, 2]
    MLA: Polynomial intuition helps when understanding kernel expansions, Taylor approximations, and model stability near critical points.

    7. Exponents & Logarithms

    7.1 Exponents Recap

    Exponents represent repeated multiplication.

    aman=am+n,aman=amn,(am)n=amna^m \cdot a^n = a^{m+n}, \quad \frac{a^m}{a^n} = a^{m-n}, \quad (a^m)^n = a^{mn}

    Example:

    2324=27=1282^3 \cdot 2^4 = 2^7 = 128

    Negative and fractional powers:

    an=1an,a1/n=ana^{-n} = \frac{1}{a^n}, \quad a^{1/n} = \sqrt[n]{a}

    7.2 Logarithms — The Undo Button for Exponents

    y=logax    ay=xy = \log_a x \iff a^y = x

    If 23=8,2^3 = 8, then log28=3.\log_2 8 = 3.

    Key Identities:

    loga(xy)=logax+logay\log_a(xy) = \log_a x + \log_a y
    loga(xy)=logaxlogay\log_a\left(\frac{x}{y}\right) = \log_a x - \log_a y
    loga(xk)=klogax\log_a(x^k) = k \log_a x

    Natural Logarithm:

    lnx=logex\ln x = \log_e x where e2.71828.e \approx 2.71828.

    7.3 TypeScript Utilities

    TypeScript Implementation

    export const power = (a: number, n: number): number => Math.pow(a, n);
    export const logBase = (x: number, a: number): number => Math.log(x) / Math.log(a);
    
    // Examples
    console.log(power(2, 3)); // 8
    console.log(logBase(8, 2)); // 3
    MLA: Exponentials model growth/decay (learning rates, activations). Logarithms appear in log-loss and entropy calculations.

    8. Linear Equations in Two Variables — Lines and Slopes

    A line can be written as:

    y=mx+by = mx + b

    Where:

    • mm = slope (rise/run)
    • bb = y-intercept (value when x=0x=0)

    8.1 Finding the Equation from Two Points

    Given points A(x1,y1)A(x_1, y_1) and B(x2,y2)B(x_2, y_2):

    m=y2y1x2x1,b=y1mx1m = \frac{y_2 - y_1}{x_2 - x_1}, \quad b = y_1 - m x_1

    TypeScript Example

    export function lineThrough(p1: [number, number], p2: [number, number]) {
      const [x1, y1] = p1, [x2, y2] = p2;
      if (x1 === x2) return { vertical: true, x: x1 };
      const m = (y2 - y1) / (x2 - x1);
      const b = y1 - m * x1;
      return { m, b, vertical: false };
    }
    
    console.log(lineThrough([2, 3], [6, 11])); // { m: 2, b: -1 }

    8.2 Parallel and Perpendicular Lines

    • Parallel lines: same slope (m1=m2m_1 = m_2).
    • Perpendicular lines: slopes are negative reciprocals (m1m2=1m_1 m_2 = -1).
    MLA: In high dimensions, linear equations generalize to hyperplanes, which separate data classes in linear models.

    9. Systems of Linear Equations — Working in Teams

    A system is a group of equations with shared variables.

    Example:

    2x+y=7xy=1\begin{aligned}2x + y &= 7 \\ x - y &= 1\end{aligned}

    We can solve by substitution, elimination, or matrix methods.

    9.1 Substitution Example

    x=y+1x = y + 1

    Substitute into 2x+y=72(y+1)+y=72x + y = 7 \Rightarrow 2(y + 1) + y = 73y=5y=53,x=83. \Rightarrow 3y = 5 \Rightarrow y = \frac{5}{3}, x = \frac{8}{3}.

    9.2 Determinant (Cramer's Rule for 2×2)

    For:

    a1x+b1y=c1a2x+b2y=c2\begin{aligned}a_1x + b_1y &= c_1 \\ a_2x + b_2y &= c_2\end{aligned}

    Determinant:

    D=a1b2a2b1D = a_1b_2 - a_2b_1

    Solutions:

    x=c1b2c2b1D,y=a1c2a2c1Dx = \frac{c_1b_2 - c_2b_1}{D}, \quad y = \frac{a_1c_2 - a_2c_1}{D}

    9.3 TypeScript Solver

    TypeScript Implementation

    export function solve2x2(a: number, b: number, c: number, d: number, e: number, f: number) {
      const det = a * d - b * c;
      if (Math.abs(det) < 1e-12) throw new Error("No unique solution");
      const x = (e * d - b * f) / det;
      const y = (a * f - e * c) / det;
      return { x, y };
    }
    
    console.log(solve2x2(2, 1, 1, -1, 7, 1)); // { x: 8/3, y: 5/3 }
    MLA: Solving systems is how we train linear models. Ridge regression simply adds a term λI\lambda I to stabilize this solve.

    10. Distance and Vectors — A Gentle Preview

    A vector is an ordered list of numbers. In 2D: v=(v1,v2)\mathbf{v} = (v_1, v_2).

    10.1 Vector Operations

    Addition:

    u+v=(u1+v1,u2+v2)\mathbf{u} + \mathbf{v} = (u_1+v_1, u_2+v_2)

    Scaling:

    cv=(cv1,cv2)c\mathbf{v} = (cv_1, cv_2)

    10.2 Euclidean Distance

    Between points x\mathbf{x} and z\mathbf{z}:

    d(x,z)=(x1z1)2+(x2z2)2d(\mathbf{x}, \mathbf{z}) = \sqrt{(x_1-z_1)^2 + (x_2-z_2)^2}

    TypeScript Example

    export const l2 = (v: number[]) => Math.sqrt(v.reduce((s, vi) => s + vi * vi, 0));
    export const dist = (a: number[], b: number[]) => l2(a.map((ai, i) => ai - b[i]));
    
    console.log(dist([3, -1], [-1, 2])); // 5
    MLA: RBF kernels depend on vector distances: K(x,z)=exp(γxz2)K(\mathbf{x}, \mathbf{z}) = \exp(-\gamma \|\mathbf{x} - \mathbf{z}\|^2).

    11. Why Algebra Powers ELM/KELM

    Let's connect all your algebra knowledge to the math inside AsterMind's ELM and KELM engines.

    11.1 Extreme Learning Machines (ELM)

    The ELM's core equation looks very familiar:

    Y^=Hβ\hat{Y} = H \beta

    Here:

    • HH is a hidden layer matrix of nonlinear features (like σ(Wx+b)\sigma(Wx + b))
    • β\beta are output weights we solve for
    • Y^\hat{Y} are predictions

    To find β\beta, ELM doesn't "train" with epochs — it solves algebraically:

    β=(HTH+λI)1HTY\beta = (H^T H + \lambda I)^{-1} H^T Y

    That's just algebra on steroids — using matrix inverses (Chapter 2) and ridge regression (Chapter 4).

    11.2 Kernel ELM (KELM)

    In Kernel ELM, we swap HH with a kernel matrix KK that measures similarities between data points:

    β=(K+λI)1Y\beta = (K + \lambda I)^{-1} Y

    So when you learned to solve for xx in ax+b=0ax+b=0, you were already learning how KELM computes its weights!

    Key idea: Algebra isn't old math — it's the foundation of fast, closed-form learning.

    11.3 Recursive Least Squares (RLS)

    Online learning adds new data and updates β\beta recursively:

    βk=βk1+Kk(YkHkβk1)\beta_k = \beta_{k-1} + K_k (Y_k - H_k \beta_{k-1})

    Still algebra: we're updating a parameter by adding a correction term.

    Every adaptive system in ARS — from schema updates to Starfish activations — is built on these algebraic building blocks.

    Chapter 2 — Linear Algebra: The Geometry of Data

    🎯 Learning Objectives

    By the end of this chapter, you will be able to:

    1. Perform vector and matrix operations fluently.
    2. Compute determinants and inverses.
    3. Understand eigenvalues and eigenvectors.
    4. Apply linear algebra concepts to machine learning problems.

    §1 Vectors — Data in Line Form

    A vector is just an ordered list of numbers — think of it as a spreadsheet column that describes a single thing (like a form entry or feature row).

    v=[v1v2v3]\mathbf{v} = \begin{bmatrix} v_1 \\ v_2 \\ v_3 \end{bmatrix}

    1.1 Operations

    Addition:

    u+v=[u1+v1u2+v2u3+v3]\mathbf{u} + \mathbf{v} = \begin{bmatrix}u_1 + v_1 \\ u_2 + v_2 \\ u_3 + v_3\end{bmatrix}

    Scalar multiplication:

    cv=[cv1cv2cv3]c \mathbf{v} = \begin{bmatrix}c v_1 \\ c v_2 \\ c v_3\end{bmatrix}

    Dot product:

    uv=iuivi\mathbf{u} \cdot \mathbf{v} = \sum_i u_i v_i

    Length (Euclidean norm):

    v2=v12+v22+v32||\mathbf{v}||_2 = \sqrt{v_1^2 + v_2^2 + v_3^2}

    TypeScript Examples:

    export const addVec = (a: number[], b: number[]): number[] =>
      a.map((ai, i) => ai + b[i]);
    
    export const scaleVec = (a: number[], c: number): number[] =>
      a.map((ai) => ai * c);
    
    export const dot = (a: number[], b: number[]): number =>
      a.reduce((sum, ai, i) => sum + ai * b[i], 0);
    
    export const norm = (a: number[]): number =>
      Math.sqrt(dot(a, a));
    
    // Example
    console.log(addVec([1, 2, 3], [4, 5, 6])); // [5,7,9]
    console.log(dot([1, 2], [3, 4])); // 11
    console.log(norm([3, 4])); // 5
    MLA: Vectors are the atoms of ML — feature vectors, weight vectors, gradient vectors, all the same idea.

    §2 Matrices

    A matrix is a 2D array of numbers arranged in rows and columns:

    A=[a11a12a1na21a22a2nam1am2amn]A = \begin{bmatrix}a_11 & a_12 & \cdots & a_1n \\ a_21 & a_22 & \cdots & a_2n \\ \vdots & \vdots & \ddots & \vdots \\ a_m1 & a_m2 & \cdots & a_mn\end{bmatrix}

    Dimensions: m×nm \times n (m rows, n columns).

    Special Matrices

    • Square matrix: m=nm = n
    • Identity matrix: II with 1s on diagonal, 0s elsewhere
    • Zero matrix: All elements are 0
    • Diagonal matrix: Non-zero only on diagonal
    • Symmetric matrix: A=ATA = A^T

    Matrix Operations

    Addition: Element-wise, same dimensions required

    C=A+Bcij=aij+bijC = A + B \Rightarrow c_{ij} = a_{ij} + b_{ij}

    Scalar Multiplication:

    B=cAbij=caijB = cA \Rightarrow b_{ij} = c \cdot a_{ij}

    Transpose: Flip rows and columns

    (AT)ij=Aji(A^T)_{ij} = A_{ji}

    Matrix Operations

    type Matrix = number[][];
    
    export const matrixAdd = (A: Matrix, B: Matrix): Matrix =>
      A.map((row, i) => row.map((val, j) => val + B[i][j]));
    
    export const matrixScale = (c: number, A: Matrix): Matrix =>
      A.map(row => row.map(val => c * val));
    
    export const transpose = (A: Matrix): Matrix => {
      const m = A.length, n = A[0].length;
      return Array.from({ length: n }, (_, j) =>
        Array.from({ length: m }, (_, i) => A[i][j])
      );
    };
    
    // Example
    const A = [[1, 2], [3, 4]];
    console.log(transpose(A)); // [[1, 3], [2, 4]]
    MLA: Data matrices store samples as rows, features as columns. Weight matrices transform input spaces to output spaces.

    §3 Matrix Multiplication

    Matrix multiplication is not element-wise. For C=ABC = AB, we compute:

    cij=k=1naikbkjc_{ij} = \sum_{k=1}^n a_{ik} b_{kj}

    Requirement: Number of columns in A must equal number of rows in B.

    Example

    [1234][5678]=[19224350]\begin{bmatrix} 1 & 2 \\ 3 & 4 \end{bmatrix} \begin{bmatrix} 5 & 6 \\ 7 & 8 \end{bmatrix} = \begin{bmatrix} 19 & 22 \\ 43 & 50 \end{bmatrix}

    Matrix Multiplication

    export const matrixMultiply = (A: Matrix, B: Matrix): Matrix => {
      const m = A.length, n = B[0].length, p = B.length;
      return Array.from({ length: m }, (_, i) =>
        Array.from({ length: n }, (_, j) =>
          A[i].reduce((sum, aik, k) => sum + aik * B[k][j], 0)
        )
      );
    };
    
    const A = [[1, 2], [3, 4]];
    const B = [[5, 6], [7, 8]];
    console.log(matrixMultiply(A, B)); // [[19, 22], [43, 50]]

    Properties

    • Associative: (AB)C=A(BC)(AB)C = A(BC)
    • Distributive: A(B+C)=AB+ACA(B+C) = AB + AC
    • NOT Commutative: ABBAAB \ne BA (in general)
    • Transpose property: (AB)T=BTAT(AB)^T = B^T A^T
    MLA: Matrix multiplication represents layer transformations in neural networks. Forward pass: y=Wx+by = Wx + b.

    §4 Determinants

    The determinant is a scalar value computed from a square matrix. It indicates if the matrix is invertible.

    2×2 Determinant

    det[abcd]=adbc\det\begin{bmatrix} a & b \\ c & d \end{bmatrix} = ad - bc

    3×3 Determinant (Sarrus Rule)

    det(A)=a11(a22a33a23a32)a12(a21a33a23a31)+a13(a21a32a22a31)\det(A) = a_{11}(a_{22}a_{33} - a_{23}a_{32}) - a_{12}(a_{21}a_{33} - a_{23}a_{31}) + a_{13}(a_{21}a_{32} - a_{22}a_{31})

    Computing 2×2 Determinant

    export const det2x2 = (a: number, b: number, c: number, d: number): number =>
      a * d - b * c;
    
    console.log(det2x2(3, 8, 4, 6)); // 3*6 - 8*4 = -14

    Properties

    • If det(A)=0\det(A) = 0, matrix is singular (not invertible)
    • If det(A)0\det(A) \ne 0, matrix is invertible
    • det(AB)=det(A)det(B)\det(AB) = \det(A) \det(B)
    • det(AT)=det(A)\det(A^T) = \det(A)
    MLA: Determinants measure volume scaling. In ML, they appear in covariance matrices and Jacobians for change-of-variables in probability.

    §5 Matrix Inverses

    The inverse of matrix A, denoted A1A^{-1}, satisfies:

    AA1=A1A=IAA^{-1} = A^{-1}A = I

    Only square matrices with det(A)0\det(A) \ne 0 have inverses.

    2×2 Inverse Formula

    A1=1adbc[dbca]A^{-1} = \frac{1}{ad-bc} \begin{bmatrix} d & -b \\ -c & a \end{bmatrix}

    2×2 Matrix Inverse

    export const inverse2x2 = (
      a: number, b: number,
      c: number, d: number
    ): number[][] | null => {
      const det = a * d - b * c;
      if (det === 0) return null;
      return [
        [d / det, -b / det],
        [-c / det, a / det]
      ];
    };
    
    const inv = inverse2x2(4, 7, 2, 6);
    console.log(inv); // [[0.6, -0.7], [-0.2, 0.4]]

    Properties

    • (A1)1=A(A^{-1})^{-1} = A
    • (AB)1=B1A1(AB)^{-1} = B^{-1}A^{-1}
    • (AT)1=(A1)T(A^T)^{-1} = (A^{-1})^T
    MLA: Matrix inversion solves normal equations in linear regression: β=(XTX)1XTy\beta = (X^TX)^{-1}X^Ty.

    §6 Eigenvalues & Eigenvectors

    An eigenvector of matrix A is a vector that only gets scaled (not rotated) when multiplied by A:

    Av=λvA\vec{v} = \lambda\vec{v}

    where λ\lambda is the eigenvalue (scaling factor).

    Finding Eigenvalues

    Solve the characteristic equation:

    det(AλI)=0\det(A - \lambda I) = 0

    Example: 2×2 Matrix

    For A=[4123]A = \begin{bmatrix} 4 & 1 \\ 2 & 3 \end{bmatrix}:

    det[4λ123λ]=(4λ)(3λ)2=0\det\begin{bmatrix} 4-\lambda & 1 \\ 2 & 3-\lambda \end{bmatrix} = (4-\lambda)(3-\lambda) - 2 = 0

    Solving: λ27λ+10=0λ=5,2\lambda^2 - 7\lambda + 10 = 0 \Rightarrow \lambda = 5, 2

    Properties

    • Sum of eigenvalues = trace of matrix (sum of diagonal elements)
    • Product of eigenvalues = determinant of matrix
    • Symmetric matrices have real eigenvalues and orthogonal eigenvectors
    MLA: Principal Component Analysis (PCA) finds eigenvectors of the covariance matrix to identify directions of maximum variance.

    §7 Orthogonality & Pseudoinverse

    Orthogonal Vectors

    Vectors u\vec{u} and v\vec{v} are orthogonal if:

    uv=0\vec{u} \cdot \vec{v} = 0

    Orthonormal Basis

    A set of vectors is orthonormal if they are mutually orthogonal and each has unit length.

    Moore-Penrose Pseudoinverse

    For non-square or singular matrices, the pseudoinverse AA^{\dagger} generalizes the inverse:

    A=(ATA)1ATA^{\dagger} = (A^T A)^{-1} A^T

    (for full column rank matrices)

    Properties

    • AAA=AAA^{\dagger}A = A
    • AAA=AA^{\dagger}AA^{\dagger} = A^{\dagger}
    • If A is invertible, A=A1A^{\dagger} = A^{-1}
    MLA: The pseudoinverse is central to ELM: β=HT\beta = H^{\dagger}T. It provides least-squares solutions even when exact solutions don't exist.

    Chapter 3 — The Geometry of Learning

    Sections 1–3 (Vectors, Hyperplanes, and the Geometry of Classification)

    🎯 Learning Objectives

    By the end of these sections, you will be able to:

    1. Visualize data as vectors in geometric space.
    2. Understand hyperplanes as geometric decision boundaries.
    3. Compute distances, projections, and dot products as geometric tools.
    4. Translate geometric concepts into TypeScript code used in ELM and KELM.
    5. Recognize how ELM decision functions create linear and nonlinear boundaries.
    Machine Learning Angle (MLA): Every model — from simple ELMs to deep transformers — carves shapes in space. Algebra describes them; geometry reveals how they separate and generalize.

    §1 Vectors as Points and Directions

    We've met vectors before as "ordered lists." Now we'll see them as arrows in space.

    1.1 Points and Directions

    • Point: A position in space — like (x,y)(x, y) in 2D or (x,y,z)(x, y, z) in 3D.
    • Vector: An arrow with direction and length.

    ASCII sketch (2D example):

    (0,0) → (3,2)
    
    y ↑
      |
    3 |        *
    2 |      *
    1 |   *
    0 +-----------→ x
        0   1   2   3

    Here, the arrow from (0,0) to (3,2) is the vector v = (3,2).

    1.2 Magnitude (Length)

    v=x2+y2\|\mathbf{v}\| = \sqrt{x^2 + y^2}

    Example: (3,2)=sqrt13approx3.606\\|(3,2)\\| = \\sqrt13 \\approx 3.606

    TypeScript Example

    export const magnitude = (v: number[]): number =>
      Math.sqrt(v.reduce((s, vi) => s + vi * vi, 0));
    
    console.log(magnitude([3, 2])); // 3.606

    1.3 Normalization

    To make a vector length 1:

    v^=vv\hat{\mathbf{v}} = \frac{\mathbf{v}}{\|\mathbf{v}\|}

    TypeScript Example

    export const normalize = (v: number[]): number[] => {
      const len = magnitude(v);
      return v.map((vi) => vi / len);
    };
    
    console.log(normalize([3, 2])); // [0.832, 0.554]
    MLA: Feature normalization (unit vectors) keeps all features on equal footing, stabilizing kernel and distance computations.

    §2 Hyperplanes and Linear Decision Boundaries

    2.1 What Is a Hyperplane?

    A hyperplane is the higher-dimensional cousin of a line.

    DimensionObjectEquation
    1DPointx=ax = a
    2DLinew1x1+w2x2+b=0w_1x_1 + w_2x_2 + b = 0
    3DPlanew1x1+w2x2+w3x3+b=0w_1x_1 + w_2x_2 + w_3x_3 + b = 0
    nDHyperplanemathbfwcdotmathbfx+b=0\\mathbf{w}\\cdot\\mathbf{x} + b = 0

    The normal vector mathbfw\\mathbf{w} is perpendicular to the hyperplane — it defines its orientation.

    ASCII illustration (2D line separating classes):

      Class +1 (Above line)
              o   o
           o
      --------------------  ←  w·x + b = 0
             x   x   x
      Class -1 (Below line)

    2.2 Signed Distance from a Point to a Hyperplane

    For a point mathbfx\\mathbf{x} and hyperplane mathbfwcdotmathbfx+b=0\\mathbf{w}\\cdot\\mathbf{x} + b = 0:

    d=wx+bwd = \frac{\mathbf{w}\cdot\mathbf{x} + b}{\|\mathbf{w}\|}
    • d>0d > 0 → point is on the "positive" side
    • d<0d < 0 → point is on the "negative" side

    TypeScript Example

    export const signedDistance = (w: number[], b: number, x: number[]): number => {
      const dotProd = w.reduce((s, wi, i) => s + wi * x[i], 0);
      const wMag = Math.sqrt(w.reduce((s, wi) => s + wi * wi, 0));
      return (dotProd + b) / wMag;
    };
    
    const w = [2, 1], b = -6;
    console.log(signedDistance(w, b, [3, 2])); // 0.894 → positive class
    MLA: This distance is the margin used in SVMs and ELMs to measure confidence. Larger margin = more confident classification.

    2.3 Classifying with Hyperplanes

    Decision rule:

    class(x)=sign(wx+b)\text{class}(\mathbf{x}) = \text{sign}(\mathbf{w}\cdot\mathbf{x} + b)

    TypeScript Example

    export const classify = (w: number[], b: number, x: number[]): number => {
      const score = w.reduce((s, wi, i) => s + wi * x[i], 0) + b;
      return Math.sign(score); // +1 or -1
    };
    
    console.log(classify([2, 1], -6, [3, 2])); // +1
    MLA: ELM's output layer is just this: linear hyperplanes stacking the decisions from hidden neurons.

    §3 Dot Products, Projections & Angles

    3.1 The Dot Product

    The dot product measures how much two vectors "align."

    uv=u1v1+u2v2++unvn\mathbf{u}\cdot\mathbf{v} = u_1v_1 + u_2v_2 + \cdots + u_nv_n

    TypeScript Example

    export const dotProduct = (u: number[], v: number[]): number =>
      u.reduce((s, ui, i) => s + ui * v[i], 0);
    
    console.log(dotProduct([1, 2, 3], [4, 5, 6])); // 32
    MLA: The dot product is at the core of neural networks — every weighted sum mathbfwcdotmathbfx\\mathbf{w}\\cdot\\mathbf{x} is a dot product.

    3.2 Vector Projection

    Projection of vector mathbfa\\mathbf{a} onto mathbfb\\mathbf{b}:

    projb(a)=abb2b\text{proj}_{\mathbf{b}}(\mathbf{a}) = \frac{\mathbf{a}\cdot\mathbf{b}}{\|\mathbf{b}\|^2}\mathbf{b}

    TypeScript Example

    export const project = (a: number[], b: number[]): number[] => {
      const dotProd = a.reduce((s, ai, i) => s + ai * b[i], 0);
      const bNormSq = b.reduce((s, bi) => s + bi * bi, 0);
      return b.map((bi) => (dotProd / bNormSq) * bi);
    };
    
    console.log(project([2, 3], [1, 0])); // projection onto x-axis => [2,0]

    ASCII illustration:

    y ↑
    |        • a(2,3)
    |       /
    |      /   → projection (shadow of a on b)
    |-----•-----------→ b-axis (1,0)

    3.3 Angle Between Vectors

    The angle theta\\theta between vectors:

    cos(θ)=uvuv\cos(\theta) = \frac{\mathbf{u}\cdot\mathbf{v}}{\|\mathbf{u}\|\|\mathbf{v}\|}

    TypeScript Example

    export const angleBetween = (u: number[], v: number[]): number => {
      const dotProd = u.reduce((s, ui, i) => s + ui * v[i], 0);
      const magU = Math.sqrt(u.reduce((s, ui) => s + ui * ui, 0));
      const magV = Math.sqrt(v.reduce((s, vi) => s + vi * vi, 0));
      return Math.acos(dotProd / (magU * magV)); // radians
    };
    
    console.log(angleBetween([1, 0], [0, 1])); // ≈ 1.5708 radians (90°)
    MLA: The angle between feature vectors corresponds to correlation. Orthogonal features (90° apart) are statistically independent — a holy grail in model design.

    End of Sections 1–3

    Next: Feature Space, Kernels, and Visualizing ELM Decision Boundaries.

    Chapter 3 — The Geometry of Learning

    Sections 4–7 (Feature Space, Kernels, Visualizing ELM Boundaries, Practice Problems & Summary)

    🎯 Learning Objectives

    By the end of this part, you'll be able to:

    1. Explain what "feature space" means and how kernels warp it.
    2. Understand geometric intuition behind kernel-based ELMs.
    3. Visualize linear and nonlinear decision boundaries.
    4. Practice computing hyperplanes and distances for classification tasks.
    5. Connect geometry, algebra, and ELM's architecture into one mental model.

    §4 Feature Space and Kernels — Bending the World

    4.1 What Is Feature Space?

    Each input vector mathbfx=(x1,x2,ldots,xd)\\mathbf{x} = (x_1, x_2, \\ldots, x_d) is a point in feature space. When we add features (like polynomial or trigonometric ones), we move to higher dimensions.

    Example:

    Original input: x=[a,b]x = [a, b]

    Augmented feature space: [a,b,a2,ab,b2][a, b, a^2, ab, b^2]

    ASCII metaphor:

    Feature space (2D)
       x2 ↑
          o  o
       o
    ---+----------→ x1
       o
          o

    Now imagine lifting it to 3D — each point's height (x3x_3) could represent a2+b2a^2+b^2. That's feature mapping.

    MLA: In ELM/KELM, hidden-layer activations (tanh, sigmoid, etc.) implicitly map data into a new curved feature space where classes are linearly separable.

    4.2 Kernels — Computing in the Warped Space

    A kernel function computes similarity without explicitly transforming the data.

    K(x,z)=ϕ(x)ϕ(z)K(x, z) = \phi(x) \cdot \phi(z)

    Common kernels:

    TypeFormulaGeometry
    LinearK(x,z)=xcdotzK(x,z)=x\\cdot zFlat space
    Polynomial(xcdotz+c)d(x\\cdot z + c)^dCurved surfaces
    RBF (Gaussian)exp(gammaxz2)\\exp(-\\gamma\\|x-z\\|^2)Infinite-dimensional smoothness

    TypeScript Example:

    export const rbfKernel = (x: number[], z: number[], gamma: number): number => {
      const distSq = x.reduce((s, xi, i) => s + Math.pow(xi - z[i], 2), 0);
      return Math.exp(-gamma * distSq);
    };
    
    console.log(rbfKernel([1, 2], [1.5, 2.5], 0.5)); // 0.882

    ASCII sketch:

    Flat space (linear):      Curved space (kernel):
      x x x                        x
       o                            x
      x x x                        x
    MLA: Kernels let ELMs separate complex shapes (like circles or spirals) by treating curved surfaces as "flat" in a hidden dimension.

    §5 Visualizing ELM Decision Boundaries

    ELMs learn decision surfaces that look like folded sheets in high-dimensional space.

    5.1 Linear Boundary Example (2D)

    f(x1,x2)=w1x1+w2x2+bf(x_1,x_2)=w_1x_1+w_2x_2+b

    When f(x)=0f(x)=0, you're on the boundary.

    ASCII Visualization:

      +1 side (positive class)
         o o o
      ----------   ←  f(x)=0
         x x x
      -1 side (negative class)

    5.2 Nonlinear Boundary with RBF Kernel

    With kernel mapping, the surface bends:

     Curved separation (2D)
     
       +++++
       +   +   o o o
       +   +  o
       +++++------------
           o    o o

    The boundary now wraps around data points, forming islands of classification.

    5.3 ELM Decision Function

    f(x)=i=1NβiK(x,xi)+bf(x) = \sum_{i=1}^{N} \beta_i K(x, x_i) + b
    • K(x,xi)K(x, x_i) measures similarity to each training point.
    • betai\\beta_i are learned output weights.
    • The decision boundary is where f(x)=0f(x)=0.

    TypeScript Example (Mini KELM Predictor):

    export function kelmPredict(Xtrain: number[][], beta: number[], b: number, x: number[], gamma: number): number {
      const f = Xtrain.reduce((sum, xi, i) => sum + beta[i] * rbfKernel(x, xi, gamma), 0) + b;
      return Math.sign(f);
    }
    MLA: ELMs learn not just "weights" but geometry — reshaping data space to make categories separable.

    §6 Practice Problems — Geometry in Action

    1. Compute the distance from point x=(2,3)x=(2,3) to line 2x1+x26=02x_1+x_2-6=0.
    2. Normalize vector v=(3,4)v=(3,4).
    3. Compute textprojb(a)\\text{proj}_b(a) for a=(2,3)a=(2,3), b=(1,0)b=(1,0).
    4. Find angle between u=(1,0)u=(1,0) and v=(1,1)v=(1,1).
    5. For f(x)=2x1+x26f(x)=2x_1+x_2-6, find f(3,2)f(3,2) and predict the class.
    6. Given w=(2,1),b=6w=(2,1), b=-6, what is the margin for x=(3,2)x=(3,2)?
    7. Compute RBF kernel between x=[1,2]x=[1,2] and z=[2,3]z=[2,3] with gamma=0.5\\gamma=0.5.
    8. Describe geometrically what gamma\\gamma controls in the RBF kernel.
    9. Plot (conceptually) the boundary x12+x22=9x_1^2+x_2^2=9. What shape is it?
    10. In your own words, explain how kernels make nonlinear separation possible.

    ✅ Answer Key (Sketch Solutions)

    1. d=2(2)+365=0.447d=\frac{2(2)+3-6}{\sqrt{5}}=0.447
    2. v^=(0.6,0.8)\hat{v}=(0.6,0.8)
    3. projb(a)=(2,0)\text{proj}_b(a)=(2,0)
    4. θ=45\theta=45^\circ
    5. f(3,2)=23+26=2class=+1f(3,2)=2*3+2-6=2\Rightarrow \text{class}=+1
    6. d=23+1265=0.894d=\frac{2*3+1*2-6}{\sqrt{5}}=0.894
    7. K=exp(0.5((12)2+(23)2))=exp(1)=0.3679K=\exp(-0.5((1-2)^2+(2-3)^2))=\exp(-1)=0.3679
    8. γ\gamma controls "tightness" of curvature — higher γ\gamma = tighter decision zones.
    9. Circle with radius 3 centered at origin.
    10. Kernels lift flat input space into curved feature space where linear cuts become curved in original space.

    §7 Glossary & Chapter Summary

    📘 Glossary

    TermDefinition
    Feature SpaceThe abstract space where each dimension represents a feature.
    HyperplaneA flat boundary separating two regions in feature space.
    MarginDistance from a point to the decision boundary.
    KernelFunction that computes inner products in transformed feature space.
    RBF KernelExponential similarity function producing smooth curved boundaries.
    ProjectionShadow of one vector on another.
    Angle Between VectorsMeasure of similarity (cosine of angle = correlation).
    Decision BoundaryGeometric surface dividing predicted classes.
    Nonlinear MappingTransforming inputs so that they become linearly separable.
    Feature NormalizationScaling feature vectors to unit length.

    🧠 Chapter Summary

    • Geometry is the soul of learning — every model is a shape transformer.
    • Linear boundaries (hyperplanes) are easy; kernels bend them into curves.
    • Distances, projections, and angles govern how models compare data.
    • ELM/KELM visualize as hyperplanes in hidden space — smooth, separable surfaces built from algebraic magic.
    • Once you understand geometry, model behavior becomes visible math, not black boxes.

    You now see what the machine sees — data as geometry.

    🎓 Conclusion

    Congratulations! You've completed the core mathematical foundations needed for machine learning. These concepts form the backbone of algorithms from linear regression to deep neural networks.

    Continue exploring by implementing these concepts in your own projects, and see how they power the learning machines you build with AsterMind's ELM technology.

    Published: November 30, 2025
    Last updated: December 10, 2025