We use cookies to enhance your browsing experience, analyze site traffic, and personalize content. By clicking "Accept All", you consent to our use of cookies. Learn more
A textbook refresher for math used in AI and machine learning
By the end of these sections, you will be able to:
Machine Learning Angle (MLA): Algebra is the grammar of ML. A linear model is just an algebraic sentence:
Numbers you'll meet often: integers (), rationals (), reals (). Computers approximate reals with floating points (IEEE 754).
Order of operations (PEMDAS): Parentheses → Exponents → Multiplication/Division → Addition/Subtraction.
Example:
| Property | Formula | Example |
|---|---|---|
| Distributive | ||
| Commutative | ||
| Associative |
TypeScript Example
export const pemdasExample = (): number => 3 + 2 * Math.pow(5 - 1, 2); // 35MLA: Loss functions and kernel computations rely on strict order of operations. Misplacing parentheses can completely change training results.
Simplify example:
TypeScript Example
// evaluate a*x + b for given x
export const linearExpr = (a: number, b: number, x: number): number => a * x + b;
// Try it
console.log(linearExpr(3, 2, 4)); // 3*4+2 = 14MLA: In linear regression, coefficients are weights, and constants are bias terms. Combining like terms parallels model simplification.
An equation says two expressions are equal. The goal: isolate the variable.
TypeScript Helper
export const solveLinear = (a: number, b: number): number => {
// Solve a*x + b = 0
if (a === 0) throw new Error("No unique solution");
return -b / a;
};
// Usage
console.log(solveLinear(3, 5)); // -1.666...MLA: Solving for x mirrors finding model parameters analytically (closed-form ridge regression).
Inequalities describe ranges of possible values instead of single solutions. They use the symbols: , , , .
When you multiply or divide an inequality by a negative number, flip the sign!
Example:
means "x is between -1 and 2, including -1 but not 2."
Interval notation:
Visual Intuition: Think of inequalities as "zones of truth" on a number line.
TypeScript Example
export function inInterval(x: number, a: number, b: number): boolean {
return a <= x && x < b;
}
// Example
console.log(inInterval(1.5, -1, 2)); // true
console.log(inInterval(2.5, -1, 2)); // falseMLA: Regularization often introduces inequality-like constraints. For example, "keep weights small" () ensures stability and prevents overfitting.
A function is a rule that takes an input and produces an output.
If , then depends on .
Example: – domain excludes .
Composition stacks functions:
Inverse functions undo each other:
| Function | Formula | Key Idea |
|---|---|---|
| Linear | Straight line | |
| Quadratic | Parabola | |
| Exponential | Rapid growth | |
| Logarithmic | Inverse of exponential |
Function Composition Example
export const compose = <T, U, V>(f: (u: U) => V, g: (t: T) => U) => (x: T): V => f(g(x));
export const f = (x: number) => 2 * x + 1;
export const g = (x: number) => x * x;
console.log(compose(f, g)(3)); // f(g(3)) = 2*(3^2)+1 = 19TypeScript Check
export const fInv = (x: number): number => (x + 9) / 4;MLA: In ML, "composition" corresponds to layer stacking, and "inverse" appears in autoencoders or when reconstructing features.
Polynomials are multi-term expressions built from powers of x:
The degree is the highest exponent (e.g., quadratic = degree 2). Roots (or zeros) make .
| Identity | Formula |
|---|---|
| Difference of squares | |
| Perfect square | |
| Sum of cubes |
For efficiency, evaluate .
Horner's Method in TypeScript
export const evalPoly = (coeffs: number[], x: number): number => {
// coeffs = [a_n, a_{n-1}, ..., a_1, a_0]
let result = 0;
for (const coeff of coeffs) {
result = result * x + coeff;
}
return result;
};
// Example: p(x) = 2x^2 + 3x + 1, evaluate at x=4
console.log(evalPoly([2, 3, 1], 4)); // 2*16 + 3*4 + 1 = 45MLA: Polynomial basis functions expand feature spaces (kernel tricks, polynomial regression).
For quadratics, factor if possible; otherwise use:
TypeScript Example
export function solveQuadratic(a: number, b: number, c: number): [number, number] | string {
const disc = b * b - 4 * a * c;
if (disc < 0) return "No real roots";
const sqrtDisc = Math.sqrt(disc);
const x1 = (-b + sqrtDisc) / (2 * a);
const x2 = (-b - sqrtDisc) / (2 * a);
return [x1, x2];
}
console.log(solveQuadratic(1, -5, 6)); // [3, 2]MLA: Polynomial intuition helps when understanding kernel expansions, Taylor approximations, and model stability near critical points.
Exponents represent repeated multiplication.
If then
where
TypeScript Implementation
export const power = (a: number, n: number): number => Math.pow(a, n);
export const logBase = (x: number, a: number): number => Math.log(x) / Math.log(a);
// Examples
console.log(power(2, 3)); // 8
console.log(logBase(8, 2)); // 3MLA: Exponentials model growth/decay (learning rates, activations). Logarithms appear in log-loss and entropy calculations.
A line can be written as:
Where:
Given points and :
TypeScript Example
export function lineThrough(p1: [number, number], p2: [number, number]) {
const [x1, y1] = p1, [x2, y2] = p2;
if (x1 === x2) return { vertical: true, x: x1 };
const m = (y2 - y1) / (x2 - x1);
const b = y1 - m * x1;
return { m, b, vertical: false };
}
console.log(lineThrough([2, 3], [6, 11])); // { m: 2, b: -1 }MLA: In high dimensions, linear equations generalize to hyperplanes, which separate data classes in linear models.
A system is a group of equations with shared variables.
Example:
We can solve by substitution, elimination, or matrix methods.
Substitute into
For:
Determinant:
Solutions:
TypeScript Implementation
export function solve2x2(a: number, b: number, c: number, d: number, e: number, f: number) {
const det = a * d - b * c;
if (Math.abs(det) < 1e-12) throw new Error("No unique solution");
const x = (e * d - b * f) / det;
const y = (a * f - e * c) / det;
return { x, y };
}
console.log(solve2x2(2, 1, 1, -1, 7, 1)); // { x: 8/3, y: 5/3 }MLA: Solving systems is how we train linear models. Ridge regression simply adds a term to stabilize this solve.
A vector is an ordered list of numbers. In 2D: .
Addition:
Scaling:
Between points and :
TypeScript Example
export const l2 = (v: number[]) => Math.sqrt(v.reduce((s, vi) => s + vi * vi, 0));
export const dist = (a: number[], b: number[]) => l2(a.map((ai, i) => ai - b[i]));
console.log(dist([3, -1], [-1, 2])); // 5MLA: RBF kernels depend on vector distances: .
Let's connect all your algebra knowledge to the math inside AsterMind's ELM and KELM engines.
The ELM's core equation looks very familiar:
Here:
To find , ELM doesn't "train" with epochs — it solves algebraically:
That's just algebra on steroids — using matrix inverses (Chapter 2) and ridge regression (Chapter 4).
In Kernel ELM, we swap with a kernel matrix that measures similarities between data points:
So when you learned to solve for in , you were already learning how KELM computes its weights!
Key idea: Algebra isn't old math — it's the foundation of fast, closed-form learning.
Online learning adds new data and updates recursively:
Still algebra: we're updating a parameter by adding a correction term.
Every adaptive system in ARS — from schema updates to Starfish activations — is built on these algebraic building blocks.
By the end of this chapter, you will be able to:
A vector is just an ordered list of numbers — think of it as a spreadsheet column that describes a single thing (like a form entry or feature row).
Addition:
Scalar multiplication:
Dot product:
Length (Euclidean norm):
export const addVec = (a: number[], b: number[]): number[] =>
a.map((ai, i) => ai + b[i]);
export const scaleVec = (a: number[], c: number): number[] =>
a.map((ai) => ai * c);
export const dot = (a: number[], b: number[]): number =>
a.reduce((sum, ai, i) => sum + ai * b[i], 0);
export const norm = (a: number[]): number =>
Math.sqrt(dot(a, a));
// Example
console.log(addVec([1, 2, 3], [4, 5, 6])); // [5,7,9]
console.log(dot([1, 2], [3, 4])); // 11
console.log(norm([3, 4])); // 5MLA: Vectors are the atoms of ML — feature vectors, weight vectors, gradient vectors, all the same idea.
A matrix is a 2D array of numbers arranged in rows and columns:
Dimensions: (m rows, n columns).
Addition: Element-wise, same dimensions required
Scalar Multiplication:
Transpose: Flip rows and columns
Matrix Operations
type Matrix = number[][];
export const matrixAdd = (A: Matrix, B: Matrix): Matrix =>
A.map((row, i) => row.map((val, j) => val + B[i][j]));
export const matrixScale = (c: number, A: Matrix): Matrix =>
A.map(row => row.map(val => c * val));
export const transpose = (A: Matrix): Matrix => {
const m = A.length, n = A[0].length;
return Array.from({ length: n }, (_, j) =>
Array.from({ length: m }, (_, i) => A[i][j])
);
};
// Example
const A = [[1, 2], [3, 4]];
console.log(transpose(A)); // [[1, 3], [2, 4]]MLA: Data matrices store samples as rows, features as columns. Weight matrices transform input spaces to output spaces.
Matrix multiplication is not element-wise. For , we compute:
Requirement: Number of columns in A must equal number of rows in B.
Matrix Multiplication
export const matrixMultiply = (A: Matrix, B: Matrix): Matrix => {
const m = A.length, n = B[0].length, p = B.length;
return Array.from({ length: m }, (_, i) =>
Array.from({ length: n }, (_, j) =>
A[i].reduce((sum, aik, k) => sum + aik * B[k][j], 0)
)
);
};
const A = [[1, 2], [3, 4]];
const B = [[5, 6], [7, 8]];
console.log(matrixMultiply(A, B)); // [[19, 22], [43, 50]]MLA: Matrix multiplication represents layer transformations in neural networks. Forward pass: .
The determinant is a scalar value computed from a square matrix. It indicates if the matrix is invertible.
Computing 2×2 Determinant
export const det2x2 = (a: number, b: number, c: number, d: number): number =>
a * d - b * c;
console.log(det2x2(3, 8, 4, 6)); // 3*6 - 8*4 = -14MLA: Determinants measure volume scaling. In ML, they appear in covariance matrices and Jacobians for change-of-variables in probability.
The inverse of matrix A, denoted , satisfies:
Only square matrices with have inverses.
2×2 Matrix Inverse
export const inverse2x2 = (
a: number, b: number,
c: number, d: number
): number[][] | null => {
const det = a * d - b * c;
if (det === 0) return null;
return [
[d / det, -b / det],
[-c / det, a / det]
];
};
const inv = inverse2x2(4, 7, 2, 6);
console.log(inv); // [[0.6, -0.7], [-0.2, 0.4]]MLA: Matrix inversion solves normal equations in linear regression: .
An eigenvector of matrix A is a vector that only gets scaled (not rotated) when multiplied by A:
where is the eigenvalue (scaling factor).
Solve the characteristic equation:
For :
Solving:
MLA: Principal Component Analysis (PCA) finds eigenvectors of the covariance matrix to identify directions of maximum variance.
Vectors and are orthogonal if:
A set of vectors is orthonormal if they are mutually orthogonal and each has unit length.
For non-square or singular matrices, the pseudoinverse generalizes the inverse:
(for full column rank matrices)
MLA: The pseudoinverse is central to ELM: . It provides least-squares solutions even when exact solutions don't exist.
By the end of these sections, you will be able to:
Machine Learning Angle (MLA): Every model — from simple ELMs to deep transformers — carves shapes in space. Algebra describes them; geometry reveals how they separate and generalize.
We've met vectors before as "ordered lists." Now we'll see them as arrows in space.
ASCII sketch (2D example):
(0,0) → (3,2)
y ↑
|
3 | *
2 | *
1 | *
0 +-----------→ x
0 1 2 3Here, the arrow from (0,0) to (3,2) is the vector v = (3,2).
Example:
TypeScript Example
export const magnitude = (v: number[]): number =>
Math.sqrt(v.reduce((s, vi) => s + vi * vi, 0));
console.log(magnitude([3, 2])); // 3.606To make a vector length 1:
TypeScript Example
export const normalize = (v: number[]): number[] => {
const len = magnitude(v);
return v.map((vi) => vi / len);
};
console.log(normalize([3, 2])); // [0.832, 0.554]MLA: Feature normalization (unit vectors) keeps all features on equal footing, stabilizing kernel and distance computations.
A hyperplane is the higher-dimensional cousin of a line.
| Dimension | Object | Equation |
|---|---|---|
| 1D | Point | |
| 2D | Line | |
| 3D | Plane | |
| nD | Hyperplane |
The normal vector is perpendicular to the hyperplane — it defines its orientation.
ASCII illustration (2D line separating classes):
Class +1 (Above line)
o o
o
-------------------- ← w·x + b = 0
x x x
Class -1 (Below line)For a point and hyperplane :
TypeScript Example
export const signedDistance = (w: number[], b: number, x: number[]): number => {
const dotProd = w.reduce((s, wi, i) => s + wi * x[i], 0);
const wMag = Math.sqrt(w.reduce((s, wi) => s + wi * wi, 0));
return (dotProd + b) / wMag;
};
const w = [2, 1], b = -6;
console.log(signedDistance(w, b, [3, 2])); // 0.894 → positive classMLA: This distance is the margin used in SVMs and ELMs to measure confidence. Larger margin = more confident classification.
Decision rule:
TypeScript Example
export const classify = (w: number[], b: number, x: number[]): number => {
const score = w.reduce((s, wi, i) => s + wi * x[i], 0) + b;
return Math.sign(score); // +1 or -1
};
console.log(classify([2, 1], -6, [3, 2])); // +1MLA: ELM's output layer is just this: linear hyperplanes stacking the decisions from hidden neurons.
The dot product measures how much two vectors "align."
TypeScript Example
export const dotProduct = (u: number[], v: number[]): number =>
u.reduce((s, ui, i) => s + ui * v[i], 0);
console.log(dotProduct([1, 2, 3], [4, 5, 6])); // 32MLA: The dot product is at the core of neural networks — every weighted sum is a dot product.
Projection of vector onto :
TypeScript Example
export const project = (a: number[], b: number[]): number[] => {
const dotProd = a.reduce((s, ai, i) => s + ai * b[i], 0);
const bNormSq = b.reduce((s, bi) => s + bi * bi, 0);
return b.map((bi) => (dotProd / bNormSq) * bi);
};
console.log(project([2, 3], [1, 0])); // projection onto x-axis => [2,0]ASCII illustration:
y ↑
| • a(2,3)
| /
| / → projection (shadow of a on b)
|-----•-----------→ b-axis (1,0)The angle between vectors:
TypeScript Example
export const angleBetween = (u: number[], v: number[]): number => {
const dotProd = u.reduce((s, ui, i) => s + ui * v[i], 0);
const magU = Math.sqrt(u.reduce((s, ui) => s + ui * ui, 0));
const magV = Math.sqrt(v.reduce((s, vi) => s + vi * vi, 0));
return Math.acos(dotProd / (magU * magV)); // radians
};
console.log(angleBetween([1, 0], [0, 1])); // ≈ 1.5708 radians (90°)MLA: The angle between feature vectors corresponds to correlation. Orthogonal features (90° apart) are statistically independent — a holy grail in model design.
End of Sections 1–3
Next: Feature Space, Kernels, and Visualizing ELM Decision Boundaries.
By the end of this part, you'll be able to:
Each input vector is a point in feature space. When we add features (like polynomial or trigonometric ones), we move to higher dimensions.
Example:
Original input:
Augmented feature space:
ASCII metaphor:
Feature space (2D)
x2 ↑
o o
o
---+----------→ x1
o
oNow imagine lifting it to 3D — each point's height () could represent . That's feature mapping.
MLA: In ELM/KELM, hidden-layer activations (tanh, sigmoid, etc.) implicitly map data into a new curved feature space where classes are linearly separable.
A kernel function computes similarity without explicitly transforming the data.
Common kernels:
| Type | Formula | Geometry |
|---|---|---|
| Linear | Flat space | |
| Polynomial | Curved surfaces | |
| RBF (Gaussian) | Infinite-dimensional smoothness |
TypeScript Example:
export const rbfKernel = (x: number[], z: number[], gamma: number): number => {
const distSq = x.reduce((s, xi, i) => s + Math.pow(xi - z[i], 2), 0);
return Math.exp(-gamma * distSq);
};
console.log(rbfKernel([1, 2], [1.5, 2.5], 0.5)); // 0.882ASCII sketch:
Flat space (linear): Curved space (kernel):
x x x x
o x
x x x xMLA: Kernels let ELMs separate complex shapes (like circles or spirals) by treating curved surfaces as "flat" in a hidden dimension.
ELMs learn decision surfaces that look like folded sheets in high-dimensional space.
When , you're on the boundary.
ASCII Visualization:
+1 side (positive class)
o o o
---------- ← f(x)=0
x x x
-1 side (negative class)With kernel mapping, the surface bends:
Curved separation (2D)
+++++
+ + o o o
+ + o
+++++------------
o o oThe boundary now wraps around data points, forming islands of classification.
TypeScript Example (Mini KELM Predictor):
export function kelmPredict(Xtrain: number[][], beta: number[], b: number, x: number[], gamma: number): number {
const f = Xtrain.reduce((sum, xi, i) => sum + beta[i] * rbfKernel(x, xi, gamma), 0) + b;
return Math.sign(f);
}MLA: ELMs learn not just "weights" but geometry — reshaping data space to make categories separable.
| Term | Definition |
|---|---|
| Feature Space | The abstract space where each dimension represents a feature. |
| Hyperplane | A flat boundary separating two regions in feature space. |
| Margin | Distance from a point to the decision boundary. |
| Kernel | Function that computes inner products in transformed feature space. |
| RBF Kernel | Exponential similarity function producing smooth curved boundaries. |
| Projection | Shadow of one vector on another. |
| Angle Between Vectors | Measure of similarity (cosine of angle = correlation). |
| Decision Boundary | Geometric surface dividing predicted classes. |
| Nonlinear Mapping | Transforming inputs so that they become linearly separable. |
| Feature Normalization | Scaling feature vectors to unit length. |
You now see what the machine sees — data as geometry.
Congratulations! You've completed the core mathematical foundations needed for machine learning. These concepts form the backbone of algorithms from linear regression to deep neural networks.
Continue exploring by implementing these concepts in your own projects, and see how they power the learning machines you build with AsterMind's ELM technology.