Octonity
All articles
aigptmachine-learninggame-devmathematics

Understanding GPT from a Game Developer's Perspective

Build a mental model of GPT from the ground up using vectors, dot products, attention, and softmax — concepts every game developer already knows.

Mohammed Jawad Alsaedi
Game Developer
19 June 2026
5 min read

When developers first hear about GPT, Transformers, Attention, Embeddings, Neural Networks, and Large Language Models, it can sound incredibly complicated.

However, if you already understand vectors from game development, GPT becomes much easier to understand.

In this article we will build a mental model of GPT from the ground up using concepts that every game developer already knows.

By the end, you'll understand:

  • Vectors
  • Dot products
  • Attention
  • Softmax
  • Contextual embeddings
  • Next-token prediction

without needing a PhD in machine learning.


The Big Picture

At its core, GPT does one thing:

Predict the next token

Example:

Ali likes ___

GPT predicts:

cats

But how?

Very roughly:

Text
 ↓
Tokens
 ↓
Vectors
 ↓
Attention
 ↓
Softmax
 ↓
Next Token Prediction

Let's start with vectors.


Vectors in Game Development

If you've ever worked with Unity, Godot, MonoGame, XNA, Unreal, or any custom engine, you've used vectors.

A simple 2D vector:

Vector2 position = new Vector2(3, 2);

Represents:

x = 3
y = 2

Visually:

        y
        ↑
  2     ● (3,2)
  1
  0 ─────────→ x
        1 2 3

Vectors are commonly used for:

  • Position
  • Velocity
  • Direction
  • Acceleration

Example:

Vector2 direction = new Vector2(1, 0);

Meaning:

Move Right

Length (Magnitude)

Suppose:

v = (3,4)

Length is:

√(x² + y²)

Result:

√(3² + 4²)
=
√25
=
5

C#:

Vector2 v = new Vector2(3, 4);

float length =
    MathF.Sqrt(
        v.X * v.X +
        v.Y * v.Y);

Game developers use this constantly for:

  • Distance checks
  • Collision detection
  • Movement calculations

Dot Product

The dot product tells us how aligned two vectors are.

Formula:

A · B = AxBx + AyBy

Example:

A = (1,0)
B = (1,0)

Result:

1×1 + 0×0 = 1

Meaning:

Same direction

Example:

A = (1,0)
B = (0,1)

Result:

0

Meaning:

Perpendicular

Example:

A = (1,0)
B = (-1,0)

Result:

-1

Meaning:

Opposite directions

Dot Product in Games

A common enemy vision implementation:

Vector2 enemyForward = new(1,0);

Vector2 toPlayer =
    Vector2.Normalize(
        playerPosition -
        enemyPosition);

float dot =
    Vector2.Dot(
        enemyForward,
        toPlayer);

Results:

1.0  → Player directly ahead
0.0  → Player to the side
-1.0 → Player behind

This exact mathematical operation is one of the most important operations inside GPT.


From Position Vectors to Meaning Vectors

In games:

(1,0)

represents:

Direction

In GPT:

(0.6,0.1)

represents:

Meaning

Imagine:

likes = (0.6,0.1)
cats  = (0.7,0.2)
car   = (-0.5,0.9)

Dot product:

likes · cats
=
0.6×0.7 + 0.1×0.2
=
0.44

Dot product:

likes · car
=
0.6×(-0.5) + 0.1×0.9
=
-0.21

Result:

likes ↔ cats = 0.44
likes ↔ car  = -0.21

GPT concludes:

cats is more related to likes

Why Not Use 2 Dimensions?

Words contain a huge amount of information.

Example:

cat

Contains ideas such as:

animal
pet
cute
living
mammal
fur
domestic
small
playful

Two numbers aren't enough.

Real GPT models use vectors like:

[0.21, -0.55, 1.23, ...]

containing:

768 dimensions
1536 dimensions
3072 dimensions

or more.

Think of an RPG character:

float[] playerStats =
{
    health,
    mana,
    strength,
    agility,
    intelligence,
    defense,
    speed,
    luck
};

GPT uses the same idea.

Instead of stats, dimensions represent learned language features.


Attention: The Heart of GPT

Consider the sentence:

Ali likes cats

Suppose GPT is processing:

likes

GPT asks:

Which words should I pay attention to?

Let's use simple vectors:

Ali   = (1,0)
likes = (2,1)
cats  = (2,2)

Compare Against Every Word

likes vs Ali

(2,1)·(1,0)
=
2

likes vs likes

(2,1)·(2,1)
=
5

likes vs cats

(2,1)·(2,2)
=
6

Scores:

Ali   = 2
likes = 5
cats  = 6

These scores indicate relevance.


Softmax

The scores:

[2,5,6]

are not probabilities.

GPT converts them into probabilities using Softmax.

Formula:

Softmax(xᵢ)
=
eˣⁱ / Σ(eˣ)

Result:

Ali   = 1.3%
likes = 26.5%
cats  = 72.2%

Meaning:

When processing "likes",
GPT should mostly focus on "cats".

Try It Yourself

Move the sliders below.

Observe how increasing a score increases its probability.

Softmax — interactive demo

Move the sliders and watch GPT turn raw scores into probabilities.

Ali1.32%
Score: 2.0
likes26.54%
Score: 5.0
cats72.14%
Score: 6.0

Softmax in JavaScript

The interactive demo uses JavaScript. Hit Run to see it turn the scores [2, 5, 6] into probabilities:

JavaScript
function softmax(values) {
  const max = Math.max(...values);
  const exp = values.map(v => Math.exp(v - max));
  const sum = exp.reduce((a, b) => a + b, 0);
  return exp.map(v => v / sum);
}

const probs = softmax([2, 5, 6]);
const labels = ["Ali", "likes", "cats"];

probs.forEach((p, i) => {
  console.log(labels[i].padEnd(6), (p * 100).toFixed(2) + "%");
});

Same Logic in C#

static double[] Softmax(double[] scores)
{
    double max = scores.Max();

    double[] expValues =
        scores
            .Select(
                score =>
                    Math.Exp(score - max))
            .ToArray();

    double sum = expValues.Sum();

    return expValues
        .Select(
            value =>
                value / sum)
        .ToArray();
}

Example:

double[] scores = { 2, 5, 6 };

double[] probabilities =
    Softmax(scores);

Output:

1.33%
26.52%
72.15%

Congratulations.

You just implemented one of the most important mathematical operations used inside GPT.


Weighted Average

Now GPT combines information.

Weights:

Ali   = 0.013
likes = 0.265
cats  = 0.722

New vector:

0.013×Ali
+
0.265×likes
+
0.722×cats

This creates a new representation.

The word:

likes

now contains information from:

Ali
likes
cats

This is called a:

Contextual Embedding

What GPT Actually Does

A simplified GPT layer:

Words
 ↓
Embeddings
 ↓
Dot Products
 ↓
Softmax
 ↓
Attention Weights
 ↓
Weighted Average
 ↓
Contextual Embeddings

This process repeats many times.

Each layer improves the model's understanding.


The Final Prediction

Eventually GPT generates scores for possible next tokens.

Example:

dog = 2.1
cat = 5.8
car = 0.4

Softmax converts them:

dog = 2%
cat = 97%
car = 1%

GPT chooses:

cat

Result:

Ali likes cat

(or more realistically, "cats")


The Missing Pieces

This article intentionally simplified several concepts.

Real GPT also contains:

  • Query vectors (Q)
  • Key vectors (K)
  • Value vectors (V)
  • Multi-head attention
  • Transformer blocks
  • Feed-forward neural networks
  • Residual connections
  • Layer normalization
  • Backpropagation
  • Training over trillions of tokens

However, all of these build on the concepts you've already learned:

Vectors
 ↓
Dot Products
 ↓
Weighted Averages
 ↓
Probabilities

Key Takeaway

For a game developer, GPT is not magic.

It is mostly:

Vector Mathematics
+
Probability
+
A Lot of Training

The same mathematics used for:

  • Movement
  • Steering
  • Enemy vision
  • Collision systems

is also used to power modern AI.

The only difference is what the vectors represent.

Game Development:
Vector = Position or Direction

GPT:
Vector = Meaning

Once you understand vectors, dot products, and softmax, the foundation of GPT becomes surprisingly approachable.

Mohammed Jawad Alsaedi
Game Developer at Octonity