الإشراف على التعليقات العربية: لماذا تهم اللهجة
العربية ليست لغة واحدة بل لهجات متعددة. نشرح لماذا يحتاج الإشراف الذكي إلى فهم كل لهجة على حدة بدلاً من الترجمة المسبقة.
Build a mental model of GPT from the ground up using vectors, dot products, attention, and softmax — concepts every game developer already knows.
When developers first hear about GPT, Transformers, Attention, Embeddings, Neural Networks, and Large Language Models, it can sound incredibly complicated.
However, if you already understand vectors from game development, GPT becomes much easier to understand.
In this article we will build a mental model of GPT from the ground up using concepts that every game developer already knows.
By the end, you'll understand:
without needing a PhD in machine learning.
At its core, GPT does one thing:
Predict the next token
Example:
Ali likes ___
GPT predicts:
cats
But how?
Very roughly:
Text
↓
Tokens
↓
Vectors
↓
Attention
↓
Softmax
↓
Next Token Prediction
Let's start with vectors.
If you've ever worked with Unity, Godot, MonoGame, XNA, Unreal, or any custom engine, you've used vectors.
A simple 2D vector:
Vector2 position = new Vector2(3, 2);
Represents:
x = 3
y = 2
Visually:
y
↑
2 ● (3,2)
1
0 ─────────→ x
1 2 3
Vectors are commonly used for:
Example:
Vector2 direction = new Vector2(1, 0);
Meaning:
Move Right
Suppose:
v = (3,4)
Length is:
√(x² + y²)
Result:
√(3² + 4²)
=
√25
=
5
C#:
Vector2 v = new Vector2(3, 4);
float length =
MathF.Sqrt(
v.X * v.X +
v.Y * v.Y);
Game developers use this constantly for:
The dot product tells us how aligned two vectors are.
Formula:
A · B = AxBx + AyBy
Example:
A = (1,0)
B = (1,0)
Result:
1×1 + 0×0 = 1
Meaning:
Same direction
Example:
A = (1,0)
B = (0,1)
Result:
0
Meaning:
Perpendicular
Example:
A = (1,0)
B = (-1,0)
Result:
-1
Meaning:
Opposite directions
A common enemy vision implementation:
Vector2 enemyForward = new(1,0);
Vector2 toPlayer =
Vector2.Normalize(
playerPosition -
enemyPosition);
float dot =
Vector2.Dot(
enemyForward,
toPlayer);
Results:
1.0 → Player directly ahead
0.0 → Player to the side
-1.0 → Player behind
This exact mathematical operation is one of the most important operations inside GPT.
In games:
(1,0)
represents:
Direction
In GPT:
(0.6,0.1)
represents:
Meaning
Imagine:
likes = (0.6,0.1)
cats = (0.7,0.2)
car = (-0.5,0.9)
Dot product:
likes · cats
=
0.6×0.7 + 0.1×0.2
=
0.44
Dot product:
likes · car
=
0.6×(-0.5) + 0.1×0.9
=
-0.21
Result:
likes ↔ cats = 0.44
likes ↔ car = -0.21
GPT concludes:
cats is more related to likes
Words contain a huge amount of information.
Example:
cat
Contains ideas such as:
animal
pet
cute
living
mammal
fur
domestic
small
playful
Two numbers aren't enough.
Real GPT models use vectors like:
[0.21, -0.55, 1.23, ...]
containing:
768 dimensions
1536 dimensions
3072 dimensions
or more.
Think of an RPG character:
float[] playerStats =
{
health,
mana,
strength,
agility,
intelligence,
defense,
speed,
luck
};
GPT uses the same idea.
Instead of stats, dimensions represent learned language features.
Consider the sentence:
Ali likes cats
Suppose GPT is processing:
likes
GPT asks:
Which words should I pay attention to?
Let's use simple vectors:
Ali = (1,0)
likes = (2,1)
cats = (2,2)
(2,1)·(1,0)
=
2
(2,1)·(2,1)
=
5
(2,1)·(2,2)
=
6
Scores:
Ali = 2
likes = 5
cats = 6
These scores indicate relevance.
The scores:
[2,5,6]
are not probabilities.
GPT converts them into probabilities using Softmax.
Formula:
Softmax(xᵢ)
=
eˣⁱ / Σ(eˣ)
Result:
Ali = 1.3%
likes = 26.5%
cats = 72.2%
Meaning:
When processing "likes",
GPT should mostly focus on "cats".
Move the sliders below.
Observe how increasing a score increases its probability.
Move the sliders and watch GPT turn raw scores into probabilities.
The interactive demo uses JavaScript. Hit Run to see it turn the scores
[2, 5, 6] into probabilities:
function softmax(values) {
const max = Math.max(...values);
const exp = values.map(v => Math.exp(v - max));
const sum = exp.reduce((a, b) => a + b, 0);
return exp.map(v => v / sum);
}
const probs = softmax([2, 5, 6]);
const labels = ["Ali", "likes", "cats"];
probs.forEach((p, i) => {
console.log(labels[i].padEnd(6), (p * 100).toFixed(2) + "%");
});static double[] Softmax(double[] scores)
{
double max = scores.Max();
double[] expValues =
scores
.Select(
score =>
Math.Exp(score - max))
.ToArray();
double sum = expValues.Sum();
return expValues
.Select(
value =>
value / sum)
.ToArray();
}
Example:
double[] scores = { 2, 5, 6 };
double[] probabilities =
Softmax(scores);
Output:
1.33%
26.52%
72.15%
Congratulations.
You just implemented one of the most important mathematical operations used inside GPT.
Now GPT combines information.
Weights:
Ali = 0.013
likes = 0.265
cats = 0.722
New vector:
0.013×Ali
+
0.265×likes
+
0.722×cats
This creates a new representation.
The word:
likes
now contains information from:
Ali
likes
cats
This is called a:
Contextual Embedding
A simplified GPT layer:
Words
↓
Embeddings
↓
Dot Products
↓
Softmax
↓
Attention Weights
↓
Weighted Average
↓
Contextual Embeddings
This process repeats many times.
Each layer improves the model's understanding.
Eventually GPT generates scores for possible next tokens.
Example:
dog = 2.1
cat = 5.8
car = 0.4
Softmax converts them:
dog = 2%
cat = 97%
car = 1%
GPT chooses:
cat
Result:
Ali likes cat
(or more realistically, "cats")
This article intentionally simplified several concepts.
Real GPT also contains:
However, all of these build on the concepts you've already learned:
Vectors
↓
Dot Products
↓
Weighted Averages
↓
Probabilities
For a game developer, GPT is not magic.
It is mostly:
Vector Mathematics
+
Probability
+
A Lot of Training
The same mathematics used for:
is also used to power modern AI.
The only difference is what the vectors represent.
Game Development:
Vector = Position or Direction
GPT:
Vector = Meaning
Once you understand vectors, dot products, and softmax, the foundation of GPT becomes surprisingly approachable.
العربية ليست لغة واحدة بل لهجات متعددة. نشرح لماذا يحتاج الإشراف الذكي إلى فهم كل لهجة على حدة بدلاً من الترجمة المسبقة.
Translating a comment to English before moderating it loses the slang, the dialect, and the intent. Native-language moderation is the only thing that holds up at scale.
Cross-posting isn't copy-paste. We walk through how a single composer adapts one idea into six native posts without flattening it.