الإشراف على التعليقات العربية: لماذا تهم اللهجة
العربية ليست لغة واحدة بل لهجات متعددة. نشرح لماذا يحتاج الإشراف الذكي إلى فهم كل لهجة على حدة بدلاً من الترجمة المسبقة.
Most tools bolt on a translation layer and call it done. Here's why moderation has to think in the language the comment was written in.
Open almost any social tool's moderation settings and you'll find the same quiet assumption: that a comment can be understood by first turning it into English. Run it through a translation API, score the English, act on the score. It's tidy, it's cheap, and it falls apart the moment someone writes the way people actually write.
Moderation isn't sentiment analysis. The thing you're trying to catch — a threat, a slur, coordinated spam — lives in the exact words, the register, the dialect. A machine translation smooths all of that into bland, plausible English and the sharp edges that mattered are gone.
Consider Arabic. A phrase that's harmless in Modern Standard Arabic can be a targeted insult in Egyptian or Gulf dialect. Translate first and you get a clean English sentence that scores as "neutral" — while the original was the whole reason you wanted moderation in the first place.
Translate-then-moderate systematically under-flags exactly the content that's hardest to catch: dialect, slang, and code-switching. The cleaner the translation reads, the more confident — and wrong — the score.
The alternative is to never leave the source language. Octonity scores each comment natively across 30+ languages, so dialect and intent survive all the way to the decision. That means:
Here's the difference in miniature. A naive filter matches a banned word literally; a language-aware one normalises first. Hit Run to see why the literal match misses the obfuscated version:
const banned = "spam";
// Normalise the way a language-aware pass would: strip the tricks people use
// to slip past a literal match.
const normalise = (s) =>
s.toLowerCase().replace(/[0-9]/g, (d) => ({ "0": "o", "1": "i", "3": "e" }[d] ?? d))
.replace(/[^a-z]/g, "");
const comments = ["buy SPAM now", "s p a m", "sp4m!!", "great post"];
for (const c of comments) {
const literal = c.toLowerCase().includes(banned);
const aware = normalise(c).includes(banned);
console.log(c.padEnd(14), "literal:", literal, " aware:", aware);
}A creator with an audience across Cairo, Riyadh, and Dubai doesn't get one blunt filter. They get moderation that understands each community's register — hiding the abuse, surfacing the genuine questions, and never silently dropping a real comment because a translation layer flattened it.
That's the bar. Anything that starts by translating away the evidence is solving an easier problem than the one you have.
العربية ليست لغة واحدة بل لهجات متعددة. نشرح لماذا يحتاج الإشراف الذكي إلى فهم كل لهجة على حدة بدلاً من الترجمة المسبقة.
Learn how GPT works using vectors, dot products, attention, and softmax, starting from concepts familiar to game developers — no PhD required.
Cross-posting isn't copy-paste. We walk through how a single composer adapts one idea into six native posts without flattening it.