I - AI Concepts

AI Concepts - Grokipaedia

🔤

Tokens

The atoms of AI language

AI models don't read words—they read tokens. A token is a chunk of text that could be a word, part of a word, or even a single character. "Understanding" might be one token, while "artificial" could be split into "art" and "ificial".

Why it matters:

Tokens determine cost (APIs charge per token), context limits (models have maximum token windows), and processing speed.

Example:

"Hello world" = 2 tokens | "Artificial intelligence" = 3-4 tokens depending on the model

Fundamental Pricing Context Windows

🔄

Transformers

The architecture that changed everything

Before transformers, AI struggled with long-range dependencies. The transformer architecture introduced self-attention—allowing models to weigh the importance of every word relative to every other word, regardless of distance.

The breakthrough:

The 2017 paper "Attention Is All You Need" proved you could build powerful language models without recurrent networks. GPT, BERT, Claude, Gemini—all built on transformers.

Architecture Revolutionary 2017

🪟

Context Window

The model's working memory

Context window is how much information an AI can "remember" at once. GPT-4 has 128K tokens (~96,000 words), Gemini 1.5 Pro has 2M tokens (~1.5 million words). Everything you send—your prompt, the conversation history, documents—lives here.

Real-world impact:

Larger context = analyze entire codebases, process full books, maintain longer conversations without forgetting. But it costs more and runs slower.

Comparison:

• GPT-4: 128K tokens ≈ 300 pages
• Claude Sonnet 4.5: 200K tokens ≈ 470 pages
• Gemini 1.5 Pro: 2M tokens ≈ 4,700 pages

Memory Capacity Tradeoffs

🎯

RLHF

Teaching AI what humans actually want

Reinforcement Learning from Human Feedback (RLHF) is how we make models helpful instead of just accurate. Humans rank different responses, and the model learns to prefer what people actually want—not just what's technically correct.

Before RLHF:

Model: "Here are 10 ways to hack someone's account."

After RLHF:

Model: "I can't help with that, but I can explain cybersecurity best practices."

Training Alignment Safety

🌡️

Temperature

The creativity dial

Temperature controls randomness in AI outputs. Low temperature (0.0-0.3) = focused and deterministic. High temperature (0.7-1.0) = creative and varied. It's literally how "confident" vs "exploratory" the model acts.

Temperature 0.0:

"The capital of France is Paris."

Temperature 1.0:

"Paris, that luminous city on the Seine, serves as France's capital—a beacon of culture and history."

Parameter Creativity Control

🧬

Embeddings

How AI understands meaning

Embeddings transform words into numbers—specifically, high-dimensional vectors. Words with similar meanings get similar vectors. "King" - "Man" + "Woman" ≈ "Queen" isn't magic—it's vector math in embedding space.

Why it's powerful:

Semantic search, recommendation systems, and RAG (Retrieval-Augmented Generation) all depend on embeddings. They let AI find information based on meaning, not just keywords.

Real-world use:

Search for "fast car" and get results about "speedy vehicle" or "quick automobile" even though the exact words don't match.

Representation Semantic Search Vector Math

← Back to Home